Microservices Expo Authors: Liz McMillan, Yeshim Deniz, Derek Weeks, Pat Romanski, Sematext Blog

Related Topics: Microservices Expo, Industrial IoT, Recurring Revenue

Microservices Expo: Article

Improving the Efficiency of SOA-Based Applications

Using an Application Grid with large XML documents to build SOA applications that scale linearly and predictably

According to Moore's Law [1], processing speed and storage capacity have been doubling about every two years since the invention of the integrated circuit in 1958.

Yet it seems that our propensity for building larger more complex software systems that anticipate these improvements inevitably outpace the exponential growth in capacity to support these systems. SOA is becoming more broadly adopted, along with the practice of using XML as a means of communicating data between services and the more rapid adoption of applications to Internet scale. Staring you in the face of your application's success, the potential to overwhelm your systems has become very real, and may happen at times when you least expect it.

How do we get ahead of this trend? Given that memory and storage are always increasing in the realm of enterprise computing, software needs to keep up with the pace. We need to architect from the beginning using the proper approach toward achieving linear scalability with predictable latency. Data files and feeds are increasing in size, requiring more processing, and becoming more cumbersome to manage with software designed to materialize entire files before consuming them. In some cases, the operations that are to be performed require multiple input sources to be consumed before processing can begin.

Those who are building the eXtreme Transaction Processing (XTP) style of applications - such as Telco call setup and billing, online gaming, securities trading, risk management, and online travel booking - understand this challenge well. The broader use case that is applicable across more industries is web applications that need to scale up to Internet volumes, against backend systems that were never designed to handle that kind of traffic.

Boundary Costs
In discussions with customers about scaling a SOA with predictable latency, the term that often comes up is "Boundary Costs." To put this in context, consider the following scenario - an XML document that may have originated from an internal application, database, an external business partner, or perhaps converted from an EDI document, needs to be processed by a number of services, which are coordinated by a BPEL process or an ESB process pipeline. The common approach is to place the XML document on the bus and have the bus invoke the services in accordance with the process definition, passing the XML document as part of the service request payload. Each service that needs to process that data will access the XML accordingly. Interaction with a database may also occur. This approach, as illustrated in Figure 1, sounds simple enough.

Figure 1: Calling services using BPEL process or Service Bus pipeline

However, in practice there are challenges to scalability when using this approach. What is the cost of crossing the boundary from one service to the next? How many times does that cost get incurred in the context of invoking a simple business process? What if the XML document is really large in the multi-megabyte range, or there are lots of them numbering in the thousands, or both?

Compounding this challenge is the reality that most IT environments are a mixture of platforms and technologies. Regardless of how efficient your process engine or service bus might be, the processing at the service endpoint might still become a bottleneck. A recent conversation at a customer site revealed a 15-step business process that normally takes 15 seconds to run, but of late under peak loads it is violating its 30-second SLA. The developers had spent the better part of the past two years optimizing and tuning every last bit of performance out of each one of those 15 services, and the remaining culprit identified for the poor end-to-end latency is the boundary cost between the services. A detailed examination revealed that each of the 15 service calls was spending 1-2 seconds in an open source web service toolkit doing parsing and marshaling of the XML payload. This is not intended to be a disparaging comment about open source web services toolkits, but is simply illustrating the point that parsing and marshaling of XML at the endpoints can introduce latency that can add up pretty quickly.

As illustrated in Figure 2, each service invoked needs to read the XML payload from its on-the-wire serialization form, and parse the XML into a native Java or .NET object form to be processed by the business logic. In addition if database interaction is required, then there is an additional object to relational mapping that needs to occur. Finally, the inverse of those steps needs to occur in order to generate a response to the service request and send that along to the next downstream service in accordance with the business process that is coordinating the interaction between the services.

Figure 2: Service request boundary cost between XML to Object to Relational and back again for each invocation

A popular approach for dealing with XML in a SOA is to use web services and XMLBeans. Using XMLBeans, objects are typically created by fully materializing the inputs and outputs, as this allows for maximum usability and processing. In-memory processing may include sorting, filtering, or aggregation operations, all of which increase the overall memory required to deal with each call. This strategy is not scalable and cannot be applied to many of the use cases in this area. Many products support streaming of XML, but this may limit the ability to do anything meaningful without putting the data somewhere else first.

What if there was a way to take this information and store it in an application grid, a place where the size of the data and the processing capability can far eclipse that of any single machine or process? The application grid can utilize the combined memory and processing power of multiple machines in order to complete an operation, such as the application of a complex formula or filter across an enormous data set. The application grid also provides the ability to hold the data for longer periods of time beyond the cycle of a single service request, survive server restarts, and even work across network boundaries.

If we could combine the power of the grid for data storage and manipulation with the efficiency of streaming, the result would be a highly scalable system capable of processing much more information than before. Using a combination of complementary technologies here, we achieve our goal of spreading compute operations across a distributed network of machines, and we lessen the processing and memory requirements of our data consumers - SOA services, application servers, and client applications. We also remove the need to use a database for intermediate storage of data while it is (or simply so it can be) processed. By using an application grid we can also implement patterns where we pass around references to data, rather than the data, resulting in huge efficiency gains in the communications layer, and dramatically reducing or eliminating the boundary cost.

This article includes a code example that covers the use case of processing large XML files in an application grid. In a typical XML file, there are a usually elements that repeat without any pre-determined limit. Using a STAX parser to handle streaming XML, and JAXB to handle conversion between XML and Java objects, we can extract these repeating elements from the XML stream and put them on the application grid as individual objects. The implementation can populate the grid with these objects, and do so with a limited amount of memory consumption. Once populated, the grid can process the data across the multiple machines that constitute the grid. Each grid member processes an operation or a filter and passes intermediate results to the grid client, which then assembles them into a final result set.

What Is an Application Grid?
An application grid is a horizontally scalable agent based in an in-memory storage engine for application state data. This effectively provides a distributed shared memory pool that can be linearly scaled across a heterogeneous grid of machines that consists of any combination of high-end and lower-cost commodity hardware. Use of an application grid in an application simultaneously provides performance, scalability and reliability to in-memory data.

One way that an application utilizes an application grid is to use API-level interfaces that mimic the Java Hashmap, .NET Dictionary, or JPA interfaces. An alternate approach is to use a service-level interface from a SOA environment. As applications or services place data into the application grid, a group of constantly cooperating caching servers coordinate updates to data objects, as well as their backups, using cluster-wide concurrency control.

As shown in Figure 3, the request to put data to the map is taken over by the application grid and transported across a highly efficient networking protocol to the grid node P, which owns the primary instance data. The primary node in turn copies the updated value to the secondary node B for backup, and then returns control to the service.

Figure 3: Application grid clustering ensures primary / backup of in-memory data on separate machines.

The application grid stores data across multiple machines with complete location transparency as it sees fit. A unique hash key value is all that is necessary to retrieve the stored data at a future point, regardless of where the application grid chose to store the data. This prevents the application logic from dealing with complex location dependencies and manual partitioning schemes. If one or more nodes in the grid fails, or can't be reached due to network failure, the application grid will immediately react to the failure and rebalance the data across the remaining healthy nodes. This can happen even if the failing node had been participating in an autonomous update operation. In Figure 4, the primary owner ‘P' of a piece of data fails while in the midst of retrieving data for the service. The get() request is immediately routed to the backup node and a new primary / backup pair is allotted.

Figure 4:  Application grid provides continual failover of in-memory state data

This data stored in the grid can be anything from simple variables to complex objects or even large XML documents. In our case we chose to fragment what would have been very large XML documents into smaller parts and store those XML fragments as Java objects in the application grid. This allows us to do parallel queries against the data using the Java APIs.

The application grid supports a range of operations including parallel processing of queries, events, and transactions. For large datasets, an entire collection of data may be put to the grid as a single operation, and the grid can disperse the contents of the collection across multiple primary and backup nodes in order to scale. In more advanced applications, the grid may even execute business logic directly and in parallel on data storage nodes, and do so with data and logic affinity such that the logic executes on the same machine that is storing the data that the logic is operating on.

More Stories By Dave Chappell

David Chappell is vice president and chief technologist for SOA at Oracle Corporation, and is driving the vision for Oracle’s SOA on App Grid initiative.

More Stories By Andrew Gregory

Andrew Gregory is currently a Sales Consultant at Oracle Corporation. He has worked in Development, Product Support, Infrastructure, and Sales over 13 years in the industry.

Comments (1) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

Most Recent Comments
jhv1blz5 07/03/09 10:31:00 AM EDT

The article validated SOA as an IT architecture paradigm that can be leveraged in many ways. Taking data storage, scalability and application performance to a nifty level using SOA Application Grid infrastructure will no doubt enhance data and application performance on Oracle architecture platforms, it also has the promise of a cost effective and efficient IT delivery model. The very benefits of SOA.

@MicroservicesExpo Stories
Virgil consists of an open-source encryption library, which implements Cryptographic Message Syntax (CMS) and Elliptic Curve Integrated Encryption Scheme (ECIES) (including RSA schema), a Key Management API, and a cloud-based Key Management Service (Virgil Keys). The Virgil Keys Service consists of a public key service and a private key escrow service. 

Apache Hadoop is a key technology for gaining business insights from your Big Data, but the penetration into enterprises is shockingly low. In fact, Apache Hadoop and Big Data proponents recognize that this technology has not yet achieved its game-changing business potential. In his session at 19th Cloud Expo, John Mertic, director of program management for ODPi at The Linux Foundation, will explain why this is, how we can work together as an open data community to increase adoption, and the i...
What do dependency resolution, situational awareness, and superheroes have in common? Meet Chris Corriere, a DevOps/Software Engineer at Autotrader, speaking on creative ways to maximize usage of all of the above. Mark Miller, Community Advocate and senior storyteller at Sonatype, caught up with Chris to learn more about what his team is up to.
All clouds are not equal. To succeed in a DevOps context, organizations should plan to develop/deploy apps across a choice of on-premise and public clouds simultaneously depending on the business needs. This is where the concept of the Lean Cloud comes in - resting on the idea that you often need to relocate your app modules over their life cycles for both innovation and operational efficiency in the cloud. In his session at @DevOpsSummit at19th Cloud Expo, Valentin (Val) Bercovici, CTO of So...
Monitoring of Docker environments is challenging. Why? Because each container typically runs a single process, has its own environment, utilizes virtual networks, or has various methods of managing storage. Traditional monitoring solutions take metrics from each server and applications they run. These servers and applications running on them are typically very static, with very long uptimes. Docker deployments are different: a set of containers may run many applications, all sharing the resource...
SYS-CON Events announced today that eCube Systems, the leading provider of modern development tools and best practices for Continuous Integration on OpenVMS, will exhibit at SYS-CON's @DevOpsSummit at Cloud Expo New York, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. eCube Systems offers a family of middleware products and development tools that maximize return on technology investment by leveraging existing technical equity to meet evolving business needs. ...
DevOps is a term that comes full of controversy. A lot of people are on the bandwagon, while others are waiting for the term to jump the shark, and eventually go back to business as usual. Regardless of where you are along the specturm of loving or hating the term DevOps, one thing is certain. More and more people are using it to describe a system administrator who uses scripts, or tools like, Chef, Puppet or Ansible, in order to provision infrastructure. There is also usually an expectation of...
JetBlue Airways uses virtual environments to reduce software development costs, centralize performance testing, and create a climate for continuous integration and real-time monitoring of mobile applications. The next BriefingsDirect Voice of the Customer performance engineering case study discussion examines how JetBlue Airways in New York uses virtual environments to reduce software development costs, centralize performance testing, and create a climate for continuous integration and real-tim...
The general concepts of DevOps have played a central role advancing the modern software delivery industry. With the library of DevOps best practices, tips and guides expanding quickly, it can be difficult to track down the best and most accurate resources and information. In order to help the software development community, and to further our own learning, we reached out to leading industry analysts and asked them about an increasingly popular tenet of a DevOps transformation: collaboration.
The best way to leverage your Cloud Expo presence as a sponsor and exhibitor is to plan your news announcements around our events. The press covering Cloud Expo and @ThingsExpo will have access to these releases and will amplify your news announcements. More than two dozen Cloud companies either set deals at our shows or have announced their mergers and acquisitions at Cloud Expo. Product announcements during our show provide your company with the most reach through our targeted audiences.
In case you haven’t heard, the new hotness in app architectures is serverless. Mainly restricted to cloud environments (Amazon Lambda, Google Cloud Functions, Microsoft Azure Functions) the general concept is that you don’t have to worry about anything but the small snippets of code (functions) you write to do something when something happens. That’s an event-driven model, by the way, that should be very familiar to anyone who has taken advantage of a programmable proxy to do app or API routing ...
At its core DevOps is all about collaboration. The lines of communication must be opened and it takes some effort to ensure that they stay that way. It’s easy to pay lip service to trends and talk about implementing new methodologies, but without action, real benefits cannot be realized. Success requires planning, advocates empowered to effect change, and, of course, the right tooling. To bring about a cultural shift it’s important to share challenges. In simple terms, ensuring that everyone k...
DevOps is speeding towards the IT world like a freight train and the hype around it is deafening. There is no reason to be afraid of this change as it is the natural reaction to the agile movement that revolutionized development just a few years ago. By definition, DevOps is the natural alignment of IT performance to business profitability. The relevance of this has yet to be quantified but it has been suggested that the route to the CEO’s chair will come from the IT leaders that successfully ma...
SYS-CON Events announced today that SoftNet Solutions will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. SoftNet Solutions specializes in Enterprise Solutions for Hadoop and Big Data. It offers customers the most open, robust, and value-conscious portfolio of solutions, services, and tools for the shortest route to success with Big Data. The unique differentiator is the ability to architect and ...
DevOps is being widely accepted (if not fully adopted) as essential in enterprise IT. But as Enterprise DevOps gains maturity, expands scope, and increases velocity, the need for data-driven decisions across teams becomes more acute. DevOps teams in any modern business must wrangle the ‘digital exhaust’ from the delivery toolchain, "pervasive" and "cognitive" computing, APIs and services, mobile devices and applications, the Internet of Things, and now even blockchain. In this power panel at @...
@DevOpsSummit has been named the ‘Top DevOps Influencer' by iTrend. iTrend processes millions of conversations, tweets, interactions, news articles, press releases, blog posts - and extract meaning form them and analyzes mobile and desktop software platforms used to communicate, various metadata (such as geo location), and automation tools. In overall placement, @DevOpsSummit ranked as the number one ‘DevOps Influencer' followed by @CloudExpo at third, and @MicroservicesE at 24th.
As software becomes more and more complex, we, as software developers, have been splitting up our code into smaller and smaller components. This is also true for the environment in which we run our code: going from bare metal, to VMs to the modern-day Cloud Native world of containers, schedulers and microservices. While we have figured out how to run containerized applications in the cloud using schedulers, we've yet to come up with a good solution to bridge the gap between getting your conta...
Without lifecycle traceability and visibility across the tool chain, stakeholders from Planning-to-Ops have limited insight and answers to who, what, when, why and how across the DevOps lifecycle. This impacts the ability to deliver high quality software at the needed velocity to drive positive business outcomes. In his session at @DevOpsSummit 19th Cloud Expo, Eric Robertson, General Manager at CollabNet, will show how customers are able to achieve a level of transparency that enables everyon...
DevOps theory promotes a culture of continuous improvement built on collaboration, empowerment, systems thinking, and feedback loops. But how do you collaborate effectively across the traditional silos? How can you make decisions without system-wide visibility? How can you see the whole system when it is spread across teams and locations? How do you close feedback loops across teams and activities delivering complex multi-tier, cloud, container, serverless, and/or API-based services?
Today every business relies on software to drive the innovation necessary for a competitive edge in the Application Economy. This is why collaboration between development and operations, or DevOps, has become IT’s number one priority. Whether you are in Dev or Ops, understanding how to implement a DevOps strategy can deliver faster development cycles, improved software quality, reduced deployment times and overall better experiences for your customers.