Welcome!

Microservices Expo Authors: Liz McMillan, Elizabeth White, Carmen Gonzalez, Sematext Blog, Pat Romanski

Related Topics: Microservices Expo, Industrial IoT, Recurring Revenue

Microservices Expo: Article

Improving the Efficiency of SOA-Based Applications

Using an Application Grid with large XML documents to build SOA applications that scale linearly and predictably

According to Moore's Law [1], processing speed and storage capacity have been doubling about every two years since the invention of the integrated circuit in 1958.

Yet it seems that our propensity for building larger more complex software systems that anticipate these improvements inevitably outpace the exponential growth in capacity to support these systems. SOA is becoming more broadly adopted, along with the practice of using XML as a means of communicating data between services and the more rapid adoption of applications to Internet scale. Staring you in the face of your application's success, the potential to overwhelm your systems has become very real, and may happen at times when you least expect it.

How do we get ahead of this trend? Given that memory and storage are always increasing in the realm of enterprise computing, software needs to keep up with the pace. We need to architect from the beginning using the proper approach toward achieving linear scalability with predictable latency. Data files and feeds are increasing in size, requiring more processing, and becoming more cumbersome to manage with software designed to materialize entire files before consuming them. In some cases, the operations that are to be performed require multiple input sources to be consumed before processing can begin.

Those who are building the eXtreme Transaction Processing (XTP) style of applications - such as Telco call setup and billing, online gaming, securities trading, risk management, and online travel booking - understand this challenge well. The broader use case that is applicable across more industries is web applications that need to scale up to Internet volumes, against backend systems that were never designed to handle that kind of traffic.

Boundary Costs
In discussions with customers about scaling a SOA with predictable latency, the term that often comes up is "Boundary Costs." To put this in context, consider the following scenario - an XML document that may have originated from an internal application, database, an external business partner, or perhaps converted from an EDI document, needs to be processed by a number of services, which are coordinated by a BPEL process or an ESB process pipeline. The common approach is to place the XML document on the bus and have the bus invoke the services in accordance with the process definition, passing the XML document as part of the service request payload. Each service that needs to process that data will access the XML accordingly. Interaction with a database may also occur. This approach, as illustrated in Figure 1, sounds simple enough.

Figure 1: Calling services using BPEL process or Service Bus pipeline

However, in practice there are challenges to scalability when using this approach. What is the cost of crossing the boundary from one service to the next? How many times does that cost get incurred in the context of invoking a simple business process? What if the XML document is really large in the multi-megabyte range, or there are lots of them numbering in the thousands, or both?

Compounding this challenge is the reality that most IT environments are a mixture of platforms and technologies. Regardless of how efficient your process engine or service bus might be, the processing at the service endpoint might still become a bottleneck. A recent conversation at a customer site revealed a 15-step business process that normally takes 15 seconds to run, but of late under peak loads it is violating its 30-second SLA. The developers had spent the better part of the past two years optimizing and tuning every last bit of performance out of each one of those 15 services, and the remaining culprit identified for the poor end-to-end latency is the boundary cost between the services. A detailed examination revealed that each of the 15 service calls was spending 1-2 seconds in an open source web service toolkit doing parsing and marshaling of the XML payload. This is not intended to be a disparaging comment about open source web services toolkits, but is simply illustrating the point that parsing and marshaling of XML at the endpoints can introduce latency that can add up pretty quickly.

As illustrated in Figure 2, each service invoked needs to read the XML payload from its on-the-wire serialization form, and parse the XML into a native Java or .NET object form to be processed by the business logic. In addition if database interaction is required, then there is an additional object to relational mapping that needs to occur. Finally, the inverse of those steps needs to occur in order to generate a response to the service request and send that along to the next downstream service in accordance with the business process that is coordinating the interaction between the services.

Figure 2: Service request boundary cost between XML to Object to Relational and back again for each invocation

A popular approach for dealing with XML in a SOA is to use web services and XMLBeans. Using XMLBeans, objects are typically created by fully materializing the inputs and outputs, as this allows for maximum usability and processing. In-memory processing may include sorting, filtering, or aggregation operations, all of which increase the overall memory required to deal with each call. This strategy is not scalable and cannot be applied to many of the use cases in this area. Many products support streaming of XML, but this may limit the ability to do anything meaningful without putting the data somewhere else first.

What if there was a way to take this information and store it in an application grid, a place where the size of the data and the processing capability can far eclipse that of any single machine or process? The application grid can utilize the combined memory and processing power of multiple machines in order to complete an operation, such as the application of a complex formula or filter across an enormous data set. The application grid also provides the ability to hold the data for longer periods of time beyond the cycle of a single service request, survive server restarts, and even work across network boundaries.

If we could combine the power of the grid for data storage and manipulation with the efficiency of streaming, the result would be a highly scalable system capable of processing much more information than before. Using a combination of complementary technologies here, we achieve our goal of spreading compute operations across a distributed network of machines, and we lessen the processing and memory requirements of our data consumers - SOA services, application servers, and client applications. We also remove the need to use a database for intermediate storage of data while it is (or simply so it can be) processed. By using an application grid we can also implement patterns where we pass around references to data, rather than the data, resulting in huge efficiency gains in the communications layer, and dramatically reducing or eliminating the boundary cost.

This article includes a code example that covers the use case of processing large XML files in an application grid. In a typical XML file, there are a usually elements that repeat without any pre-determined limit. Using a STAX parser to handle streaming XML, and JAXB to handle conversion between XML and Java objects, we can extract these repeating elements from the XML stream and put them on the application grid as individual objects. The implementation can populate the grid with these objects, and do so with a limited amount of memory consumption. Once populated, the grid can process the data across the multiple machines that constitute the grid. Each grid member processes an operation or a filter and passes intermediate results to the grid client, which then assembles them into a final result set.

What Is an Application Grid?
An application grid is a horizontally scalable agent based in an in-memory storage engine for application state data. This effectively provides a distributed shared memory pool that can be linearly scaled across a heterogeneous grid of machines that consists of any combination of high-end and lower-cost commodity hardware. Use of an application grid in an application simultaneously provides performance, scalability and reliability to in-memory data.

One way that an application utilizes an application grid is to use API-level interfaces that mimic the Java Hashmap, .NET Dictionary, or JPA interfaces. An alternate approach is to use a service-level interface from a SOA environment. As applications or services place data into the application grid, a group of constantly cooperating caching servers coordinate updates to data objects, as well as their backups, using cluster-wide concurrency control.

As shown in Figure 3, the request to put data to the map is taken over by the application grid and transported across a highly efficient networking protocol to the grid node P, which owns the primary instance data. The primary node in turn copies the updated value to the secondary node B for backup, and then returns control to the service.

Figure 3: Application grid clustering ensures primary / backup of in-memory data on separate machines.

The application grid stores data across multiple machines with complete location transparency as it sees fit. A unique hash key value is all that is necessary to retrieve the stored data at a future point, regardless of where the application grid chose to store the data. This prevents the application logic from dealing with complex location dependencies and manual partitioning schemes. If one or more nodes in the grid fails, or can't be reached due to network failure, the application grid will immediately react to the failure and rebalance the data across the remaining healthy nodes. This can happen even if the failing node had been participating in an autonomous update operation. In Figure 4, the primary owner ‘P' of a piece of data fails while in the midst of retrieving data for the service. The get() request is immediately routed to the backup node and a new primary / backup pair is allotted.

Figure 4:  Application grid provides continual failover of in-memory state data

This data stored in the grid can be anything from simple variables to complex objects or even large XML documents. In our case we chose to fragment what would have been very large XML documents into smaller parts and store those XML fragments as Java objects in the application grid. This allows us to do parallel queries against the data using the Java APIs.

The application grid supports a range of operations including parallel processing of queries, events, and transactions. For large datasets, an entire collection of data may be put to the grid as a single operation, and the grid can disperse the contents of the collection across multiple primary and backup nodes in order to scale. In more advanced applications, the grid may even execute business logic directly and in parallel on data storage nodes, and do so with data and logic affinity such that the logic executes on the same machine that is storing the data that the logic is operating on.

More Stories By Dave Chappell

David Chappell is vice president and chief technologist for SOA at Oracle Corporation, and is driving the vision for Oracle’s SOA on App Grid initiative.

More Stories By Andrew Gregory

Andrew Gregory is currently a Sales Consultant at Oracle Corporation. He has worked in Development, Product Support, Infrastructure, and Sales over 13 years in the industry.

Comments (1) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Most Recent Comments
jhv1blz5 07/03/09 10:31:00 AM EDT

The article validated SOA as an IT architecture paradigm that can be leveraged in many ways. Taking data storage, scalability and application performance to a nifty level using SOA Application Grid infrastructure will no doubt enhance data and application performance on Oracle architecture platforms, it also has the promise of a cost effective and efficient IT delivery model. The very benefits of SOA.

@MicroservicesExpo Stories
In IT, we sometimes coin terms for things before we know exactly what they are and how they’ll be used. The resulting terms may capture a common set of aspirations and goals – as “cloud” did broadly for on-demand, self-service, and flexible computing. But such a term can also lump together diverse and even competing practices, technologies, and priorities to the point where important distinctions are glossed over and lost.
SYS-CON Events has announced today that Roger Strukhoff has been named conference chair of Cloud Expo and @ThingsExpo 2017 New York. The 20th Cloud Expo and 7th @ThingsExpo will take place on June 6-8, 2017, at the Javits Center in New York City, NY. "The Internet of Things brings trillions of dollars of opportunity to developers and enterprise IT, no matter how you measure it," stated Roger Strukhoff. "More importantly, it leverages the power of devices and the Internet to enable us all to im...
In his general session at 19th Cloud Expo, Manish Dixit, VP of Product and Engineering at Dice, discussed how Dice leverages data insights and tools to help both tech professionals and recruiters better understand how skills relate to each other and which skills are in high demand using interactive visualizations and salary indicator tools to maximize earning potential. Manish Dixit is VP of Product and Engineering at Dice. As the leader of the Product, Engineering and Data Sciences team at D...
Financial Technology has become a topic of intense interest throughout the cloud developer and enterprise IT communities. Accordingly, attendees at the upcoming 20th Cloud Expo at the Javits Center in New York, June 6-8, 2017, will find fresh new content in a new track called FinTech.
Monitoring of Docker environments is challenging. Why? Because each container typically runs a single process, has its own environment, utilizes virtual networks, or has various methods of managing storage. Traditional monitoring solutions take metrics from each server and applications they run. These servers and applications running on them are typically very static, with very long uptimes. Docker deployments are different: a set of containers may run many applications, all sharing the resource...
You have great SaaS business app ideas. You want to turn your idea quickly into a functional and engaging proof of concept. You need to be able to modify it to meet customers' needs, and you need to deliver a complete and secure SaaS application. How could you achieve all the above and yet avoid unforeseen IT requirements that add unnecessary cost and complexity? You also want your app to be responsive in any device at any time. In his session at 19th Cloud Expo, Mark Allen, General Manager of...
Today’s IT environments are increasingly heterogeneous, with Linux, Java, Oracle and MySQL considered nearly as common as traditional Windows environments. In many cases, these platforms have been integrated into an organization’s Windows-based IT department by way of an acquisition of a company that leverages one of those platforms. In other cases, the applications may have been part of the IT department for years, but managed by a separate department or singular administrator. Still, whether...
Without lifecycle traceability and visibility across the tool chain, stakeholders from Planning-to-Ops have limited insight and answers to who, what, when, why and how across the DevOps lifecycle. This impacts the ability to deliver high quality software at the needed velocity to drive positive business outcomes. In his general session at @DevOpsSummit at 19th Cloud Expo, Phil Hombledal, Solution Architect at CollabNet, discussed how customers are able to achieve a level of transparency that e...
Logs are continuous digital records of events generated by all components of your software stack – and they’re everywhere – your networks, servers, applications, containers and cloud infrastructure just to name a few. The data logs provide are like an X-ray for your IT infrastructure. Without logs, this lack of visibility creates operational challenges for managing modern applications that drive today’s digital businesses.
Rapid innovation, changing business landscapes, and new IT demands force businesses to make changes quickly. In the eyes of many, containers are at the brink of becoming a pervasive technology in enterprise IT to accelerate application delivery. In this presentation, attendees learned about the: The transformation of IT to a DevOps, microservices, and container-based architecture What are containers and how DevOps practices can operate in a container-based environment A demonstration of how ...
Cloud Expo, Inc. has announced today that Andi Mann returns to 'DevOps at Cloud Expo 2017' as Conference Chair The @DevOpsSummit at Cloud Expo will take place on June 6-8, 2017, at the Javits Center in New York City, NY. "DevOps is set to be one of the most profound disruptions to hit IT in decades," said Andi Mann. "It is a natural extension of cloud computing, and I have seen both firsthand and in independent research the fantastic results DevOps delivers. So I am excited to help the great t...
If you haven’t heard yet, CollabNet just put out some very big news for managing and gaining value from DevOps. We introduced CollabNet DevOps Lifecycle Manager (DLM) — a platform designed exclusively for providing a single pane of glass, dashboard, and traceability views across your DevOps toolchain and processes from planning to operations and that can be traced back to planning and development.
@DevOpsSummit taking place June 6-8, 2017 at Javits Center, New York City, is co-located with the 20th International Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. @DevOpsSummit at Cloud Expo New York Call for Papers is now open.
SYS-CON Events announced today that Fusion, a leading provider of cloud services, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Fusion, a leading provider of integrated cloud solutions to small, medium and large businesses, is the industry’s single source for the cloud. Fusion’s advanced, proprietary cloud service platform enables the integration of leading edge solutions in the cloud, including cloud...
@DevOpsSummit at Cloud taking place June 6-8, 2017, at Javits Center, New York City, is co-located with the 20th International Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long developm...
Get deep visibility into the performance of your databases and expert advice for performance optimization and tuning. You can't get application performance without database performance. Give everyone on the team a comprehensive view of how every aspect of the system affects performance across SQL database operations, host server and OS, virtualization resources and storage I/O. Quickly find bottlenecks and troubleshoot complex problems.
The 20th International Cloud Expo has announced that its Call for Papers is open. Cloud Expo, to be held June 6-8, 2017, at the Javits Center in New York City, brings together Cloud Computing, Big Data, Internet of Things, DevOps, Containers, Microservices and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding business opportunity. Submit your speaking proposal ...
DevOps is being widely accepted (if not fully adopted) as essential in enterprise IT. But as Enterprise DevOps gains maturity, expands scope, and increases velocity, the need for data-driven decisions across teams becomes more acute. DevOps teams in any modern business must wrangle the ‘digital exhaust’ from the delivery toolchain, "pervasive" and "cognitive" computing, APIs and services, mobile devices and applications, the Internet of Things, and now even blockchain. In this power panel at @...
SYS-CON Events announced today that Dataloop.IO, an innovator in cloud IT-monitoring whose products help organizations save time and money, has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Dataloop.IO is an emerging software company on the cutting edge of major IT-infrastructure trends including cloud computing and microservices. The company, founded in the UK but now based in San Fran...
Kubernetes is a new and revolutionary open-sourced system for managing containers across multiple hosts in a cluster. Ansible is a simple IT automation tool for just about any requirement for reproducible environments. In his session at @DevOpsSummit at 18th Cloud Expo, Patrick Galbraith, a principal engineer at HPE, discussed how to build a fully functional Kubernetes cluster on a number of virtual machines or bare-metal hosts. Also included will be a brief demonstration of running a Galera MyS...