Welcome!

SOA & WOA Authors: Plutora Blog, John Savageau, Elizabeth White, Roger Strukhoff, Pat Romanski

Related Topics: SOA & WOA, XML, Oracle

SOA & WOA: Article

Improving the Efficiency of SOA-Based Applications

Using an Application Grid with large XML documents to build SOA applications that scale linearly and predictably

According to Moore's Law [1], processing speed and storage capacity have been doubling about every two years since the invention of the integrated circuit in 1958.

Yet it seems that our propensity for building larger more complex software systems that anticipate these improvements inevitably outpace the exponential growth in capacity to support these systems. SOA is becoming more broadly adopted, along with the practice of using XML as a means of communicating data between services and the more rapid adoption of applications to Internet scale. Staring you in the face of your application's success, the potential to overwhelm your systems has become very real, and may happen at times when you least expect it.

How do we get ahead of this trend? Given that memory and storage are always increasing in the realm of enterprise computing, software needs to keep up with the pace. We need to architect from the beginning using the proper approach toward achieving linear scalability with predictable latency. Data files and feeds are increasing in size, requiring more processing, and becoming more cumbersome to manage with software designed to materialize entire files before consuming them. In some cases, the operations that are to be performed require multiple input sources to be consumed before processing can begin.

Those who are building the eXtreme Transaction Processing (XTP) style of applications - such as Telco call setup and billing, online gaming, securities trading, risk management, and online travel booking - understand this challenge well. The broader use case that is applicable across more industries is web applications that need to scale up to Internet volumes, against backend systems that were never designed to handle that kind of traffic.

Boundary Costs
In discussions with customers about scaling a SOA with predictable latency, the term that often comes up is "Boundary Costs." To put this in context, consider the following scenario - an XML document that may have originated from an internal application, database, an external business partner, or perhaps converted from an EDI document, needs to be processed by a number of services, which are coordinated by a BPEL process or an ESB process pipeline. The common approach is to place the XML document on the bus and have the bus invoke the services in accordance with the process definition, passing the XML document as part of the service request payload. Each service that needs to process that data will access the XML accordingly. Interaction with a database may also occur. This approach, as illustrated in Figure 1, sounds simple enough.

Figure 1: Calling services using BPEL process or Service Bus pipeline

However, in practice there are challenges to scalability when using this approach. What is the cost of crossing the boundary from one service to the next? How many times does that cost get incurred in the context of invoking a simple business process? What if the XML document is really large in the multi-megabyte range, or there are lots of them numbering in the thousands, or both?

Compounding this challenge is the reality that most IT environments are a mixture of platforms and technologies. Regardless of how efficient your process engine or service bus might be, the processing at the service endpoint might still become a bottleneck. A recent conversation at a customer site revealed a 15-step business process that normally takes 15 seconds to run, but of late under peak loads it is violating its 30-second SLA. The developers had spent the better part of the past two years optimizing and tuning every last bit of performance out of each one of those 15 services, and the remaining culprit identified for the poor end-to-end latency is the boundary cost between the services. A detailed examination revealed that each of the 15 service calls was spending 1-2 seconds in an open source web service toolkit doing parsing and marshaling of the XML payload. This is not intended to be a disparaging comment about open source web services toolkits, but is simply illustrating the point that parsing and marshaling of XML at the endpoints can introduce latency that can add up pretty quickly.

As illustrated in Figure 2, each service invoked needs to read the XML payload from its on-the-wire serialization form, and parse the XML into a native Java or .NET object form to be processed by the business logic. In addition if database interaction is required, then there is an additional object to relational mapping that needs to occur. Finally, the inverse of those steps needs to occur in order to generate a response to the service request and send that along to the next downstream service in accordance with the business process that is coordinating the interaction between the services.

Figure 2: Service request boundary cost between XML to Object to Relational and back again for each invocation

A popular approach for dealing with XML in a SOA is to use web services and XMLBeans. Using XMLBeans, objects are typically created by fully materializing the inputs and outputs, as this allows for maximum usability and processing. In-memory processing may include sorting, filtering, or aggregation operations, all of which increase the overall memory required to deal with each call. This strategy is not scalable and cannot be applied to many of the use cases in this area. Many products support streaming of XML, but this may limit the ability to do anything meaningful without putting the data somewhere else first.

What if there was a way to take this information and store it in an application grid, a place where the size of the data and the processing capability can far eclipse that of any single machine or process? The application grid can utilize the combined memory and processing power of multiple machines in order to complete an operation, such as the application of a complex formula or filter across an enormous data set. The application grid also provides the ability to hold the data for longer periods of time beyond the cycle of a single service request, survive server restarts, and even work across network boundaries.

If we could combine the power of the grid for data storage and manipulation with the efficiency of streaming, the result would be a highly scalable system capable of processing much more information than before. Using a combination of complementary technologies here, we achieve our goal of spreading compute operations across a distributed network of machines, and we lessen the processing and memory requirements of our data consumers - SOA services, application servers, and client applications. We also remove the need to use a database for intermediate storage of data while it is (or simply so it can be) processed. By using an application grid we can also implement patterns where we pass around references to data, rather than the data, resulting in huge efficiency gains in the communications layer, and dramatically reducing or eliminating the boundary cost.

This article includes a code example that covers the use case of processing large XML files in an application grid. In a typical XML file, there are a usually elements that repeat without any pre-determined limit. Using a STAX parser to handle streaming XML, and JAXB to handle conversion between XML and Java objects, we can extract these repeating elements from the XML stream and put them on the application grid as individual objects. The implementation can populate the grid with these objects, and do so with a limited amount of memory consumption. Once populated, the grid can process the data across the multiple machines that constitute the grid. Each grid member processes an operation or a filter and passes intermediate results to the grid client, which then assembles them into a final result set.

What Is an Application Grid?
An application grid is a horizontally scalable agent based in an in-memory storage engine for application state data. This effectively provides a distributed shared memory pool that can be linearly scaled across a heterogeneous grid of machines that consists of any combination of high-end and lower-cost commodity hardware. Use of an application grid in an application simultaneously provides performance, scalability and reliability to in-memory data.

One way that an application utilizes an application grid is to use API-level interfaces that mimic the Java Hashmap, .NET Dictionary, or JPA interfaces. An alternate approach is to use a service-level interface from a SOA environment. As applications or services place data into the application grid, a group of constantly cooperating caching servers coordinate updates to data objects, as well as their backups, using cluster-wide concurrency control.

As shown in Figure 3, the request to put data to the map is taken over by the application grid and transported across a highly efficient networking protocol to the grid node P, which owns the primary instance data. The primary node in turn copies the updated value to the secondary node B for backup, and then returns control to the service.

Figure 3: Application grid clustering ensures primary / backup of in-memory data on separate machines.

The application grid stores data across multiple machines with complete location transparency as it sees fit. A unique hash key value is all that is necessary to retrieve the stored data at a future point, regardless of where the application grid chose to store the data. This prevents the application logic from dealing with complex location dependencies and manual partitioning schemes. If one or more nodes in the grid fails, or can't be reached due to network failure, the application grid will immediately react to the failure and rebalance the data across the remaining healthy nodes. This can happen even if the failing node had been participating in an autonomous update operation. In Figure 4, the primary owner ‘P' of a piece of data fails while in the midst of retrieving data for the service. The get() request is immediately routed to the backup node and a new primary / backup pair is allotted.

Figure 4:  Application grid provides continual failover of in-memory state data

This data stored in the grid can be anything from simple variables to complex objects or even large XML documents. In our case we chose to fragment what would have been very large XML documents into smaller parts and store those XML fragments as Java objects in the application grid. This allows us to do parallel queries against the data using the Java APIs.

The application grid supports a range of operations including parallel processing of queries, events, and transactions. For large datasets, an entire collection of data may be put to the grid as a single operation, and the grid can disperse the contents of the collection across multiple primary and backup nodes in order to scale. In more advanced applications, the grid may even execute business logic directly and in parallel on data storage nodes, and do so with data and logic affinity such that the logic executes on the same machine that is storing the data that the logic is operating on.

More Stories By Dave Chappell

David Chappell is vice president and chief technologist for SOA at Oracle Corporation, and is driving the vision for Oracle’s SOA on App Grid initiative.

More Stories By Andrew Gregory

Andrew Gregory is currently a Sales Consultant at Oracle Corporation. He has worked in Development, Product Support, Infrastructure, and Sales over 13 years in the industry.

Comments (1) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Most Recent Comments
jhv1blz5 07/03/09 10:31:00 AM EDT

The article validated SOA as an IT architecture paradigm that can be leveraged in many ways. Taking data storage, scalability and application performance to a nifty level using SOA Application Grid infrastructure will no doubt enhance data and application performance on Oracle architecture platforms, it also has the promise of a cost effective and efficient IT delivery model. The very benefits of SOA.

@ThingsExpo Stories
SYS-CON Events announced today that ActiveState, the leading independent Cloud Foundry and Docker-based PaaS provider, has been named “Silver Sponsor” of SYS-CON's DevOps Summit New York, which will take place June 9-11, 2015, at the Javits Center in New York City, NY. ActiveState believes that enterprises gain a competitive advantage when they are able to quickly create, deploy and efficiently manage software solutions that immediately create business value, but they face many challenges that prevent them from doing so. The Company is uniquely positioned to help address these challenges thro...
Today’s enterprise is being driven by disruptive competitive and human capital requirements to provide enterprise application access through not only desktops, but also mobile devices. To retrofit existing programs across all these devices using traditional programming methods is very costly and time consuming – often prohibitively so. In his session at @ThingsExpo, Jesse Shiah, CEO, President, and Co-Founder of AgilePoint Inc., discussed how you can create applications that run on all mobile devices as well as laptops and desktops using a visual drag-and-drop application – and eForms-buildi...
IoT is still a vague buzzword for many people. In his session at @ThingsExpo, Mike Kavis, Vice President & Principal Cloud Architect at Cloud Technology Partners, discussed the business value of IoT that goes far beyond the general public's perception that IoT is all about wearables and home consumer services. He also discussed how IoT is perceived by investors and how venture capitalist access this space. Other topics discussed were barriers to success, what is new, what is old, and what the future may hold. Mike Kavis is Vice President & Principal Cloud Architect at Cloud Technology Pa...
Dale Kim is the Director of Industry Solutions at MapR. His background includes a variety of technical and management roles at information technology companies. While his experience includes work with relational databases, much of his career pertains to non-relational data in the areas of search, content management, and NoSQL, and includes senior roles in technical marketing, sales engineering, and support engineering. Dale holds an MBA from Santa Clara University, and a BA in Computer Science from the University of California, Berkeley.
SYS-CON Media announced that Cisco, a worldwide leader in IT that helps companies seize the opportunities of tomorrow, has launched a new ad campaign in Cloud Computing Journal. The ad campaign, a webcast titled 'Is Your Data Center Ready for the Application Economy?', focuses on the latest data center networking technologies, including SDN or ACI, and how customers are using SDN and ACI in their organizations to achieve business agility. The Cisco webcast is available on-demand.
The Internet of Things (IoT) promises to evolve the way the world does business; however, understanding how to apply it to your company can be a mystery. Most people struggle with understanding the potential business uses or tend to get caught up in the technology, resulting in solutions that fail to meet even minimum business goals. In his session at @ThingsExpo, Jesse Shiah, CEO / President / Co-Founder of AgilePoint Inc., showed what is needed to leverage the IoT to transform your business. He discussed opportunities and challenges ahead for the IoT from a market and technical point of vie...
SYS-CON Media announced that Splunk, a provider of the leading software platform for real-time Operational Intelligence, has launched an ad campaign on Big Data Journal. Splunk software and cloud services enable organizations to search, monitor, analyze and visualize machine-generated big data coming from websites, applications, servers, networks, sensors and mobile devices. The ads focus on delivering ROI - how improved uptime delivered $6M in annual ROI, improving customer operations by mining large volumes of unstructured data, and how data tracking delivers uptime when it matters most.
"People are a lot more knowledgeable about APIs now. There are two types of people who work with APIs - IT people who want to use APIs for something internal and the product managers who want to do something outside APIs for people to connect to them," explained Roberto Medrano, Executive Vice President at SOA Software, in this SYS-CON.tv interview at Cloud Expo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Performance is the intersection of power, agility, control, and choice. If you value performance, and more specifically consistent performance, you need to look beyond simple virtualized compute. Many factors need to be considered to create a truly performant environment. In his General Session at 15th Cloud Expo, Harold Hannon, Sr. Software Architect at SoftLayer, discussed how to take advantage of a multitude of compute options and platform features to make cloud the cornerstone of your online presence.
In this Women in Technology Power Panel at 15th Cloud Expo, moderated by Anne Plese, Senior Consultant, Cloud Product Marketing at Verizon Enterprise, Esmeralda Swartz, CMO at MetraTech; Evelyn de Souza, Data Privacy and Compliance Strategy Leader at Cisco Systems; Seema Jethani, Director of Product Management at Basho Technologies; Victoria Livschitz, CEO of Qubell Inc.; Anne Hungate, Senior Director of Software Quality at DIRECTV, discussed what path they took to find their spot within the technology industry and how do they see opportunities for other women in their area of expertise.
The Industrial Internet revolution is now underway, enabled by connected machines and billions of devices that communicate and collaborate. The massive amounts of Big Data requiring real-time analysis is flooding legacy IT systems and giving way to cloud environments that can handle the unpredictable workloads. Yet many barriers remain until we can fully realize the opportunities and benefits from the convergence of machines and devices with Big Data and the cloud, including interoperability, data security and privacy.
Almost everyone sees the potential of Internet of Things but how can businesses truly unlock that potential. The key will be in the ability to discover business insight in the midst of an ocean of Big Data generated from billions of embedded devices via Systems of Discover. Businesses will also need to ensure that they can sustain that insight by leveraging the cloud for global reach, scale and elasticity.
“The age of the Internet of Things is upon us,” stated Thomas Svensson, senior vice-president and general manager EMEA, ThingWorx, “and working with forward-thinking companies, such as Elisa, enables us to deploy our leading technology so that customers can profit from complete, end-to-end solutions.” ThingWorx, a PTC® (Nasdaq: PTC) business and Internet of Things (IoT) platform provider, announced on Monday that Elisa, Finnish provider of mobile and fixed broadband subscriptions, will deploy ThingWorx® platform technology to enable a new Elisa IoT service in Finland and Estonia.
Advanced Persistent Threats (APTs) are increasing at an unprecedented rate. The threat landscape of today is drastically different than just a few years ago. Attacks are much more organized and sophisticated. They are harder to detect and even harder to anticipate. In the foreseeable future it's going to get a whole lot harder. Everything you know today will change. Keeping up with this changing landscape is already a daunting task. Your organization needs to use the latest tools, methods and expertise to guard against those threats. But will that be enough? In the foreseeable future attacks w...
As enterprises move to all-IP networks and cloud-based applications, communications service providers (CSPs) – facing increased competition from over-the-top providers delivering content via the Internet and independently of CSPs – must be able to offer seamless cloud-based communication and collaboration solutions that can scale for small, midsize, and large enterprises, as well as public sector organizations, in order to keep and grow market share. The latest version of Oracle Communications Unified Communications Suite gives CSPs the capability to do just that. In addition, its integration ...
From telemedicine to smart cars, digital homes and industrial monitoring, the explosive growth of IoT has created exciting new business opportunities for real time calls and messaging. In his session at @ThingsExpo, Ivelin Ivanov, CEO and Co-Founder of Telestax, shared some of the new revenue sources that IoT created for Restcomm – the open source telephony platform from Telestax. Ivelin Ivanov is a technology entrepreneur who founded Mobicents, an Open Source VoIP Platform, to help create, deploy, and manage applications integrating voice, video and data. He is the co-founder of TeleStax, a...
We certainly live in interesting technological times. And no more interesting than the current competing IoT standards for connectivity. Various standards bodies, approaches, and ecosystems are vying for mindshare and positioning for a competitive edge. It is clear that when the dust settles, we will have new protocols, evolved protocols, that will change the way we interact with devices and infrastructure. We will also have evolved web protocols, like HTTP/2, that will be changing the very core of our infrastructures. At the same time, we have old approaches made new again like micro-services...
Disruptive macro trends in technology are impacting and dramatically changing the "art of the possible" relative to supply chain management practices through the innovative use of IoT, cloud, machine learning and Big Data to enable connected ecosystems of engagement. Enterprise informatics can now move beyond point solutions that merely monitor the past and implement integrated enterprise fabrics that enable end-to-end supply chain visibility to improve customer service delivery and optimize supplier management. Learn about enterprise architecture strategies for designing connected systems tha...
SYS-CON Events announced today that CodeFutures, a leading supplier of database performance tools, has been named a “Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9–11, 2015, at the Javits Center in New York, NY. CodeFutures is an independent software vendor focused on providing tools that deliver database performance tools that increase productivity during database development and increase database performance and scalability during production.
The Internet of Things is a misnomer. That implies that everything is on the Internet, and that simply should not be - especially for things that are blurring the line between medical devices that stimulate like a pacemaker and quantified self-sensors like a pedometer or pulse tracker. The mesh of things that we manage must be segmented into zones of trust for sensing data, transmitting data, receiving command and control administrative changes, and peer-to-peer mesh messaging. In his session at @ThingsExpo, Ryan Bagnulo, Solution Architect / Software Engineer at SOA Software, focused on desi...