|By Dave Chappell, Andrew Gregory||
|June 3, 2009 08:30 AM EDT||
According to Moore's Law , processing speed and storage capacity have been doubling about every two years since the invention of the integrated circuit in 1958.
Yet it seems that our propensity for building larger more complex software systems that anticipate these improvements inevitably outpace the exponential growth in capacity to support these systems. SOA is becoming more broadly adopted, along with the practice of using XML as a means of communicating data between services and the more rapid adoption of applications to Internet scale. Staring you in the face of your application's success, the potential to overwhelm your systems has become very real, and may happen at times when you least expect it.
How do we get ahead of this trend? Given that memory and storage are always increasing in the realm of enterprise computing, software needs to keep up with the pace. We need to architect from the beginning using the proper approach toward achieving linear scalability with predictable latency. Data files and feeds are increasing in size, requiring more processing, and becoming more cumbersome to manage with software designed to materialize entire files before consuming them. In some cases, the operations that are to be performed require multiple input sources to be consumed before processing can begin.
Those who are building the eXtreme Transaction Processing (XTP) style of applications - such as Telco call setup and billing, online gaming, securities trading, risk management, and online travel booking - understand this challenge well. The broader use case that is applicable across more industries is web applications that need to scale up to Internet volumes, against backend systems that were never designed to handle that kind of traffic.
In discussions with customers about scaling a SOA with predictable latency, the term that often comes up is "Boundary Costs." To put this in context, consider the following scenario - an XML document that may have originated from an internal application, database, an external business partner, or perhaps converted from an EDI document, needs to be processed by a number of services, which are coordinated by a BPEL process or an ESB process pipeline. The common approach is to place the XML document on the bus and have the bus invoke the services in accordance with the process definition, passing the XML document as part of the service request payload. Each service that needs to process that data will access the XML accordingly. Interaction with a database may also occur. This approach, as illustrated in Figure 1, sounds simple enough.
Figure 1: Calling services using BPEL process or Service Bus pipeline
However, in practice there are challenges to scalability when using this approach. What is the cost of crossing the boundary from one service to the next? How many times does that cost get incurred in the context of invoking a simple business process? What if the XML document is really large in the multi-megabyte range, or there are lots of them numbering in the thousands, or both?
Compounding this challenge is the reality that most IT environments are a mixture of platforms and technologies. Regardless of how efficient your process engine or service bus might be, the processing at the service endpoint might still become a bottleneck. A recent conversation at a customer site revealed a 15-step business process that normally takes 15 seconds to run, but of late under peak loads it is violating its 30-second SLA. The developers had spent the better part of the past two years optimizing and tuning every last bit of performance out of each one of those 15 services, and the remaining culprit identified for the poor end-to-end latency is the boundary cost between the services. A detailed examination revealed that each of the 15 service calls was spending 1-2 seconds in an open source web service toolkit doing parsing and marshaling of the XML payload. This is not intended to be a disparaging comment about open source web services toolkits, but is simply illustrating the point that parsing and marshaling of XML at the endpoints can introduce latency that can add up pretty quickly.
As illustrated in Figure 2, each service invoked needs to read the XML payload from its on-the-wire serialization form, and parse the XML into a native Java or .NET object form to be processed by the business logic. In addition if database interaction is required, then there is an additional object to relational mapping that needs to occur. Finally, the inverse of those steps needs to occur in order to generate a response to the service request and send that along to the next downstream service in accordance with the business process that is coordinating the interaction between the services.
Figure 2: Service request boundary cost between XML to Object to Relational and back again for each invocation
A popular approach for dealing with XML in a SOA is to use web services and XMLBeans. Using XMLBeans, objects are typically created by fully materializing the inputs and outputs, as this allows for maximum usability and processing. In-memory processing may include sorting, filtering, or aggregation operations, all of which increase the overall memory required to deal with each call. This strategy is not scalable and cannot be applied to many of the use cases in this area. Many products support streaming of XML, but this may limit the ability to do anything meaningful without putting the data somewhere else first.
What if there was a way to take this information and store it in an application grid, a place where the size of the data and the processing capability can far eclipse that of any single machine or process? The application grid can utilize the combined memory and processing power of multiple machines in order to complete an operation, such as the application of a complex formula or filter across an enormous data set. The application grid also provides the ability to hold the data for longer periods of time beyond the cycle of a single service request, survive server restarts, and even work across network boundaries.
If we could combine the power of the grid for data storage and manipulation with the efficiency of streaming, the result would be a highly scalable system capable of processing much more information than before. Using a combination of complementary technologies here, we achieve our goal of spreading compute operations across a distributed network of machines, and we lessen the processing and memory requirements of our data consumers - SOA services, application servers, and client applications. We also remove the need to use a database for intermediate storage of data while it is (or simply so it can be) processed. By using an application grid we can also implement patterns where we pass around references to data, rather than the data, resulting in huge efficiency gains in the communications layer, and dramatically reducing or eliminating the boundary cost.
This article includes a code example that covers the use case of processing large XML files in an application grid. In a typical XML file, there are a usually elements that repeat without any pre-determined limit. Using a STAX parser to handle streaming XML, and JAXB to handle conversion between XML and Java objects, we can extract these repeating elements from the XML stream and put them on the application grid as individual objects. The implementation can populate the grid with these objects, and do so with a limited amount of memory consumption. Once populated, the grid can process the data across the multiple machines that constitute the grid. Each grid member processes an operation or a filter and passes intermediate results to the grid client, which then assembles them into a final result set.
What Is an Application Grid?
An application grid is a horizontally scalable agent based in an in-memory storage engine for application state data. This effectively provides a distributed shared memory pool that can be linearly scaled across a heterogeneous grid of machines that consists of any combination of high-end and lower-cost commodity hardware. Use of an application grid in an application simultaneously provides performance, scalability and reliability to in-memory data.
One way that an application utilizes an application grid is to use API-level interfaces that mimic the Java Hashmap, .NET Dictionary, or JPA interfaces. An alternate approach is to use a service-level interface from a SOA environment. As applications or services place data into the application grid, a group of constantly cooperating caching servers coordinate updates to data objects, as well as their backups, using cluster-wide concurrency control.
As shown in Figure 3, the request to put data to the map is taken over by the application grid and transported across a highly efficient networking protocol to the grid node P, which owns the primary instance data. The primary node in turn copies the updated value to the secondary node B for backup, and then returns control to the service.
Figure 3: Application grid clustering ensures primary / backup of in-memory data on separate machines.
The application grid stores data across multiple machines with complete location transparency as it sees fit. A unique hash key value is all that is necessary to retrieve the stored data at a future point, regardless of where the application grid chose to store the data. This prevents the application logic from dealing with complex location dependencies and manual partitioning schemes. If one or more nodes in the grid fails, or can't be reached due to network failure, the application grid will immediately react to the failure and rebalance the data across the remaining healthy nodes. This can happen even if the failing node had been participating in an autonomous update operation. In Figure 4, the primary owner ‘P' of a piece of data fails while in the midst of retrieving data for the service. The get() request is immediately routed to the backup node and a new primary / backup pair is allotted.
Figure 4: Application grid provides continual failover of in-memory state data
This data stored in the grid can be anything from simple variables to complex objects or even large XML documents. In our case we chose to fragment what would have been very large XML documents into smaller parts and store those XML fragments as Java objects in the application grid. This allows us to do parallel queries against the data using the Java APIs.
The application grid supports a range of operations including parallel processing of queries, events, and transactions. For large datasets, an entire collection of data may be put to the grid as a single operation, and the grid can disperse the contents of the collection across multiple primary and backup nodes in order to scale. In more advanced applications, the grid may even execute business logic directly and in parallel on data storage nodes, and do so with data and logic affinity such that the logic executes on the same machine that is storing the data that the logic is operating on.
|jhv1blz5 07/03/09 10:31:00 AM EDT|
The article validated SOA as an IT architecture paradigm that can be leveraged in many ways. Taking data storage, scalability and application performance to a nifty level using SOA Application Grid infrastructure will no doubt enhance data and application performance on Oracle architecture platforms, it also has the promise of a cost effective and efficient IT delivery model. The very benefits of SOA.
Digital means customer preferences and behavior are driving enterprise technology decisions to be sure, but let’s not forget our employees. After all, when we say customer, we mean customer writ large, including partners, supply chain participants, and yes, those salaried denizens whose daily labor forms the cornerstone of the enterprise. While your customers bask in the warm rays of your digital efforts, are your employees toiling away in the dark recesses of your enterprise, pecking data into...
Apr. 28, 2016 02:15 PM EDT Reads: 769
SYS-CON Events announced today that Stratoscale, the software company developing the next generation data center operating system, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. Stratoscale is revolutionizing the data center with a zero-to-cloud-in-minutes solution. With Stratoscale’s hardware-agnostic, Software Defined Data Center (SDDC) solution to store everything, run anything and scale everywhere...
Apr. 28, 2016 12:45 PM EDT Reads: 1,379
SYS-CON Events announced today that Men & Mice, the leading global provider of DNS, DHCP and IP address management overlay solutions, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. The Men & Mice Suite overlay solution is already known for its powerful application in heterogeneous operating environments, enabling enterprises to scale without fuss. Building on a solid range of diverse platform support,...
Apr. 28, 2016 11:30 AM EDT Reads: 2,104
You deployed your app with the Bluemix PaaS and it's gaining some serious traction, so it's time to make some tweaks. Did you design your application in a way that it can scale in the cloud? Were you even thinking about the cloud when you built the app? If not, chances are your app is going to break. Check out this webcast to learn various techniques for designing applications that will scale successfully in Bluemix, for the confidence you need to take your apps to the next level and beyond.
Apr. 28, 2016 11:00 AM EDT Reads: 1,350
I had the opportunity to catch up with Chris Corriere - DevOps Engineer at AutoTrader - to talk about his experiences in the realm of Rugged DevOps. We discussed automation, culture and collaboration, and which thought leaders he is following. Chris Corriere: Hey, I'm Chris Corriere. I'm a DevOps Engineer AutoTrader. Derek Weeks: Today we're going to talk about Rugged DevOps. It's a subject that's gaining a lot of traction in the community but not a lot of people are really familiar with wh...
Apr. 28, 2016 08:30 AM EDT Reads: 1,499
SYS-CON Events announced today that DatacenterDynamics has been named “Media Sponsor” of SYS-CON's 18th International Cloud Expo, which will take place on June 7–9, 2016, at the Javits Center in New York City, NY. DatacenterDynamics is a brand of DCD Group, a global B2B media and publishing company that develops products to help senior professionals in the world's most ICT dependent organizations make risk-based infrastructure and capacity decisions.
Apr. 28, 2016 05:30 AM EDT Reads: 2,361
With DevOps becoming more well-known and established practice in nearly every industry that delivers software, it is important to continually reassess its efficacy. This week’s top 10 includes a discussion on how the quick uptake of DevOps adoption in the enterprise has posed some serious challenges. Additionally, organizations who have taken the DevOps plunge must find ways to find, hire and keep their DevOps talent in order to keep the machine running smoothly.
Apr. 28, 2016 04:15 AM EDT Reads: 1,268
Between the mockups and specs produced by analysts, and resulting applications built by developers, there exists a gulf where projects fail, costs spiral, and applications disappoint. Methodologies like Agile attempt to address this with intensified communication, with partial success but many limitations. In his session at 18th Cloud Expo, Charles Kendrick, CTO & Chief Architect at Isomorphic Software, will present a revolutionary model enabled by new technologies. Learn how business and devel...
Apr. 28, 2016 03:30 AM EDT Reads: 1,593
Call it DevOps or not, if you are concerned about releasing more code faster and at a higher quality, the resulting software delivery chain and process will look and smell like DevOps. But for existing development teams, no matter what the velocity objective is, getting from here to there is not something that can be done without a plan. Moving your release cadence from months to weeks is not just about learning Agile practices and getting some automation tools. It involves people, tooling and ...
Apr. 28, 2016 03:30 AM EDT Reads: 1,415
Much of the discussion around cloud DevOps focuses on the speed with which companies need to get new code into production. This focus is important – because in an increasingly digital marketplace, new code enables new value propositions. New code is also often essential for maintaining competitive parity with market innovators. But new code doesn’t just have to deliver the functionality the business requires. It also has to behave well because the behavior of code in the cloud affects performan...
Apr. 28, 2016 02:15 AM EDT Reads: 1,280
The notion of customer journeys, of course, are central to the digital marketer’s playbook. Clearly, enterprises should focus their digital efforts on such journeys, as they represent customer interactions over time. But making customer journeys the centerpiece of the enterprise architecture, however, leaves more questions than answers. The challenge arises when EAs consider the context of the customer journey in the overall architecture as well as the architectural elements that make up each...
Apr. 28, 2016 02:15 AM EDT Reads: 1,845
It's been a busy time for tech's ongoing infatuation with containers. Amazon just announced EC2 Container Registry to simply container management. The new Azure container service taps into Microsoft's partnership with Docker and Mesosphere. You know when there's a standard for containers on the table there's money on the table, too. Everyone is talking containers because they reduce a ton of development-related challenges and make it much easier to move across production and testing environm...
Apr. 28, 2016 01:45 AM EDT Reads: 2,766
SYS-CON Events announced today that Kintone has been named "Bronze Sponsor" of SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. kintone promotes cloud-based workgroup productivity, transparency and profitability with a seamless collaboration space, build your own business application (BYOA) platform, and workflow automation system.
Apr. 27, 2016 10:15 PM EDT Reads: 3,245
APIs have taken the world by storm in recent years. The use of APIs has gone beyond just traditional "software" companies, to companies and organizations across industries using APIs to share information and power their applications. For some organizations, APIs are the biggest revenue drivers. For example, Salesforce generates nearly 50% of annual revenue through APIs. In other cases, APIs can increase a business's footprint and initiate collaboration. Netflix, for example, reported over 5 bi...
Apr. 27, 2016 10:00 PM EDT Reads: 2,478
As the software delivery industry continues to evolve and mature, the challenge of managing the growing list of the tools and processes becomes more daunting every day. Today, Application Lifecycle Management (ALM) platforms are proving most valuable by providing the governance, management and coordination for every stage of development, deployment and release. Recently, I spoke with Madison Moore at SD Times about the changing market and where ALM is headed.
Apr. 27, 2016 09:45 PM EDT Reads: 1,341
If there is anything we have learned by now, is that every business paves their own unique path for releasing software- every pipeline, implementation and practices are a bit different, and DevOps comes in all shapes and sizes. Software delivery practices are often comprised of set of several complementing (or even competing) methodologies – such as leveraging Agile, DevOps and even a mix of ITIL, to create the combination that’s most suitable for your organization and that maximize your busines...
Apr. 27, 2016 09:00 PM EDT Reads: 1,656
These days I mostly make my living as a consultant. Consultants in general are probably not the best loved group in the world. It is common to think of consultants wafting-in to your organization, telling you things that you already know and advising you to “change your culture”, whatever that means. Subsequently they depart, no-doubt with a fat fee, and leave you as you were before with the same problems and no progress made.
Apr. 27, 2016 07:45 PM EDT Reads: 1,378
The goal of any tech business worth its salt is to provide the best product or service to its clients in the most efficient and cost-effective way possible. This is just as true in the development of software products as it is in other product design services. Microservices, an app architecture style that leans mostly on independent, self-contained programs, are quickly becoming the new norm, so to speak. With this change comes a declining reliance on older SOAs like COBRA, a push toward more s...
Apr. 27, 2016 07:15 PM EDT Reads: 1,143
Struggling to keep up with increasing application demand? Learn how Platform as a Service (PaaS) can streamline application development processes and make resource management easy.
Apr. 27, 2016 07:15 PM EDT Reads: 1,889
New Relic, Inc. has announced a set of new features across the New Relic Software Analytics Cloud that offer IT operations teams increased visibility, and the ability to diagnose and resolve performance problems quickly. The new features further IT operations teams’ ability to leverage data and analytics, as well as drive collaboration and a common, shared understanding between teams. Software teams are under pressure to resolve performance issues quickly and improve availability, as the comple...
Apr. 27, 2016 07:00 PM EDT Reads: 2,313