|By Mark Little||
|September 23, 2002 12:00 AM EDT||
Atomic transactions are a well-known technique for guaranteeing consistency in the presence of failures. The ACID properties of atomic transactions ensure that, even in complex business applications, consistency of state is preserved.
Transactions are best viewed as "short-lived" entities operating in a closely-coupled environment, performing stable state changes to the system; they are less well suited for structuring "long-lived" application functions (e.g., running for hours, days, etc.) and running in a loosely coupled environment like the Web. Long-lived atomic transactions (as typically occur in business-to-business interactions) may reduce the concurrency in the system to an unacceptable level by holding on to resources (e.g., locks) for a long time; further, if such an atomic transaction rolls back, much valuable work already performed could be undone. As a result, there have been various extended transactions models where strict ACID properties can be relaxed in a controlled manner. Until recently, translating these models into the world of Web services had not been attempted. However, the OASIS Business Transactions Protocol, specified by a collaboration of several companies, has tried to address this issue.
With the advent of Web services, the Web is being populated by service providers who wish to take advantage of this large B2B space. However, there are still important security and fault-tolerance considerations that must be addressed. One of these is the fact that the Web frequently suffers from failures that can affect both the performance and consistency of applications that run over it.
Atomic transactions are a well-known technique for guaranteeing consistency in the presence of failures. (Note: I will not use the term transaction in place of atomic transaction since in the B2B space this has different connotations.) The ACID properties of atomic transactions (Atomicity, Consistency, Isolation, Durability) ensure that even in complex business applications consistency of state is preserved, despite concurrent accesses and failures. This is an extremely useful fault-tolerance technique, especially when multiple, possibly remote, resources are involved.
The structuring mechanisms available within traditional atomic transaction systems are sequential and concurrent composition of transactions. These mechanisms are sufficient if an application function can be represented as a single atomic transaction. As Web services evolved as a means to integrate processes and applications at an inter-enterprise level, traditional transaction semantics and protocols have proven inappropriate. Web services-based transactions differ from traditional transactions in that they execute over long periods, they require commitments to the transaction to be "negotiated" at runtime, and isolation levels have to be relaxed.
As a result, there have been various extended transactions models, in which strict ACID properties can be relaxed in a controlled manner. Until recently, translating these models into the world of Web services had not been attempted. However, the OASIS Business Transactions Protocol (BTP), specified by a collaboration of several companies, has tried to address this issue. In this article we'll first consider why traditional atomic transactions are insufficient for long-running B2B activities, and then describe how the BTP protocol has attempted to solve these problems.
Why ACID Transactions Are Too Strong
ACID transactions by themselves are inadequate for structuring long-lived applications. To ensure ACID-ity between multiple participants, a multiphase (typically two) consensus mechanism is required (see Figure 1). During the first (preparation) phase, an individual participant must make durable any state changes that occurred during the scope of the atomic transaction, such that these changes can either be rolled back (undone) or committed later once consensus to the transaction outcome has been determined among all participants, i.e., any original state must not be lost at this point, as the atomic transaction could still roll back. Assuming no failures occurred during the first phase (in which case all participants will be forced to undo their changes), in the second (commitment) phase, participants may "overwrite" the original state with the state made durable during the first phase.
In order to guarantee consensus, a two-phase commit is necessarily a blocking protocol. After returning the phase 1 response, each participant that returned a commit response must remain blocked until it has received the coordinator's phase 2 message telling it what to do. Until they receive this message, any resources used by the participant are unavailable for use by other atomic transactions, since to do so may result in non-ACID behavior. If the coordinator fails before delivery of the second phase message these resources remain blocked until it recovers. In addition, if a participant fails after phase 1, but before the coordinator can deliver its final commit decision, the atomic transaction cannot be completed until the participant recovers: all participants must see both phases of the commit protocol in order to guarantee ACID semantics. There is no implied time limit between a coordinator sending the first phase message of the commit protocol and it sending the second, commit phase message; there could be seconds or hours between them.
Therefore, structuring certain activities from long-running atomic transactions can reduce the amount of concurrency within an application or (in the event of failures) require work to be performed again. For example, there are certain classes of application where it is known that resources acquired within an atomic transaction can be released "early," rather than having to wait until the atomic transaction terminates; in the event of the atomic transaction rolling back, however, certain compensation activities may be necessary to restore the system to a consistent state. Such compensation activities (which may perform forward or backward recovery) will typically be application specific, may not be necessary at all, or may be more efficiently dealt with by the application. For example, long-running activities can be structured as many independent, short-duration atomic transactions, to form a "logical" long-running transaction. This structure allows an activity to acquire and use resources for only the required duration of this long-running activity. In Figure 2 an application activity (shown by the dotted ellipse) has been split into many different, coordinated, short-duration atomic transactions. Assume that the application activity is concerned with booking a taxi (t1), reserving a table at a restaurant (t2), reserving a seat at the theater (t3), booking a room at a hotel (t4), and so on. If all of these operations were performed as a single atomic transaction, then resources acquired during t1 would not be released until the atomic transaction has terminated. If subsequent activities t2, t3, etc., do not require those resources, then they will be needlessly unavailable to other clients.
However, if failures and concurrent access occur during the lifetime of these individual transactional activities, then the behavior of the entire "logical long-running transaction" may not possess ACID properties. Therefore, some form of (application-specific) compensation may be required to attempt to return the state of the system to consistency. For example, let's assume that t4 aborts. Further assume that the application can continue to make forward progress, but in order to do so must now undo some state changes made prior to the start of t4 (by t1, t2, or t3). New activities are started; tc1 is a compensation activity that will attempt to undo state changes performed by, say, t2 and t3, which will continue the application once tc1 has completed. tc5' and tc6' are new activities that continue after compensation, e.g. since it was not possible to reserve the theater, restaurant, and hotel, it is decided to book tickets at the cinema. Obviously, other forms of composition are possible.
Properties of a Web Service-Based Transaction
The fundamental question addressed here is what properties must a transaction model possess in order to support business-to-business interactions? To begin to answer that, we need to understand what we mean by a business transaction.
A business relationship is any distributed state maintained by two or more parties and is subject to some contractual constraints previously agreed to by those parties. A business transaction can therefore be considered as a consistent change in the state of a business relationship between parties. Each party in a business transaction holds its own application state corresponding to the business relationship with other parties in that transaction. During the course of a business transaction, this state may change.
In the Web services domain, information about business transactions is communicated in XML documents. However, how those documents are exchanged by the different parties involved (e.g., e-mail or HTTP) may be a function of the environment, type of business relationship, or other business or logistical factors. Therefore, mandating a specific XML carrier protocol may be too restrictive.
Since business relationships imply a level of value to the parties associated by those relationships, achieving some level of consensus among these parties is important. Not all participants within a particular business transaction have to see the same outcome; a specific transaction may possess multiple consensus groups.
In addition to understanding the outcomes, a participant within a business transaction may need to support provisional or tentative state changes during the course of the transaction. Such parties must also support the completion of a business transaction, either through confirmation (final effect) or cancellation (counter-effect). In general, what it means to confirm or cancel work done within a business transaction will be for the participant to determine.
For example, an application may choose to perform changes as provisional effects and make them visible to other business transactions. It may store necessary information to undo these changes at the same time. On confirmation, it may simply discard these "undo", changes, or on cancellation it may apply these "undo" changes. An application can employ such a compensation-based approach or take a conventional "rollback" approach. It is with these properties in mind that we can discuss the Business Transaction Protocol.
The Business Transaction Protocol
B2B interactions may be complex, involving many parties, spanning many different organisations, and potentially lasting for hours or days, e.g., the process of ordering and delivering parts for a computer may involve different suppliers, and may only be considered to have completed once the parts are delivered to their final destination. Most business-to-business collaborative applications require transactional support in order to guarantee consistent outcome and correct execution. These applications often involve long-running computations, loosely coupled systems, and components that do not share data, location, or administration; it is then difficult to incorporate ACID transactions within such architectures. Furthermore, most collaborative business process management systems support complex, long-running processes in which undoing tasks that have already completed may be necessary in order to effect recovery or to choose another acceptable execution path.
For example, an online bookshop may well reserve books for an individual for a specific period of time, but if the individual does not purchase the books within that time period they will be "put back onto the shelf" for others to purchase; to do otherwise could result in the shop never selling a single book. Furthermore, because it is not possible for anyone to have an infinite supply of stock, some examples of online shops may appear to users to reserve items for them, but in fact if other users want to purchase them first they may be allowed to (i.e., the same book may be "reserved" for multiple users concurrently); a user may subsequently find that the item is no longer available, or may have to be ordered especially for them. If these examples were modelled using atomic transactions, then the reservation process would require the book to be locked for the duration of the atomic transaction - it would have to be available, and could not be acquired by (sold to) another user. When the atomic transaction commits, the book will be removed from stock and mailed to the user. However, if a failure occurs during the commitment protocol, the book may remain locked for an indeterminate amount of time (or until manual intervention occurs).
As a result, the use of traditional atomic transactions with strict ACID properties (e.g., systems that implement the JTS specification [SUN99]) is considered too restrictive for many types of applications.
The Organization for the Advancement of Structured Information Standards (OASIS) Business Transaction Protocol (BTP) is a transaction protocol that meets the requirement for Web-based, long-running collaborative business applications. BTP is designed to support applications that are disparate in time, location, and administration and thus require transactional support beyond classical ACID transactions. In short, BTP is a protocol for ensuring consistent outcomes from participating parties in a business transaction.
Note: It is important to realize that the term "transaction" in this sense does not mean atomic transaction, although ACID semantics can be obtained if required.
Consensus of Opinion
In general, a business transaction requires the capability for certain participants to be structured into a consensus group such that all of the members in a grouping have the same result. Different participants within the same business transaction may belong to different consensus groups. The business logic then controls how each group completes. In this way, a business transaction may cause a subset of the groups it naturally creates to perform the work it asks, while asking the other groups to undo the work.
Consider the situation shown in Figure 4, in which a user is booking a holiday, has provisionally reserved a flight ticket and taxi to the airport, and is now looking for travel insurance. The first consensus group holds Flights and Taxi, since neither of these can occur independently. The user may then decide to visit multiple insurance sites (called A and B in this example), and as he goes may reserve the quotes he likes. So, A may quote $50, which is just within budget, but the user may want to try B just in case he can find a cheaper price, without losing the initial quote. If the quote from B is less than that from A, the user may cancel A while confirming both the flights and the insurance from B. Each insurance site may therefore occur within its own consensus group. This is not possible when using ACID transactions.
BTP uses a two-phase completion protocol to guarantee atomicity of decisions but does not imply specific implementations. To enforce this distinction, rather than call the second phases of the termination protocol "commit" and "rollback" as is the case in an ACID transaction environment, they are called "confirm" and "cancel" respectively, with the intention of decoupling the phases from any preconceptions of specific backward-compensation implementations.
It's important to stress that although BTP uses a two-phase protocol, it does not imply ACID transactions. How implementations of prepare, confirm, and cancel are provided is a back-end implementation decision. Issues to do with consistency and isolation of data are also back-end choices and not imposed or assumed by BTP. A BTP implementation is primarily concerned with two-phase coordination of abstract entities (participants).
In a traditional transaction system, the application or user has very few verbs with which to control the transaction. Typically, these are "begin," "commit," and "roll back," corresponding to starting a transaction, committing a transaction, and rolling back a transaction respectively. When an application asks for a transaction to commit, the coordinator will execute the entire two-phase commit protocol, as described earlier, before returning an outcome to the application (what BTP terms a closed-top commit protocol). The elapse time between the execution of the first phase and the second phase is typically milliseconds to seconds, but is entirely under the control of the coordinator.
However, the actual two-phase protocol does not impose any restrictions on the time between executing the first and second phases. Obviously, the longer this period takes, the more chance there is for a failure to occur and the longer (critical) resources remain locked or isolated from other users. This is the reason why most ACID transaction systems attempt to keep this time frame to a minimum and why they do not work well in environments like the Web.
BTP, on the other hand, took the approach of allowing the time between these phases to be set by the application by expanding the verbs available to include explicit control over both phases of the term, i.e., "prepare," "confirm," and "cancel" - what BTP terms an open-top commit protocol. The application has complete control over when it can tell a transaction to prepare and, using whatever business logic is required, it can later determine which transaction(s) to confirm or cancel. This ability is a powerful tool for applications.
Atoms and Cohesions
To address the specific requirements of business transactions, BTP introduced two types of extended transactions, both using the open-top completion protocol:
In my next article, I'll take a closer look at the architecture of BTP and how XML is involved in it. I'll also look at the Web services stack and how BTP is used.
The battle over bimodal IT is heating up. Now that there’s a reasonably broad consensus that Gartner’s advice about bimodal IT is deeply flawed – consensus everywhere except perhaps at Gartner – various ideas are springing up to fill the void. The bimodal problem, of course, is well understood. ‘Traditional’ or ‘slow’ IT uses hidebound, laborious processes that would only get in the way of ‘fast’ or ‘agile’ digital efforts. The result: incoherent IT strategies and shadow IT struggles that lead ...
Feb. 8, 2016 06:00 PM EST Reads: 443
SYS-CON Events announced today that VAI, a leading ERP software provider, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. VAI (Vormittag Associates, Inc.) is a leading independent mid-market ERP software developer renowned for its flexible solutions and ability to automate critical business functions for the distribution, manufacturing, specialty retail and service sectors. An IBM Premier Business Part...
Feb. 8, 2016 03:00 PM EST Reads: 573
SYS-CON Events announced today that Catchpoint Systems, Inc., a provider of innovative web and infrastructure monitoring solutions, has been named “Silver Sponsor” of SYS-CON's DevOps Summit at 18th Cloud Expo New York, which will take place June 7-9, 2016, at the Javits Center in New York City, NY. Catchpoint is a leading Digital Performance Analytics company that provides unparalleled insight into customer-critical services to help consistently deliver an amazing customer experience. Designed...
Feb. 8, 2016 02:00 PM EST Reads: 341
SYS-CON Events announced today that Alert Logic, Inc., the leading provider of Security-as-a-Service solutions for the cloud, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. Alert Logic, Inc., provides Security-as-a-Service for on-premises, cloud, and hybrid infrastructures, delivering deep security insight and continuous protection for customers at a lower cost than traditional security solutions. Ful...
Feb. 8, 2016 02:00 PM EST Reads: 377
In most cases, it is convenient to have some human interaction with a web (micro-)service, no matter how small it is. A traditional approach would be to create an HTTP interface, where user requests will be dispatched and HTML/CSS pages must be served. This approach is indeed very traditional for a web site, but not really convenient for a web service, which is not intended to be good looking, 24x7 up and running and UX-optimized. Instead, talking to a web service in a chat-bot mode would be muc...
Feb. 8, 2016 02:00 PM EST Reads: 207
[session] From Build to Scale: Lifecycle of Microservices By @fortyfivan | @CloudExpo #Microservices
More and more companies are looking to microservices as an architectural pattern for breaking apart applications into more manageable pieces so that agile teams can deliver new features quicker and more effectively. What this pattern has done more than anything to date is spark organizational transformations, setting the foundation for future application development. In practice, however, there are a number of considerations to make that go beyond simply “build, ship, and run,” which changes ho...
Feb. 8, 2016 01:30 PM EST Reads: 173
SYS-CON Events announced today that Interoute, owner-operator of one of Europe's largest networks and a global cloud services platform, has been named “Bronze Sponsor” of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2015 at the Javits Center in New York, New York. Interoute is the owner-operator of one of Europe's largest networks and a global cloud services platform which encompasses 12 data centers, 14 virtual data centers and 31 colocation centers, with connections to 195 ad...
Feb. 8, 2016 12:45 PM EST Reads: 356
SYS-CON Events announced today that Commvault, a global leader in enterprise data protection and information management, has been named “Bronze Sponsor” of SYS-CON's 18th International Cloud Expo, which will take place on June 7–9, 2016, at the Javits Center in New York City, NY, and the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Commvault is a leading provider of data protection and information management...
Feb. 8, 2016 10:45 AM EST Reads: 383
SYS-CON Events announced today that AppNeta, the leader in performance insight for business-critical web applications, will exhibit and present at SYS-CON's @DevOpsSummit at Cloud Expo New York, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. AppNeta is the only application performance monitoring (APM) company to provide solutions for all applications – applications you develop internally, business-critical SaaS applications you use and the networks that deli...
Feb. 8, 2016 10:15 AM EST Reads: 365
The (re?)emergence of Microservices was especially prominent in this week’s news. What are they good for? do they make sense for your application? should you take the plunge? and what do Microservices mean for your DevOps and Continuous Delivery efforts? Continue reading for more on Microservices, containers, DevOps culture, and more top news from the past week. As always, stay tuned to all the news coming from@ElectricCloud on DevOps and Continuous Delivery throughout the week and retweet/favo...
Feb. 8, 2016 09:45 AM EST Reads: 181
If we look at slow, traditional IT and jump to the conclusion that just because we found its issues intractable before, that necessarily means we will again, then it’s time for a rethink. As a matter of fact, the world of IT has changed over the last ten years or so. We’ve been experiencing unprecedented innovation across the board – innovation in technology as well as in how people organize and accomplish tasks. Let’s take a look at three differences between today’s modern, digital context...
Feb. 8, 2016 08:15 AM EST Reads: 179
Continuous Delivery and Release Automation for Microservices By @Anders_Wallgren | @DevOpsSummit #Microservices
As software organizations continue to invest in achieving Continuous Delivery (CD) of their applications, we see increased interest in microservices architectures, which–on the face of it–seem like a natural fit for enabling CD. In microservices (or its predecessor, “SOA”), the business functionality is decomposed into a set of independent, self-contained services that communicate with each other via an API. Each of the services has their own application release cycle, and are developed and depl...
Feb. 6, 2016 02:00 PM EST Reads: 214
At the heart of the Cloud Native model is a microservices application architecture, and applying this to a telco SDN scenario offers enormous opportunity for product innovation and competitive advantage. For example in the ETSI NFV Ecosystem white paper they describe one of the product markets that SDN might address to be the Home sector. Vendors like Alcatel market SDN-based solutions for the home market, offering Home Gateways – A virtual residential gateway (vRGW) where service provider...
Feb. 6, 2016 01:00 PM EST Reads: 150
In the Bimodal model we find two areas of IT - the traditional kind where the main concern is keeping the lights on and the IT focusing on agility and speed, where everything needs to be faster. Today companies are investing in new technologies and processes to emulate their most agile competitors. Gone are the days of waterfall development and releases only every few months. Today's IT and the business it powers demands performance akin to a supercar - everything needs to be faster, every sc...
Feb. 6, 2016 09:00 AM EST Reads: 516
With microservices, SOA and distributed architectures becoming more popular, it is becoming increasingly harder to keep track of where time is spent in a distributed application when trying to diagnose performance problems. Distributed tracing systems attempt to address this problem by following application requests across service boundaries, persisting metadata along the way that provide context for fine-grained performance monitoring.
Feb. 5, 2016 03:45 PM EST Reads: 808
Web performance issues and advances have been gaining a stronger presence in the headlines as people are becoming more aware of its impact on virtually every business, and 2015 was no exception. We saw a myriad of major outages this year hit some of the biggest corporations, as well as some technology integrations and other news that we IT Ops aficionados find very exciting. This past year has offered several opportunities for growth and evolution in the performance realm — even the worst failu...
Feb. 3, 2016 10:00 PM EST Reads: 550
Are you someone who knows that the number one rule in DevOps is “Don’t Panic”? Especially when it comes to making Continuous Delivery changes inside your organization? Are you someone that theorizes that if anyone implements real automation changes, the solution will instantly become antiquated and be replaced by something even more bizarre and inexplicable?
Feb. 3, 2016 06:30 PM EST Reads: 317
Welcome to the first top DevOps news roundup of 2016! At the end of last year, we saw some great predictions for 2016. While we’re excited to kick off the new year, this week’s top posts reminded us to take a second to slow down and really understand the current state of affairs. For example, do you actually know what microservices are – or aren’t? What about DevOps? Does the emphasis still fall mostly on the development side? This week’s top news definitely got the wheels turning and just migh...
Feb. 3, 2016 03:00 PM EST Reads: 289
Test automation is arguably the most important innovation to the process of QA testing in software development. The ability to automate regression testing and other repetitive test cases can significantly reduce the overall production time for even the most complex solutions. As software continues to be developed for new platforms – including mobile devices and the diverse array of endpoints that will be created during the rise of the Internet of Things - automation integration will have a huge ...
Feb. 3, 2016 02:00 PM EST Reads: 642
Providing a full-duplex communication channel over a single TCP connection, WebSocket is the most efficient protocol for real-time responses over the web. If you’re utilizing WebSocket technology, performance testing will boil down to simulating the bi-directional nature of your application. Introduced with HTML5, the WebSocket protocol allows for more interaction between a browser and website, facilitating real-time applications and live content. WebSocket technology creates a persistent conne...
Feb. 3, 2016 07:00 AM EST Reads: 315