Welcome!

SOA & WOA Authors: Dana Gardner, Gary Kaiser, Elizabeth White, Noel Wurst, Rachel A. Dines

Related Topics: SOA & WOA

SOA & WOA: Article

The Business Transaction Protocol: Transactions for a New Age

The Business Transaction Protocol: Transactions for a New Age

Atomic transactions are a well-known technique for guaranteeing consistency in the presence of failures. The ACID properties of atomic transactions ensure that, even in complex business applications, consistency of state is preserved.

Transactions are best viewed as "short-lived" entities operating in a closely-coupled environment, performing stable state changes to the system; they are less well suited for structuring "long-lived" application functions (e.g., running for hours, days, etc.) and running in a loosely coupled environment like the Web. Long-lived atomic transactions (as typically occur in business-to-business interactions) may reduce the concurrency in the system to an unacceptable level by holding on to resources (e.g., locks) for a long time; further, if such an atomic transaction rolls back, much valuable work already performed could be undone. As a result, there have been various extended transactions models where strict ACID properties can be relaxed in a controlled manner. Until recently, translating these models into the world of Web services had not been attempted. However, the OASIS Business Transactions Protocol, specified by a collaboration of several companies, has tried to address this issue.

Introduction
With the advent of Web services, the Web is being populated by service providers who wish to take advantage of this large B2B space. However, there are still important security and fault-tolerance considerations that must be addressed. One of these is the fact that the Web frequently suffers from failures that can affect both the performance and consistency of applications that run over it.

Atomic transactions are a well-known technique for guaranteeing consistency in the presence of failures. (Note: I will not use the term transaction in place of atomic transaction since in the B2B space this has different connotations.) The ACID properties of atomic transactions (Atomicity, Consistency, Isolation, Durability) ensure that even in complex business applications consistency of state is preserved, despite concurrent accesses and failures. This is an extremely useful fault-tolerance technique, especially when multiple, possibly remote, resources are involved.

The structuring mechanisms available within traditional atomic transaction systems are sequential and concurrent composition of transactions. These mechanisms are sufficient if an application function can be represented as a single atomic transaction. As Web services evolved as a means to integrate processes and applications at an inter-enterprise level, traditional transaction semantics and protocols have proven inappropriate. Web services-based transactions differ from traditional transactions in that they execute over long periods, they require commitments to the transaction to be "negotiated" at runtime, and isolation levels have to be relaxed.

As a result, there have been various extended transactions models, in which strict ACID properties can be relaxed in a controlled manner. Until recently, translating these models into the world of Web services had not been attempted. However, the OASIS Business Transactions Protocol (BTP), specified by a collaboration of several companies, has tried to address this issue. In this article we'll first consider why traditional atomic transactions are insufficient for long-running B2B activities, and then describe how the BTP protocol has attempted to solve these problems.

Why ACID Transactions Are Too Strong
ACID transactions by themselves are inadequate for structuring long-lived applications. To ensure ACID-ity between multiple participants, a multiphase (typically two) consensus mechanism is required (see Figure 1). During the first (preparation) phase, an individual participant must make durable any state changes that occurred during the scope of the atomic transaction, such that these changes can either be rolled back (undone) or committed later once consensus to the transaction outcome has been determined among all participants, i.e., any original state must not be lost at this point, as the atomic transaction could still roll back. Assuming no failures occurred during the first phase (in which case all participants will be forced to undo their changes), in the second (commitment) phase, participants may "overwrite" the original state with the state made durable during the first phase.

 

In order to guarantee consensus, a two-phase commit is necessarily a blocking protocol. After returning the phase 1 response, each participant that returned a commit response must remain blocked until it has received the coordinator's phase 2 message telling it what to do. Until they receive this message, any resources used by the participant are unavailable for use by other atomic transactions, since to do so may result in non-ACID behavior. If the coordinator fails before delivery of the second phase message these resources remain blocked until it recovers. In addition, if a participant fails after phase 1, but before the coordinator can deliver its final commit decision, the atomic transaction cannot be completed until the participant recovers: all participants must see both phases of the commit protocol in order to guarantee ACID semantics. There is no implied time limit between a coordinator sending the first phase message of the commit protocol and it sending the second, commit phase message; there could be seconds or hours between them.

Therefore, structuring certain activities from long-running atomic transactions can reduce the amount of concurrency within an application or (in the event of failures) require work to be performed again. For example, there are certain classes of application where it is known that resources acquired within an atomic transaction can be released "early," rather than having to wait until the atomic transaction terminates; in the event of the atomic transaction rolling back, however, certain compensation activities may be necessary to restore the system to a consistent state. Such compensation activities (which may perform forward or backward recovery) will typically be application specific, may not be necessary at all, or may be more efficiently dealt with by the application. For example, long-running activities can be structured as many independent, short-duration atomic transactions, to form a "logical" long-running transaction. This structure allows an activity to acquire and use resources for only the required duration of this long-running activity. In Figure 2 an application activity (shown by the dotted ellipse) has been split into many different, coordinated, short-duration atomic transactions. Assume that the application activity is concerned with booking a taxi (t1), reserving a table at a restaurant (t2), reserving a seat at the theater (t3), booking a room at a hotel (t4), and so on. If all of these operations were performed as a single atomic transaction, then resources acquired during t1 would not be released until the atomic transaction has terminated. If subsequent activities t2, t3, etc., do not require those resources, then they will be needlessly unavailable to other clients.

 

However, if failures and concurrent access occur during the lifetime of these individual transactional activities, then the behavior of the entire "logical long-running transaction" may not possess ACID properties. Therefore, some form of (application-specific) compensation may be required to attempt to return the state of the system to consistency. For example, let's assume that t4 aborts. Further assume that the application can continue to make forward progress, but in order to do so must now undo some state changes made prior to the start of t4 (by t1, t2, or t3). New activities are started; tc1 is a compensation activity that will attempt to undo state changes performed by, say, t2 and t3, which will continue the application once tc1 has completed. tc5' and tc6' are new activities that continue after compensation, e.g. since it was not possible to reserve the theater, restaurant, and hotel, it is decided to book tickets at the cinema. Obviously, other forms of composition are possible.

 

Properties of a Web Service-Based Transaction
The fundamental question addressed here is what properties must a transaction model possess in order to support business-to-business interactions? To begin to answer that, we need to understand what we mean by a business transaction.

A business relationship is any distributed state maintained by two or more parties and is subject to some contractual constraints previously agreed to by those parties. A business transaction can therefore be considered as a consistent change in the state of a business relationship between parties. Each party in a business transaction holds its own application state corresponding to the business relationship with other parties in that transaction. During the course of a business transaction, this state may change.

In the Web services domain, information about business transactions is communicated in XML documents. However, how those documents are exchanged by the different parties involved (e.g., e-mail or HTTP) may be a function of the environment, type of business relationship, or other business or logistical factors. Therefore, mandating a specific XML carrier protocol may be too restrictive.

Since business relationships imply a level of value to the parties associated by those relationships, achieving some level of consensus among these parties is important. Not all participants within a particular business transaction have to see the same outcome; a specific transaction may possess multiple consensus groups.

In addition to understanding the outcomes, a participant within a business transaction may need to support provisional or tentative state changes during the course of the transaction. Such parties must also support the completion of a business transaction, either through confirmation (final effect) or cancellation (counter-effect). In general, what it means to confirm or cancel work done within a business transaction will be for the participant to determine.

For example, an application may choose to perform changes as provisional effects and make them visible to other business transactions. It may store necessary information to undo these changes at the same time. On confirmation, it may simply discard these "undo", changes, or on cancellation it may apply these "undo" changes. An application can employ such a compensation-based approach or take a conventional "rollback" approach. It is with these properties in mind that we can discuss the Business Transaction Protocol.

The Business Transaction Protocol
B2B interactions may be complex, involving many parties, spanning many different organisations, and potentially lasting for hours or days, e.g., the process of ordering and delivering parts for a computer may involve different suppliers, and may only be considered to have completed once the parts are delivered to their final destination. Most business-to-business collaborative applications require transactional support in order to guarantee consistent outcome and correct execution. These applications often involve long-running computations, loosely coupled systems, and components that do not share data, location, or administration; it is then difficult to incorporate ACID transactions within such architectures. Furthermore, most collaborative business process management systems support complex, long-running processes in which undoing tasks that have already completed may be necessary in order to effect recovery or to choose another acceptable execution path.

For example, an online bookshop may well reserve books for an individual for a specific period of time, but if the individual does not purchase the books within that time period they will be "put back onto the shelf" for others to purchase; to do otherwise could result in the shop never selling a single book. Furthermore, because it is not possible for anyone to have an infinite supply of stock, some examples of online shops may appear to users to reserve items for them, but in fact if other users want to purchase them first they may be allowed to (i.e., the same book may be "reserved" for multiple users concurrently); a user may subsequently find that the item is no longer available, or may have to be ordered especially for them. If these examples were modelled using atomic transactions, then the reservation process would require the book to be locked for the duration of the atomic transaction - it would have to be available, and could not be acquired by (sold to) another user. When the atomic transaction commits, the book will be removed from stock and mailed to the user. However, if a failure occurs during the commitment protocol, the book may remain locked for an indeterminate amount of time (or until manual intervention occurs).

As a result, the use of traditional atomic transactions with strict ACID properties (e.g., systems that implement the JTS specification [SUN99]) is considered too restrictive for many types of applications.

The Organization for the Advancement of Structured Information Standards (OASIS) Business Transaction Protocol (BTP) is a transaction protocol that meets the requirement for Web-based, long-running collaborative business applications. BTP is designed to support applications that are disparate in time, location, and administration and thus require transactional support beyond classical ACID transactions. In short, BTP is a protocol for ensuring consistent outcomes from participating parties in a business transaction.

Note: It is important to realize that the term "transaction" in this sense does not mean atomic transaction, although ACID semantics can be obtained if required.

Consensus of Opinion
In general, a business transaction requires the capability for certain participants to be structured into a consensus group such that all of the members in a grouping have the same result. Different participants within the same business transaction may belong to different consensus groups. The business logic then controls how each group completes. In this way, a business transaction may cause a subset of the groups it naturally creates to perform the work it asks, while asking the other groups to undo the work.

Consider the situation shown in Figure 4, in which a user is booking a holiday, has provisionally reserved a flight ticket and taxi to the airport, and is now looking for travel insurance. The first consensus group holds Flights and Taxi, since neither of these can occur independently. The user may then decide to visit multiple insurance sites (called A and B in this example), and as he goes may reserve the quotes he likes. So, A may quote $50, which is just within budget, but the user may want to try B just in case he can find a cheaper price, without losing the initial quote. If the quote from B is less than that from A, the user may cancel A while confirming both the flights and the insurance from B. Each insurance site may therefore occur within its own consensus group. This is not possible when using ACID transactions.

 

BTP uses a two-phase completion protocol to guarantee atomicity of decisions but does not imply specific implementations. To enforce this distinction, rather than call the second phases of the termination protocol "commit" and "rollback" as is the case in an ACID transaction environment, they are called "confirm" and "cancel" respectively, with the intention of decoupling the phases from any preconceptions of specific backward-compensation implementations.

It's important to stress that although BTP uses a two-phase protocol, it does not imply ACID transactions. How implementations of prepare, confirm, and cancel are provided is a back-end implementation decision. Issues to do with consistency and isolation of data are also back-end choices and not imposed or assumed by BTP. A BTP implementation is primarily concerned with two-phase coordination of abstract entities (participants).

Open-Top Coordination
In a traditional transaction system, the application or user has very few verbs with which to control the transaction. Typically, these are "begin," "commit," and "roll back," corresponding to starting a transaction, committing a transaction, and rolling back a transaction respectively. When an application asks for a transaction to commit, the coordinator will execute the entire two-phase commit protocol, as described earlier, before returning an outcome to the application (what BTP terms a closed-top commit protocol). The elapse time between the execution of the first phase and the second phase is typically milliseconds to seconds, but is entirely under the control of the coordinator.

However, the actual two-phase protocol does not impose any restrictions on the time between executing the first and second phases. Obviously, the longer this period takes, the more chance there is for a failure to occur and the longer (critical) resources remain locked or isolated from other users. This is the reason why most ACID transaction systems attempt to keep this time frame to a minimum and why they do not work well in environments like the Web.

BTP, on the other hand, took the approach of allowing the time between these phases to be set by the application by expanding the verbs available to include explicit control over both phases of the term, i.e., "prepare," "confirm," and "cancel" - what BTP terms an open-top commit protocol. The application has complete control over when it can tell a transaction to prepare and, using whatever business logic is required, it can later determine which transaction(s) to confirm or cancel. This ability is a powerful tool for applications.

Atoms and Cohesions
To address the specific requirements of business transactions, BTP introduced two types of extended transactions, both using the open-top completion protocol:

  • Atom: An atom is the typical way in which "transactional" work performed on Web services is scoped. The outcome of an atom is guaranteed to be consistent such that all enlisted participants will see the same outcome, which will be either to accept (confirm) the work or reject (cancel) it.
  • Cohesion: This type of transaction was introduced in order to relax atomicity and allow for the selection of work to be confirmed or cancelled based on higher-level business rules. Atoms are the typical participants within a cohesion, but unlike an atom, a cohesion may give different outcomes to its participants such that some of them may confirm while the remainder cancel. In essence, the two-phase protocol for a cohesion is parameterized to allow a user to specify precisely which participants to prepare and which to cancel. The strategy underpinning cohesions is that they better model long-running business activities in which services enroll in atoms that represent specific units of work and as the business activity progresses, may encounter conditions that allow it to cancel or prepare these units, with the caveat that it may be many hours or days before the cohesion arrives at its confirm-set: the set of participants that it requires to confirm in order to successfully terminate the business activity. Once the confirm-set has been determined, the cohesion collapses down to being an atom: all members of the confirm-set see the same outcome.

    Looking Ahead
    In my next article, I'll take a closer look at the architecture of BTP and how XML is involved in it. I'll also look at the Web services stack and how BTP is used.

    References

  • BTP: www.oasis-open.org/committees/business-transactions
  • OMG (1995) "CORBAservices: Common Object Services Specification." OMG Document Number 95-3-31. March.
  • Sun Microsystems Inc. (1999) "Java Transaction API 1.0.1 (JTA)," April.
  • Sun Microsystems Inc. (2002) "XML Transactioning API for Java (JAXTX)." www.jcp.org/jsr/detail/156.jsp.
  • More Stories By Mark Little

    Mark Little was Chief Architect, Transactions for Arjuna Technologies Ltd, a UK-based company specialising in the development of reliable middleware that was recently acquired by JBoss, Inc. Before Arjuna, Mark was a Distinguished Engineer/Architect within HP Arjuna Labs in Newcastle upon Tyne, England, where he led the HP-TS and HP-WST teams, developing J2EE and Web services transactions products respectively. He is one of the primary authors of the OMG Activity Service specification and is on the expert group for the same work in J2EE (JSR 95). He is also the specification lead for JSR 156: Java API for XML Transactions. He's on the OTS Revision Task Force and the OASIS Business Transactions Protocol specification. Before joining HP he was for over 10 years a member of the Arjuna team within the University of Newcastle upon Tyne (where he continues to have a Visiting Fellowship). His research within the Arjuna team included replication and transactions support, which include the construction of an OTS/JTS compliant transaction processing system. Mark has published extensively in the Web Services Journal, Java Developer's Journal and other journals and magazines. He is also the co-author of several books including “Java and Transactions for Systems Professionals” and “The J2EE 1.4 Bible.”

    Comments (0)

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.