Click here to close now.


Microservices Expo Authors: Jason Bloomberg, Victoria Livschitz, Lori MacVittie, Elizabeth White, Ian Goldsmith

Related Topics: Microservices Expo

Microservices Expo: Article

The Business Transaction Protocol: Transactions for a New Age

The Business Transaction Protocol: Transactions for a New Age

Atomic transactions are a well-known technique for guaranteeing consistency in the presence of failures. The ACID properties of atomic transactions ensure that, even in complex business applications, consistency of state is preserved.

Transactions are best viewed as "short-lived" entities operating in a closely-coupled environment, performing stable state changes to the system; they are less well suited for structuring "long-lived" application functions (e.g., running for hours, days, etc.) and running in a loosely coupled environment like the Web. Long-lived atomic transactions (as typically occur in business-to-business interactions) may reduce the concurrency in the system to an unacceptable level by holding on to resources (e.g., locks) for a long time; further, if such an atomic transaction rolls back, much valuable work already performed could be undone. As a result, there have been various extended transactions models where strict ACID properties can be relaxed in a controlled manner. Until recently, translating these models into the world of Web services had not been attempted. However, the OASIS Business Transactions Protocol, specified by a collaboration of several companies, has tried to address this issue.

With the advent of Web services, the Web is being populated by service providers who wish to take advantage of this large B2B space. However, there are still important security and fault-tolerance considerations that must be addressed. One of these is the fact that the Web frequently suffers from failures that can affect both the performance and consistency of applications that run over it.

Atomic transactions are a well-known technique for guaranteeing consistency in the presence of failures. (Note: I will not use the term transaction in place of atomic transaction since in the B2B space this has different connotations.) The ACID properties of atomic transactions (Atomicity, Consistency, Isolation, Durability) ensure that even in complex business applications consistency of state is preserved, despite concurrent accesses and failures. This is an extremely useful fault-tolerance technique, especially when multiple, possibly remote, resources are involved.

The structuring mechanisms available within traditional atomic transaction systems are sequential and concurrent composition of transactions. These mechanisms are sufficient if an application function can be represented as a single atomic transaction. As Web services evolved as a means to integrate processes and applications at an inter-enterprise level, traditional transaction semantics and protocols have proven inappropriate. Web services-based transactions differ from traditional transactions in that they execute over long periods, they require commitments to the transaction to be "negotiated" at runtime, and isolation levels have to be relaxed.

As a result, there have been various extended transactions models, in which strict ACID properties can be relaxed in a controlled manner. Until recently, translating these models into the world of Web services had not been attempted. However, the OASIS Business Transactions Protocol (BTP), specified by a collaboration of several companies, has tried to address this issue. In this article we'll first consider why traditional atomic transactions are insufficient for long-running B2B activities, and then describe how the BTP protocol has attempted to solve these problems.

Why ACID Transactions Are Too Strong
ACID transactions by themselves are inadequate for structuring long-lived applications. To ensure ACID-ity between multiple participants, a multiphase (typically two) consensus mechanism is required (see Figure 1). During the first (preparation) phase, an individual participant must make durable any state changes that occurred during the scope of the atomic transaction, such that these changes can either be rolled back (undone) or committed later once consensus to the transaction outcome has been determined among all participants, i.e., any original state must not be lost at this point, as the atomic transaction could still roll back. Assuming no failures occurred during the first phase (in which case all participants will be forced to undo their changes), in the second (commitment) phase, participants may "overwrite" the original state with the state made durable during the first phase.


In order to guarantee consensus, a two-phase commit is necessarily a blocking protocol. After returning the phase 1 response, each participant that returned a commit response must remain blocked until it has received the coordinator's phase 2 message telling it what to do. Until they receive this message, any resources used by the participant are unavailable for use by other atomic transactions, since to do so may result in non-ACID behavior. If the coordinator fails before delivery of the second phase message these resources remain blocked until it recovers. In addition, if a participant fails after phase 1, but before the coordinator can deliver its final commit decision, the atomic transaction cannot be completed until the participant recovers: all participants must see both phases of the commit protocol in order to guarantee ACID semantics. There is no implied time limit between a coordinator sending the first phase message of the commit protocol and it sending the second, commit phase message; there could be seconds or hours between them.

Therefore, structuring certain activities from long-running atomic transactions can reduce the amount of concurrency within an application or (in the event of failures) require work to be performed again. For example, there are certain classes of application where it is known that resources acquired within an atomic transaction can be released "early," rather than having to wait until the atomic transaction terminates; in the event of the atomic transaction rolling back, however, certain compensation activities may be necessary to restore the system to a consistent state. Such compensation activities (which may perform forward or backward recovery) will typically be application specific, may not be necessary at all, or may be more efficiently dealt with by the application. For example, long-running activities can be structured as many independent, short-duration atomic transactions, to form a "logical" long-running transaction. This structure allows an activity to acquire and use resources for only the required duration of this long-running activity. In Figure 2 an application activity (shown by the dotted ellipse) has been split into many different, coordinated, short-duration atomic transactions. Assume that the application activity is concerned with booking a taxi (t1), reserving a table at a restaurant (t2), reserving a seat at the theater (t3), booking a room at a hotel (t4), and so on. If all of these operations were performed as a single atomic transaction, then resources acquired during t1 would not be released until the atomic transaction has terminated. If subsequent activities t2, t3, etc., do not require those resources, then they will be needlessly unavailable to other clients.


However, if failures and concurrent access occur during the lifetime of these individual transactional activities, then the behavior of the entire "logical long-running transaction" may not possess ACID properties. Therefore, some form of (application-specific) compensation may be required to attempt to return the state of the system to consistency. For example, let's assume that t4 aborts. Further assume that the application can continue to make forward progress, but in order to do so must now undo some state changes made prior to the start of t4 (by t1, t2, or t3). New activities are started; tc1 is a compensation activity that will attempt to undo state changes performed by, say, t2 and t3, which will continue the application once tc1 has completed. tc5' and tc6' are new activities that continue after compensation, e.g. since it was not possible to reserve the theater, restaurant, and hotel, it is decided to book tickets at the cinema. Obviously, other forms of composition are possible.


Properties of a Web Service-Based Transaction
The fundamental question addressed here is what properties must a transaction model possess in order to support business-to-business interactions? To begin to answer that, we need to understand what we mean by a business transaction.

A business relationship is any distributed state maintained by two or more parties and is subject to some contractual constraints previously agreed to by those parties. A business transaction can therefore be considered as a consistent change in the state of a business relationship between parties. Each party in a business transaction holds its own application state corresponding to the business relationship with other parties in that transaction. During the course of a business transaction, this state may change.

In the Web services domain, information about business transactions is communicated in XML documents. However, how those documents are exchanged by the different parties involved (e.g., e-mail or HTTP) may be a function of the environment, type of business relationship, or other business or logistical factors. Therefore, mandating a specific XML carrier protocol may be too restrictive.

Since business relationships imply a level of value to the parties associated by those relationships, achieving some level of consensus among these parties is important. Not all participants within a particular business transaction have to see the same outcome; a specific transaction may possess multiple consensus groups.

In addition to understanding the outcomes, a participant within a business transaction may need to support provisional or tentative state changes during the course of the transaction. Such parties must also support the completion of a business transaction, either through confirmation (final effect) or cancellation (counter-effect). In general, what it means to confirm or cancel work done within a business transaction will be for the participant to determine.

For example, an application may choose to perform changes as provisional effects and make them visible to other business transactions. It may store necessary information to undo these changes at the same time. On confirmation, it may simply discard these "undo", changes, or on cancellation it may apply these "undo" changes. An application can employ such a compensation-based approach or take a conventional "rollback" approach. It is with these properties in mind that we can discuss the Business Transaction Protocol.

The Business Transaction Protocol
B2B interactions may be complex, involving many parties, spanning many different organisations, and potentially lasting for hours or days, e.g., the process of ordering and delivering parts for a computer may involve different suppliers, and may only be considered to have completed once the parts are delivered to their final destination. Most business-to-business collaborative applications require transactional support in order to guarantee consistent outcome and correct execution. These applications often involve long-running computations, loosely coupled systems, and components that do not share data, location, or administration; it is then difficult to incorporate ACID transactions within such architectures. Furthermore, most collaborative business process management systems support complex, long-running processes in which undoing tasks that have already completed may be necessary in order to effect recovery or to choose another acceptable execution path.

For example, an online bookshop may well reserve books for an individual for a specific period of time, but if the individual does not purchase the books within that time period they will be "put back onto the shelf" for others to purchase; to do otherwise could result in the shop never selling a single book. Furthermore, because it is not possible for anyone to have an infinite supply of stock, some examples of online shops may appear to users to reserve items for them, but in fact if other users want to purchase them first they may be allowed to (i.e., the same book may be "reserved" for multiple users concurrently); a user may subsequently find that the item is no longer available, or may have to be ordered especially for them. If these examples were modelled using atomic transactions, then the reservation process would require the book to be locked for the duration of the atomic transaction - it would have to be available, and could not be acquired by (sold to) another user. When the atomic transaction commits, the book will be removed from stock and mailed to the user. However, if a failure occurs during the commitment protocol, the book may remain locked for an indeterminate amount of time (or until manual intervention occurs).

As a result, the use of traditional atomic transactions with strict ACID properties (e.g., systems that implement the JTS specification [SUN99]) is considered too restrictive for many types of applications.

The Organization for the Advancement of Structured Information Standards (OASIS) Business Transaction Protocol (BTP) is a transaction protocol that meets the requirement for Web-based, long-running collaborative business applications. BTP is designed to support applications that are disparate in time, location, and administration and thus require transactional support beyond classical ACID transactions. In short, BTP is a protocol for ensuring consistent outcomes from participating parties in a business transaction.

Note: It is important to realize that the term "transaction" in this sense does not mean atomic transaction, although ACID semantics can be obtained if required.

Consensus of Opinion
In general, a business transaction requires the capability for certain participants to be structured into a consensus group such that all of the members in a grouping have the same result. Different participants within the same business transaction may belong to different consensus groups. The business logic then controls how each group completes. In this way, a business transaction may cause a subset of the groups it naturally creates to perform the work it asks, while asking the other groups to undo the work.

Consider the situation shown in Figure 4, in which a user is booking a holiday, has provisionally reserved a flight ticket and taxi to the airport, and is now looking for travel insurance. The first consensus group holds Flights and Taxi, since neither of these can occur independently. The user may then decide to visit multiple insurance sites (called A and B in this example), and as he goes may reserve the quotes he likes. So, A may quote $50, which is just within budget, but the user may want to try B just in case he can find a cheaper price, without losing the initial quote. If the quote from B is less than that from A, the user may cancel A while confirming both the flights and the insurance from B. Each insurance site may therefore occur within its own consensus group. This is not possible when using ACID transactions.


BTP uses a two-phase completion protocol to guarantee atomicity of decisions but does not imply specific implementations. To enforce this distinction, rather than call the second phases of the termination protocol "commit" and "rollback" as is the case in an ACID transaction environment, they are called "confirm" and "cancel" respectively, with the intention of decoupling the phases from any preconceptions of specific backward-compensation implementations.

It's important to stress that although BTP uses a two-phase protocol, it does not imply ACID transactions. How implementations of prepare, confirm, and cancel are provided is a back-end implementation decision. Issues to do with consistency and isolation of data are also back-end choices and not imposed or assumed by BTP. A BTP implementation is primarily concerned with two-phase coordination of abstract entities (participants).

Open-Top Coordination
In a traditional transaction system, the application or user has very few verbs with which to control the transaction. Typically, these are "begin," "commit," and "roll back," corresponding to starting a transaction, committing a transaction, and rolling back a transaction respectively. When an application asks for a transaction to commit, the coordinator will execute the entire two-phase commit protocol, as described earlier, before returning an outcome to the application (what BTP terms a closed-top commit protocol). The elapse time between the execution of the first phase and the second phase is typically milliseconds to seconds, but is entirely under the control of the coordinator.

However, the actual two-phase protocol does not impose any restrictions on the time between executing the first and second phases. Obviously, the longer this period takes, the more chance there is for a failure to occur and the longer (critical) resources remain locked or isolated from other users. This is the reason why most ACID transaction systems attempt to keep this time frame to a minimum and why they do not work well in environments like the Web.

BTP, on the other hand, took the approach of allowing the time between these phases to be set by the application by expanding the verbs available to include explicit control over both phases of the term, i.e., "prepare," "confirm," and "cancel" - what BTP terms an open-top commit protocol. The application has complete control over when it can tell a transaction to prepare and, using whatever business logic is required, it can later determine which transaction(s) to confirm or cancel. This ability is a powerful tool for applications.

Atoms and Cohesions
To address the specific requirements of business transactions, BTP introduced two types of extended transactions, both using the open-top completion protocol:

  • Atom: An atom is the typical way in which "transactional" work performed on Web services is scoped. The outcome of an atom is guaranteed to be consistent such that all enlisted participants will see the same outcome, which will be either to accept (confirm) the work or reject (cancel) it.
  • Cohesion: This type of transaction was introduced in order to relax atomicity and allow for the selection of work to be confirmed or cancelled based on higher-level business rules. Atoms are the typical participants within a cohesion, but unlike an atom, a cohesion may give different outcomes to its participants such that some of them may confirm while the remainder cancel. In essence, the two-phase protocol for a cohesion is parameterized to allow a user to specify precisely which participants to prepare and which to cancel. The strategy underpinning cohesions is that they better model long-running business activities in which services enroll in atoms that represent specific units of work and as the business activity progresses, may encounter conditions that allow it to cancel or prepare these units, with the caveat that it may be many hours or days before the cohesion arrives at its confirm-set: the set of participants that it requires to confirm in order to successfully terminate the business activity. Once the confirm-set has been determined, the cohesion collapses down to being an atom: all members of the confirm-set see the same outcome.

    Looking Ahead
    In my next article, I'll take a closer look at the architecture of BTP and how XML is involved in it. I'll also look at the Web services stack and how BTP is used.


  • BTP:
  • OMG (1995) "CORBAservices: Common Object Services Specification." OMG Document Number 95-3-31. March.
  • Sun Microsystems Inc. (1999) "Java Transaction API 1.0.1 (JTA)," April.
  • Sun Microsystems Inc. (2002) "XML Transactioning API for Java (JAXTX)."
  • More Stories By Mark Little

    Mark Little was Chief Architect, Transactions for Arjuna Technologies Ltd, a UK-based company specialising in the development of reliable middleware that was recently acquired by JBoss, Inc. Before Arjuna, Mark was a Distinguished Engineer/Architect within HP Arjuna Labs in Newcastle upon Tyne, England, where he led the HP-TS and HP-WST teams, developing J2EE and Web services transactions products respectively. He is one of the primary authors of the OMG Activity Service specification and is on the expert group for the same work in J2EE (JSR 95). He is also the specification lead for JSR 156: Java API for XML Transactions. He's on the OTS Revision Task Force and the OASIS Business Transactions Protocol specification. Before joining HP he was for over 10 years a member of the Arjuna team within the University of Newcastle upon Tyne (where he continues to have a Visiting Fellowship). His research within the Arjuna team included replication and transactions support, which include the construction of an OTS/JTS compliant transaction processing system. Mark has published extensively in the Web Services Journal, Java Developer's Journal and other journals and magazines. He is also the co-author of several books including “Java and Transactions for Systems Professionals” and “The J2EE 1.4 Bible.”

    Comments (0)

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

    @MicroservicesExpo Stories
    DevOps Summit at Cloud Expo 2014 Silicon Valley was a terrific event for us. The Qubell booth was crowded on all three days. We ran demos every 30 minutes with folks lining up to get a seat and usually standing around. It was great to meet and talk to over 500 people! My keynote was well received and so was Stan's joint presentation with RingCentral on Devops for BigData. I also participated in two Power Panels – ‘Women in Technology’ and ‘Why DevOps Is Even More Important than You Think,’ both ...
    Somebody call the buzzword police: we have a serious case of microservices-washing in progress. The term “microservices-washing” is derived from “whitewashing,” meaning to hide some inconvenient truth with bluster and nonsense. We saw plenty of cloudwashing a few years ago, as vendors and enterprises alike pretended what they were doing was cloud, even though it wasn’t. Today, the hype around microservices has led to the same kind of obfuscation, as vendors and enterprise technologists alike ar...
    Application availability is not just the measure of “being up”. Many apps can claim that status. Technically they are running and responding to requests, but at a rate which users would certainly interpret as being down. That’s because excessive load times can (and will be) interpreted as “not available.” That’s why it’s important to view ensuring application availability as requiring attention to all its composite parts: scalability, performance, and security.
    I’ve been thinking a bit about microservices (μServices) recently. My immediate reaction is to think: “Isn’t this just yet another new term for the same stuff, Web Services->SOA->APIs->Microservices?” Followed shortly by the thought, “well yes it is, but there are some important differences/distinguishing factors.” Microservices is an evolutionary paradigm born out of the need for simplicity (i.e., get away from the ESB) and alignment with agile (think DevOps) and scalable (think Containerizati...
    The cloud has reached mainstream IT. Those 18.7 million data centers out there (server closets to corporate data centers to colocation deployments) are moving to the cloud. In his session at 17th Cloud Expo, Achim Weiss, CEO & co-founder of ProfitBricks, will share how two companies – one in the U.S. and one in Germany – are achieving their goals with cloud infrastructure. More than a case study, he will share the details of how they prioritized their cloud computing infrastructure deployments ...
    Clearly the way forward is to move to cloud be it bare metal, VMs or containers. One aspect of the current public clouds that is slowing this cloud migration is cloud lock-in. Every cloud vendor is trying to make it very difficult to move out once a customer has chosen their cloud. In his session at 17th Cloud Expo, Naveen Nimmu, CEO of Clouber, Inc., will advocate that making the inter-cloud migration as simple as changing airlines would help the entire industry to quickly adopt the cloud wit...
    Apps and devices shouldn't stop working when there's limited or no network connectivity. Learn how to bring data stored in a cloud database to the edge of the network (and back again) whenever an Internet connection is available. In his session at 17th Cloud Expo, Bradley Holt, Developer Advocate at IBM Cloud Data Services, will demonstrate techniques for replicating cloud databases with devices in order to build offline-first mobile or Internet of Things (IoT) apps that can provide a better, ...
    “All our customers are looking at the cloud ecosystem as an important part of their overall product strategy. Some see it evolve as a multi-cloud / hybrid cloud strategy, while others are embracing all forms of cloud offerings like PaaS, IaaS and SaaS in their solutions,” noted Suhas Joshi, Vice President – Technology, at Harbinger Group, in this exclusive Q&A with Cloud Expo Conference Chair Roger Strukhoff.
    Jack Welch, the former CEO of GE once said - “If the rate of change on the outside is happening faster than the rate of change on the inside, the end is in sight.” This rings truer than ever – especially because business success is inextricably associated with those organizations who’ve got really good at delivering high-quality software innovations – innovations that disrupt existing markets and carve out new ones. Like the businesses they’ve helped digitally transform, DevOps teams and Conti...
    This week, the team assembled in NYC for @Cloud Expo 2015 and @ThingsExpo 2015. For the past four years, this has been a must-attend event for MetraTech. We were happy to once again join industry visionaries, colleagues, customers and even competitors to share and explore the ways in which the Internet of Things (IoT) will impact our industry. Over the course of the show, we discussed the types of challenges we will collectively need to solve to capitalize on the opportunity IoT presents.
    Docker is hot. However, as Docker container use spreads into more mature production pipelines, there can be issues about control of Docker images to ensure they are production-ready. Is a promotion-based model appropriate to control and track the flow of Docker images from development to production? In his session at DevOps Summit, Fred Simon, Co-founder and Chief Architect of JFrog, will demonstrate how to implement a promotion model for Docker images using a binary repository, and then show h...
    All we need to do is have our teams self-organize, and behold! Emergent design and/or architecture springs up out of the nothingness! If only it were that easy, right? I follow in the footsteps of so many people who have long wondered at the meanings of such simple words, as though they were dogma from on high. Emerge? Self-organizing? Profound, to be sure. But what do we really make of this sentence?
    SYS-CON Events announced today that HPM Networks will exhibit at the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. For 20 years, HPM Networks has been integrating technology solutions that solve complex business challenges. HPM Networks has designed solutions for both SMB and enterprise customers throughout the San Francisco Bay Area.
    DevOps is speeding towards the IT world like a freight train and the hype around it is deafening. There is no reason to be afraid of this change as it is the natural reaction to the agile movement that revolutionized development just a few years ago. By definition, DevOps is the natural alignment of IT performance to business profitability. The relevance of this has yet to be quantified but it has been suggested that the route to the CEO’s chair will come from the IT leaders that successfully ma...
    Mobile has become standard in the enterprise with smartphones and tablets common in the workplace. Anywhere, anytime access to company systems is expected and systems must work flawlessly on these devices! This demand is requiring that corporate IT departments figure out the best mobile strategy to follow. This eBook looks at how to kick start your mobile application strategy.
    Even though you are running an agile development process, that doesn’t necessarily mean that your performance testing is being conducted in a truly agile way. Saving performance testing for a “final sprint” before release still treats it like a waterfall development step, with all the cost and risk that comes with that. In this post, we will show you how to make load testing happen early and often by putting SLAs on the agile task board.
    Today, we are in the middle of a paradigm shift as we move from managing applications on VMs and containers to embracing everything that the cloud and XaaS (Everything as a Service) has to offer. In his session at 17th Cloud Expo, Kevin Hoffman, Advisory Solutions Architect at Pivotal Cloud Foundry, will provide an overview of 12-factor apps and migrating enterprise apps to the cloud. Kevin Hoffman is an Advisory Solutions Architect for Pivotal Cloud Foundry, and has spent the past 20 years b...
    Go ahead. Name a cloud environment that doesn't include load balancing as the key enabler of elastic scalability. I've got coffee... so it's good, take your time... Exactly. Load balancing - whether implemented as traditional high availability pairs or clustering - provides the means by which applications (and infrastructure, in many cases) scale horizontally. It is load balancing that is at the heart of elastic scalability models, and that provides a means to ensure availability and even imp...
    SYS-CON Events announced today that HPM Networks will exhibit at the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. For 20 years, HPM Networks has been integrating technology solutions that solve complex business challenges. HPM Networks has designed solutions for both SMB and enterprise customers throughout the San Francisco Bay Area.
    SYS-CON Events announced today that has been named a "Bronze Sponsor" of SYS-CON's @DevOpsSummit Silicon Valley, which will take place November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. provides open-source software ELK turned into a log analytics platform that is simple, infinitely- scalable, highly available, and secure.