Welcome!

Microservices Expo Authors: Liz McMillan, Pat Romanski, VictorOps Blog, Derek Weeks, Jason Bloomberg

Related Topics: Microservices Expo

Microservices Expo: Article

The Business Transaction Protocol: Transactions for a New Age

The Business Transaction Protocol: Transactions for a New Age

Atomic transactions are a well-known technique for guaranteeing consistency in the presence of failures. The ACID properties of atomic transactions ensure that, even in complex business applications, consistency of state is preserved.

Transactions are best viewed as "short-lived" entities operating in a closely-coupled environment, performing stable state changes to the system; they are less well suited for structuring "long-lived" application functions (e.g., running for hours, days, etc.) and running in a loosely coupled environment like the Web. Long-lived atomic transactions (as typically occur in business-to-business interactions) may reduce the concurrency in the system to an unacceptable level by holding on to resources (e.g., locks) for a long time; further, if such an atomic transaction rolls back, much valuable work already performed could be undone. As a result, there have been various extended transactions models where strict ACID properties can be relaxed in a controlled manner. Until recently, translating these models into the world of Web services had not been attempted. However, the OASIS Business Transactions Protocol, specified by a collaboration of several companies, has tried to address this issue.

Introduction
With the advent of Web services, the Web is being populated by service providers who wish to take advantage of this large B2B space. However, there are still important security and fault-tolerance considerations that must be addressed. One of these is the fact that the Web frequently suffers from failures that can affect both the performance and consistency of applications that run over it.

Atomic transactions are a well-known technique for guaranteeing consistency in the presence of failures. (Note: I will not use the term transaction in place of atomic transaction since in the B2B space this has different connotations.) The ACID properties of atomic transactions (Atomicity, Consistency, Isolation, Durability) ensure that even in complex business applications consistency of state is preserved, despite concurrent accesses and failures. This is an extremely useful fault-tolerance technique, especially when multiple, possibly remote, resources are involved.

The structuring mechanisms available within traditional atomic transaction systems are sequential and concurrent composition of transactions. These mechanisms are sufficient if an application function can be represented as a single atomic transaction. As Web services evolved as a means to integrate processes and applications at an inter-enterprise level, traditional transaction semantics and protocols have proven inappropriate. Web services-based transactions differ from traditional transactions in that they execute over long periods, they require commitments to the transaction to be "negotiated" at runtime, and isolation levels have to be relaxed.

As a result, there have been various extended transactions models, in which strict ACID properties can be relaxed in a controlled manner. Until recently, translating these models into the world of Web services had not been attempted. However, the OASIS Business Transactions Protocol (BTP), specified by a collaboration of several companies, has tried to address this issue. In this article we'll first consider why traditional atomic transactions are insufficient for long-running B2B activities, and then describe how the BTP protocol has attempted to solve these problems.

Why ACID Transactions Are Too Strong
ACID transactions by themselves are inadequate for structuring long-lived applications. To ensure ACID-ity between multiple participants, a multiphase (typically two) consensus mechanism is required (see Figure 1). During the first (preparation) phase, an individual participant must make durable any state changes that occurred during the scope of the atomic transaction, such that these changes can either be rolled back (undone) or committed later once consensus to the transaction outcome has been determined among all participants, i.e., any original state must not be lost at this point, as the atomic transaction could still roll back. Assuming no failures occurred during the first phase (in which case all participants will be forced to undo their changes), in the second (commitment) phase, participants may "overwrite" the original state with the state made durable during the first phase.

 

In order to guarantee consensus, a two-phase commit is necessarily a blocking protocol. After returning the phase 1 response, each participant that returned a commit response must remain blocked until it has received the coordinator's phase 2 message telling it what to do. Until they receive this message, any resources used by the participant are unavailable for use by other atomic transactions, since to do so may result in non-ACID behavior. If the coordinator fails before delivery of the second phase message these resources remain blocked until it recovers. In addition, if a participant fails after phase 1, but before the coordinator can deliver its final commit decision, the atomic transaction cannot be completed until the participant recovers: all participants must see both phases of the commit protocol in order to guarantee ACID semantics. There is no implied time limit between a coordinator sending the first phase message of the commit protocol and it sending the second, commit phase message; there could be seconds or hours between them.

Therefore, structuring certain activities from long-running atomic transactions can reduce the amount of concurrency within an application or (in the event of failures) require work to be performed again. For example, there are certain classes of application where it is known that resources acquired within an atomic transaction can be released "early," rather than having to wait until the atomic transaction terminates; in the event of the atomic transaction rolling back, however, certain compensation activities may be necessary to restore the system to a consistent state. Such compensation activities (which may perform forward or backward recovery) will typically be application specific, may not be necessary at all, or may be more efficiently dealt with by the application. For example, long-running activities can be structured as many independent, short-duration atomic transactions, to form a "logical" long-running transaction. This structure allows an activity to acquire and use resources for only the required duration of this long-running activity. In Figure 2 an application activity (shown by the dotted ellipse) has been split into many different, coordinated, short-duration atomic transactions. Assume that the application activity is concerned with booking a taxi (t1), reserving a table at a restaurant (t2), reserving a seat at the theater (t3), booking a room at a hotel (t4), and so on. If all of these operations were performed as a single atomic transaction, then resources acquired during t1 would not be released until the atomic transaction has terminated. If subsequent activities t2, t3, etc., do not require those resources, then they will be needlessly unavailable to other clients.

 

However, if failures and concurrent access occur during the lifetime of these individual transactional activities, then the behavior of the entire "logical long-running transaction" may not possess ACID properties. Therefore, some form of (application-specific) compensation may be required to attempt to return the state of the system to consistency. For example, let's assume that t4 aborts. Further assume that the application can continue to make forward progress, but in order to do so must now undo some state changes made prior to the start of t4 (by t1, t2, or t3). New activities are started; tc1 is a compensation activity that will attempt to undo state changes performed by, say, t2 and t3, which will continue the application once tc1 has completed. tc5' and tc6' are new activities that continue after compensation, e.g. since it was not possible to reserve the theater, restaurant, and hotel, it is decided to book tickets at the cinema. Obviously, other forms of composition are possible.

 

Properties of a Web Service-Based Transaction
The fundamental question addressed here is what properties must a transaction model possess in order to support business-to-business interactions? To begin to answer that, we need to understand what we mean by a business transaction.

A business relationship is any distributed state maintained by two or more parties and is subject to some contractual constraints previously agreed to by those parties. A business transaction can therefore be considered as a consistent change in the state of a business relationship between parties. Each party in a business transaction holds its own application state corresponding to the business relationship with other parties in that transaction. During the course of a business transaction, this state may change.

In the Web services domain, information about business transactions is communicated in XML documents. However, how those documents are exchanged by the different parties involved (e.g., e-mail or HTTP) may be a function of the environment, type of business relationship, or other business or logistical factors. Therefore, mandating a specific XML carrier protocol may be too restrictive.

Since business relationships imply a level of value to the parties associated by those relationships, achieving some level of consensus among these parties is important. Not all participants within a particular business transaction have to see the same outcome; a specific transaction may possess multiple consensus groups.

In addition to understanding the outcomes, a participant within a business transaction may need to support provisional or tentative state changes during the course of the transaction. Such parties must also support the completion of a business transaction, either through confirmation (final effect) or cancellation (counter-effect). In general, what it means to confirm or cancel work done within a business transaction will be for the participant to determine.

For example, an application may choose to perform changes as provisional effects and make them visible to other business transactions. It may store necessary information to undo these changes at the same time. On confirmation, it may simply discard these "undo", changes, or on cancellation it may apply these "undo" changes. An application can employ such a compensation-based approach or take a conventional "rollback" approach. It is with these properties in mind that we can discuss the Business Transaction Protocol.

The Business Transaction Protocol
B2B interactions may be complex, involving many parties, spanning many different organisations, and potentially lasting for hours or days, e.g., the process of ordering and delivering parts for a computer may involve different suppliers, and may only be considered to have completed once the parts are delivered to their final destination. Most business-to-business collaborative applications require transactional support in order to guarantee consistent outcome and correct execution. These applications often involve long-running computations, loosely coupled systems, and components that do not share data, location, or administration; it is then difficult to incorporate ACID transactions within such architectures. Furthermore, most collaborative business process management systems support complex, long-running processes in which undoing tasks that have already completed may be necessary in order to effect recovery or to choose another acceptable execution path.

For example, an online bookshop may well reserve books for an individual for a specific period of time, but if the individual does not purchase the books within that time period they will be "put back onto the shelf" for others to purchase; to do otherwise could result in the shop never selling a single book. Furthermore, because it is not possible for anyone to have an infinite supply of stock, some examples of online shops may appear to users to reserve items for them, but in fact if other users want to purchase them first they may be allowed to (i.e., the same book may be "reserved" for multiple users concurrently); a user may subsequently find that the item is no longer available, or may have to be ordered especially for them. If these examples were modelled using atomic transactions, then the reservation process would require the book to be locked for the duration of the atomic transaction - it would have to be available, and could not be acquired by (sold to) another user. When the atomic transaction commits, the book will be removed from stock and mailed to the user. However, if a failure occurs during the commitment protocol, the book may remain locked for an indeterminate amount of time (or until manual intervention occurs).

As a result, the use of traditional atomic transactions with strict ACID properties (e.g., systems that implement the JTS specification [SUN99]) is considered too restrictive for many types of applications.

The Organization for the Advancement of Structured Information Standards (OASIS) Business Transaction Protocol (BTP) is a transaction protocol that meets the requirement for Web-based, long-running collaborative business applications. BTP is designed to support applications that are disparate in time, location, and administration and thus require transactional support beyond classical ACID transactions. In short, BTP is a protocol for ensuring consistent outcomes from participating parties in a business transaction.

Note: It is important to realize that the term "transaction" in this sense does not mean atomic transaction, although ACID semantics can be obtained if required.

Consensus of Opinion
In general, a business transaction requires the capability for certain participants to be structured into a consensus group such that all of the members in a grouping have the same result. Different participants within the same business transaction may belong to different consensus groups. The business logic then controls how each group completes. In this way, a business transaction may cause a subset of the groups it naturally creates to perform the work it asks, while asking the other groups to undo the work.

Consider the situation shown in Figure 4, in which a user is booking a holiday, has provisionally reserved a flight ticket and taxi to the airport, and is now looking for travel insurance. The first consensus group holds Flights and Taxi, since neither of these can occur independently. The user may then decide to visit multiple insurance sites (called A and B in this example), and as he goes may reserve the quotes he likes. So, A may quote $50, which is just within budget, but the user may want to try B just in case he can find a cheaper price, without losing the initial quote. If the quote from B is less than that from A, the user may cancel A while confirming both the flights and the insurance from B. Each insurance site may therefore occur within its own consensus group. This is not possible when using ACID transactions.

 

BTP uses a two-phase completion protocol to guarantee atomicity of decisions but does not imply specific implementations. To enforce this distinction, rather than call the second phases of the termination protocol "commit" and "rollback" as is the case in an ACID transaction environment, they are called "confirm" and "cancel" respectively, with the intention of decoupling the phases from any preconceptions of specific backward-compensation implementations.

It's important to stress that although BTP uses a two-phase protocol, it does not imply ACID transactions. How implementations of prepare, confirm, and cancel are provided is a back-end implementation decision. Issues to do with consistency and isolation of data are also back-end choices and not imposed or assumed by BTP. A BTP implementation is primarily concerned with two-phase coordination of abstract entities (participants).

Open-Top Coordination
In a traditional transaction system, the application or user has very few verbs with which to control the transaction. Typically, these are "begin," "commit," and "roll back," corresponding to starting a transaction, committing a transaction, and rolling back a transaction respectively. When an application asks for a transaction to commit, the coordinator will execute the entire two-phase commit protocol, as described earlier, before returning an outcome to the application (what BTP terms a closed-top commit protocol). The elapse time between the execution of the first phase and the second phase is typically milliseconds to seconds, but is entirely under the control of the coordinator.

However, the actual two-phase protocol does not impose any restrictions on the time between executing the first and second phases. Obviously, the longer this period takes, the more chance there is for a failure to occur and the longer (critical) resources remain locked or isolated from other users. This is the reason why most ACID transaction systems attempt to keep this time frame to a minimum and why they do not work well in environments like the Web.

BTP, on the other hand, took the approach of allowing the time between these phases to be set by the application by expanding the verbs available to include explicit control over both phases of the term, i.e., "prepare," "confirm," and "cancel" - what BTP terms an open-top commit protocol. The application has complete control over when it can tell a transaction to prepare and, using whatever business logic is required, it can later determine which transaction(s) to confirm or cancel. This ability is a powerful tool for applications.

Atoms and Cohesions
To address the specific requirements of business transactions, BTP introduced two types of extended transactions, both using the open-top completion protocol:

  • Atom: An atom is the typical way in which "transactional" work performed on Web services is scoped. The outcome of an atom is guaranteed to be consistent such that all enlisted participants will see the same outcome, which will be either to accept (confirm) the work or reject (cancel) it.
  • Cohesion: This type of transaction was introduced in order to relax atomicity and allow for the selection of work to be confirmed or cancelled based on higher-level business rules. Atoms are the typical participants within a cohesion, but unlike an atom, a cohesion may give different outcomes to its participants such that some of them may confirm while the remainder cancel. In essence, the two-phase protocol for a cohesion is parameterized to allow a user to specify precisely which participants to prepare and which to cancel. The strategy underpinning cohesions is that they better model long-running business activities in which services enroll in atoms that represent specific units of work and as the business activity progresses, may encounter conditions that allow it to cancel or prepare these units, with the caveat that it may be many hours or days before the cohesion arrives at its confirm-set: the set of participants that it requires to confirm in order to successfully terminate the business activity. Once the confirm-set has been determined, the cohesion collapses down to being an atom: all members of the confirm-set see the same outcome.

    Looking Ahead
    In my next article, I'll take a closer look at the architecture of BTP and how XML is involved in it. I'll also look at the Web services stack and how BTP is used.

    References

  • BTP: www.oasis-open.org/committees/business-transactions
  • OMG (1995) "CORBAservices: Common Object Services Specification." OMG Document Number 95-3-31. March.
  • Sun Microsystems Inc. (1999) "Java Transaction API 1.0.1 (JTA)," April.
  • Sun Microsystems Inc. (2002) "XML Transactioning API for Java (JAXTX)." www.jcp.org/jsr/detail/156.jsp.
  • More Stories By Mark Little

    Mark Little was Chief Architect, Transactions for Arjuna Technologies Ltd, a UK-based company specialising in the development of reliable middleware that was recently acquired by JBoss, Inc. Before Arjuna, Mark was a Distinguished Engineer/Architect within HP Arjuna Labs in Newcastle upon Tyne, England, where he led the HP-TS and HP-WST teams, developing J2EE and Web services transactions products respectively. He is one of the primary authors of the OMG Activity Service specification and is on the expert group for the same work in J2EE (JSR 95). He is also the specification lead for JSR 156: Java API for XML Transactions. He's on the OTS Revision Task Force and the OASIS Business Transactions Protocol specification. Before joining HP he was for over 10 years a member of the Arjuna team within the University of Newcastle upon Tyne (where he continues to have a Visiting Fellowship). His research within the Arjuna team included replication and transactions support, which include the construction of an OTS/JTS compliant transaction processing system. Mark has published extensively in the Web Services Journal, Java Developer's Journal and other journals and magazines. He is also the co-author of several books including “Java and Transactions for Systems Professionals” and “The J2EE 1.4 Bible.”

    Comments (0)

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


    @MicroservicesExpo Stories
    In the world of DevOps there are ‘known good practices’ – aka ‘patterns’ – and ‘known bad practices’ – aka ‘anti-patterns.' Many of these patterns and anti-patterns have been developed from real world experience, especially by the early adopters of DevOps theory; but many are more feasible in theory than in practice, especially for more recent entrants to the DevOps scene. In this power panel at @DevOpsSummit at 18th Cloud Expo, moderated by DevOps Conference Chair Andi Mann, panelists will dis...
    SYS-CON Events announced today that Peak 10, Inc., a national IT infrastructure and cloud services provider, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. Peak 10 provides reliable, tailored data center and network services, cloud and managed services. Its solutions are designed to scale and adapt to customers’ changing business needs, enabling them to lower costs, improve performance and focus inter...
    Many private cloud projects were built to deliver self-service access to development and test resources. While those clouds delivered faster access to resources, they lacked visibility, control and security needed for production deployments. In their session at 18th Cloud Expo, Steve Anderson, Product Manager at BMC Software, and Rick Lefort, Principal Technical Marketing Consultant at BMC Software, will discuss how a cloud designed for production operations not only helps accelerate developer...
    Last week I had the pleasure of speaking on a panel at Sapphire Ventures Next-Gen Tech Stack Forum in San Francisco. Obviously, I was excited to join the discussion, but as a participant the event crystallized not only where the larger software development market is relative to microservices, container technologies (like Docker), continuous integration and deployment; but also provided insight into where DevOps is heading in the coming years.
    Much of the value of DevOps comes from a (renewed) focus on measurement, sharing, and continuous feedback loops. In increasingly complex DevOps workflows and environments, and especially in larger, regulated, or more crystallized organizations, these core concepts become even more critical. In his session at @DevOpsSummit at 18th Cloud Expo, Andi Mann, Chief Technology Advocate at Splunk, will show how, by focusing on 'metrics that matter,' you can provide objective, transparent, and meaningfu...
    Wow, if you ever wanted to learn about Rugged DevOps (some call it DevSecOps), sit down for a spell with Shannon Lietz, Ian Allison and Scott Kennedy from Intuit. We discussed a number of important topics including internal war games, culture hacking, gamification of Rugged DevOps and starting as a small team. There are 100 gold nuggets in this conversation for novices and experts alike.
    The notion of customer journeys, of course, are central to the digital marketer’s playbook. Clearly, enterprises should focus their digital efforts on such journeys, as they represent customer interactions over time. But making customer journeys the centerpiece of the enterprise architecture, however, leaves more questions than answers. The challenge arises when EAs consider the context of the customer journey in the overall architecture as well as the architectural elements that make up each...
    In a crowded world of popular computer languages, platforms and ecosystems, Node.js is one of the hottest. According to w3techs.com, Node.js usage has gone up 241 percent in the last year alone. Retailers have taken notice and are implementing it on many levels. I am going to share the basics of Node.js, and discuss why retailers are using it to reduce page load times and improve server efficiency. I’ll talk about similar developments such as Docker and microservices, and look at several compani...
    Admittedly, two years ago I was a bulk contributor to the DevOps noise with conversations rooted in the movement around culture, principles, and goals. And while all of these elements of DevOps environments are important, I’ve found that the biggest challenge now is a lack of understanding as to why DevOps is beneficial. It’s getting the wheels going, or just taking the next step. The best way to start on the road to change is to take a look at the companies that have already made great headway ...
    In 2006, Martin Fowler posted his now famous essay on Continuous Integration. Looking back, what seemed revolutionary, radical or just plain crazy is now common, pedestrian and "just what you do." I love it. Back then, building and releasing software was a real pain. Integration was something you did at the end, after code complete, and we didn't know how long it would take. Some people may recall how we, as an industry, spent a massive amount of time integrating code from one team with another...
    From the conception of Docker containers to the unfolding microservices revolution we see today, here is a brief history of what I like to call 'containerology'. In 2013, we were solidly in the monolithic application era. I had noticed that a growing amount of effort was going into deploying and configuring applications. As applications had grown in complexity and interdependency over the years, the effort to install and configure them was becoming significant. But the road did not end with a ...
    I have an article in the recently released “DZone Guide to Building and Deploying Applications on the Cloud” entitled “Fullstack Engineering in the Age of Hybrid Cloud”. In this article I discuss the need and skills of a Fullstack Engineer with relation to troubleshooting and repairing complex, distributed hybrid cloud applications. My recent experiences with troubleshooting issues with my Docker WordPress container only reinforce the details I wrote about in this piece. Without my comprehensive...
    As the software delivery industry continues to evolve and mature, the challenge of managing the growing list of the tools and processes becomes more daunting every day. Today, Application Lifecycle Management (ALM) platforms are proving most valuable by providing the governance, management and coordination for every stage of development, deployment and release. Recently, I spoke with Madison Moore at SD Times about the changing market and where ALM is headed.
    The goal of any tech business worth its salt is to provide the best product or service to its clients in the most efficient and cost-effective way possible. This is just as true in the development of software products as it is in other product design services. Microservices, an app architecture style that leans mostly on independent, self-contained programs, are quickly becoming the new norm, so to speak. With this change comes a declining reliance on older SOAs like COBRA, a push toward more s...
    Small teams are more effective. The general agreement is that anything from 5 to 12 is the 'right' small. But of course small teams will also have 'small' throughput - relatively speaking. So if your demand is X and the throughput of a small team is X/10, you probably need 10 teams to meet that demand. But more teams also mean more effort to coordinate and align their efforts in the same direction. So, the challenge is how to harness the power of small teams and yet orchestrate multiples of them...
    Much of the discussion around cloud DevOps focuses on the speed with which companies need to get new code into production. This focus is important – because in an increasingly digital marketplace, new code enables new value propositions. New code is also often essential for maintaining competitive parity with market innovators. But new code doesn’t just have to deliver the functionality the business requires. It also has to behave well because the behavior of code in the cloud affects performan...
    Struggling to keep up with increasing application demand? Learn how Platform as a Service (PaaS) can streamline application development processes and make resource management easy.
    If there is anything we have learned by now, is that every business paves their own unique path for releasing software- every pipeline, implementation and practices are a bit different, and DevOps comes in all shapes and sizes. Software delivery practices are often comprised of set of several complementing (or even competing) methodologies – such as leveraging Agile, DevOps and even a mix of ITIL, to create the combination that’s most suitable for your organization and that maximize your busines...
    Digital means customer preferences and behavior are driving enterprise technology decisions to be sure, but let’s not forget our employees. After all, when we say customer, we mean customer writ large, including partners, supply chain participants, and yes, those salaried denizens whose daily labor forms the cornerstone of the enterprise. While your customers bask in the warm rays of your digital efforts, are your employees toiling away in the dark recesses of your enterprise, pecking data into...
    You deployed your app with the Bluemix PaaS and it's gaining some serious traction, so it's time to make some tweaks. Did you design your application in a way that it can scale in the cloud? Were you even thinking about the cloud when you built the app? If not, chances are your app is going to break. Check out this webcast to learn various techniques for designing applications that will scale successfully in Bluemix, for the confidence you need to take your apps to the next level and beyond.