Microservices Expo Authors: Elizabeth White, Liz McMillan, Gopala Krishna Behara, Sridhar Chalasani, Tirumala Khandrika

Related Topics: Microservices Expo

Microservices Expo: Article

Overcoming Web Services Challenges with Smart Design

Overcoming Web Services Challenges with Smart Design

Web services have become a very popular method for connecting distributed systems. The open standards-based techniques that Web services leverage provide many benefits in an enterprise computing environment, including cross-platform interoperability, simple firewall configuration, and limited software deployment requirements.

However, integrating distributed transactions via Web services presents special challenges, especially in relation to interacting with established transaction coordinators. In this article, I'll examine some of the problems, and suggest architectural best practices to avoid or mitigate them.

Challenges with Web Services
Enterprise architects and developers who use Web services to integrate transactional systems need to be aware of the current Web service challenges. Ignoring or not intelligently planning for these issues can lead to poor system performance or corrupt application data, particularly if the Web services are used as an RPC mechanism.

No Transaction-Control Providers Today
Modern distributed systems have come to rely on application servers to act as transaction-control managers for the various resources involved in a transaction. Most databases and messaging products, some file systems, and all transaction coordinators support the Open Group's XA specification for distributed transactions. The ability to coordinate and commit or roll back a transaction across these multiple types of datastores has simplified application development and management.

The Web services model suffers from a lack of any XA-compliant, two-phase commit resource controllers. Several groups have begun work on defining a transaction-control mechanism for Web services, including:

  • OASIS: Business Transaction Protocol
  • ebXML: Business Collaboration Protocol
  • Tentative Hold Protocol
However, none of these protocols has yet been finalized and there isn't overwhelming agreement between the various Web services tool vendors on a standard. Web services developers cannot reasonably expect to see a supported production implementation of any one of these standards anytime soon. This means that Web services developers and architects need to carefully consider how they will include Web service transactions among their other database or message queue updates.

HTTP Isn't Reliable
Many problems arise from the basic protocol used to communicate between computers. Most Web services are built on top of HTTP, which is a "best effort" protocol, with no guarantee that a message has reached its destination. In addition, the public Internet isn't a reliable network for any particular packet or message. If a client calls a Web service and doesn't get a response within the timeout period, the client can't be sure of whether the request was received, partially processed, or "lost on the 'Net." Worse yet, if an application retries the request, will the request be duplicated or cause an error? (Two orders entered? Two credit card charges?)

Longer Transaction Time
In general, a Web service "call" will take longer to execute than an equivalent direct query against a local database. The call will be slower due to the HTTP overhead, the XML overhead, and the network overhead to a remote server. In most cases, applications and data providers aren't optimized for XML transport but are translating data from a native format into an XML format. Similar to the effect of distributed object calls, the differences in performance between Web services and traditional database queries need to be factored into the application architecture to prevent unexpectedly poor performance.

Read-Only Is Simple
A Web service that provides read-only access to a datastore can effectively avoid concerns about transactional integrity. This is by far the simplest approach to handling transactional integration (i.e., do nothing!) and can work for many examples of Web services, including public information retrieval services like weather forecasts, enterprise-wide directory or authentication services, read access to the corporate customer master file, and inventory-level queries.

Note that the Web service need only be read-only from the client perspective. The service provider may log information, create audit trails, and update usage databases. The important distinction is that the client isn't relying on the server to record any information. In these situations, if the request doesn't complete successfully, the client application can choose whatever error-handling mechanism is appropriate for the user, including retrying the request, ignoring the error, or querying a different datastore, without fear of duplication or further errors.

Read-Only Doesn't Always Cut It
Not every API can be treated as a read-only request. Most real services will support some level of data updating, for instance, a Web service that supports order entry will need to record the customer order. Let's consider two types of applications that use these services - batch and online - which can generally be distinguished by whether or not a user is waiting for the response of the Web service.

In a batch-oriented system, the user isn't actively waiting for the response to the Web service request. In this case, the immediate response time isn't critical, as long as the request is eventually processed. A batch system has some simple options for handling uncertain delivery. The system can simply queue the request and continue to retry until a final success or failure response is received. In fact, the leading Web service vendors, including BEA, webMethods, and Microsoft, all support these features as part of their standard product application server.

On the other hand, applications with online transactions are dependent on the response from the Web service for immediate application flow or logic. In these circumstances the user cannot proceed (and accomplish his or her goal) until the Web service has completed processing the request. If the service is known to be unreliable, a custom retry mechanism may be built into the client application to retry a request; however, the application cannot reasonably retry more than once, because the user won't patiently wait, but will abandon the requested action. If the user is a paying customer, your business has just lost revenue! Examples of such time-critical actions include authenticating a user or performing credit card authorization prior to delivery of electronic products.

Handling Retries
Due to the uncertain nature of Web service transactions, it's essential to architect Web service APIs to support a retry or request-status mechanism. This mechanism enables a client application to either retry a request without fear of error, or at a minimum, determine the status of a request to ensure that a request isn't duplicated. The key to supporting retries is to ensure that every request type supports a unique transaction identifier used by both the client and server to identify the request.

Any client that wishes to inquire about the status of a request can use the transaction identifier to query the server. For a request that was received and processed by the server, the server can respond with the results, or an "in-process" status. If the server hasn't received the request, the client can resend the request safely (see Figure 1). This approach enables the client to avoid duplicate entries and have a measure of confidence that the request was correctly processed.

When implementing the transaction identifier, you must consider several competing priorities. To support robust transactions, the client needs to know the transaction ID before submitting the request. The server must enforce uniqueness across all clients and maintain security and privacy of data. A balance must be achieved in which the client generates the ID, usually based on a standard UUID algorithm. The server validates that the transaction ID is unique, rejecting any duplicate requests that don't originate from the same client. If a single client submits a duplicate request, the server should respond with the same result as for the original request.

No Two-Phase Commit
Until widespread support for a two-phase commit protocol for Web services develops, application architects will plan around the inability to roll back requests. The straightforward approach to handling a nontransactional resource, whether it's a Web service, file system, or other service, is to submit the request to that resource as the last step in an distributed transaction. The outcome of the Web service request will determine if the overall transaction is committed or aborted. This approach can lead to some interesting application logic, in which confirmation records are written to a database before the request is submitted. Since the database transaction can be rolled back, the application has ensured the data will not be in an inconsistent state if the Web service fails. If the application were to wait until after the Web service call to record the confirmation, it may fail to record the result even if the request is successfully processed (see Figure 2).

One requirement of this approach is that you break larger transactions into subtransactions with only one Web service call participating in each subtransaction. Each subtransaction should be semantically correct by itself, with no direct dependency on the following steps. Consider how an order entry system using Web service interfaces for both a corporate customer master and a particular manufacturing execution system (MES) should handle an order for a new customer. If the system logically breaks the overall process into two separate transactions, one to add a new customer to the customer master and a second to place the order, it can avoid any problems with attempting to roll back the first Web service call. If we are unable to connect to the MES to process the order request, the customer has still been logically and completely added to the customer master (see Figure 3).

In a larger system that utilizes functionality provided by many Web services, it may not be possible to isolate different system calls. In this case, it's imperative that the application architecture be carefully planned to minimize the potential for inconsistent data between the participating subsystems.

Queued Processing
In the particular example of submitting to an MES, an alternative approach would be to delay submission. If the MES doesn't apply additional validation rules that cause the order to be rejected, we can assume that order will be accepted. By queuing the order request for later submission to the MES, we can retry as often as necessary to succeed. This approach simplifies the transactional layout, allowing the application to consolidate to only a single logical transaction. However, the application logic may become convoluted, with the customer added to customer master after the order is placed in the message queue for delivery (see Figure 4).

Because queuing systems and databases will both participate in a distributed transaction, we can roll back the queue insert if the customer-add fails. In this way, we can correctly control the data consistency. As an additional side benefit, by delaying delivery to the MES, we also avoid requiring additional 24-hour availability on the existing manufacturing system simply to support the distributed Web service architecture.

Careful Resource Use The above discussion focused on integrating a Web service update with local transactional resources, and earlier I "dismissed" integrating a read-only Web service as easy. Well, nothing in programming is truly easy, and this is no exception. The longer average call time for a remote system call (like a Web service) can negatively impact the behavior of a transactional monitor. In order to coordinate all the actions within a transaction, the transaction manager needs to lock each resource that has been modified for the lifetime of the transaction. This prevents any other user or process from reading new or changed values until after the transaction has been committed. When a Web service call is executed in the middle of a transaction block, the transaction locks will be held until the Web service completes, because the transaction hasn't yet been committed.

The increase in a single transaction time can potentially have a snowball effect on a high-volume system. By holding locks open for one second longer, this transaction will prevent other transactions that need the same resources from even beginning for an extra second. That delay in start can delay other transactions, and so on. If a system was executing tens or hundreds of transactions per second, adding a single Web service call in the middle of the transaction could wreak havoc with the system throughput. A simple approach to avoiding this is to perform read-only queries before beginning any XA-transactions. The thread that calls the Web service will be blocked while waiting..

Returning to the order entry system, if we need to check the customer's maximum credit in order to enter the order, we can retrieve the maximum credit amount before beginning the order entry. Note that this isn't perfect, in that another system may update the credit amounts after we've read the file. However, that's one of the chances for incorrect data created as a result of the lack of a Web service transaction protocol.

In this article, I looked at some of the challenges of integrating Web services into transactional systems. Web services provide an easy way to integrate distributed applications via standard protocols, but there isn't yet a standard mechanism for implementing transactional control. Many of the ideas presented here apply equally well to any nontransactional, high-latency external system call. Hopefully, the architectural guidelines and suggestions will help you build more robust distributed systems.


In any enterprise computing system, certain operations need to be done as a single atomic unit. Either all of the operations need to succeed or all need to fail in order to keep the system's data internally consistent. All enterprise-class database-management systems support the notion of a transaction, to support exactly this requirement. The transaction coordinator makes sure that all of the database writes (updates or inserts) to a single database, are all either committed or rolled back as a single unit of work. As distributed systems evolved, the Open Group developed the XA Specification to enable multiple transactional systems to cooperate in a single distributed transaction.

An XA-compliant transaction is a two-phase transaction with multiple systems participating. Each of the distributed systems has a resource manager, which controls the transaction state on the single database, message queue, file system, etc. A central transaction manager coordinates the final commit or rollback of all the resources in the transaction. Once each of the resource managers has communicated the ability to commit the transaction, the transaction manager issues a command to all of the systems to commit. If any one of the systems doesn't communicate success in the first phase, then the transaction coordinator would command each of the systems to roll back.

The Java Transaction API (JTA) is the Java implementation of the XA specification; COM+ is Microsoft's XA-compliant transaction manager.

More Stories By Dave Rader

David Rader is a partner with Fusion Technologies, a consulting company specializing in application integration and Web services development. Most recently, he's been developing XML-based applications in the small business portal space.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

Microservices Articles
Kubernetes is a new and revolutionary open-sourced system for managing containers across multiple hosts in a cluster. Ansible is a simple IT automation tool for just about any requirement for reproducible environments. In his session at @DevOpsSummit at 18th Cloud Expo, Patrick Galbraith, a principal engineer at HPE, discussed how to build a fully functional Kubernetes cluster on a number of virtual machines or bare-metal hosts. Also included will be a brief demonstration of running a Galera MyS...
Your homes and cars can be automated and self-serviced. Why can't your storage? From simply asking questions to analyze and troubleshoot your infrastructure, to provisioning storage with snapshots, recovery and replication, your wildest sci-fi dream has come true. In his session at @DevOpsSummit at 20th Cloud Expo, Dan Florea, Director of Product Management at Tintri, provided a ChatOps demo where you can talk to your storage and manage it from anywhere, through Slack and similar services with...
Growth hacking is common for startups to make unheard-of progress in building their business. Career Hacks can help Geek Girls and those who support them (yes, that's you too, Dad!) to excel in this typically male-dominated world. Get ready to learn the facts: Is there a bias against women in the tech / developer communities? Why are women 50% of the workforce, but hold only 24% of the STEM or IT positions? Some beginnings of what to do about it! In her Day 2 Keynote at 17th Cloud Expo, Sandy Ca...
New competitors, disruptive technologies, and growing expectations are pushing every business to both adopt and deliver new digital services. This ‘Digital Transformation’ demands rapid delivery and continuous iteration of new competitive services via multiple channels, which in turn demands new service delivery techniques – including DevOps. In this power panel at @DevOpsSummit 20th Cloud Expo, moderated by DevOps Conference Co-Chair Andi Mann, panelists examined how DevOps helps to meet the de...
As Enterprise business moves from Monoliths to Microservices, adoption and successful implementations of Microservices become more evident. The goal of Microservices is to improve software delivery speed and increase system safety as scale increases. Documenting hurdles and problems for the use of Microservices will help consultants, architects and specialists to avoid repeating the same mistakes and learn how and when to use (or not use) Microservices at the enterprise level. The circumstance w...
"NetApp's vision is how we help organizations manage data - delivering the right data in the right place, in the right time, to the people who need it, and doing it agnostic to what the platform is," explained Josh Atwell, Developer Advocate for NetApp, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
TCP (Transmission Control Protocol) is a common and reliable transmission protocol on the Internet. TCP was introduced in the 70s by Stanford University for US Defense to establish connectivity between distributed systems to maintain a backup of defense information. At the time, TCP was introduced to communicate amongst a selected set of devices for a smaller dataset over shorter distances. As the Internet evolved, however, the number of applications and users, and the types of data accessed and...
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm. In their Day 3 Keynote at 20th Cloud Expo, Chris Brown, a Solutions Marketing Manager at Nutanix, and Mark Lav...
The Software Defined Data Center (SDDC), which enables organizations to seamlessly run in a hybrid cloud model (public + private cloud), is here to stay. IDC estimates that the software-defined networking market will be valued at $3.7 billion by 2016. Security is a key component and benefit of the SDDC, and offers an opportunity to build security 'from the ground up' and weave it into the environment from day one. In his session at 16th Cloud Expo, Reuven Harrison, CTO and Co-Founder of Tufin, ...
Gone are the days when application development was the daunting task of the highly skilled developers backed with strong IT skills, low code application development has democratized app development and empowered a new generation of citizen developers. There was a time when app development was in the domain of people with complex coding and technical skills. We called these people by various names like programmers, coders, techies, and they usually worked in a world oblivious of the everyday pri...