Microservices Expo Authors: Liz McMillan, Pat Romanski, Carmen Gonzalez, Elizabeth White, Jason Bloomberg

Related Topics: Microservices Expo

Microservices Expo: Article

Overcoming Web Services Challenges with Smart Design

Overcoming Web Services Challenges with Smart Design

Web services have become a very popular method for connecting distributed systems. The open standards-based techniques that Web services leverage provide many benefits in an enterprise computing environment, including cross-platform interoperability, simple firewall configuration, and limited software deployment requirements.

However, integrating distributed transactions via Web services presents special challenges, especially in relation to interacting with established transaction coordinators. In this article, I'll examine some of the problems, and suggest architectural best practices to avoid or mitigate them.

Challenges with Web Services
Enterprise architects and developers who use Web services to integrate transactional systems need to be aware of the current Web service challenges. Ignoring or not intelligently planning for these issues can lead to poor system performance or corrupt application data, particularly if the Web services are used as an RPC mechanism.

No Transaction-Control Providers Today
Modern distributed systems have come to rely on application servers to act as transaction-control managers for the various resources involved in a transaction. Most databases and messaging products, some file systems, and all transaction coordinators support the Open Group's XA specification for distributed transactions. The ability to coordinate and commit or roll back a transaction across these multiple types of datastores has simplified application development and management.

The Web services model suffers from a lack of any XA-compliant, two-phase commit resource controllers. Several groups have begun work on defining a transaction-control mechanism for Web services, including:

  • OASIS: Business Transaction Protocol
  • ebXML: Business Collaboration Protocol
  • Tentative Hold Protocol
However, none of these protocols has yet been finalized and there isn't overwhelming agreement between the various Web services tool vendors on a standard. Web services developers cannot reasonably expect to see a supported production implementation of any one of these standards anytime soon. This means that Web services developers and architects need to carefully consider how they will include Web service transactions among their other database or message queue updates.

HTTP Isn't Reliable
Many problems arise from the basic protocol used to communicate between computers. Most Web services are built on top of HTTP, which is a "best effort" protocol, with no guarantee that a message has reached its destination. In addition, the public Internet isn't a reliable network for any particular packet or message. If a client calls a Web service and doesn't get a response within the timeout period, the client can't be sure of whether the request was received, partially processed, or "lost on the 'Net." Worse yet, if an application retries the request, will the request be duplicated or cause an error? (Two orders entered? Two credit card charges?)

Longer Transaction Time
In general, a Web service "call" will take longer to execute than an equivalent direct query against a local database. The call will be slower due to the HTTP overhead, the XML overhead, and the network overhead to a remote server. In most cases, applications and data providers aren't optimized for XML transport but are translating data from a native format into an XML format. Similar to the effect of distributed object calls, the differences in performance between Web services and traditional database queries need to be factored into the application architecture to prevent unexpectedly poor performance.

Read-Only Is Simple
A Web service that provides read-only access to a datastore can effectively avoid concerns about transactional integrity. This is by far the simplest approach to handling transactional integration (i.e., do nothing!) and can work for many examples of Web services, including public information retrieval services like weather forecasts, enterprise-wide directory or authentication services, read access to the corporate customer master file, and inventory-level queries.

Note that the Web service need only be read-only from the client perspective. The service provider may log information, create audit trails, and update usage databases. The important distinction is that the client isn't relying on the server to record any information. In these situations, if the request doesn't complete successfully, the client application can choose whatever error-handling mechanism is appropriate for the user, including retrying the request, ignoring the error, or querying a different datastore, without fear of duplication or further errors.

Read-Only Doesn't Always Cut It
Not every API can be treated as a read-only request. Most real services will support some level of data updating, for instance, a Web service that supports order entry will need to record the customer order. Let's consider two types of applications that use these services - batch and online - which can generally be distinguished by whether or not a user is waiting for the response of the Web service.

In a batch-oriented system, the user isn't actively waiting for the response to the Web service request. In this case, the immediate response time isn't critical, as long as the request is eventually processed. A batch system has some simple options for handling uncertain delivery. The system can simply queue the request and continue to retry until a final success or failure response is received. In fact, the leading Web service vendors, including BEA, webMethods, and Microsoft, all support these features as part of their standard product application server.

On the other hand, applications with online transactions are dependent on the response from the Web service for immediate application flow or logic. In these circumstances the user cannot proceed (and accomplish his or her goal) until the Web service has completed processing the request. If the service is known to be unreliable, a custom retry mechanism may be built into the client application to retry a request; however, the application cannot reasonably retry more than once, because the user won't patiently wait, but will abandon the requested action. If the user is a paying customer, your business has just lost revenue! Examples of such time-critical actions include authenticating a user or performing credit card authorization prior to delivery of electronic products.

Handling Retries
Due to the uncertain nature of Web service transactions, it's essential to architect Web service APIs to support a retry or request-status mechanism. This mechanism enables a client application to either retry a request without fear of error, or at a minimum, determine the status of a request to ensure that a request isn't duplicated. The key to supporting retries is to ensure that every request type supports a unique transaction identifier used by both the client and server to identify the request.

Any client that wishes to inquire about the status of a request can use the transaction identifier to query the server. For a request that was received and processed by the server, the server can respond with the results, or an "in-process" status. If the server hasn't received the request, the client can resend the request safely (see Figure 1). This approach enables the client to avoid duplicate entries and have a measure of confidence that the request was correctly processed.

When implementing the transaction identifier, you must consider several competing priorities. To support robust transactions, the client needs to know the transaction ID before submitting the request. The server must enforce uniqueness across all clients and maintain security and privacy of data. A balance must be achieved in which the client generates the ID, usually based on a standard UUID algorithm. The server validates that the transaction ID is unique, rejecting any duplicate requests that don't originate from the same client. If a single client submits a duplicate request, the server should respond with the same result as for the original request.

No Two-Phase Commit
Until widespread support for a two-phase commit protocol for Web services develops, application architects will plan around the inability to roll back requests. The straightforward approach to handling a nontransactional resource, whether it's a Web service, file system, or other service, is to submit the request to that resource as the last step in an distributed transaction. The outcome of the Web service request will determine if the overall transaction is committed or aborted. This approach can lead to some interesting application logic, in which confirmation records are written to a database before the request is submitted. Since the database transaction can be rolled back, the application has ensured the data will not be in an inconsistent state if the Web service fails. If the application were to wait until after the Web service call to record the confirmation, it may fail to record the result even if the request is successfully processed (see Figure 2).

One requirement of this approach is that you break larger transactions into subtransactions with only one Web service call participating in each subtransaction. Each subtransaction should be semantically correct by itself, with no direct dependency on the following steps. Consider how an order entry system using Web service interfaces for both a corporate customer master and a particular manufacturing execution system (MES) should handle an order for a new customer. If the system logically breaks the overall process into two separate transactions, one to add a new customer to the customer master and a second to place the order, it can avoid any problems with attempting to roll back the first Web service call. If we are unable to connect to the MES to process the order request, the customer has still been logically and completely added to the customer master (see Figure 3).

In a larger system that utilizes functionality provided by many Web services, it may not be possible to isolate different system calls. In this case, it's imperative that the application architecture be carefully planned to minimize the potential for inconsistent data between the participating subsystems.

Queued Processing
In the particular example of submitting to an MES, an alternative approach would be to delay submission. If the MES doesn't apply additional validation rules that cause the order to be rejected, we can assume that order will be accepted. By queuing the order request for later submission to the MES, we can retry as often as necessary to succeed. This approach simplifies the transactional layout, allowing the application to consolidate to only a single logical transaction. However, the application logic may become convoluted, with the customer added to customer master after the order is placed in the message queue for delivery (see Figure 4).

Because queuing systems and databases will both participate in a distributed transaction, we can roll back the queue insert if the customer-add fails. In this way, we can correctly control the data consistency. As an additional side benefit, by delaying delivery to the MES, we also avoid requiring additional 24-hour availability on the existing manufacturing system simply to support the distributed Web service architecture.

Careful Resource Use The above discussion focused on integrating a Web service update with local transactional resources, and earlier I "dismissed" integrating a read-only Web service as easy. Well, nothing in programming is truly easy, and this is no exception. The longer average call time for a remote system call (like a Web service) can negatively impact the behavior of a transactional monitor. In order to coordinate all the actions within a transaction, the transaction manager needs to lock each resource that has been modified for the lifetime of the transaction. This prevents any other user or process from reading new or changed values until after the transaction has been committed. When a Web service call is executed in the middle of a transaction block, the transaction locks will be held until the Web service completes, because the transaction hasn't yet been committed.

The increase in a single transaction time can potentially have a snowball effect on a high-volume system. By holding locks open for one second longer, this transaction will prevent other transactions that need the same resources from even beginning for an extra second. That delay in start can delay other transactions, and so on. If a system was executing tens or hundreds of transactions per second, adding a single Web service call in the middle of the transaction could wreak havoc with the system throughput. A simple approach to avoiding this is to perform read-only queries before beginning any XA-transactions. The thread that calls the Web service will be blocked while waiting..

Returning to the order entry system, if we need to check the customer's maximum credit in order to enter the order, we can retrieve the maximum credit amount before beginning the order entry. Note that this isn't perfect, in that another system may update the credit amounts after we've read the file. However, that's one of the chances for incorrect data created as a result of the lack of a Web service transaction protocol.

In this article, I looked at some of the challenges of integrating Web services into transactional systems. Web services provide an easy way to integrate distributed applications via standard protocols, but there isn't yet a standard mechanism for implementing transactional control. Many of the ideas presented here apply equally well to any nontransactional, high-latency external system call. Hopefully, the architectural guidelines and suggestions will help you build more robust distributed systems.


In any enterprise computing system, certain operations need to be done as a single atomic unit. Either all of the operations need to succeed or all need to fail in order to keep the system's data internally consistent. All enterprise-class database-management systems support the notion of a transaction, to support exactly this requirement. The transaction coordinator makes sure that all of the database writes (updates or inserts) to a single database, are all either committed or rolled back as a single unit of work. As distributed systems evolved, the Open Group developed the XA Specification to enable multiple transactional systems to cooperate in a single distributed transaction.

An XA-compliant transaction is a two-phase transaction with multiple systems participating. Each of the distributed systems has a resource manager, which controls the transaction state on the single database, message queue, file system, etc. A central transaction manager coordinates the final commit or rollback of all the resources in the transaction. Once each of the resource managers has communicated the ability to commit the transaction, the transaction manager issues a command to all of the systems to commit. If any one of the systems doesn't communicate success in the first phase, then the transaction coordinator would command each of the systems to roll back.

The Java Transaction API (JTA) is the Java implementation of the XA specification; COM+ is Microsoft's XA-compliant transaction manager.

More Stories By Dave Rader

David Rader is a partner with Fusion Technologies, a consulting company specializing in application integration and Web services development. Most recently, he's been developing XML-based applications in the small business portal space.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

Microservices Articles
Modern software design has fundamentally changed how we manage applications, causing many to turn to containers as the new virtual machine for resource management. As container adoption grows beyond stateless applications to stateful workloads, the need for persistent storage is foundational - something customers routinely cite as a top pain point. In his session at @DevOpsSummit at 21st Cloud Expo, Bill Borsari, Head of Systems Engineering at Datera, explored how organizations can reap the bene...
"NetApp's vision is how we help organizations manage data - delivering the right data in the right place, in the right time, to the people who need it, and doing it agnostic to what the platform is," explained Josh Atwell, Developer Advocate for NetApp, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
The Jevons Paradox suggests that when technological advances increase efficiency of a resource, it results in an overall increase in consumption. Writing on the increased use of coal as a result of technological improvements, 19th-century economist William Stanley Jevons found that these improvements led to the development of new ways to utilize coal. In his session at 19th Cloud Expo, Mark Thiele, Chief Strategy Officer for Apcera, compared the Jevons Paradox to modern-day enterprise IT, examin...
In his session at 20th Cloud Expo, Mike Johnston, an infrastructure engineer at Supergiant.io, discussed how to use Kubernetes to set up a SaaS infrastructure for your business. Mike Johnston is an infrastructure engineer at Supergiant.io with over 12 years of experience designing, deploying, and maintaining server and workstation infrastructure at all scales. He has experience with brick and mortar data centers as well as cloud providers like Digital Ocean, Amazon Web Services, and Rackspace. H...
Skeuomorphism usually means retaining existing design cues in something new that doesn’t actually need them. However, the concept of skeuomorphism can be thought of as relating more broadly to applying existing patterns to new technologies that, in fact, cry out for new approaches. In his session at DevOps Summit, Gordon Haff, Senior Cloud Strategy Marketing and Evangelism Manager at Red Hat, will discuss why containers should be paired with new architectural practices such as microservices ra...
In his session at 20th Cloud Expo, Scott Davis, CTO of Embotics, discussed how automation can provide the dynamic management required to cost-effectively deliver microservices and container solutions at scale. He also discussed how flexible automation is the key to effectively bridging and seamlessly coordinating both IT and developer needs for component orchestration across disparate clouds – an increasingly important requirement at today’s multi-cloud enterprise.
The Software Defined Data Center (SDDC), which enables organizations to seamlessly run in a hybrid cloud model (public + private cloud), is here to stay. IDC estimates that the software-defined networking market will be valued at $3.7 billion by 2016. Security is a key component and benefit of the SDDC, and offers an opportunity to build security 'from the ground up' and weave it into the environment from day one. In his session at 16th Cloud Expo, Reuven Harrison, CTO and Co-Founder of Tufin, ...
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm. In their Day 3 Keynote at 20th Cloud Expo, Chris Brown, a Solutions Marketing Manager at Nutanix, and Mark Lav...
Many organizations are now looking to DevOps maturity models to gauge their DevOps adoption and compare their maturity to their peers. However, as enterprise organizations rush to adopt DevOps, moving past experimentation to embrace it at scale, they are in danger of falling into the trap that they have fallen into time and time again. Unfortunately, we've seen this movie before, and we know how it ends: badly.
TCP (Transmission Control Protocol) is a common and reliable transmission protocol on the Internet. TCP was introduced in the 70s by Stanford University for US Defense to establish connectivity between distributed systems to maintain a backup of defense information. At the time, TCP was introduced to communicate amongst a selected set of devices for a smaller dataset over shorter distances. As the Internet evolved, however, the number of applications and users, and the types of data accessed and...