Microservices Expo Authors: Liz McMillan, Pat Romanski, Carmen Gonzalez, Elizabeth White, Jason Bloomberg

Related Topics: Microservices Expo

Microservices Expo: Article

When Exceptions Are the Rule

Achieving reliable and traceable service-oriented architectures

Every now and then, an IT glitch makes national news. Just a few weeks ago, I read in the paper about an airline that mistakenly sold thousands of roundtrip tickets online at a fare of just a few dollars each. The airline lost hundreds of thousands of dollars from the mistake, though that's really just the tip of the iceberg. The National Institute of Software Technology (NIST) estimates that application errors cost the U.S. economy $59.5 billion per year. Because nearly 80 percent of such errors are discovered after applications have been put into production, exceptions also have a significant impact on the productivity and effectiveness of your IT staff and production support teams. And that's to say nothing of foregone revenue due to poor customer service.

We call these unexpected conditions "exceptions," though they happen all the time. They are as unavoidable as they are harmful.

Unlike my highly publicized airline example, the people who know about exceptions are usually limited to customers, IT teams, and line-of-business managers - and they typically find out about exceptions in that order. It's the customer or the end user of an SOA-based business application who is usually the first to witness the consequences of an exception. Common symptoms include opaque messages (such as "Sorry, unable to process request at this time") on Web sites. Such seemingly mild errors eventually translate into more disruptive business exceptions such as delayed orders, lost packages, rejected insurance claims, and so on.

Due to their distributed and heterogeneous nature, services-based systems are inherently vulnerable to exceptions. Exceptions in SOA environments can be broadly categorized into three classes:

  • System-Level Exceptions result from XML/SOAP processing errors or transmission failures. They surface as SOAP faults with inconsistent fault codes.
  • Application-Level Exceptions often come from incorrect message semantics or logical errors within the application. Incorrect data, unchecked boundary conditions, unexpected service, and client responses can be the cause.
  • Business-Level Exceptions denote unacceptable business states in a transaction and are not necessarily technology-related issues. These surface as events that violate best practices, compliance laws, regulations, or business policies mandated by business managers. They include both technology-driven issues (such as an important order not processed within 24 hours) and human errors (such as an incorrect shipping address in a purchase order).
SOA: A Haven for Exceptions
Any developer will tell you that 80 percent of the time required to diagnose an exception is spent simply trying to replicate the scenario. That's due to all the effort of searching through log files and iteratively recoding to add more information to the log. With distributed SOA, possibly stretched across geographies, this task is even more challenging.

Managing exceptions has traditionally been an expensive, extremely manual effort performed by often dedicated application maintenance teams. Since their clues reside in multiple messages that span different services in the business application, exceptions in SOA systems are even harder to detect and diagnose. To begin with, the applications themselves are seldom instrumented to proactively alert on specific exceptions. At best, an application surfaces exceptions as incoherent entries in an error log, such as "Error 00021C: Transaction rejected." These exceptions might be uncovered during routine maintenance of the application. However, as noted earlier, it's the phone calls from vexed customers that usually make IT staff aware of exceptions. Business operations teams seldom come to know about exceptions until it is too late to respond.

So what are you to do about it? You could just shrug your shoulders, accept that exceptions are going to happen, and hope you're not next week's headline news. Or you could look for ways to detect, diagnose, and remedy exceptions before they bring your business to a standstill. Let's explore ways to go about the latter option.

Managing Exceptions in Services-Based Systems
To understand how to manage exceptions in SOA-based environments, start by considering the requisite capabilities. Exceptions must be detected as they occur. To do so, IT and business operations must be able to specify the criteria for spotting exceptions in live business transactions. Typically, you'd look for message patterns that indicate unusual business activity. These may include incongruent reference data, discrepancies in data fields, error messages, and error codes. Sometimes, criteria can be crafted for detecting very specific conditions - for example "raise an exception if a premier customer's order is rejected due to mainframe error code D234200." Other times it's the absence of a message or pattern that's the symptom of an exception.

Since it's impossible to anticipate all patterns, operations teams need to cast as wide a net as possible across their business systems to trap the maximum number of exceptions. As a fallback, they must be able to trace and record all distributed transactions and diagnose this data for the root causes of exceptions.

IT and business teams must know about exceptions immediately. It might be important to alert one or more individuals across different teams based on the nature of the exception. For example, the error code is of interest to the IT staff, while the rejected purchase order and the customer details are important to business types.

The notified personnel must then be able to quickly analyze the situation, understand its cause, and implement a cure. To accomplish this, they need to know not only the exception message pattern but also the context of the business transaction in which it occurred. IT operations must be able to diagnose and resolve the exception in minutes and seconds instead of days and hours. Similarly, business operations must be able to learn about exceptions in real time in order to formalize a resolution before customer service is affected.

For some exceptions, the resolution is clear. In such cases it's important to resolve the exception in-flight by applying automated exception-handling actions.

Why Traditional Approaches Don't Work for SOA
Programmatic exception-handling models have been the mainstay of exception management in business applications. The compilation stage detects and eliminates syntactic errors. Anticipated anomalous business conditions are detected and handled via embedded logic, either in the application source code or in the business process driving the application. Business Process Management (BPM) systems often handle exceptions in process definitions by hardwiring the process definition with corrective actions for a well-defined set of exceptions that might occur while executing the process. Unanticipated conditions and process exits are handled by writing the condition to a log file.

Debugging and testing practices aim to isolate and eliminate logical errors. Quality assurance teams spend countless hours putting the software through scripted production simulations. Then it's up to the consumers of the production systems to report any exceptions to the technical support organization. IT operations staff members depend on applications and system logs to diagnose problems reported by customers. Patches are applied to applications if problems are deemed severe. Additionally, Network Systems Management (NSM) software is used to isolate runtime failures in the hardware or in elements of the physical layer and to trigger alerts.

More Stories By Sean Fitts

Sean Fitts is the chief systems architect for AmberPoint, Inc., the leading provider of SOA management software. Prior to AmberPoint, Sean held positions as lead architect and engineer at Forte Software, where he guided the overall architecture of the SynerJ product suite. Sean also held positions as senior software engineer at Sybase and Management Dynamics. He has a Bachelor of Science in Electrical Engineering/Computer Science from Princeton University. He has been awarded a patent, and has another pending.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

Microservices Articles
Modern software design has fundamentally changed how we manage applications, causing many to turn to containers as the new virtual machine for resource management. As container adoption grows beyond stateless applications to stateful workloads, the need for persistent storage is foundational - something customers routinely cite as a top pain point. In his session at @DevOpsSummit at 21st Cloud Expo, Bill Borsari, Head of Systems Engineering at Datera, explored how organizations can reap the bene...
"NetApp's vision is how we help organizations manage data - delivering the right data in the right place, in the right time, to the people who need it, and doing it agnostic to what the platform is," explained Josh Atwell, Developer Advocate for NetApp, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
The Jevons Paradox suggests that when technological advances increase efficiency of a resource, it results in an overall increase in consumption. Writing on the increased use of coal as a result of technological improvements, 19th-century economist William Stanley Jevons found that these improvements led to the development of new ways to utilize coal. In his session at 19th Cloud Expo, Mark Thiele, Chief Strategy Officer for Apcera, compared the Jevons Paradox to modern-day enterprise IT, examin...
In his session at 20th Cloud Expo, Mike Johnston, an infrastructure engineer at Supergiant.io, discussed how to use Kubernetes to set up a SaaS infrastructure for your business. Mike Johnston is an infrastructure engineer at Supergiant.io with over 12 years of experience designing, deploying, and maintaining server and workstation infrastructure at all scales. He has experience with brick and mortar data centers as well as cloud providers like Digital Ocean, Amazon Web Services, and Rackspace. H...
Skeuomorphism usually means retaining existing design cues in something new that doesn’t actually need them. However, the concept of skeuomorphism can be thought of as relating more broadly to applying existing patterns to new technologies that, in fact, cry out for new approaches. In his session at DevOps Summit, Gordon Haff, Senior Cloud Strategy Marketing and Evangelism Manager at Red Hat, will discuss why containers should be paired with new architectural practices such as microservices ra...
In his session at 20th Cloud Expo, Scott Davis, CTO of Embotics, discussed how automation can provide the dynamic management required to cost-effectively deliver microservices and container solutions at scale. He also discussed how flexible automation is the key to effectively bridging and seamlessly coordinating both IT and developer needs for component orchestration across disparate clouds – an increasingly important requirement at today’s multi-cloud enterprise.
The Software Defined Data Center (SDDC), which enables organizations to seamlessly run in a hybrid cloud model (public + private cloud), is here to stay. IDC estimates that the software-defined networking market will be valued at $3.7 billion by 2016. Security is a key component and benefit of the SDDC, and offers an opportunity to build security 'from the ground up' and weave it into the environment from day one. In his session at 16th Cloud Expo, Reuven Harrison, CTO and Co-Founder of Tufin, ...
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm. In their Day 3 Keynote at 20th Cloud Expo, Chris Brown, a Solutions Marketing Manager at Nutanix, and Mark Lav...
Many organizations are now looking to DevOps maturity models to gauge their DevOps adoption and compare their maturity to their peers. However, as enterprise organizations rush to adopt DevOps, moving past experimentation to embrace it at scale, they are in danger of falling into the trap that they have fallen into time and time again. Unfortunately, we've seen this movie before, and we know how it ends: badly.
TCP (Transmission Control Protocol) is a common and reliable transmission protocol on the Internet. TCP was introduced in the 70s by Stanford University for US Defense to establish connectivity between distributed systems to maintain a backup of defense information. At the time, TCP was introduced to communicate amongst a selected set of devices for a smaller dataset over shorter distances. As the Internet evolved, however, the number of applications and users, and the types of data accessed and...