| By Sean Fitts | Article Rating: |
|
| September 10, 2005 03:00 PM EDT | Reads: |
19,728 |
Every now and then, an IT glitch makes national news. Just a few weeks ago, I read in the paper about an airline that mistakenly sold thousands of roundtrip tickets online at a fare of just a few dollars each. The airline lost hundreds of thousands of dollars from the mistake, though that's really just the tip of the iceberg. The National Institute of Software Technology (NIST) estimates that application errors cost the U.S. economy $59.5 billion per year. Because nearly 80 percent of such errors are discovered after applications have been put into production, exceptions also have a significant impact on the productivity and effectiveness of your IT staff and production support teams. And that's to say nothing of foregone revenue due to poor customer service.
We call these unexpected conditions "exceptions," though they happen all the time. They are as unavoidable as they are harmful.
Unlike my highly publicized airline example, the people who know about exceptions are usually limited to customers, IT teams, and line-of-business managers - and they typically find out about exceptions in that order. It's the customer or the end user of an SOA-based business application who is usually the first to witness the consequences of an exception. Common symptoms include opaque messages (such as "Sorry, unable to process request at this time") on Web sites. Such seemingly mild errors eventually translate into more disruptive business exceptions such as delayed orders, lost packages, rejected insurance claims, and so on.
Due to their distributed and heterogeneous nature, services-based systems are inherently vulnerable to exceptions. Exceptions in SOA environments can be broadly categorized into three classes:
- System-Level Exceptions result from XML/SOAP processing errors or transmission failures. They surface as SOAP faults with inconsistent fault codes.
- Application-Level Exceptions often come from incorrect message semantics or logical errors within the application. Incorrect data, unchecked boundary conditions, unexpected service, and client responses can be the cause.
- Business-Level Exceptions denote unacceptable business states in a transaction and are not necessarily technology-related issues. These surface as events that violate best practices, compliance laws, regulations, or business policies mandated by business managers. They include both technology-driven issues (such as an important order not processed within 24 hours) and human errors (such as an incorrect shipping address in a purchase order).
Any developer will tell you that 80 percent of the time required to diagnose an exception is spent simply trying to replicate the scenario. That's due to all the effort of searching through log files and iteratively recoding to add more information to the log. With distributed SOA, possibly stretched across geographies, this task is even more challenging.
Managing exceptions has traditionally been an expensive, extremely manual effort performed by often dedicated application maintenance teams. Since their clues reside in multiple messages that span different services in the business application, exceptions in SOA systems are even harder to detect and diagnose. To begin with, the applications themselves are seldom instrumented to proactively alert on specific exceptions. At best, an application surfaces exceptions as incoherent entries in an error log, such as "Error 00021C: Transaction rejected." These exceptions might be uncovered during routine maintenance of the application. However, as noted earlier, it's the phone calls from vexed customers that usually make IT staff aware of exceptions. Business operations teams seldom come to know about exceptions until it is too late to respond.
So what are you to do about it? You could just shrug your shoulders, accept that exceptions are going to happen, and hope you're not next week's headline news. Or you could look for ways to detect, diagnose, and remedy exceptions before they bring your business to a standstill. Let's explore ways to go about the latter option.
Managing Exceptions in Services-Based Systems
To understand how to manage exceptions in SOA-based environments, start by considering the requisite capabilities. Exceptions must be detected as they occur. To do so, IT and business operations must be able to specify the criteria for spotting exceptions in live business transactions. Typically, you'd look for message patterns that indicate unusual business activity. These may include incongruent reference data, discrepancies in data fields, error messages, and error codes. Sometimes, criteria can be crafted for detecting very specific conditions - for example "raise an exception if a premier customer's order is rejected due to mainframe error code D234200." Other times it's the absence of a message or pattern that's the symptom of an exception.
Since it's impossible to anticipate all patterns, operations teams need to cast as wide a net as possible across their business systems to trap the maximum number of exceptions. As a fallback, they must be able to trace and record all distributed transactions and diagnose this data for the root causes of exceptions.
IT and business teams must know about exceptions immediately. It might be important to alert one or more individuals across different teams based on the nature of the exception. For example, the error code is of interest to the IT staff, while the rejected purchase order and the customer details are important to business types.
The notified personnel must then be able to quickly analyze the situation, understand its cause, and implement a cure. To accomplish this, they need to know not only the exception message pattern but also the context of the business transaction in which it occurred. IT operations must be able to diagnose and resolve the exception in minutes and seconds instead of days and hours. Similarly, business operations must be able to learn about exceptions in real time in order to formalize a resolution before customer service is affected.
For some exceptions, the resolution is clear. In such cases it's important to resolve the exception in-flight by applying automated exception-handling actions.
Why Traditional Approaches Don't Work for SOA
Programmatic exception-handling models have been the mainstay of exception management in business applications. The compilation stage detects and eliminates syntactic errors. Anticipated anomalous business conditions are detected and handled via embedded logic, either in the application source code or in the business process driving the application. Business Process Management (BPM) systems often handle exceptions in process definitions by hardwiring the process definition with corrective actions for a well-defined set of exceptions that might occur while executing the process. Unanticipated conditions and process exits are handled by writing the condition to a log file.
Debugging and testing practices aim to isolate and eliminate logical errors. Quality assurance teams spend countless hours putting the software through scripted production simulations. Then it's up to the consumers of the production systems to report any exceptions to the technical support organization. IT operations staff members depend on applications and system logs to diagnose problems reported by customers. Patches are applied to applications if problems are deemed severe. Additionally, Network Systems Management (NSM) software is used to isolate runtime failures in the hardware or in elements of the physical layer and to trigger alerts.
Published September 10, 2005 Reads 19,728
Copyright © 2005 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Sean Fitts
Sean Fitts is the chief systems architect for AmberPoint, Inc., the leading provider of SOA management software. Prior to AmberPoint, Sean held positions as lead architect and engineer at Forte Software, where he guided the overall architecture of the SynerJ product suite. Sean also held positions as senior software engineer at Sybase and Management Dynamics. He has a Bachelor of Science in Electrical Engineering/Computer Science from Princeton University. He has been awarded a patent, and has another pending.
- The Top 150 Players in Cloud Computing
- SYS-CON.TV: Cloud Computing Expo Power Panel
- Why IBM’s Server Chief Got Busted
- SOA World Power Panel on SYS-CON.TV
- 1st Annual GovIT Expo: Letter from the Technical Chair
- Deputy CIO of the CIA to Keynote 1st Annual GovIT Expo
- Stock in Focus: Dragon Capital
- 1st Annual Government IT Conference & Expo: Themes & Topics
- CIA was Headed to an Enterprise Cloud All Along: Jill Tummler Singer
- Cloud Computing Expo: Exclusive Q&A with Yahoo! SVP Cloud Computing
- The Top 150 Players in Cloud Computing
- SOA in the Cloud - Monitoring and Management for Reliability
- How to Diagnose Java Resource Starvation
- SYS-CON.TV: Cloud Computing Expo Power Panel
- Software AG Named "Gold Sponsor" of SOA World Conference & Expo 2009 East
- Why IBM’s Server Chief Got Busted
- IBM & Cloud Computing: How "SOA in the Cloud" Can Produce Real Change
- SYS-CON's Cloud Expo Adds Two New Tracks
- SOA World Power Panel on SYS-CON.TV
- 1st Annual GovIT Expo: Letter from the Technical Chair
- The i-Technology Right Stuff
- Who Are The All-Time Heroes of i-Technology?
- Get the Message
- Where Are RIA Technologies Headed in 2008?
- Success, Arrogance, Rise and Fall
- i-Technology Viewpoint: Is Web 2.0 the Global SOA?
- i-Technology Viewpoint: Thinking Outside the VC Box
- ESB Myth Busters: 10 Enterprise Service Bus Myths Debunked
- i-Technology Viewpoint: When to Leave Your First IT Job
- SOA Web Services Edge Conference Coverage on SYS-CON.TV









The new widgetry features multi-cluster suppo...
























