Welcome!

SOA & WOA Authors: Peter Silva, Maureen O'Gara, Tony Bishop, Mark O'Neill, Yeshim Deniz

Related Topics: SOA & WOA, .NET

SOA & WOA: Article

SOA Performance: Monitoring Bottlenecks in an Ultra-Heterogeneous Environment

The Microsoft Word of functional requirements

To state the obvious: with mission-critical applications, your mission will fail around the same time your applications do. This truism is of immediate concern to .NET developers involved with Service Oriented Architectures (SOA), the loosely coupled software services that now support all kinds of business processes, including supply chains and customer-facing online applications. Failures or even brief slowdowns can take immense tolls because, well, the company's mission itself is affected.

But monitoring application performance in a SOA can be especially challenging because loosely coupled architectures introduce a paradox: while the communication between disparate systems has been simplified, the environment in which these services are deployed is exponentially more complex. This complexity makes it ever more difficult to manage the overall system performance and reliability because the bottlenecks are ever more difficult to trace.

It's not that SOAs are themselves inherently complex. They only get that way after they grow. After all, SOAs are really nothing more than a collection of services that communicate with each other using simple data passing or with multiple services coordinating some activity. SOAs have grown in popularity in part because the enabling technology can take so many forms. For example, an inventory management system might be composed of multiple services - search, price quotes, and invoice generation - that are each exposed through a Web Service. Or a customer's request might get routed from an ASP.NET application to multiple backends including SQL Server and external Web Service providers. Or HTTP requests might be transformed into Java Messaging Service (JMS) messages through an Enterprise Service Bus. Or remote services might be exposed through portlets into customer-facing applications via Web Services Remote Portlets.

Where things get complicated is when the size of the SOA grows past the ability of mere mortals like us to track them. But SOAs do tend to grow: the ease at which services can be loosely coupled makes expansion both tempting and quick. As the services multiply, so do the connection points between them. Growth takes place in many forms - the addition of message brokers, integration brokers, enterprise service buses, portal servers, and Web Services.

While adding any of these creates a measure of complexity, Web Services are the place where seemingly intractable problems are most likely to arise, because Web Services have an especially wide lineage. An application behind the service might reside on a Windows or Linux server or a mainframe. It may reside in your building or across the country. And it may have been written last week, or created 25 years ago.

Consider a COBOL mainframe application built in-house in 1985 to store actuarial data. The original developers have long since retired, but they did a good job - the database keeps on ticking. In the pre-SOA world, this was a self-contained system with no exposure to the outside world. Now, by putting a SOA wrapper around it, this legacy system is suddenly part of a brand new Web self-service application. It may talk to other apps, including the Web interface itself, that were built in the last six months. And most important, the customers for this actuarial data are no longer just the in-house statisticians, but casual shoppers on the Internet whose expectations have been set by companies like Amazon and eBay.

SOA's use of legacy systems is convenient and cheap, because developers don't have to reinvent the wheel. The insurance company with the hand-built legacy mainframe application may also be linking in systems from SAP, Oracle, and software from a consulting company purchased two years ago. Information flow in this environment gets complicated. It can move through an application that's spawned by an application server then hit a database and a CRM system -all with systems made by different companies and running on different classes of hardware.

But this convenience can result in a performance sinkhole because the entire system's throughput resides, in part, on aging systems created in a bygone era, when real-time response wasn't even a design consideration. Nevertheless, the weak link of the chain takes the whole system down. When any SOA service encounter problems, the performance and the availability of the entire SOA application is directly impacted.

In other words, SOA is a hyper-heterogeneous environment - a superb way to leverage legacy systems and applications - but a bear to troubleshoot. When all goes well, end users and IT staff alike blissfully go about their business. But when even a single service snags, the application bogs down and, if it's mission-critical, heads can roll.

The Solution: Deep Monitoring
The biggest problem associated with a SOA environment is very human: when problems arise, who do you call and what do you tell them? Did the system snag surface somewhere upstream, in a local application, or in the interaction between the SOA app and the service? Because all of these are possibilities, and human nature being what it is, fingers tend to be pointed away from the culpable party, toward someone else - anyone else. "Is the problem at your end, my end or in between?" "Is our system not responding quickly enough to remote queries, or is the problem at your end with the way you handle the returned data." Nailing any of this down gets tougher as SOAs grow because a bottleneck from one service can affect several others.

Given the complexity of the hyper-heterogeneous SOA environment, deep system monitoring is essential - and that is true both for the .NET and Java worlds. On the .NET side, monitoring should provide comprehensive views into each .NET application, identifying performance problems whether they're inside .NET's common language runtime environment or in supporting systems such as databases. Visibility should be at the transaction level, in addition to aggregate level of performance metrics. Obviously, the technology shouldn't itself be a bottleneck; it should incur low overhead across hardware and operating systems. The system should be rapidly deployable, with no need for application developers to write additional code for application performance management. When problems occur, the monitoring system should identify which affect users and customers the most, thereby helping administrators triage problems based on severity, then identify the right owners to solve them. This last point is crucial: monitoring is the answer to the vendor blame game because it effectively documents where the problem is taking place. Monitoring enables application personnel on your side of the fence to point out that a service request that once took 30 seconds to process is now taking 10 minutes. With that kind of data, the blame game comes to an end.

Single Pane of Glass
The best way to do monitoring is to add probes to .NET applications through a standard Microsoft interface. The probes collect and send performance metrics back to a central server. This approach provides deep analysis without impacting system performance - which can be a problem with other approaches, especially as monitored data volumes increase.

The system should be able to automatically discover the components that .NET applications rely on, such as those in ASP.NET, Active Server Methods (ASMX), ADO.NET, Enterprise Services, Directory Services, and .NET Messaging. And each view should have sufficient depth, so that visibility into, say, ADO.NET reaches all the way down to the individual SQL statements. By identifying the relationships among these .NET components, the system enables IT personnel to follow the flow of transactions through multiple systems. It can isolate specific slowdowns and provide detailed information for resolving them.

Ideally, the system should also monitor the Windows environment through integrated Perfmon metrics, helping IT administrators understand the relationship between the Windows environment and the performance and availability of the .NET application. Monitoring of Web Services that is produced and consumed by .NET applications should include live views of individual transactions, the number and nature of Web Service faults, and details of component interactions.

Because SOA often encompasses both .NET and Java, the system should also be able to track Java applications running on the Java Platform, Enterprise Edition (Java EE). A solution that uses a common technology foundation in both will enable IT administrators to integrate the monitoring of the two environments and correlate performance between .NET and Java applications.

Making Sense of the Data
To make sense of all this data, a well-designed, flexible dashboard is critical. It shouldn't just list raw metrics, but provide both graphing and reporting features. The ability to customize is important and should be available to both technical and non-technical personnel. That's because operations personnel who identify and triage performance issues may have little or no expertise in programming or the .NET Framework, yet may need to specify how they interact with the system. The same is true for a CIO, who will usually want the high-level overview. Regardless of their job description, all stakeholders will need a "single pane of glass" - that is, a single place to view all performance information of the disparate applications, with a uniform performance scale so that, in evaluating the health of the system, "apples" aren't compared to "oranges." Real-time monitoring is a given, and the performance of every application component should be replayable, as well. Such a "flight data recorder" feature enables IT administrators to recreate and examine the conditions that cause errors, including intermittent errors that can be especially difficult to observe in real-time.

Staying Alert
A good monitoring system such as Introscope for Microsoft .NET from CA Wily Technology, will go beyond a plain alarm function to provide a wider overview - an especially important feature in a complex SOA environment. For example, an automatic base-lining function can show what normal performance looks like for any service, thereby establishing what constitutes an abnormal bottleneck. Some systems also provide a grid that diagrammatically maps all related SOA services along with their interconnections, thereby providing an aerial view.

The system should be able to notify administrators onscreen, through e-mail or by cell phone pager when predefined events occur. If appropriate, the system should be able to use native APIs or .NET messages to handle the events, so that notification can be integrated with existing management solutions such as Microsoft Operations Manager.

Conclusion
The rise of SOAs that use applications built on .NET technologies, as well as Java EE, requires IT administrators to implement proven processes for application performance management. By simplifying the management of complex Web applications, a good monitoring system is indispensable in improving the customer experience, ensuring superior service delivery and achieving business goals better.

More Stories By Patrick Chang

Patrick Chang is a senior product manager at Wily Technology, the leader in enterprise application management. He is responsible for defining product strategy, driving product specification, and working with industry partners to develop innovative management solutions for the enterprise application market.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.