Click here to close now.



Welcome!

Microservices Expo Authors: Pat Romanski, Cloud Best Practices Network, Elizabeth White, Automic Blog, Anders Wallgren

Related Topics: Java IoT, Industrial IoT, Microservices Expo, IoT User Interface, Agile Computing, @CloudExpo

Java IoT: Article

Evolving an APM Strategy for the 21st Century

How the approach to APM has evolved to adapt to a complex ecosystem

I started in the web performance industry - well before Application Performance Management (APM) existed - during a time when external, single page measurement ruled the land. In an ecosystem where no other solutions existed, it was the top of the data chain to support the rapidly evolving world of web applications. This was an effective approach to APM, as most online applications were self-contained and, compared to the modern era, relatively simple in their design.

A state-of-the-art web application, circa 2000

Soon, a new solution rose to the top of the ecosystem - the synthetic, multi-step business process, played back either in a browser or a browser simulator. By evolving beyond the single-page measurement, this more complex data collection methodology was able to provide a view into the most critical business processes, delivering repeatable baseline and benchmark data that could be used by operations and business teams to track the health of the applications and identify when and where issues occurred.

These multi-step processes ruled the ecosystem for nearly a decade, evolving to include finer detail, deeper analytics, wider browser selection, and greater geographic coverage. But, like anything at the apex of an ecosystem, even this approach began to show that it couldn't answer every question.

In the modern online application environment, companies are delivering data to multiple browsers and mobile devices while creating increasingly sophisticated applications. These applications are developed using a combination of in-house code, commercial and open source packages and servers, and outside services to extend the application beyond what the in-house team specializes in.

This growth and complexity means that the traditional, stand-alone tools are no longer complex and "smart" enough to help customers actually solve the problems they face in their online applications. This means that a new approach, the next step in APM evolution, was needed to displace the current technologies at the top of the pyramid.

A state-of-the-art online application, circa 2013

This ecosystem, with multiple, sometimes competing, data streams makes it extremely difficult to answer the seemingly simple question of What is happening?, and sometimes nearly impossible to answer the important question of And why does it matter to us?

Let's walk through a performance issue and show how the approach to APM has evolved to adapt to the complex ecosystem, and why we find that it requires a sophisticated, integrated approach to allow the flood of data to turn into a concentrated stream of actionable information.

Starting with synthetic data, we already have two unique perspectives that provide a broader scope of data than the traditional datacenter-only approach. By combining Backbone (traditional datacenter synthetic monitoring) with data from the Last Mile (data collected from end-user competitors running the same scripts that are run from the Backbone), the clear differences in performance appear, giving companies an idea that the datacenter-only approach needs to be extended by collecting data from a source that is much closer to the customers that use the monitored application.

Outside-In Data Capture Perspectives used to provide the user experience data for online applications

Using a real-world scenario, let's follow the diagnostic process of a detected issue from the initial synthetic errors to the deepest level of impact, and see how a new, integrated APM solution can help resolve issues in an effective, efficient, and actionable way.

Starting with a three-hour snapshot of synthetic data, it's apparent that there is an issue almost halfway through this period, affecting primarily the Backbone measurements.

Examination of Individual Synthetic Measurements to identify outliers and errors

The clear cluster of errors (red squares in the scatter plot) around 17:30 is seen to be affecting Backbone only by filtering out the blue Last Mile measurements. After this filtering, zooming in allows us to quickly see that these errors are focused on the Backbone measurement perspective.

Filtered Scatter Plot Data Showing the Backbone Perspective, Focusing on the Errors

Examining the data shows that they are all script playback issues related to a missing element on the step, preventing the next action in the script from being executed.

A waterfall chart showing that the script execution failed due to an expected page element not appearing

But there are two questions that need to be answered: Why? And Does this matter? What's interesting is that as good as the synthetic tool is, this is as far as it can go. This forces teams to investigate the issue further and replicate it using other tools, wasting precious time.

But an evolved APM strategy doesn't stop here. By isolating the time period and error, the modern, integrated toolset can now ask and answer both those questions, and extend the information to: Who else was affected?

In the above instance, we know that the issue occurred from Pennsylvania. By using a user-experience monitoring (UEM) tool that captures data from all incoming visitors, we can filter the data to examine just the synthetic test visit.

Already, we have extended the data provided by the synthetic measurement. By drilling down further, it immediately becomes clear what the issue was.

Click on "Chart" takes over 60 seconds of Server Time

And then, the final step, what was happening on the server-side? Well, it's clear that one layer of the application was causing the issue and eventually the server timed out.

Issue is in the Crystaldecision API - Something to pass to the developers and QA team!

So, the element that was needed to make the script move forward wasn't there because the process that was generating the element timed out. When the agent decided to attempt the action, the missing element caused the script to fail.

This integrated approach has identified that the Click on ‘Chart' action is one of potential concern and we can now go back and look at all instances of this action in the past 24 hours to see if there are visits that encountered a similar issue. It's clear that this is a serious issue that needs to be investigated. The following screenshot shows all click-on chart actions that experienced this problem including those from REAL users that were also impacted by this problem.

A list of all visitors - Synthetic and Real - affected by the click on "Chart" issue in a 24-hour period, indicating a high priority issue

From an error on a Synthetic chart, we have quickly been able to move down to an issue that has been repeated multiple times over the past 24 hours, affecting not only synthetic users but also real users. Exporting all of this data and sending it to the QA and development teams will allow them to focus their efforts on the critical area.

This integrated approach has shown what has been proven in ecosystems all throughout the world, whether they are in nature or in applications: a tightly integrated group that seamlessly works together is far more effective than an individual. With many eyes, perspectives, and complementary areas of expertise, the team approach has provided far more data to solve the problem than any one of the perspectives could have on its own.

More Stories By Stephen Pierzchala

With more than a decade in the web performance industry, Stephen Pierzchala has advised many organizations, from Fortune 500 to startups, in how to improve the performance of their web applications by helping them develop and evolve the unique speed, conversion, and customer experience metrics necessary to effectively measure, manage, and evolve online web and mobile applications that improve performance and increase revenue. Working on projects for top companies in the online retail, financial services, content delivery, ad-delivery, and enterprise software industries, he has developed new approaches to web performance data analysis. Stephen has led web performance methodology, CDN Assessment, SaaS load testing, technical troubleshooting, and performance assessments, demonstrating the value of the web performance. He noted for his technical analyses and knowledge of Web performance from the outside-in.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@MicroservicesExpo Stories
SYS-CON Events announced today that Alert Logic, Inc., the leading provider of Security-as-a-Service solutions for the cloud, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. Alert Logic, Inc., provides Security-as-a-Service for on-premises, cloud, and hybrid infrastructures, delivering deep security insight and continuous protection for customers at a lower cost than traditional security solutions. Ful...
At the heart of the Cloud Native model is a microservices application architecture, and applying this to a telco SDN scenario offers enormous opportunity for product innovation and competitive advantage. For example in the ETSI NFV Ecosystem white paper they describe one of the product markets that SDN might address to be the Home sector. Vendors like Alcatel market SDN-based solutions for the home market, offering Home Gateways – A virtual residential gateway (vRGW) where service provider...
SYS-CON Events announced today that VAI, a leading ERP software provider, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. VAI (Vormittag Associates, Inc.) is a leading independent mid-market ERP software developer renowned for its flexible solutions and ability to automate critical business functions for the distribution, manufacturing, specialty retail and service sectors. An IBM Premier Business Part...
SYS-CON Events announced today that Catchpoint Systems, Inc., a provider of innovative web and infrastructure monitoring solutions, has been named “Silver Sponsor” of SYS-CON's DevOps Summit at 18th Cloud Expo New York, which will take place June 7-9, 2016, at the Javits Center in New York City, NY. Catchpoint is a leading Digital Performance Analytics company that provides unparalleled insight into customer-critical services to help consistently deliver an amazing customer experience. Designed...
SYS-CON Events announced today that AppNeta, the leader in performance insight for business-critical web applications, will exhibit and present at SYS-CON's @DevOpsSummit at Cloud Expo New York, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. AppNeta is the only application performance monitoring (APM) company to provide solutions for all applications – applications you develop internally, business-critical SaaS applications you use and the networks that deli...
In the Bimodal model we find two areas of IT - the traditional kind where the main concern is keeping the lights on and the IT focusing on agility and speed, where everything needs to be faster. Today companies are investing in new technologies and processes to emulate their most agile competitors. Gone are the days of waterfall development and releases only every few months. Today's IT and the business it powers demands performance akin to a supercar - everything needs to be faster, every sc...
The (re?)emergence of Microservices was especially prominent in this week’s news. What are they good for? do they make sense for your application? should you take the plunge? and what do Microservices mean for your DevOps and Continuous Delivery efforts? Continue reading for more on Microservices, containers, DevOps culture, and more top news from the past week. As always, stay tuned to all the news coming from@ElectricCloud on DevOps and Continuous Delivery throughout the week and retweet/favo...
In most cases, it is convenient to have some human interaction with a web (micro-)service, no matter how small it is. A traditional approach would be to create an HTTP interface, where user requests will be dispatched and HTML/CSS pages must be served. This approach is indeed very traditional for a web site, but not really convenient for a web service, which is not intended to be good looking, 24x7 up and running and UX-optimized. Instead, talking to a web service in a chat-bot mode would be muc...
More and more companies are looking to microservices as an architectural pattern for breaking apart applications into more manageable pieces so that agile teams can deliver new features quicker and more effectively. What this pattern has done more than anything to date is spark organizational transformations, setting the foundation for future application development. In practice, however, there are a number of considerations to make that go beyond simply “build, ship, and run,” which changes ho...
SYS-CON Events announced today that Interoute, owner-operator of one of Europe's largest networks and a global cloud services platform, has been named “Bronze Sponsor” of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2015 at the Javits Center in New York, New York. Interoute is the owner-operator of one of Europe's largest networks and a global cloud services platform which encompasses 12 data centers, 14 virtual data centers and 31 colocation centers, with connections to 195 ad...
As software organizations continue to invest in achieving Continuous Delivery (CD) of their applications, we see increased interest in microservices architectures, which–on the face of it–seem like a natural fit for enabling CD. In microservices (or its predecessor, “SOA”), the business functionality is decomposed into a set of independent, self-contained services that communicate with each other via an API. Each of the services has their own application release cycle, and are developed and depl...
With microservices, SOA and distributed architectures becoming more popular, it is becoming increasingly harder to keep track of where time is spent in a distributed application when trying to diagnose performance problems. Distributed tracing systems attempt to address this problem by following application requests across service boundaries, persisting metadata along the way that provide context for fine-grained performance monitoring.
The battle over bimodal IT is heating up. Now that there’s a reasonably broad consensus that Gartner’s advice about bimodal IT is deeply flawed – consensus everywhere except perhaps at Gartner – various ideas are springing up to fill the void. The bimodal problem, of course, is well understood. ‘Traditional’ or ‘slow’ IT uses hidebound, laborious processes that would only get in the way of ‘fast’ or ‘agile’ digital efforts. The result: incoherent IT strategies and shadow IT struggles that lead ...
SYS-CON Events announced today that Commvault, a global leader in enterprise data protection and information management, has been named “Bronze Sponsor” of SYS-CON's 18th International Cloud Expo, which will take place on June 7–9, 2016, at the Javits Center in New York City, NY, and the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Commvault is a leading provider of data protection and information management...
If we look at slow, traditional IT and jump to the conclusion that just because we found its issues intractable before, that necessarily means we will again, then it’s time for a rethink. As a matter of fact, the world of IT has changed over the last ten years or so. We’ve been experiencing unprecedented innovation across the board – innovation in technology as well as in how people organize and accomplish tasks. Let’s take a look at three differences between today’s modern, digital context...
Web performance issues and advances have been gaining a stronger presence in the headlines as people are becoming more aware of its impact on virtually every business, and 2015 was no exception. We saw a myriad of major outages this year hit some of the biggest corporations, as well as some technology integrations and other news that we IT Ops aficionados find very exciting. This past year has offered several opportunities for growth and evolution in the performance realm — even the worst failu...
Are you someone who knows that the number one rule in DevOps is “Don’t Panic”? Especially when it comes to making Continuous Delivery changes inside your organization? Are you someone that theorizes that if anyone implements real automation changes, the solution will instantly become antiquated and be replaced by something even more bizarre and inexplicable?
Welcome to the first top DevOps news roundup of 2016! At the end of last year, we saw some great predictions for 2016. While we’re excited to kick off the new year, this week’s top posts reminded us to take a second to slow down and really understand the current state of affairs. For example, do you actually know what microservices are – or aren’t? What about DevOps? Does the emphasis still fall mostly on the development side? This week’s top news definitely got the wheels turning and just migh...
Test automation is arguably the most important innovation to the process of QA testing in software development. The ability to automate regression testing and other repetitive test cases can significantly reduce the overall production time for even the most complex solutions. As software continues to be developed for new platforms – including mobile devices and the diverse array of endpoints that will be created during the rise of the Internet of Things - automation integration will have a huge ...
Providing a full-duplex communication channel over a single TCP connection, WebSocket is the most efficient protocol for real-time responses over the web. If you’re utilizing WebSocket technology, performance testing will boil down to simulating the bi-directional nature of your application. Introduced with HTML5, the WebSocket protocol allows for more interaction between a browser and website, facilitating real-time applications and live content. WebSocket technology creates a persistent conne...