Welcome!

Microservices Expo Authors: Elizabeth White, Aruna Ravichandran, Liz McMillan, Pat Romanski, Cameron Van Orman

Related Topics: Java IoT, Microservices Expo, Containers Expo Blog, Machine Learning

Java IoT: Article

Why Response Times Are Often Measured Incorrectly

Response time measurements and how to interpret them

Response times are in many – if not in most – cases the basis for performance analysis. When they are within expected boundaries everything is ok. When they get to high we start optimizing our applications.

So response times play a central role in performance monitoring and analysis. In virtualized and cloud environments they are the most accurate performance metric you can get. Very often, however, people measure and interpret response times the wrong way. This is more than reason enough to discuss the topic of response time measurements and how to interpret them. Therefore I will discuss typical measurement approaches, the related misunderstandings and how to improve measurement approaches.

Averaging information away
When measuring response times, we cannot look at each and every single measurement. Even in very small production systems the number of transactions is unmanageable. Therefore measurements are aggregated for a certain timeframe. Depending on the monitoring configuration this might be seconds, minutes or even hours.

While this aggregation helps us to easily understand response times in large volume systems, it also means that we are losing information. The most common approach to measurement aggregation is using averages. This means the collected measurements are averaged and we are working with the average instead of the real values.

The problem with averages is that they in many cases do not reflect what is happening in the real world. There are two main reasons why working with averages leads to wrong or misleading results.

In the case of measurements that are highly volatile in their value, the average is not representative for actually measured response times. If our measurements range from 1 to 4 seconds the average might be around 2 seconds which certainly does not represent what many of our users perceive.

So averages only provide little insight into real world performance. Instead of working with averages you should use percentiles. If you talk to people who have been working in the performance space for some time, they will tell you that the only reliable metrics to work with are percentiles. In contrast to averages, percentiles define how many users perceived response times slower than a certain threshold. If the 50th percentile for example is 2.5 seconds this means that the response times for 50 percent of your users were less or equal to 2.5 seconds. As you can see this approach is by far closer to reality than using averages

Percentiles and Average of a Measurement Series

Percentiles and Average of a Measurement Series

The only potential downside with percentiles is that they require more data to be stored than averages do. While average calculation only requires the sum and count of all measurements, percentiles require a whole range of measurement values as their calculation is more complex. This is also the reason why not all performance management tools support them.

Putting all in a box
Another important question when aggregating data is which data you use as the basis of your aggregations. If you mix together data for different transaction types like the start page, a search and a credit card validation the results will only be of little value as the base data is kind of apple and oranges. So in addition to ensuring that you are working with percentiles it is necessary to also split transaction types properly so that the data that is the basis for your calculations fits together

The concept of splitting transactions by their business function is often referred to as business transaction management. While the field of BTM is wide, the basic idea is to distinguish transactions in an application by logical parameters like what they do or where they come from. An example would be a “put into cart” transaction or the requests of a certain user.

Only a combination of both approaches ensures that the response times you measure are a solid basis for performance analysis.

Far from the real world
Another point to consider with response times is where they are measured. Most people measure response times at the server-side and implicitly assume that they represent what real users see. While server-side response times are down to 500 milliseconds and everyone thinks everything is fine, users might experience response times of several seconds.

The reason is that server-side response times don’t take a lot of factors influencing end-user response times into account. First of all server-side measurements neglect network transfer time to the end users. This easily adds half a second or more to your response times.

Server vs. Client Response Time

Server vs. Client Response Time

At the same time server-side response times often only measure the initial document sent to the user. All images, JavaScript and CSS files that are required to render a paper properly are not included in this calculation at all. Experts like Steve Souders even say that only 10 percent of the overall response time is influenced by the server side. Even if we consider this an extreme scenario it is obvious that basing performance management solely on server-side metrics does not provide a solid basis for understanding end-user performance.

The situation gets even worse with JavaScript-heavy Web 2.0 applications where a great portion of the application logic is executed within the browser. In this case server-side metrics cannot be taken as representative for end-user performance at all.

Not measuring what you want to know
A common approach to solve this problem is to use synthetic transaction monitoring. This approach often claims to be “close to the end-user”. Commercial providers offer a huge number of locations around the world from where you can test the performance of pre-defined transactions. While this provides better insight into what the perceived performance of end-users is, it is not the full truth.

The most important thing to understand is how these measurements are collected. There are two approaches to collect this data: via emulators or real browsers. From my very personal perspective any approach that does not use real browsers should be avoided as real browsers are also what your users use. They are the only way to get accurate measurements.

The issue with using synthetic transactions for performance measurement is that it is not about real users. Your synthetic transactions might run pretty fast, but that guy with a slow internet connection who just wants to book a $5,000 holiday (ok, a rare case) still sees 10 second response times. Is it the fault of your application? No. Do you care? Yes, because this is your business. Additionally synthetic transaction monitoring cannot monitor all of your transactions. You cannot really book a holiday every couple of minutes, so you at the end only get a portion of your transactions covered by your monitoring.

This does not mean that there is no value in using synthetic transactions. They are great to be informed about availability or network problems that might affect your users, but they do not represent what your users actually see. As a consequence, they do not serve as a solid basis for performance improvements

Measuring at the End-User Level
The only way to get real user performance metrics is to measure from within the users’ browser. There are two approaches to do this. You can user a tool like the free dynaTrace Ajax Edition which uses a browser plug-in to collect performance data or inject JavaScript code to get performance metrics. The W3C now also has a number of standardization activities for browser performance APIs. The Navigation Timing Specification is already supported by recent browsers and the Resource Timing Specification. Open-source implementations like Boomerang provide a convenient way to access performance data within the browser. Products like dynaTrace UEM go further by providing a highly scalable backend and full integration into your server-side systems.

The main idea is to inject custom JavaScript code which captures timing information like the beginning of a request, DOM ready and fully loaded. While these events are sufficient for “classic” web applications they are not enough for Web 2.0 applications which execute a lot of client-side code. In this case the JavaScript code has to be instrumented as well.

Is it enough to measure on the client-side?
The question now is whether it is enough to measure performance from the end-user perspective. If we know how our web application performs for each user we have enough information to see whether an application is slow or fast. If we then combine this data with information like geo location, browser and connection speed we know for which users a problem exists. So from a pure monitoring perspective this is enough.

In case of problems, however, we want to go beyond monitoring. Monitoring only tells us that we have a problem but does not help in finding the cause of the problem. Especially when we measure end-user performance our information is less rich compared to development-centric approaches. We could still use a development-focused tool like dynaTrace Ajax Edition for production troubleshooting. This however requires installing custom software on an end user’s machine. While this might be an option for SaaS environments this is not the case in a typical eCommerce scenario.

The only way to gain this level of insight for diagnostics purposes is to collect information from the browser as well as the server side to have a holistic view on application performance. As discussed using averaged metrics is not enough in this case. Using aggregated data does not provide the insight we need. So instead of aggregated information we require the possibility to identify and relate the requests of a user’s browser to server-side requests.

Client/Server Drill Down of Pages and Actions

Client/Server Drill Down of Pages and Actions

The figure below shows an architecture based (and abstracted) from dynaTrace UEM which provides this functionality. It shows the combination of browser and server-side data capturing on a transactional basis and a centralized performance repository for analysis.

 

Architecture for End-To-End User Experience Monitoring

Architecture for End-To-End User Experience Monitoring

Conclusion
There are many ways where and how to measure response times. Depending on what we want to achieve each one of them provides more or less accurate data. For the analysis of server-side problems measuring at the server-side is enough. We however have to be aware that this does not reflect the response times of our end users. It is a purely technical metric for optimizing the way we create content and service requests. The prerequisite to meaningful measurements is that we separate different transaction types properly.

Measurements from anything but the end-user’s perspective can only be used to optimize your technical infrastructure and only indirectly the performance of end users. Only performance measurements in the browser enable you to understand and optimize user-perceived performance.

Related reading:

  1. Antivirus Add-On for IE to cause 5 times slower page load times The dynaTrace AJAX Community has been really active lately –...
  2. Troubleshooting response time problems – why you cannot trust your system metrics // Production Monitoring is about ensuring the stability and health...
  3. Why you can’t compare cross browser execution times of Selenium Tests // I am currently working on a blog where I...
  4. Application Performance Monitoring in production – A Step-by-Step Guide – Part 1 // Setting up Application Performance Monitoring is a big task,...
  5. Week 9 – How to Measure Application Performance Measurement is the most central concept in any performance-related activity....

More Stories By Alois Reitbauer

Alois Reitbauer is Chief Technical Strategist at Dynatrace. He has spent most of his career building monitoring tools and fine-tuning application performance. A regular conference speaker, blogger, author, and sushi maniac, Alois currently shares his professional time between Linz, Boston, and San Francisco.

@MicroservicesExpo Stories
Is advanced scheduling in Kubernetes achievable? Yes, however, how do you properly accommodate every real-life scenario that a Kubernetes user might encounter? How do you leverage advanced scheduling techniques to shape and describe each scenario in easy-to-use rules and configurations? In his session at @DevOpsSummit at 21st Cloud Expo, Oleg Chunikhin, CTO at Kublr, will answer these questions and demonstrate techniques for implementing advanced scheduling. For example, using spot instances ...
We all know that end users experience the Internet primarily with mobile devices. From an app development perspective, we know that successfully responding to the needs of mobile customers depends on rapid DevOps – failing fast, in short, until the right solution evolves in your customers' relationship to your business. Whether you’re decomposing an SOA monolith, or developing a new application cloud natively, it’s not a question of using microservices – not doing so will be a path to eventual b...
Transforming cloud-based data into a reportable format can be a very expensive, time-intensive and complex operation. As a SaaS platform with more than 30 million global users, Cornerstone OnDemand’s challenge was to create a scalable solution that would improve the time it took customers to access their user data. Our Real-Time Data Warehouse (RTDW) process vastly reduced data time-to-availability from 24 hours to just 10 minutes. In his session at 21st Cloud Expo, Mark Goldin, Chief Technolo...
Digital transformation leaders have poured tons of money and effort into coding in recent years. And with good reason. To succeed at digital, you must be able to write great code. You also have to build a strong Agile culture so your coding efforts tightly align with market signals and business outcomes. But if your investments in testing haven’t kept pace with your investments in coding, you’ll lose. But if your investments in testing haven’t kept pace with your investments in coding, you’ll...
In his session at 21st Cloud Expo, Michael Burley, a Senior Business Development Executive in IT Services at NetApp, will describe how NetApp designed a three-year program of work to migrate 25PB of a major telco's enterprise data to a new STaaS platform, and then secured a long-term contract to manage and operate the platform. This significant program blended the best of NetApp’s solutions and services capabilities to enable this telco’s successful adoption of private cloud storage and launchi...
Enterprises are adopting Kubernetes to accelerate the development and the delivery of cloud-native applications. However, sharing a Kubernetes cluster between members of the same team can be challenging. And, sharing clusters across multiple teams is even harder. Kubernetes offers several constructs to help implement segmentation and isolation. However, these primitives can be complex to understand and apply. As a result, it’s becoming common for enterprises to end up with several clusters. Thi...
Containers are rapidly finding their way into enterprise data centers, but change is difficult. How do enterprises transform their architecture with technologies like containers without losing the reliable components of their current solutions? In his session at @DevOpsSummit at 21st Cloud Expo, Tony Campbell, Director, Educational Services at CoreOS, will explore the challenges organizations are facing today as they move to containers and go over how Kubernetes applications can deploy with lega...
Today most companies are adopting or evaluating container technology - Docker in particular - to speed up application deployment, drive down cost, ease management and make application delivery more flexible overall. As with most new architectures, this dream takes significant work to become a reality. Even when you do get your application componentized enough and packaged properly, there are still challenges for DevOps teams to making the shift to continuous delivery and achieving that reducti...
DevOps at Cloud Expo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to w...
SYS-CON Events announced today that Cloud Academy has been named “Bronze Sponsor” of SYS-CON's 21st International Cloud Expo®, which will take place on Oct. 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Cloud Academy is the leading technology training platform for enterprise multi-cloud infrastructure. Cloud Academy is trusted by leading companies to deliver continuous learning solutions across Amazon Web Services, Microsoft Azure, Google Cloud Platform, and the most...
The last two years has seen discussions about cloud computing evolve from the public / private / hybrid split to the reality that most enterprises will be creating a complex, multi-cloud strategy. Companies are wary of committing all of their resources to a single cloud, and instead are choosing to spread the risk – and the benefits – of cloud computing across multiple providers and internal infrastructures, as they follow their business needs. Will this approach be successful? How large is the ...
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm. In their Day 3 Keynote at 20th Cloud Expo, Chris Brown, a Solutions Marketing Manager at Nutanix, and Mark Lav...
Many organizations adopt DevOps to reduce cycle times and deliver software faster; some take on DevOps to drive higher quality and better end-user experience; others look to DevOps for a clearer line-of-sight to customers to drive better business impacts. In truth, these three foundations go together. In this power panel at @DevOpsSummit 21st Cloud Expo, moderated by DevOps Conference Co-Chair Andi Mann, industry experts will discuss how leading organizations build application success from all...
DevSecOps – a trend around transformation in process, people and technology – is about breaking down silos and waste along the software development lifecycle and using agile methodologies, automation and insights to help get apps to market faster. This leads to higher quality apps, greater trust in organizations, less organizational friction, and ultimately a five-star customer experience. These apps are the new competitive currency in this digital economy and they’re powered by data. Without ...
A common misconception about the cloud is that one size fits all. Companies expecting to run all of their operations using one cloud solution or service must realize that doing so is akin to forcing the totality of their business functionality into a straightjacket. Unlocking the full potential of the cloud means embracing the multi-cloud future where businesses use their own cloud, and/or clouds from different vendors, to support separate functions or product groups. There is no single cloud so...
For most organizations, the move to hybrid cloud is now a question of when, not if. Fully 82% of enterprises plan to have a hybrid cloud strategy this year, according to Infoholic Research. The worldwide hybrid cloud computing market is expected to grow about 34% annually over the next five years, reaching $241.13 billion by 2022. Companies are embracing hybrid cloud because of the many advantages it offers compared to relying on a single provider for all of their cloud needs. Hybrid offers bala...
With the modern notion of digital transformation, enterprises are chipping away at the fundamental organizational and operational structures that have been with us since the nineteenth century or earlier. One remarkable casualty: the business process. Business processes have become so ingrained in how we envision large organizations operating and the roles people play within them that relegating them to the scrap heap is almost unimaginable, and unquestionably transformative. In the Digital ...
These days, APIs have become an integral part of the digital transformation journey for all enterprises. Every digital innovation story is connected to APIs . But have you ever pondered over to know what are the source of these APIs? Let me explain - APIs sources can be varied, internal or external, solving different purposes, but mostly categorized into the following two categories. Data lakes is a term used to represent disconnected but relevant data that are used by various business units wit...
The nature of the technology business is forward-thinking. It focuses on the future and what’s coming next. Innovations and creativity in our world of software development strive to improve the status quo and increase customer satisfaction through speed and increased connectivity. Yet, while it's exciting to see enterprises embrace new ways of thinking and advance their processes with cutting edge technology, it rarely happens rapidly or even simultaneously across all industries.
It has never been a better time to be a developer! Thanks to cloud computing, deploying our applications is much easier than it used to be. How we deploy our apps continues to evolve thanks to cloud hosting, Platform-as-a-Service (PaaS), and now Function-as-a-Service. FaaS is the concept of serverless computing via serverless architectures. Software developers can leverage this to deploy an individual "function", action, or piece of business logic. They are expected to start within milliseconds...