Welcome!

Microservices Expo Authors: Steve Wilson, Jason Bloomberg, Harry Trott, Derek Weeks, Mamoon Yunus

Related Topics: Java IoT, Microservices Expo, Containers Expo Blog, Machine Learning

Java IoT: Article

Why Response Times Are Often Measured Incorrectly

Response time measurements and how to interpret them

Response times are in many – if not in most – cases the basis for performance analysis. When they are within expected boundaries everything is ok. When they get to high we start optimizing our applications.

So response times play a central role in performance monitoring and analysis. In virtualized and cloud environments they are the most accurate performance metric you can get. Very often, however, people measure and interpret response times the wrong way. This is more than reason enough to discuss the topic of response time measurements and how to interpret them. Therefore I will discuss typical measurement approaches, the related misunderstandings and how to improve measurement approaches.

Averaging information away
When measuring response times, we cannot look at each and every single measurement. Even in very small production systems the number of transactions is unmanageable. Therefore measurements are aggregated for a certain timeframe. Depending on the monitoring configuration this might be seconds, minutes or even hours.

While this aggregation helps us to easily understand response times in large volume systems, it also means that we are losing information. The most common approach to measurement aggregation is using averages. This means the collected measurements are averaged and we are working with the average instead of the real values.

The problem with averages is that they in many cases do not reflect what is happening in the real world. There are two main reasons why working with averages leads to wrong or misleading results.

In the case of measurements that are highly volatile in their value, the average is not representative for actually measured response times. If our measurements range from 1 to 4 seconds the average might be around 2 seconds which certainly does not represent what many of our users perceive.

So averages only provide little insight into real world performance. Instead of working with averages you should use percentiles. If you talk to people who have been working in the performance space for some time, they will tell you that the only reliable metrics to work with are percentiles. In contrast to averages, percentiles define how many users perceived response times slower than a certain threshold. If the 50th percentile for example is 2.5 seconds this means that the response times for 50 percent of your users were less or equal to 2.5 seconds. As you can see this approach is by far closer to reality than using averages

Percentiles and Average of a Measurement Series

Percentiles and Average of a Measurement Series

The only potential downside with percentiles is that they require more data to be stored than averages do. While average calculation only requires the sum and count of all measurements, percentiles require a whole range of measurement values as their calculation is more complex. This is also the reason why not all performance management tools support them.

Putting all in a box
Another important question when aggregating data is which data you use as the basis of your aggregations. If you mix together data for different transaction types like the start page, a search and a credit card validation the results will only be of little value as the base data is kind of apple and oranges. So in addition to ensuring that you are working with percentiles it is necessary to also split transaction types properly so that the data that is the basis for your calculations fits together

The concept of splitting transactions by their business function is often referred to as business transaction management. While the field of BTM is wide, the basic idea is to distinguish transactions in an application by logical parameters like what they do or where they come from. An example would be a “put into cart” transaction or the requests of a certain user.

Only a combination of both approaches ensures that the response times you measure are a solid basis for performance analysis.

Far from the real world
Another point to consider with response times is where they are measured. Most people measure response times at the server-side and implicitly assume that they represent what real users see. While server-side response times are down to 500 milliseconds and everyone thinks everything is fine, users might experience response times of several seconds.

The reason is that server-side response times don’t take a lot of factors influencing end-user response times into account. First of all server-side measurements neglect network transfer time to the end users. This easily adds half a second or more to your response times.

Server vs. Client Response Time

Server vs. Client Response Time

At the same time server-side response times often only measure the initial document sent to the user. All images, JavaScript and CSS files that are required to render a paper properly are not included in this calculation at all. Experts like Steve Souders even say that only 10 percent of the overall response time is influenced by the server side. Even if we consider this an extreme scenario it is obvious that basing performance management solely on server-side metrics does not provide a solid basis for understanding end-user performance.

The situation gets even worse with JavaScript-heavy Web 2.0 applications where a great portion of the application logic is executed within the browser. In this case server-side metrics cannot be taken as representative for end-user performance at all.

Not measuring what you want to know
A common approach to solve this problem is to use synthetic transaction monitoring. This approach often claims to be “close to the end-user”. Commercial providers offer a huge number of locations around the world from where you can test the performance of pre-defined transactions. While this provides better insight into what the perceived performance of end-users is, it is not the full truth.

The most important thing to understand is how these measurements are collected. There are two approaches to collect this data: via emulators or real browsers. From my very personal perspective any approach that does not use real browsers should be avoided as real browsers are also what your users use. They are the only way to get accurate measurements.

The issue with using synthetic transactions for performance measurement is that it is not about real users. Your synthetic transactions might run pretty fast, but that guy with a slow internet connection who just wants to book a $5,000 holiday (ok, a rare case) still sees 10 second response times. Is it the fault of your application? No. Do you care? Yes, because this is your business. Additionally synthetic transaction monitoring cannot monitor all of your transactions. You cannot really book a holiday every couple of minutes, so you at the end only get a portion of your transactions covered by your monitoring.

This does not mean that there is no value in using synthetic transactions. They are great to be informed about availability or network problems that might affect your users, but they do not represent what your users actually see. As a consequence, they do not serve as a solid basis for performance improvements

Measuring at the End-User Level
The only way to get real user performance metrics is to measure from within the users’ browser. There are two approaches to do this. You can user a tool like the free dynaTrace Ajax Edition which uses a browser plug-in to collect performance data or inject JavaScript code to get performance metrics. The W3C now also has a number of standardization activities for browser performance APIs. The Navigation Timing Specification is already supported by recent browsers and the Resource Timing Specification. Open-source implementations like Boomerang provide a convenient way to access performance data within the browser. Products like dynaTrace UEM go further by providing a highly scalable backend and full integration into your server-side systems.

The main idea is to inject custom JavaScript code which captures timing information like the beginning of a request, DOM ready and fully loaded. While these events are sufficient for “classic” web applications they are not enough for Web 2.0 applications which execute a lot of client-side code. In this case the JavaScript code has to be instrumented as well.

Is it enough to measure on the client-side?
The question now is whether it is enough to measure performance from the end-user perspective. If we know how our web application performs for each user we have enough information to see whether an application is slow or fast. If we then combine this data with information like geo location, browser and connection speed we know for which users a problem exists. So from a pure monitoring perspective this is enough.

In case of problems, however, we want to go beyond monitoring. Monitoring only tells us that we have a problem but does not help in finding the cause of the problem. Especially when we measure end-user performance our information is less rich compared to development-centric approaches. We could still use a development-focused tool like dynaTrace Ajax Edition for production troubleshooting. This however requires installing custom software on an end user’s machine. While this might be an option for SaaS environments this is not the case in a typical eCommerce scenario.

The only way to gain this level of insight for diagnostics purposes is to collect information from the browser as well as the server side to have a holistic view on application performance. As discussed using averaged metrics is not enough in this case. Using aggregated data does not provide the insight we need. So instead of aggregated information we require the possibility to identify and relate the requests of a user’s browser to server-side requests.

Client/Server Drill Down of Pages and Actions

Client/Server Drill Down of Pages and Actions

The figure below shows an architecture based (and abstracted) from dynaTrace UEM which provides this functionality. It shows the combination of browser and server-side data capturing on a transactional basis and a centralized performance repository for analysis.

 

Architecture for End-To-End User Experience Monitoring

Architecture for End-To-End User Experience Monitoring

Conclusion
There are many ways where and how to measure response times. Depending on what we want to achieve each one of them provides more or less accurate data. For the analysis of server-side problems measuring at the server-side is enough. We however have to be aware that this does not reflect the response times of our end users. It is a purely technical metric for optimizing the way we create content and service requests. The prerequisite to meaningful measurements is that we separate different transaction types properly.

Measurements from anything but the end-user’s perspective can only be used to optimize your technical infrastructure and only indirectly the performance of end users. Only performance measurements in the browser enable you to understand and optimize user-perceived performance.

Related reading:

  1. Antivirus Add-On for IE to cause 5 times slower page load times The dynaTrace AJAX Community has been really active lately –...
  2. Troubleshooting response time problems – why you cannot trust your system metrics // Production Monitoring is about ensuring the stability and health...
  3. Why you can’t compare cross browser execution times of Selenium Tests // I am currently working on a blog where I...
  4. Application Performance Monitoring in production – A Step-by-Step Guide – Part 1 // Setting up Application Performance Monitoring is a big task,...
  5. Week 9 – How to Measure Application Performance Measurement is the most central concept in any performance-related activity....

More Stories By Alois Reitbauer

Alois Reitbauer is Chief Technical Strategist at Dynatrace. He has spent most of his career building monitoring tools and fine-tuning application performance. A regular conference speaker, blogger, author, and sushi maniac, Alois currently shares his professional time between Linz, Boston, and San Francisco.

@MicroservicesExpo Stories
As today's digital disruptions bounce and smash their way through conventional technologies and conventional wisdom alike, predicting their path is a multifaceted challenge. So many areas of technology advance on Moore's Law-like exponential curves that divining the future is fraught with danger. Such is the problem with artificial intelligence (AI), and its related concepts, including cognitive computing, machine learning, and deep learning.
There are several reasons why businesses migrate their operations to the cloud. Scalability and price are among the most important factors determining this transition. Unlike legacy systems, cloud based businesses can scale on demand. The database and applications in the cloud are not rendered simply from one server located in your headquarters, but is instead distributed across several servers across the world. Such CDNs also bring about greater control in times of uncertainty. A database hack ...
We have Continuous Integration and we have Continuous Deployment, but what’s continuous across all of what we do is people. Even when tasks are automated, someone wrote the automation. So, Jayne Groll evangelizes about Continuous Everyone. Jayne is the CEO of the DevOps Institute and the author of Agile Service Management Guide. She talked about Continuous Everyone at the 2016 All Day DevOps conference. She describes it as "about people, culture, and collaboration mapped into your value streams....
API Security is complex! Vendors like Forum Systems, IBM, CA and Axway have invested almost 2 decades of engineering effort and significant capital in building API Security stacks to lockdown APIs. The API Security stack diagram shown below is a building block for rapidly locking down APIs. The four fundamental pillars of API Security - SSL, Identity, Content Validation and deployment architecture - are discussed in detail below.
“Why didn’t testing catch this” must become “How did this make it to testing?” Traditional quality teams are the crutch and excuse keeping organizations from making the necessary investment in people, process, and technology to accelerate test automation. Just like societies that did not build waterways because the labor to keep carrying the water was so cheap, we have created disincentives to automate. In her session at @DevOpsSummit at 20th Cloud Expo, Anne Hungate, President of Daring System...
Did you know that you can develop for mainframes in Java? Or that the testing and deployment can be automated across mobile to mainframe? In his session and demo at @DevOpsSummit at 21st Cloud Expo, Dana Boudreau, a Senior Director at CA Technologies, will discuss how increasingly teams are developing with agile methodologies, using modern development environments, and automating testing and deployments, mobile to mainframe.
As DevOps methodologies expand their reach across the enterprise, organizations face the daunting challenge of adapting related cloud strategies to ensure optimal alignment, from managing complexity to ensuring proper governance. How can culture, automation, legacy apps and even budget be reexamined to enable this ongoing shift within the modern software factory?
While some vendors scramble to create and sell you a fancy solution for monitoring your spanking new Amazon Lambdas, hear how you can do it on the cheap using just built-in Java APIs yourself. By exploiting a little-known fact that Lambdas aren’t exactly single-threaded, you can effectively identify hot spots in your serverless code. In his session at @DevOpsSummit at 21st Cloud Expo, Dave Martin, Product owner at CA Technologies, will give a live demonstration and code walkthrough, showing how ...
@DevOpsSummit at Cloud Expo taking place Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center, Santa Clara, CA, is co-located with the 21st International Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is ...
We define Hybrid IT as a management approach in which organizations create a workload-centric and value-driven integrated technology stack that may include legacy infrastructure, web-scale architectures, private cloud implementations along with public cloud platforms ranging from Infrastructure-as-a-Service to Software-as-a-Service.
In his session at 20th Cloud Expo, Scott Davis, CTO of Embotics, discussed how automation can provide the dynamic management required to cost-effectively deliver microservices and container solutions at scale. He also discussed how flexible automation is the key to effectively bridging and seamlessly coordinating both IT and developer needs for component orchestration across disparate clouds – an increasingly important requirement at today’s multi-cloud enterprise.
Docker is on a roll. In the last few years, this container management service has become immensely popular in development, especially given the great fit with agile-based projects and continuous delivery. In this article, I want to take a brief look at how you can use Docker to accelerate and streamline the software development lifecycle (SDLC) process.
In his session at 20th Cloud Expo, Chris Carter, CEO of Approyo, discussed the basic set up and solution for an SAP solution in the cloud and what it means to the viability of your company. Chris Carter is CEO of Approyo. He works with business around the globe, to assist them in their journey to the usage of Big Data in the forms of Hadoop (Cloudera and Hortonwork's) and SAP HANA. At Approyo, we support firms who are looking for knowledge to grow through current business process, where even 1%...
With Cloud Foundry you can easily deploy and use apps utilizing websocket technology, but not everybody realizes that scaling them out is not that trivial. In his session at 21st Cloud Expo, Roman Swoszowski, CTO and VP, Cloud Foundry Services, at Grape Up, will show you an example of how to deal with this issue. He will demonstrate a cloud-native Spring Boot app running in Cloud Foundry and communicating with clients over websocket protocol that can be easily scaled horizontally and coordinate...
IT organizations are moving to the cloud in hopes to approve efficiency, increase agility and save money. Migrating workloads might seem like a simple task, but what many businesses don’t realize is that application migration criteria differs across organizations, making it difficult for architects to arrive at an accurate TCO number. In his session at 21st Cloud Expo, Joe Kinsella, CTO of CloudHealth Technologies, will offer a systematic approach to understanding the TCO of a cloud application...
DevOps at Cloud Expo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to w...
API Security has finally entered our security zeitgeist. OWASP Top 10 2017 - RC1 recognized API Security as a first class citizen by adding it as number 10, or A-10 on its list of web application vulnerabilities. We believe this is just the start. The attack surface area offered by API is orders or magnitude larger than any other attack surface area. Consider the fact the APIs expose cloud services, internal databases, application and even legacy mainframes over the internet. What could go wrong...
Cloud adoption is often driven by a desire to increase efficiency, boost agility and save money. All too often, however, the reality involves unpredictable cost spikes and lack of oversight due to resource limitations. In his session at 20th Cloud Expo, Joe Kinsella, CTO and Founder of CloudHealth Technologies, tackled the question: “How do you build a fully optimized cloud?” He will examine: Why TCO is critical to achieving cloud success – and why attendees should be thinking holistically ab...
The goal of Continuous Testing is to shift testing left to find defects earlier and release software faster. This can be achieved by integrating a set of open source functional and performance testing tools in the early stages of your software delivery lifecycle. There is one process that binds all application delivery stages together into one well-orchestrated machine: Continuous Testing. Continuous Testing is the conveyer belt between the Software Factory and production stages. Artifacts are m...
Web services have taken the development world by storm, especially in recent years as they've become more and more widely adopted. There are naturally many reasons for this, but first, let's understand what exactly a web service is. The World Wide Web Consortium (W3C) defines "web of services" as "message-based design frequently found on the Web and in enterprise software". Basically, a web service is a method of sending a message between two devices through a network. In practical terms, this ...