Welcome!

Microservices Expo Authors: Dana Gardner, Elizabeth White, Ruxit Blog, Liz McMillan, Christopher Keene

Related Topics: Java IoT, Microservices Expo, Containers Expo Blog, IoT User Interface

Java IoT: Article

Why Response Times Are Often Measured Incorrectly

Response time measurements and how to interpret them

Response times are in many – if not in most – cases the basis for performance analysis. When they are within expected boundaries everything is ok. When they get to high we start optimizing our applications.

So response times play a central role in performance monitoring and analysis. In virtualized and cloud environments they are the most accurate performance metric you can get. Very often, however, people measure and interpret response times the wrong way. This is more than reason enough to discuss the topic of response time measurements and how to interpret them. Therefore I will discuss typical measurement approaches, the related misunderstandings and how to improve measurement approaches.

Averaging information away
When measuring response times, we cannot look at each and every single measurement. Even in very small production systems the number of transactions is unmanageable. Therefore measurements are aggregated for a certain timeframe. Depending on the monitoring configuration this might be seconds, minutes or even hours.

While this aggregation helps us to easily understand response times in large volume systems, it also means that we are losing information. The most common approach to measurement aggregation is using averages. This means the collected measurements are averaged and we are working with the average instead of the real values.

The problem with averages is that they in many cases do not reflect what is happening in the real world. There are two main reasons why working with averages leads to wrong or misleading results.

In the case of measurements that are highly volatile in their value, the average is not representative for actually measured response times. If our measurements range from 1 to 4 seconds the average might be around 2 seconds which certainly does not represent what many of our users perceive.

So averages only provide little insight into real world performance. Instead of working with averages you should use percentiles. If you talk to people who have been working in the performance space for some time, they will tell you that the only reliable metrics to work with are percentiles. In contrast to averages, percentiles define how many users perceived response times slower than a certain threshold. If the 50th percentile for example is 2.5 seconds this means that the response times for 50 percent of your users were less or equal to 2.5 seconds. As you can see this approach is by far closer to reality than using averages

Percentiles and Average of a Measurement Series

Percentiles and Average of a Measurement Series

The only potential downside with percentiles is that they require more data to be stored than averages do. While average calculation only requires the sum and count of all measurements, percentiles require a whole range of measurement values as their calculation is more complex. This is also the reason why not all performance management tools support them.

Putting all in a box
Another important question when aggregating data is which data you use as the basis of your aggregations. If you mix together data for different transaction types like the start page, a search and a credit card validation the results will only be of little value as the base data is kind of apple and oranges. So in addition to ensuring that you are working with percentiles it is necessary to also split transaction types properly so that the data that is the basis for your calculations fits together

The concept of splitting transactions by their business function is often referred to as business transaction management. While the field of BTM is wide, the basic idea is to distinguish transactions in an application by logical parameters like what they do or where they come from. An example would be a “put into cart” transaction or the requests of a certain user.

Only a combination of both approaches ensures that the response times you measure are a solid basis for performance analysis.

Far from the real world
Another point to consider with response times is where they are measured. Most people measure response times at the server-side and implicitly assume that they represent what real users see. While server-side response times are down to 500 milliseconds and everyone thinks everything is fine, users might experience response times of several seconds.

The reason is that server-side response times don’t take a lot of factors influencing end-user response times into account. First of all server-side measurements neglect network transfer time to the end users. This easily adds half a second or more to your response times.

Server vs. Client Response Time

Server vs. Client Response Time

At the same time server-side response times often only measure the initial document sent to the user. All images, JavaScript and CSS files that are required to render a paper properly are not included in this calculation at all. Experts like Steve Souders even say that only 10 percent of the overall response time is influenced by the server side. Even if we consider this an extreme scenario it is obvious that basing performance management solely on server-side metrics does not provide a solid basis for understanding end-user performance.

The situation gets even worse with JavaScript-heavy Web 2.0 applications where a great portion of the application logic is executed within the browser. In this case server-side metrics cannot be taken as representative for end-user performance at all.

Not measuring what you want to know
A common approach to solve this problem is to use synthetic transaction monitoring. This approach often claims to be “close to the end-user”. Commercial providers offer a huge number of locations around the world from where you can test the performance of pre-defined transactions. While this provides better insight into what the perceived performance of end-users is, it is not the full truth.

The most important thing to understand is how these measurements are collected. There are two approaches to collect this data: via emulators or real browsers. From my very personal perspective any approach that does not use real browsers should be avoided as real browsers are also what your users use. They are the only way to get accurate measurements.

The issue with using synthetic transactions for performance measurement is that it is not about real users. Your synthetic transactions might run pretty fast, but that guy with a slow internet connection who just wants to book a $5,000 holiday (ok, a rare case) still sees 10 second response times. Is it the fault of your application? No. Do you care? Yes, because this is your business. Additionally synthetic transaction monitoring cannot monitor all of your transactions. You cannot really book a holiday every couple of minutes, so you at the end only get a portion of your transactions covered by your monitoring.

This does not mean that there is no value in using synthetic transactions. They are great to be informed about availability or network problems that might affect your users, but they do not represent what your users actually see. As a consequence, they do not serve as a solid basis for performance improvements

Measuring at the End-User Level
The only way to get real user performance metrics is to measure from within the users’ browser. There are two approaches to do this. You can user a tool like the free dynaTrace Ajax Edition which uses a browser plug-in to collect performance data or inject JavaScript code to get performance metrics. The W3C now also has a number of standardization activities for browser performance APIs. The Navigation Timing Specification is already supported by recent browsers and the Resource Timing Specification. Open-source implementations like Boomerang provide a convenient way to access performance data within the browser. Products like dynaTrace UEM go further by providing a highly scalable backend and full integration into your server-side systems.

The main idea is to inject custom JavaScript code which captures timing information like the beginning of a request, DOM ready and fully loaded. While these events are sufficient for “classic” web applications they are not enough for Web 2.0 applications which execute a lot of client-side code. In this case the JavaScript code has to be instrumented as well.

Is it enough to measure on the client-side?
The question now is whether it is enough to measure performance from the end-user perspective. If we know how our web application performs for each user we have enough information to see whether an application is slow or fast. If we then combine this data with information like geo location, browser and connection speed we know for which users a problem exists. So from a pure monitoring perspective this is enough.

In case of problems, however, we want to go beyond monitoring. Monitoring only tells us that we have a problem but does not help in finding the cause of the problem. Especially when we measure end-user performance our information is less rich compared to development-centric approaches. We could still use a development-focused tool like dynaTrace Ajax Edition for production troubleshooting. This however requires installing custom software on an end user’s machine. While this might be an option for SaaS environments this is not the case in a typical eCommerce scenario.

The only way to gain this level of insight for diagnostics purposes is to collect information from the browser as well as the server side to have a holistic view on application performance. As discussed using averaged metrics is not enough in this case. Using aggregated data does not provide the insight we need. So instead of aggregated information we require the possibility to identify and relate the requests of a user’s browser to server-side requests.

Client/Server Drill Down of Pages and Actions

Client/Server Drill Down of Pages and Actions

The figure below shows an architecture based (and abstracted) from dynaTrace UEM which provides this functionality. It shows the combination of browser and server-side data capturing on a transactional basis and a centralized performance repository for analysis.

 

Architecture for End-To-End User Experience Monitoring

Architecture for End-To-End User Experience Monitoring

Conclusion
There are many ways where and how to measure response times. Depending on what we want to achieve each one of them provides more or less accurate data. For the analysis of server-side problems measuring at the server-side is enough. We however have to be aware that this does not reflect the response times of our end users. It is a purely technical metric for optimizing the way we create content and service requests. The prerequisite to meaningful measurements is that we separate different transaction types properly.

Measurements from anything but the end-user’s perspective can only be used to optimize your technical infrastructure and only indirectly the performance of end users. Only performance measurements in the browser enable you to understand and optimize user-perceived performance.

Related reading:

  1. Antivirus Add-On for IE to cause 5 times slower page load times The dynaTrace AJAX Community has been really active lately –...
  2. Troubleshooting response time problems – why you cannot trust your system metrics // Production Monitoring is about ensuring the stability and health...
  3. Why you can’t compare cross browser execution times of Selenium Tests // I am currently working on a blog where I...
  4. Application Performance Monitoring in production – A Step-by-Step Guide – Part 1 // Setting up Application Performance Monitoring is a big task,...
  5. Week 9 – How to Measure Application Performance Measurement is the most central concept in any performance-related activity....

More Stories By Alois Reitbauer

Alois Reitbauer is Chief Technical Strategist at Dynatrace. He has spent most of his career building monitoring tools and fine-tuning application performance. A regular conference speaker, blogger, author, and sushi maniac, Alois currently shares his professional time between Linz, Boston, and San Francisco.

@MicroservicesExpo Stories
The burgeoning trends around DevOps are translating into new types of IT infrastructure that both developers and operators can take advantage of. The next BriefingsDirect Voice of the Customer thought leadership discussion focuses on the burgeoning trends around DevOps and how that’s translating into new types of IT infrastructure that both developers and operators can take advantage of.
With so much going on in this space you could be forgiven for thinking you were always working with yesterday’s technologies. So much change, so quickly. What do you do if you have to build a solution from the ground up that is expected to live in the field for at least 5-10 years? This is the challenge we faced when we looked to refresh our existing 10-year-old custom hardware stack to measure the fullness of trash cans and compactors.
The emerging Internet of Everything creates tremendous new opportunities for customer engagement and business model innovation. However, enterprises must overcome a number of critical challenges to bring these new solutions to market. In his session at @ThingsExpo, Michael Martin, CTO/CIO at nfrastructure, outlined these key challenges and recommended approaches for overcoming them to achieve speed and agility in the design, development and implementation of Internet of Everything solutions wi...
This digest provides an overview of good resources that are well worth reading. We’ll be updating this page as new content becomes available, so I suggest you bookmark it. Also, expect more digests to come on different topics that make all of our IT-hearts go boom!
SYS-CON Events announced today that Isomorphic Software will exhibit at DevOps Summit at 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Isomorphic Software provides the SmartClient HTML5/AJAX platform, the most advanced technology for building rich, cutting-edge enterprise web applications for desktop and mobile. SmartClient combines the productivity and performance of traditional desktop software with the simp...
Thomas Bitman of Gartner wrote a blog post last year about why OpenStack projects fail. In that article, he outlined three particular metrics which together cause 60% of OpenStack projects to fall short of expectations: Wrong people (31% of failures): a successful cloud needs commitment both from the operations team as well as from "anchor" tenants. Wrong processes (19% of failures): a successful cloud automates across silos in the software development lifecycle, not just within silos.
Node.js and io.js are increasingly being used to run JavaScript on the server side for many types of applications, such as websites, real-time messaging and controllers for small devices with limited resources. For DevOps it is crucial to monitor the whole application stack and Node.js is rapidly becoming an important part of the stack in many organizations. Sematext has historically had a strong support for monitoring big data applications such as Elastic (aka Elasticsearch), Cassandra, Solr, S...
SYS-CON Events announced today that 910Telecom will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Housed in the classic Denver Gas & Electric Building, 910 15th St., 910Telecom is a carrier-neutral telecom hotel located in the heart of Denver. Adjacent to CenturyLink, AT&T, and Denver Main, 910Telecom offers connectivity to all major carriers, Internet service providers, Internet backbones and ...
As the world moves toward more DevOps and Microservices, application deployment to the cloud ought to become a lot simpler. The Microservices architecture, which is the basis of many new age distributed systems such as OpenStack, NetFlix and so on, is at the heart of Cloud Foundry - a complete developer-oriented Platform as a Service (PaaS) that is IaaS agnostic and supports vCloud, OpenStack and AWS. Serverless computing is revolutionizing computing. In his session at 19th Cloud Expo, Raghav...
Monitoring of Docker environments is challenging. Why? Because each container typically runs a single process, has its own environment, utilizes virtual networks, or has various methods of managing storage. Traditional monitoring solutions take metrics from each server and applications they run. These servers and applications running on them are typically very static, with very long uptimes. Docker deployments are different: a set of containers may run many applications, all sharing the resource...
It's been a busy time for tech's ongoing infatuation with containers. Amazon just announced EC2 Container Registry to simply container management. The new Azure container service taps into Microsoft's partnership with Docker and Mesosphere. You know when there's a standard for containers on the table there's money on the table, too. Everyone is talking containers because they reduce a ton of development-related challenges and make it much easier to move across production and testing environm...
DevOps at Cloud Expo, taking place Nov 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 19th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long dev...
The 19th International Cloud Expo has announced that its Call for Papers is open. Cloud Expo, to be held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, brings together Cloud Computing, Big Data, Internet of Things, DevOps, Digital Transformation, Microservices and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding business opportuni...
DevOps at Cloud Expo – being held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA – announces that its Call for Papers is open. Born out of proven success in agile development, cloud computing, and process automation, DevOps is a macro trend you cannot afford to miss. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the world's largest enterprises – and delivering real results. Am...

Modern organizations face great challenges as they embrace innovation and integrate new tools and services. They begin to mature and move away from the complacency of maintaining traditional technologies and systems that only solve individual, siloed problems and work “well enough.” In order to build...

The post Gearing up for Digital Transformation appeared first on Aug. 26, 2016 01:30 PM EDT  Reads: 1,463

Using new techniques of information modeling, indexing, and processing, new cloud-based systems can support cloud-based workloads previously not possible for high-throughput insurance, banking, and case-based applications. In his session at 18th Cloud Expo, John Newton, CTO, Founder and Chairman of Alfresco, described how to scale cloud-based content management repositories to store, manage, and retrieve billions of documents and related information with fast and linear scalability. He addres...
Cloud Expo 2016 New York at the Javits Center New York was characterized by increased attendance and a new focus on operations. These were both encouraging signs for all involved in Cloud Computing and all that it touches. As Conference Chair, I work with the Cloud Expo team to structure three keynotes, numerous general sessions, and more than 150 breakout sessions along 10 tracks. Our job is to balance the state of enterprise IT today with the trends that will be commonplace tomorrow. Mobile...
The following fictional case study is a composite of actual horror stories I’ve heard over the years. Unfortunately, this scenario often occurs when in-house integration teams take on the complexities of DevOps and ALM integration with an enterprise service bus (ESB) or custom integration. It is written from the perspective of an enterprise architect tasked with leading an organization’s effort to adopt Agile to become more competitive. The company has turned to Scaled Agile Framework (SAFe) as ...
SYS-CON Events announced today that eCube Systems, a leading provider of middleware modernization, integration, and management solutions, will exhibit at @DevOpsSummit at 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. eCube Systems offers a family of middleware evolution products and services that maximize return on technology investment by leveraging existing technical equity to meet evolving business needs. ...
To leverage Continuous Delivery, enterprises must consider impacts that span functional silos, as well as applications that touch older, slower moving components. Managing the many dependencies can cause slowdowns. See how to achieve continuous delivery in the enterprise.