Welcome!

Microservices Expo Authors: Pat Romanski, Dalibor Siroky, Stackify Blog, Elizabeth White, Liz McMillan

Related Topics: Microservices Expo, Java IoT, Machine Learning , Agile Computing, @CloudExpo

Microservices Expo: Article

Fact Finders: Sorting Out the Truth in Real User Monitoring

Go Real with the right expectations

On my recent visits to Velocity, WebPerfDay and Apps World in London, Real User Monitoring (RUM) was the hot topic. That triggered my thinking about the differences between vendors. They all promise the same for a varying range of prices - from free to a couple thousand US dollars. What I found out is that there IS a big difference and - depending on what you want to do with RUM - you want to make sure you understand the capabilities and limitations of the available solutions.

The false claim of 100% Coverage
What all vendors claim to do is capture data from 100% of your users. When looking closer you see that many of these solutions - especially the "Freemiums" - rely on theW3C Navigation Timings. So my question is: How can I cover ALL Users with W3C timings when these timings are NOT AVAILABLE on all browsers?

W3C timings are only available on new browsers. So - what about the IE6, IE7, IE8, the whole Safari Browser family, older Firefox and Chrome instances? Looking at current statistics they sum up to 35% of the overall market share (http://www.w3counter.com/globalstats.php). The statements of vendors that rely on these timings to capture all users experience are simply not accurate.

The performance impact of monitoring
After finding that out I just asked myself: "Are there anymore deficiencies that can be found?"

I first thought about the collection mechanism which reminded me of the challenges all the Web Analytics tools have. Data collection relies on the browsers onUnload event. The RUM tools have to collect the data till the last second of the lifecycle of the page and then send it off. Most SaaS solution vendors are using an image GET request to send the data to the collection instances. Modern browsers are optimizing this event because "Why should a Browser download an image if the page is about to die?"Modern browsers like Chrome optimized this use case and simply do not execute the request at all or do not wait for response if the data got sent. So again- I am losing data from my real end users. The work around some of the vendors put in place is putting a timeout in the onUnLoad-event. I've seen timeouts with up to 500ms which impact the next page that gets loaded. We want to improve the user experience/performance but these tools are forcing the user to wait longer to move to the next page.

So we are losing all the old browsers and additionally the modern ones that do not execute the data collection requests. We are now far away from 100% coverage.

Do the math
Another argument you always hear is that the RUM solution allows you to find out more about the end user environment's impact on page performance. The geographical region of the end user, the browsers, the OS or device can result in slow page performance. But does this really work?

Let's do some simple math and figure out what this means to a page with 1 000 000 visits a day:

  • 1 000 000 over all visits/day
  • 1 000 000 - 35% visits with no W3C timing support in the browser
  • 650 000- 20% not sending the data correct at all or incomplete
  • 520 000 captured visits per day

Figure 1: Only 52% of visitors are captured by most RUM vendors due to limitations of browsers

So we have reduced or base from 1 000 000 to 520 000. Let's start with the break down into the different goupings:

  • 520000 broken down by 100 countries
  • 520000/100 = 5200 visits/country/day
  • 5200 visits per country broken down by 20 Browser Versions
  • 5200/20 = 260 visits/country/browser version/day

Let's break the 260 visits further down by  10 operating system:

  • 260/10 = 26 visits/country/browser version/operating system/day

We want to have date on an hourly basis:

  • 26/24 ~ 1 visits/country/browser version/operating system/hour

**1 000 000 visits per day =~ 1 visits/country/browser version/operating system/hour! We have done no sampling, we have only country level data, we are looking at visits and not page views!**

To clarify: In this calculation I assume that the visits are evenly distributed over all countries but do not take into account that most solutions do sampling at a rate of 1-20% and look at visits with multiple page views instead of unique URIs - this seems to me as a best case scenario. In reality it can be even worse.

So then, why is Real User Monitoring so popular?...
...because it helps you to improve your Users experience! How can that work after knowing that we might not capture data from all our end users? You only have to change your expectations of what you want to achieve with Real User Monitoring.

What you should expect from your RUM solution is:

  • Support for all browsers - not only the new browsers
  • A reliable data sending mechanism
  • W3C timings support
  • Functional Health information like errors from JavaScript and HTTP - not only timings
  • AJAX/XHR-requests timing - not only timings for page loads
  • The click path of a whole visit - not only separate page views
  • Support for desktop browsers, mobile browsers and mobile native applications in combined view
  • Landing and Exit page analysis

If your selected solution provides all these features to you can go an additional step further and not only monitor your users, you can do real User Experience Management (UEM). I just want to point out what that allows you to do in some short examples.

Example 1: JavaScript Errors - Which one to fix first?
If your RUM- UEM solution provides you with JavaScript errors you can start fixing problems right away. It should be able to show you which messages appear how often in which browser, shown in Figure 2.

Figure 2: Detailed JavaScript error messages are captured for every visit and easy accessible grouped by browser, OS or geo-location

Example 2: Why are my customers leaving my web site?
With the UEM you are now able to not only see that your customers are leaving your web site. You can also figure out if they had technical issues (see Figure 3).

Figure 3: Looking at Exit Pages and correlating it with Failure Rate, Performance and User Experience allows us to quickly identify why visitors leave the website on these pages

Example 3: What did my customer do on the application before he called our support center?
Having every visit and all actions available makes it easy for the support center employees to look up the visit information as part of the triage process (see Figure 4).

Figure 4: Seeing all actions the visitor really executed on the website helps speed up the complaint process as all facts are available

Example 4: Correlating Performance to Business
Analyzing the performance of every single visit and all actions not only allows us to pinpoint problems on individual pages, certain browsers or geographical regions. It also allows us to correlate problems in the application to business. Knowing how much revenue is lost due to declined performance gives application owners better arguments when discussing investments in the infrastructure or additional R&D resources. The dashboard shown in Figure 5correlates Response Time with the number of Visitors by Continent and the generated Orders. Problems in the infrastructure that lead to performance problems of the application can then easily be correlated to lost revenue:

Figure 5: Correlating Business Values such as number of Orders with Page Performance and Infrastructure Health opens a new of communication between Business and Application Owners

Conclusion
W3C timings give us great insight but it is only available in new browsers. Be aware of what your RUM solution vendor promises to you and do not forget about the simple math. Set your expectations right and look for solutions that support visits and health indicators like HTTP errors and JavaScript errors. Go Real with the right expectations.

More Stories By Klaus Enzenhofer

Klaus Enzenhofer has several years of experience and expertise in the field of Web Performance Optimization and User Experience Management. He works as Technical Strategist in the Center of Excellence Team at dynaTrace Software. In this role he influences the development of the dynaTrace Application Performance Management Solution and the Web Performance Optimization Tool dynaTrace AJAX Edition. He mainly gathered his experience in web and performance by developing and running large-scale web portals at Tiscover GmbH.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@MicroservicesExpo Stories
While some developers care passionately about how data centers and clouds are architected, for most, it is only the end result that matters. To the majority of companies, technology exists to solve a business problem, and only delivers value when it is solving that problem. 2017 brings the mainstream adoption of containers for production workloads. In his session at 21st Cloud Expo, Ben McCormack, VP of Operations at Evernote, discussed how data centers of the future will be managed, how the p...
The nature of test environments is inherently temporary—you set up an environment, run through an automated test suite, and then tear down the environment. If you can reduce the cycle time for this process down to hours or minutes, then you may be able to cut your test environment budgets considerably. The impact of cloud adoption on test environments is a valuable advancement in both cost savings and agility. The on-demand model takes advantage of public cloud APIs requiring only payment for t...
It has never been a better time to be a developer! Thanks to cloud computing, deploying our applications is much easier than it used to be. How we deploy our apps continues to evolve thanks to cloud hosting, Platform-as-a-Service (PaaS), and now Function-as-a-Service. FaaS is the concept of serverless computing via serverless architectures. Software developers can leverage this to deploy an individual "function", action, or piece of business logic. They are expected to start within milliseconds...
As DevOps methodologies expand their reach across the enterprise, organizations face the daunting challenge of adapting related cloud strategies to ensure optimal alignment, from managing complexity to ensuring proper governance. How can culture, automation, legacy apps and even budget be reexamined to enable this ongoing shift within the modern software factory? In her Day 2 Keynote at @DevOpsSummit at 21st Cloud Expo, Aruna Ravichandran, VP, DevOps Solutions Marketing, CA Technologies, was jo...
You know you need the cloud, but you’re hesitant to simply dump everything at Amazon since you know that not all workloads are suitable for cloud. You know that you want the kind of ease of use and scalability that you get with public cloud, but your applications are architected in a way that makes the public cloud a non-starter. You’re looking at private cloud solutions based on hyperconverged infrastructure, but you’re concerned with the limits inherent in those technologies.
Is advanced scheduling in Kubernetes achievable?Yes, however, how do you properly accommodate every real-life scenario that a Kubernetes user might encounter? How do you leverage advanced scheduling techniques to shape and describe each scenario in easy-to-use rules and configurations? In his session at @DevOpsSummit at 21st Cloud Expo, Oleg Chunikhin, CTO at Kublr, answered these questions and demonstrated techniques for implementing advanced scheduling. For example, using spot instances and co...
The cloud era has reached the stage where it is no longer a question of whether a company should migrate, but when. Enterprises have embraced the outsourcing of where their various applications are stored and who manages them, saving significant investment along the way. Plus, the cloud has become a defining competitive edge. Companies that fail to successfully adapt risk failure. The media, of course, continues to extol the virtues of the cloud, including how easy it is to get there. Migrating...
For DevOps teams, the concepts behind service-oriented architecture (SOA) are nothing new. A style of software design initially made popular in the 1990s, SOA was an alternative to a monolithic application; essentially a collection of coarse-grained components that communicated with each other. Communication would involve either simple data passing or two or more services coordinating some activity. SOA served as a valid approach to solving many architectural problems faced by businesses, as app...
Some journey to cloud on a mission, others, a deadline. Change management is useful when migrating to public, private or hybrid cloud environments in either case. For most, stakeholder engagement peaks during the planning and post migration phases of a project. Legacy engagements are fairly direct: projects follow a linear progression of activities (the “waterfall” approach) – change managers and application coders work from the same functional and technical requirements. Enablement and develo...
Gone are the days when application development was the daunting task of the highly skilled developers backed with strong IT skills, low code application development has democratized app development and empowered a new generation of citizen developers. There was a time when app development was in the domain of people with complex coding and technical skills. We called these people by various names like programmers, coders, techies, and they usually worked in a world oblivious of the everyday pri...
From manual human effort the world is slowly paving its way to a new space where most process are getting replaced with tools and systems to improve efficiency and bring down operational costs. Automation is the next big thing and low code platforms are fueling it in a significant way. The Automation era is here. We are in the fast pace of replacing manual human efforts with machines and processes. In the world of Information Technology too, we are linking disparate systems, softwares and tool...
DevOps is good for organizations. According to the soon to be released State of DevOps Report high-performing IT organizations are 2X more likely to exceed profitability, market share, and productivity goals. But how do they do it? How do they use DevOps to drive value and differentiate their companies? We recently sat down with Nicole Forsgren, CEO and Chief Scientist at DORA (DevOps Research and Assessment) and lead investigator for the State of DevOps Report, to discuss the role of measure...
DevOps is under attack because developers don’t want to mess with infrastructure. They will happily own their code into production, but want to use platforms instead of raw automation. That’s changing the landscape that we understand as DevOps with both architecture concepts (CloudNative) and process redefinition (SRE). Rob Hirschfeld’s recent work in Kubernetes operations has led to the conclusion that containers and related platforms have changed the way we should be thinking about DevOps and...
"As we've gone out into the public cloud we've seen that over time we may have lost a few things - we've lost control, we've given up cost to a certain extent, and then security, flexibility," explained Steve Conner, VP of Sales at Cloudistics,in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
These days, APIs have become an integral part of the digital transformation journey for all enterprises. Every digital innovation story is connected to APIs . But have you ever pondered over to know what are the source of these APIs? Let me explain - APIs sources can be varied, internal or external, solving different purposes, but mostly categorized into the following two categories. Data lakes is a term used to represent disconnected but relevant data that are used by various business units wit...
With continuous delivery (CD) almost always in the spotlight, continuous integration (CI) is often left out in the cold. Indeed, it's been in use for so long and so widely, we often take the model for granted. So what is CI and how can you make the most of it? This blog is intended to answer those questions. Before we step into examining CI, we need to look back. Software developers often work in small teams and modularity, and need to integrate their changes with the rest of the project code b...
"I focus on what we are calling CAST Highlight, which is our SaaS application portfolio analysis tool. It is an extremely lightweight tool that can integrate with pretty much any build process right now," explained Andrew Siegmund, Application Migration Specialist for CAST, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"Cloud4U builds software services that help people build DevOps platforms for cloud-based software and using our platform people can draw a picture of the system, network, software," explained Kihyeon Kim, CEO and Head of R&D at Cloud4U, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Kubernetes is an open source system for automating deployment, scaling, and management of containerized applications. Kubernetes was originally built by Google, leveraging years of experience with managing container workloads, and is now a Cloud Native Compute Foundation (CNCF) project. Kubernetes has been widely adopted by the community, supported on all major public and private cloud providers, and is gaining rapid adoption in enterprises. However, Kubernetes may seem intimidating and complex ...
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm. In their Day 3 Keynote at 20th Cloud Expo, Chris Brown, a Solutions Marketing Manager at Nutanix, and Mark Lav...