Microservices Expo Authors: Liz McMillan, Elizabeth White, Derek Weeks, Hollis Tibbetts, Yeshim Deniz

Related Topics: Microservices Expo, Java IoT, IoT User Interface, Agile Computing, @CloudExpo

Microservices Expo: Article

Fact Finders: Sorting Out the Truth in Real User Monitoring

Go Real with the right expectations

On my recent visits to Velocity, WebPerfDay and Apps World in London, Real User Monitoring (RUM) was the hot topic. That triggered my thinking about the differences between vendors. They all promise the same for a varying range of prices - from free to a couple thousand US dollars. What I found out is that there IS a big difference and - depending on what you want to do with RUM - you want to make sure you understand the capabilities and limitations of the available solutions.

The false claim of 100% Coverage
What all vendors claim to do is capture data from 100% of your users. When looking closer you see that many of these solutions - especially the "Freemiums" - rely on theW3C Navigation Timings. So my question is: How can I cover ALL Users with W3C timings when these timings are NOT AVAILABLE on all browsers?

W3C timings are only available on new browsers. So - what about the IE6, IE7, IE8, the whole Safari Browser family, older Firefox and Chrome instances? Looking at current statistics they sum up to 35% of the overall market share (http://www.w3counter.com/globalstats.php). The statements of vendors that rely on these timings to capture all users experience are simply not accurate.

The performance impact of monitoring
After finding that out I just asked myself: "Are there anymore deficiencies that can be found?"

I first thought about the collection mechanism which reminded me of the challenges all the Web Analytics tools have. Data collection relies on the browsers onUnload event. The RUM tools have to collect the data till the last second of the lifecycle of the page and then send it off. Most SaaS solution vendors are using an image GET request to send the data to the collection instances. Modern browsers are optimizing this event because "Why should a Browser download an image if the page is about to die?"Modern browsers like Chrome optimized this use case and simply do not execute the request at all or do not wait for response if the data got sent. So again- I am losing data from my real end users. The work around some of the vendors put in place is putting a timeout in the onUnLoad-event. I've seen timeouts with up to 500ms which impact the next page that gets loaded. We want to improve the user experience/performance but these tools are forcing the user to wait longer to move to the next page.

So we are losing all the old browsers and additionally the modern ones that do not execute the data collection requests. We are now far away from 100% coverage.

Do the math
Another argument you always hear is that the RUM solution allows you to find out more about the end user environment's impact on page performance. The geographical region of the end user, the browsers, the OS or device can result in slow page performance. But does this really work?

Let's do some simple math and figure out what this means to a page with 1 000 000 visits a day:

  • 1 000 000 over all visits/day
  • 1 000 000 - 35% visits with no W3C timing support in the browser
  • 650 000- 20% not sending the data correct at all or incomplete
  • 520 000 captured visits per day

Figure 1: Only 52% of visitors are captured by most RUM vendors due to limitations of browsers

So we have reduced or base from 1 000 000 to 520 000. Let's start with the break down into the different goupings:

  • 520000 broken down by 100 countries
  • 520000/100 = 5200 visits/country/day
  • 5200 visits per country broken down by 20 Browser Versions
  • 5200/20 = 260 visits/country/browser version/day

Let's break the 260 visits further down by  10 operating system:

  • 260/10 = 26 visits/country/browser version/operating system/day

We want to have date on an hourly basis:

  • 26/24 ~ 1 visits/country/browser version/operating system/hour

**1 000 000 visits per day =~ 1 visits/country/browser version/operating system/hour! We have done no sampling, we have only country level data, we are looking at visits and not page views!**

To clarify: In this calculation I assume that the visits are evenly distributed over all countries but do not take into account that most solutions do sampling at a rate of 1-20% and look at visits with multiple page views instead of unique URIs - this seems to me as a best case scenario. In reality it can be even worse.

So then, why is Real User Monitoring so popular?...
...because it helps you to improve your Users experience! How can that work after knowing that we might not capture data from all our end users? You only have to change your expectations of what you want to achieve with Real User Monitoring.

What you should expect from your RUM solution is:

  • Support for all browsers - not only the new browsers
  • A reliable data sending mechanism
  • W3C timings support
  • Functional Health information like errors from JavaScript and HTTP - not only timings
  • AJAX/XHR-requests timing - not only timings for page loads
  • The click path of a whole visit - not only separate page views
  • Support for desktop browsers, mobile browsers and mobile native applications in combined view
  • Landing and Exit page analysis

If your selected solution provides all these features to you can go an additional step further and not only monitor your users, you can do real User Experience Management (UEM). I just want to point out what that allows you to do in some short examples.

Example 1: JavaScript Errors - Which one to fix first?
If your RUM- UEM solution provides you with JavaScript errors you can start fixing problems right away. It should be able to show you which messages appear how often in which browser, shown in Figure 2.

Figure 2: Detailed JavaScript error messages are captured for every visit and easy accessible grouped by browser, OS or geo-location

Example 2: Why are my customers leaving my web site?
With the UEM you are now able to not only see that your customers are leaving your web site. You can also figure out if they had technical issues (see Figure 3).

Figure 3: Looking at Exit Pages and correlating it with Failure Rate, Performance and User Experience allows us to quickly identify why visitors leave the website on these pages

Example 3: What did my customer do on the application before he called our support center?
Having every visit and all actions available makes it easy for the support center employees to look up the visit information as part of the triage process (see Figure 4).

Figure 4: Seeing all actions the visitor really executed on the website helps speed up the complaint process as all facts are available

Example 4: Correlating Performance to Business
Analyzing the performance of every single visit and all actions not only allows us to pinpoint problems on individual pages, certain browsers or geographical regions. It also allows us to correlate problems in the application to business. Knowing how much revenue is lost due to declined performance gives application owners better arguments when discussing investments in the infrastructure or additional R&D resources. The dashboard shown in Figure 5correlates Response Time with the number of Visitors by Continent and the generated Orders. Problems in the infrastructure that lead to performance problems of the application can then easily be correlated to lost revenue:

Figure 5: Correlating Business Values such as number of Orders with Page Performance and Infrastructure Health opens a new of communication between Business and Application Owners

W3C timings give us great insight but it is only available in new browsers. Be aware of what your RUM solution vendor promises to you and do not forget about the simple math. Set your expectations right and look for solutions that support visits and health indicators like HTTP errors and JavaScript errors. Go Real with the right expectations.

More Stories By Klaus Enzenhofer

Klaus Enzenhofer has several years of experience and expertise in the field of Web Performance Optimization and User Experience Management. He works as Technical Strategist in the Center of Excellence Team at dynaTrace Software. In this role he influences the development of the dynaTrace Application Performance Management Solution and the Web Performance Optimization Tool dynaTrace AJAX Edition. He mainly gathered his experience in web and performance by developing and running large-scale web portals at Tiscover GmbH.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

@MicroservicesExpo Stories
Today every business relies on software to drive the innovation necessary for a competitive edge in the Application Economy. This is why collaboration between development and operations, or DevOps, has become IT’s number one priority. Whether you are in Dev or Ops, understanding how to implement a DevOps strategy can deliver faster development cycles, improved software quality, reduced deployment times and overall better experiences for your customers.
SYS-CON Events announced today that Super Micro Computer, Inc., a global leader in Embedded and IoT solutions, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 7-9, 2017, at the Javits Center in New York City, NY. Supermicro (NASDAQ: SMCI), the leading innovator in high-performance, high-efficiency server technology, is a premier provider of advanced server Building Block Solutions® for Data Center, Cloud Computing, Enterprise IT, Hadoop/Big Data, HPC and ...
@DevOpsSummit has been named the ‘Top DevOps Influencer' by iTrend. iTrend processes millions of conversations, tweets, interactions, news articles, press releases, blog posts - and extract meaning form them and analyzes mobile and desktop software platforms used to communicate, various metadata (such as geo location), and automation tools. In overall placement, @DevOpsSummit ranked as the number one ‘DevOps Influencer' followed by @CloudExpo at third, and @MicroservicesE at 24th.
Explosive growth in connected devices. Enormous amounts of data for collection and analysis. Critical use of data for split-second decision making and actionable information. All three are factors in making the Internet of Things a reality. Yet, any one factor would have an IT organization pondering its infrastructure strategy. How should your organization enhance its IT framework to enable an Internet of Things implementation? In his session at @ThingsExpo, James Kirkland, Red Hat's Chief Arch...
When people aren’t talking about VMs and containers, they’re talking about serverless architecture. Serverless is about no maintenance. It means you are not worried about low-level infrastructural and operational details. An event-driven serverless platform is a great use case for IoT. In his session at @ThingsExpo, Animesh Singh, an STSM and Lead for IBM Cloud Platform and Infrastructure, will detail how to build a distributed serverless, polyglot, microservices framework using open source tec...
“Being able to take needless work out of the system is more important than being able to put more work into the system.” This is one of my favorite quotes from Gene Kim’s book, The Phoenix Project, and it plays directly into why we're announcing the DevOps Express initiative today. Tracing the Steps. For years now, I have witnessed needless work being performed across the DevOps industry. No, not within our clients DevOps and continuous delivery practices. I have seen it in the buyer’s journe...
A completely new computing platform is on the horizon. They’re called Microservers by some, ARM Servers by others, and sometimes even ARM-based Servers. No matter what you call them, Microservers will have a huge impact on the data center and on server computing in general. Although few people are familiar with Microservers today, their impact will be felt very soon. This is a new category of computing platform that is available today and is predicted to have triple-digit growth rates for some ...
Without lifecycle traceability and visibility across the tool chain, stakeholders from Planning-to-Ops have limited insight and answers to who, what, when, why and how across the DevOps lifecycle. This impacts the ability to deliver high quality software at the needed velocity to drive positive business outcomes. In his general session at @DevOpsSummit at 19th Cloud Expo, Eric Robertson, General Manager at CollabNet, will discuss how customers are able to achieve a level of transparency that e...
SYS-CON Events announced today that Enzu will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Enzu’s mission is to be the leading provider of enterprise cloud solutions worldwide. Enzu enables online businesses to use its IT infrastructure to their competitive advantage. By offering a suite of proven hosting and management services, Enzu wants companies to focus on the core of their online busine...
24Notion is full-service global creative digital marketing, technology and lifestyle agency that combines strategic ideas with customized tactical execution. With a broad understand of the art of traditional marketing, new media, communications and social influence, 24Notion uniquely understands how to connect your brand strategy with the right consumer. 24Notion ranked #12 on Corporate Social Responsibility - Book of List.
In his keynote at 19th Cloud Expo, Sheng Liang, co-founder and CEO of Rancher Labs, will discuss the technological advances and new business opportunities created by the rapid adoption of containers. With the success of Amazon Web Services (AWS) and various open source technologies used to build private clouds, cloud computing has become an essential component of IT strategy. However, users continue to face challenges in implementing clouds, as older technologies evolve and newer ones like Docke...
Just over a week ago I received a long and loud sustained applause for a presentation I delivered at this year’s Cloud Expo in Santa Clara. I was extremely pleased with the turnout and had some very good conversations with many of the attendees. Over the next few days I had many more meaningful conversations and was not only happy with the results but also learned a few new things. Here is everything I learned in those three days distilled into three short points.
The reason I believe digital transformation is not only more than a fad, but is actually a life-or-death imperative for every business and IT executive on the planet is simple: there will be no place for an “industrial enterprise” in a digital world. Transformation, by definition, is a metamorphosis from one state to another, wholly new state. As such, a true digital transformation must be the act of transforming an industrial-era organization into something wholly different – the Digital Enter...
In his session at 19th Cloud Expo, Claude Remillard, Principal Program Manager in Developer Division at Microsoft, will contrast how his team used config as code and immutable patterns for continuous delivery of microservices and apps to the cloud. He will show the immutable patterns helps developers do away with most of the complexity of config as code-enabling scenarios such as rollback, zero downtime upgrades with far greater simplicity. He will also have live demos of building immutable pipe...
Application transformation and DevOps practices are two sides of the same coin. Enterprises that want to capture value faster, need to deliver value faster – time value of money principle. To do that enterprises need to build cloud-native apps as microservices by empowering teams to build, ship, and run in production. In his session at @DevOpsSummit at 19th Cloud Expo, Neil Gehani, senior product manager at HPE, will discuss what every business should plan for how to structure their teams to d...
When we talk about the impact of BYOD and BYOA and the Internet of Things, we often focus on the impact on data center architectures. That's because there will be an increasing need for authentication, for access control, for security, for application delivery as the number of potential endpoints (clients, devices, things) increases. That means scale in the data center. What we gloss over, what we skip, is that before any of these "things" ever makes a request to access an application it had to...
SYS-CON Events announced today that Transparent Cloud Computing (T-Cloud) Consortium will exhibit at the 19th International Cloud Expo®, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. The Transparent Cloud Computing Consortium (T-Cloud Consortium) will conduct research activities into changes in the computing model as a result of collaboration between "device" and "cloud" and the creation of new value and markets through organic data proces...
The evolution of JavaScript and HTML 5 to support a genuine component based framework (Web Components) with the necessary tools to deliver something close to a native experience including genuine realtime networking (UDP using WebRTC). HTML5 is evolving to offer built in templating support, the ability to watch objects (which will speed up Angular) and Web Components (which offer Angular Directives). The native level support will offer a massive performance boost to frameworks having to fake all...
In many organizations governance is still practiced by phase or stage gate peer review, and Agile projects are forced to accommodate, which leads to WaterScrumFall or worse. But governance criteria and policies are often very weak anyway, out of date or non-existent. Consequently governance is frequently a matter of opinion and experience, highly dependent upon the experience of individual reviewers. As we all know, a basic principle of Agile methods is delegation of responsibility, and ideally ...
Apache Hadoop is a key technology for gaining business insights from your Big Data, but the penetration into enterprises is shockingly low. In fact, Apache Hadoop and Big Data proponents recognize that this technology has not yet achieved its game-changing business potential. In his session at 19th Cloud Expo, John Mertic, director of program management for ODPi at The Linux Foundation, will explain why this is, how we can work together as an open data community to increase adoption, and the i...