Welcome!

Microservices Expo Authors: Elizabeth White, Pat Romanski, Liz McMillan, Anders Wallgren, AppDynamics Blog

Related Topics: Microservices Expo, Java IoT, IoT User Interface, Agile Computing, @CloudExpo

Microservices Expo: Article

Fact Finders: Sorting Out the Truth in Real User Monitoring

Go Real with the right expectations

On my recent visits to Velocity, WebPerfDay and Apps World in London, Real User Monitoring (RUM) was the hot topic. That triggered my thinking about the differences between vendors. They all promise the same for a varying range of prices - from free to a couple thousand US dollars. What I found out is that there IS a big difference and - depending on what you want to do with RUM - you want to make sure you understand the capabilities and limitations of the available solutions.

The false claim of 100% Coverage
What all vendors claim to do is capture data from 100% of your users. When looking closer you see that many of these solutions - especially the "Freemiums" - rely on theW3C Navigation Timings. So my question is: How can I cover ALL Users with W3C timings when these timings are NOT AVAILABLE on all browsers?

W3C timings are only available on new browsers. So - what about the IE6, IE7, IE8, the whole Safari Browser family, older Firefox and Chrome instances? Looking at current statistics they sum up to 35% of the overall market share (http://www.w3counter.com/globalstats.php). The statements of vendors that rely on these timings to capture all users experience are simply not accurate.

The performance impact of monitoring
After finding that out I just asked myself: "Are there anymore deficiencies that can be found?"

I first thought about the collection mechanism which reminded me of the challenges all the Web Analytics tools have. Data collection relies on the browsers onUnload event. The RUM tools have to collect the data till the last second of the lifecycle of the page and then send it off. Most SaaS solution vendors are using an image GET request to send the data to the collection instances. Modern browsers are optimizing this event because "Why should a Browser download an image if the page is about to die?"Modern browsers like Chrome optimized this use case and simply do not execute the request at all or do not wait for response if the data got sent. So again- I am losing data from my real end users. The work around some of the vendors put in place is putting a timeout in the onUnLoad-event. I've seen timeouts with up to 500ms which impact the next page that gets loaded. We want to improve the user experience/performance but these tools are forcing the user to wait longer to move to the next page.

So we are losing all the old browsers and additionally the modern ones that do not execute the data collection requests. We are now far away from 100% coverage.

Do the math
Another argument you always hear is that the RUM solution allows you to find out more about the end user environment's impact on page performance. The geographical region of the end user, the browsers, the OS or device can result in slow page performance. But does this really work?

Let's do some simple math and figure out what this means to a page with 1 000 000 visits a day:

  • 1 000 000 over all visits/day
  • 1 000 000 - 35% visits with no W3C timing support in the browser
  • 650 000- 20% not sending the data correct at all or incomplete
  • 520 000 captured visits per day

Figure 1: Only 52% of visitors are captured by most RUM vendors due to limitations of browsers

So we have reduced or base from 1 000 000 to 520 000. Let's start with the break down into the different goupings:

  • 520000 broken down by 100 countries
  • 520000/100 = 5200 visits/country/day
  • 5200 visits per country broken down by 20 Browser Versions
  • 5200/20 = 260 visits/country/browser version/day

Let's break the 260 visits further down by  10 operating system:

  • 260/10 = 26 visits/country/browser version/operating system/day

We want to have date on an hourly basis:

  • 26/24 ~ 1 visits/country/browser version/operating system/hour

**1 000 000 visits per day =~ 1 visits/country/browser version/operating system/hour! We have done no sampling, we have only country level data, we are looking at visits and not page views!**

To clarify: In this calculation I assume that the visits are evenly distributed over all countries but do not take into account that most solutions do sampling at a rate of 1-20% and look at visits with multiple page views instead of unique URIs - this seems to me as a best case scenario. In reality it can be even worse.

So then, why is Real User Monitoring so popular?...
...because it helps you to improve your Users experience! How can that work after knowing that we might not capture data from all our end users? You only have to change your expectations of what you want to achieve with Real User Monitoring.

What you should expect from your RUM solution is:

  • Support for all browsers - not only the new browsers
  • A reliable data sending mechanism
  • W3C timings support
  • Functional Health information like errors from JavaScript and HTTP - not only timings
  • AJAX/XHR-requests timing - not only timings for page loads
  • The click path of a whole visit - not only separate page views
  • Support for desktop browsers, mobile browsers and mobile native applications in combined view
  • Landing and Exit page analysis

If your selected solution provides all these features to you can go an additional step further and not only monitor your users, you can do real User Experience Management (UEM). I just want to point out what that allows you to do in some short examples.

Example 1: JavaScript Errors - Which one to fix first?
If your RUM- UEM solution provides you with JavaScript errors you can start fixing problems right away. It should be able to show you which messages appear how often in which browser, shown in Figure 2.

Figure 2: Detailed JavaScript error messages are captured for every visit and easy accessible grouped by browser, OS or geo-location

Example 2: Why are my customers leaving my web site?
With the UEM you are now able to not only see that your customers are leaving your web site. You can also figure out if they had technical issues (see Figure 3).

Figure 3: Looking at Exit Pages and correlating it with Failure Rate, Performance and User Experience allows us to quickly identify why visitors leave the website on these pages

Example 3: What did my customer do on the application before he called our support center?
Having every visit and all actions available makes it easy for the support center employees to look up the visit information as part of the triage process (see Figure 4).

Figure 4: Seeing all actions the visitor really executed on the website helps speed up the complaint process as all facts are available

Example 4: Correlating Performance to Business
Analyzing the performance of every single visit and all actions not only allows us to pinpoint problems on individual pages, certain browsers or geographical regions. It also allows us to correlate problems in the application to business. Knowing how much revenue is lost due to declined performance gives application owners better arguments when discussing investments in the infrastructure or additional R&D resources. The dashboard shown in Figure 5correlates Response Time with the number of Visitors by Continent and the generated Orders. Problems in the infrastructure that lead to performance problems of the application can then easily be correlated to lost revenue:

Figure 5: Correlating Business Values such as number of Orders with Page Performance and Infrastructure Health opens a new of communication between Business and Application Owners

Conclusion
W3C timings give us great insight but it is only available in new browsers. Be aware of what your RUM solution vendor promises to you and do not forget about the simple math. Set your expectations right and look for solutions that support visits and health indicators like HTTP errors and JavaScript errors. Go Real with the right expectations.

More Stories By Klaus Enzenhofer

Klaus Enzenhofer has several years of experience and expertise in the field of Web Performance Optimization and User Experience Management. He works as Technical Strategist in the Center of Excellence Team at dynaTrace Software. In this role he influences the development of the dynaTrace Application Performance Management Solution and the Web Performance Optimization Tool dynaTrace AJAX Edition. He mainly gathered his experience in web and performance by developing and running large-scale web portals at Tiscover GmbH.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@MicroservicesExpo Stories
Data-as-a-Service is the complete package for the transformation of raw data into meaningful data assets and the delivery of those data assets. In her session at 18th Cloud Expo, Lakshmi Randall, an industry expert, analyst and strategist, will address: What is DaaS (Data-as-a-Service)? Challenges addressed by DaaS Vendors that are enabling DaaS Architecture options for DaaS
One of the bewildering things about DevOps is integrating the massive toolchain including the dozens of new tools that seem to crop up every year. Part of DevOps is Continuous Delivery and having a complex toolchain can add additional integration and setup to your developer environment. In his session at @DevOpsSummit at 18th Cloud Expo, Miko Matsumura, Chief Marketing Officer of Gradle Inc., will discuss which tools to use in a developer stack, how to provision the toolchain to minimize onboa...
SYS-CON Events announced today that Catchpoint Systems, Inc., a provider of innovative web and infrastructure monitoring solutions, has been named “Silver Sponsor” of SYS-CON's DevOps Summit at 18th Cloud Expo New York, which will take place June 7-9, 2016, at the Javits Center in New York City, NY. Catchpoint is a leading Digital Performance Analytics company that provides unparalleled insight into customer-critical services to help consistently deliver an amazing customer experience. Designed...
With the proliferation of both SQL and NoSQL databases, organizations can now target specific fit-for-purpose database tools for their different application needs regarding scalability, ease of use, ACID support, etc. Platform as a Service offerings make this even easier now, enabling developers to roll out their own database infrastructure in minutes with minimal management overhead. However, this same amount of flexibility also comes with the challenges of picking the right tool, on the right ...
SYS-CON Events announced today that Interoute, owner-operator of one of Europe's largest networks and a global cloud services platform, has been named “Bronze Sponsor” of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2015 at the Javits Center in New York, New York. Interoute is the owner-operator of one of Europe's largest networks and a global cloud services platform which encompasses 12 data centers, 14 virtual data centers and 31 colocation centers, with connections to 195 ad...
Microservices are all the rage right now — and the industry is still learning, experimenting, and developing patterns, for successfully designing, deploying and managing Microservices in the real world. Are you considering jumping on the Microservices-wagon? Do Microservices make sense for your particular use case? What are some of the “gotchas” you should be aware of? This morning on #c9d9 we had experts from popular chat app Kik, SMB SaaS platform Yodle and hosted CI solution Semaphore sha...
SYS-CON Events announced today that Commvault, a global leader in enterprise data protection and information management, has been named “Bronze Sponsor” of SYS-CON's 18th International Cloud Expo, which will take place on June 7–9, 2016, at the Javits Center in New York City, NY, and the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Commvault is a leading provider of data protection and information management...
Microservices are a type of software architecture where large applications are made up of small, self-contained units working together through APIs that are not dependent on a specific language. Each service has a limited scope, concentrates on a specific task and is highly independent. This setup allows IT managers and developers to build systems in a modular way. In his book, “Building Microservices,” Sam Newman said microservices are small, focused components built to do a single thing very w...
With an estimated 50 billion devices connected to the Internet by 2020, several industries will begin to expand their capabilities for retaining end point data at the edge to better utilize the range of data types and sheer volume of M2M data generated by the Internet of Things. In his session at @ThingsExpo, Don DeLoach, CEO and President of Infobright, will discuss the infrastructures businesses will need to implement to handle this explosion of data by providing specific use cases for filte...
SYS-CON Events announced today that Avere Systems, a leading provider of enterprise storage for the hybrid cloud, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. Avere delivers a more modern architectural approach to storage that doesn’t require the overprovisioning of storage capacity to achieve performance, overspending on expensive storage media for inactive data or the overbuilding of data centers ...
CIOs and those charged with running IT Operations are challenged to deliver secure, audited, and reliable compute environments for the applications and data for the business. Behind the scenes these tasks are often accomplished by following onerous time-consuming processes and often the management of these environments and processes will be outsourced to multiple IT service providers. In addition, the division of work is often siloed into traditional "towers" that are not well integrated for cro...
SYS-CON Events announced today that Alert Logic, Inc., the leading provider of Security-as-a-Service solutions for the cloud, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. Alert Logic, Inc., provides Security-as-a-Service for on-premises, cloud, and hybrid infrastructures, delivering deep security insight and continuous protection for customers at a lower cost than traditional security solutions. Ful...
In most cases, it is convenient to have some human interaction with a web (micro-)service, no matter how small it is. A traditional approach would be to create an HTTP interface, where user requests will be dispatched and HTML/CSS pages must be served. This approach is indeed very traditional for a web site, but not really convenient for a web service, which is not intended to be good looking, 24x7 up and running and UX-optimized. Instead, talking to a web service in a chat-bot mode would be muc...
More and more companies are looking to microservices as an architectural pattern for breaking apart applications into more manageable pieces so that agile teams can deliver new features quicker and more effectively. What this pattern has done more than anything to date is spark organizational transformations, setting the foundation for future application development. In practice, however, there are a number of considerations to make that go beyond simply “build, ship, and run,” which changes ho...
SYS-CON Events announced today that AppNeta, the leader in performance insight for business-critical web applications, will exhibit and present at SYS-CON's @DevOpsSummit at Cloud Expo New York, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. AppNeta is the only application performance monitoring (APM) company to provide solutions for all applications – applications you develop internally, business-critical SaaS applications you use and the networks that deli...
WebSocket is effectively a persistent and fat pipe that is compatible with a standard web infrastructure; a "TCP for the Web." If you think of WebSocket in this light, there are other more hugely interesting applications of WebSocket than just simply sending data to a browser. In his session at 18th Cloud Expo, Frank Greco, Director of Technology for Kaazing Corporation, will compare other modern web connectivity methods such as HTTP/2, HTTP Streaming, Server-Sent Events and new W3C event APIs ...
SYS-CON Events announced today that Men & Mice, the leading global provider of DNS, DHCP and IP address management overlay solutions, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. The Men & Mice Suite overlay solution is already known for its powerful application in heterogeneous operating environments, enabling enterprises to scale without fuss. Building on a solid range of diverse platform support,...
The (re?)emergence of Microservices was especially prominent in this week’s news. What are they good for? do they make sense for your application? should you take the plunge? and what do Microservices mean for your DevOps and Continuous Delivery efforts? Continue reading for more on Microservices, containers, DevOps culture, and more top news from the past week. As always, stay tuned to all the news coming from@ElectricCloud on DevOps and Continuous Delivery throughout the week and retweet/favo...
How is your DevOps transformation coming along? How do you measure Agility? Reliability? Efficiency? Quality? Success?! How do you optimize your processes? This morning on #c9d9 we talked about some of the metrics that matter for the different stakeholders throughout the software delivery pipeline. Our panelists shared their best practices.
In a previous article, I demonstrated how to effectively and efficiently install the Dynatrace Application Monitoring solution using Ansible. In this post, I am going to explain how to achieve the same results using Chef with our official dynatrace cookbook available on GitHub and on the Chef Supermarket. In the following hands-on tutorial, we’ll also apply what we see as good practice on working with and extending our deployment automation blueprints to suit your needs.