Welcome!

Microservices Expo Authors: Liz McMillan, Elizabeth White, Charles Araujo, Ed Witkovic, Pat Romanski

Related Topics: @DevOpsSummit, Microservices Expo, Containers Expo Blog, @CloudExpo

@DevOpsSummit: Article

Real User #Monitoring | @DevOpsSummit #APM #DevOps #ContinuousDelivery

Enterprises are interested in understanding how they analyze performance to positively impact business metrics

With online viewership and sales growing rapidly, enterprises are interested in understanding how they analyze performance to positively impact business metrics. Deeper insight into the user experience is needed to understand why conversions are dropping and/or bounce rates are increasing or, preferably, to understand what has been helping these metrics improve.

The digital performance management industry has evolved as application performance management companies have broadened their scope beyond synthetic testing that simulates users loading specific pages at regular intervals to include web and mobile testing, and real user monitoring (RUM).  As synthetic monitoring gained popularity, performance engineers realized the variations that exist from real end users were not being captured. This led to the introduction of RUM - the process of capturing, analyzing and reporting data from a real end user's interaction with a website. RUM has been around for more than a decade, but the technology is still in its infancy.

Five factors contributing to the shift towards RUM to complement synthetic testing

Ability to measure third-party resources
Websites are complex, with many different resources affecting performance. While there is no way to reliably detect the number of third party scripts, the number of third-party components is growing, with the average web page now requesting over 30% of their resources from third party domains, as shown in Figure 1. These components have multiple purposes, including   tracking users, ad insertion, and  A/B testing. Understanding the impact these components have on the end user experience is critical.

Figure 1 - Growth in third party vs first party resources per page, 2011-2015

Mobile matters
With more users accessing applications primarily on mobile devices, understanding mobile performance is increasingly important. Metrics must be captured from desktop and mobile devices alike. Just because an application performs well on a desktop does not mean it will perform well on a mobile device. If you have or want to have mobile customers, ensure you are able to capture metrics from them. Mobile presents unique challenges, such as congestion and latency, that can have significant impacts on page performance.

With a growing  mobile user base, RUM is frequently correlated with bandwidth measured in the last mile, to determine whether the impact to performance is a result of unpredictable last mile conditions. This need is increasingly seen in many major Asian economies, where a large proportion of consumers' primary means of internet access is a mobile phone. Major eCommerce players in Asia report over 65% of transactions are made from mobile devices. With such a big customer base, monitoring performance on the mobile web and understanding the influence of carrier impact on performance is critical to doing business. Some businesses have therefore instrumented ability to profile expected levels of user experience as it relates to carrier impact on performance.

Validate performance for specific users or geographies
Synthetic measurements may not be available from all geographies. To understand why a service level agreement in a specific region is not being met, the only way to capture information may be through real users in that geographic location. Real user measurements also enable customers to validate whether issues reported by synthetic testing are widespread across user base or localized to geos or local to the synthetic test tools.

Continuous Delivery
As more organizations move to a continuous delivery model, synthetic tests may need to be frequently re-scripted. As the time to deliver and release content decreases, organizations are looking at ways to quickly gather performance data. Some have decided the fastest way to gather performance metrics on a just-released page or feature is through data from real users.

Native applications
As organizations evolve from mobile websites to native apps, the need to gather metrics from these applications becomes increasingly important.

What features should you look for in a RUM solution?
Knowing that you need a RUM solution is the first step.   The second step is identifying what features are required to meet your business needs.  With a variety of solutions available in the market, identifying the must-have and the nice-to-have features is important to find the best fit.  Here are a few features you should consider.

Real-time and actionable data
Most RUM tools  display insights in the dashboard for the user in near real-time.  This information can be coupled with near real time tracking information from business analytics tools like Google Analytics. Performance data from RUM solutions should be cross-checked against metrics such as site visits, conversions,user location and device/browser insights. Many website operators continuously monitor any changes in the business metrics since they are indicative of problems in performance; further, it enables them to minimize false positives or isolated issues in performance.

User experience timings
Trends in performance optimization testing have  moved away from metrics like time to first byte (TTFB) and page load towards measurements more accurately reflecting the user experience - such as start render and speed index.  A user does not necessarily care when the content on the bottom of the page has loaded - when critical resources have been loaded and the page appears usable is what matters. Ensure the metrics you are gathering accurately reflect what you are attempting to measure and optimize.

Granular information
While page-level metrics are a good start, they don't reveal  precisely what resources are causing content to load slowly, nor  the relevance of each metric. Combining resource timing on specific elements with where the resource is (above or below "the fold") can help organizations filter out the noise and collect actionable information. Intersection Observer can help you identify which resources are loading above or below the fold and prioritize what to do to remedy the impact.

Impact of ads
With large numbers of pages being populated with ads, understanding the impact of the ads is important. RUM tools can identify both the performance impact of an ad in terms of when the ad was fetched and how long it took to download, as well as user engagement - such as how many users watched a video ad in its entirety.

Correlation to business metrics
While there have been many articles describing the impact of performance on business in eCommerce companies - for example, impact on conversions - the same isn't true for media companies. Media companies are more interested in scroll depth, virality of content, and session length.  Soasta recently announced an Activity Impact Score as a way to correlate web performance to session length. Measurements like the Activity Impact Score help non-eCommerce companies measure and monitor engagement and how performance can negatively or positively impact user engagement. Further, with bonuses tied to metrics such as page views, organizations are increasingly scrutinizing RUM metrics and insist on verifying the integrity of these tools.

End device support & ease of measurement
With the plethora of device types and browsers on the market, you need to ensure the RUM solution implemented will capture traffic from the majority of your users. In some Asian countries, over 35% of browsers and devices are unknown, which presents an interesting challenge: should you just forget about these users, or find a way to reliably measure performance on these unknown devices?

Another important factor to consider is how easy is it to enable RUM measurements? Does it require manual instrumentation of every web page or is this automatically done by injection of a script?

End to end perspective
Frequently the performance issues can be anywhere in the delivery network or end user. The ability to zero in on the problem quickly requires correlation of metrics from the end user, last mile, delivery network and the server.

Dynamic thresholds and alerts
The connectivity of an end user's device can change throughout the day. At work, they may be browsing the internet on a high-speed connection; on the commute home, they may be on their mobile device with high latency and congestion; and at night, they may be at home on a DSL or fiber connection. Expecting the same level of performance at all times is unrealistic. Having the ability to set variable thresholds is more indicative of the real user experience.

What solutions exist today
In addition to commercial solutions like Soasta, New Relic, and Google Analytics' Site Speed, there are three specifications from the W3C that enable you to build your own solution - navigation timing, resource timing and user timing. Browser support for these specifications vary, with navigation timing having the greatest adoption, since it has been available the longest.

Navigation timing captures the timing of various events as a page loads, from the HTTP request until all content has been received, parsed, and executed by the browser. This provides high-level information on the overall page load time and can be used to get details on items such as DNS lookups and latency.

Figure 2 shows the various timings available from the navigation timing API:

Figure 2 - Navigation timing events

Among many metrics that can be computed using the navigation timing events, the following are most often used:

  • TimeToFirstByte = responseStart - requestStart
  • TimeToInteractive = domInteractive - requestStart
  • TimeToPageLoad = loadEventEnd - requestStart

While page-level information is helpful, you may want to know how various resources on a page perform. This is where the resource timing specification comes in. Resource timing enables you to collect complete timing information for any resource within a page,with some restrictions for security purposes.  The resource timings available for the request and response are shown in Figure 3.

Figure 3 - Resource timing events

Once resource and navigation timing specifications were available for all resources, the next step was to provide the ability to gather custom metrics to understand where an application is spending the most time. The user timing specification allows marks to be inserted in code enabling the  measurement of time deltas between various marks. This makes it possible to determine information like when a hero image is displayed, when fonts are loaded, and when scripts are done blocking.

Evolving quality measurements
As quality measurements evolve, they will become better at providing actionable insights that recommend specific improvements to mitigate performance bottlenecks - not only at the browser end point, but from an end-to-end perspective.

Increasingly, RUM measurements will leverage machine learning to more deeply understand traffic patterns and dynamically adapt to  changing patterns.

RUM measurements will evolve to include the time a given resource starts to execute and completes execution in the browser.

Also, device-agnostic solutions will no doubt emerge. Metrics need to be captured across the entire spectrum of user endpoints. Not gathering statistics from large percentages of users whose browsers don't support the technology leaves gaping blind spots in the visibility you have on the end user experience.

*    *    *

RUM gives organizations the ability to isolate and identify the cause of performance degradation in a web application, whether it is related to the browser, third-party content, the network provider, the CDN, or infrastructure. RUM is a piece of the puzzle; when used in conjunction with other tools and analytics, it can be used  to quickly recommend web application optimizations.

More Stories By Krishnan Manjeri

Krishnan is a seasoned product manager and is currently a Director of Product Management at InstartLogic responsible for Data Platform, Analytics and Performance. He has nearly 2 decades of experience in leading & delivering solutions, in various capacities from Engineering to Marketing and Product Management, for a variety of fortune 500 companies in the areas of Analytics, Telecommunication Networks, Application Delivery and Security. He has extensive experience leading cross-functional teams and delivering multi-million dollars in revenue in both the Enterprise and Service Provider. He has an MS in Computer Science from Case Western Reserve University and an MBA from Santa Clara University. He has a couple of patents in the area of Networking and Security.

@MicroservicesExpo Stories
In his keynote at 19th Cloud Expo, Sheng Liang, co-founder and CEO of Rancher Labs, discussed the technological advances and new business opportunities created by the rapid adoption of containers. With the success of Amazon Web Services (AWS) and various open source technologies used to build private clouds, cloud computing has become an essential component of IT strategy. However, users continue to face challenges in implementing clouds, as older technologies evolve and newer ones like Docker c...
The next XaaS is CICDaaS. Why? Because CICD saves developers a huge amount of time. CD is an especially great option for projects that require multiple and frequent contributions to be integrated. But… securing CICD best practices is an emerging, essential, yet little understood practice for DevOps teams and their Cloud Service Providers. The only way to get CICD to work in a highly secure environment takes collaboration, patience and persistence. Building CICD in the cloud requires rigorous ar...
"This all sounds great. But it's just not realistic." This is what a group of five senior IT executives told me during a workshop I held not long ago. We were working through an exercise on the organizational characteristics necessary to successfully execute a digital transformation, and the group was doing their ‘readout.' The executives loved everything we discussed and agreed that if such an environment existed, it would make transformation much easier. They just didn't believe it was reali...
All organizations that did not originate this moment have a pre-existing culture as well as legacy technology and processes that can be more or less amenable to DevOps implementation. That organizational culture is influenced by the personalities and management styles of Executive Management, the wider culture in which the organization is situated, and the personalities of key team members at all levels of the organization. This culture and entrenched interests usually throw a wrench in the work...
"Opsani helps the enterprise adopt containers, help them move their infrastructure into this modern world of DevOps, accelerate the delivery of new features into production, and really get them going on the container path," explained Ross Schibler, CEO of Opsani, and Peter Nickolov, CTO of Opsani, in this SYS-CON.tv interview at DevOps Summit at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
The purpose of this article is draw attention to key SaaS services that are commonly overlooked during contact signing that are essential to ensuring they meet the expectations and requirements of the organization and provide guidance and recommendations for process and controls necessary for achieving quality SaaS contractual agreements.
What's the role of an IT self-service portal when you get to continuous delivery and Infrastructure as Code? This general session showed how to create the continuous delivery culture and eight accelerators for leading the change. Don Demcsak is a DevOps and Cloud Native Modernization Principal for Dell EMC based out of New Jersey. He is a former, long time, Microsoft Most Valuable Professional, specializing in building and architecting Application Delivery Pipelines for hybrid legacy, and cloud ...
The “Digital Era” is forcing us to engage with new methods to build, operate and maintain applications. This transformation also implies an evolution to more and more intelligent applications to better engage with the customers, while creating significant market differentiators. In both cases, the cloud has become a key enabler to embrace this digital revolution. So, moving to the cloud is no longer the question; the new questions are HOW and WHEN. To make this equation even more complex, most ...
CloudEXPO New York 2018, colocated with DXWorldEXPO New York 2018 will be held November 11-13, 2018, in New York City and will bring together Cloud Computing, FinTech and Blockchain, Digital Transformation, Big Data, Internet of Things, DevOps, AI, Machine Learning and WebRTC to one location.
Docker is sweeping across startups and enterprises alike, changing the way we build and ship applications. It's the most prominent and widely known software container platform, and it's particularly useful for eliminating common challenges when collaborating on code (like the "it works on my machine" phenomenon that most devs know all too well). With Docker, you can run and manage apps side-by-side - in isolated containers - resulting in better compute density. It's something that many developer...
In his keynote at 19th Cloud Expo, Sheng Liang, co-founder and CEO of Rancher Labs, discussed the technological advances and new business opportunities created by the rapid adoption of containers. With the success of Amazon Web Services (AWS) and various open source technologies used to build private clouds, cloud computing has become an essential component of IT strategy. However, users continue to face challenges in implementing clouds, as older technologies evolve and newer ones like Docker c...
"We're developing a software that is based on the cloud environment and we are providing those services to corporations and the general public," explained Seungmin Kim, CEO/CTO of SM Systems Inc., in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
We all know that end users experience the internet primarily with mobile devices. From an app development perspective, we know that successfully responding to the needs of mobile customers depends on rapid DevOps – failing fast, in short, until the right solution evolves in your customers' relationship to your business. Whether you’re decomposing an SOA monolith, or developing a new application cloud natively, it’s not a question of using microservices - not doing so will be a path to eventual ...
Explosive growth in connected devices. Enormous amounts of data for collection and analysis. Critical use of data for split-second decision making and actionable information. All three are factors in making the Internet of Things a reality. Yet, any one factor would have an IT organization pondering its infrastructure strategy. How should your organization enhance its IT framework to enable an Internet of Things implementation? In his session at @ThingsExpo, James Kirkland, Red Hat's Chief Archi...
Containers and Kubernetes allow for code portability across on-premise VMs, bare metal, or multiple cloud provider environments. Yet, despite this portability promise, developers may include configuration and application definitions that constrain or even eliminate application portability. In this session we'll describe best practices for "configuration as code" in a Kubernetes environment. We will demonstrate how a properly constructed containerized app can be deployed to both Amazon and Azure ...
We all know that end users experience the internet primarily with mobile devices. From an app development perspective, we know that successfully responding to the needs of mobile customers depends on rapid DevOps – failing fast, in short, until the right solution evolves in your customers' relationship to your business. Whether you’re decomposing an SOA monolith, or developing a new application cloud natively, it’s not a question of using microservices - not doing so will be a path to eventual ...
We all know that end users experience the Internet primarily with mobile devices. From an app development perspective, we know that successfully responding to the needs of mobile customers depends on rapid DevOps – failing fast, in short, until the right solution evolves in your customers' relationship to your business. Whether you’re decomposing an SOA monolith, or developing a new application cloud natively, it’s not a question of using microservices – not doing so will be a path to eventual b...
Agile has finally jumped the technology shark, expanding outside the software world. Enterprises are now increasingly adopting Agile practices across their organizations in order to successfully navigate the disruptive waters that threaten to drown them. In our quest for establishing change as a core competency in our organizations, this business-centric notion of Agile is an essential component of Agile Digital Transformation. In the years since the publication of the Agile Manifesto, the conn...
The past few years have brought a sea change in the way applications are architected, developed, and consumed—increasing both the complexity of testing and the business impact of software failures. How can software testing professionals keep pace with modern application delivery, given the trends that impact both architectures (cloud, microservices, and APIs) and processes (DevOps, agile, and continuous delivery)? This is where continuous testing comes in. D
JetBlue Airways uses virtual environments to reduce software development costs, centralize performance testing, and create a climate for continuous integration and real-time monitoring of mobile applications. The next BriefingsDirect Voice of the Customer performance engineering case study discussion examines how JetBlue Airways in New York uses virtual environments to reduce software development costs, centralize performance testing, and create a climate for continuous integration and real-tim...