Welcome!

Microservices Expo Authors: Liz McMillan, Jyoti Bansal, Yeshim Deniz, Dan Blacharski, Elizabeth White

Related Topics: Microservices Expo, Java IoT, Industrial IoT, Machine Learning , Agile Computing, @BigDataExpo

Microservices Expo: Article

With Confidence Through the Holiday Season: Manage Success in Production

The top goal for eCommerce sites is to ensure high conversion rates as this converts into business

In our last two articles, we discussed what we have learned from last year's holiday season as well as things that we can do in the preparation phase for this year's upcoming event. In this blog we show you those dashboards and data points you need throughout the holiday season to make it a success.

The top goal for eCommerce sites is to ensure high conversion rates as this converts into business. IT's responsibility is to ensure that consumers can use the eCommerce site in an "enjoyable" way. But there is much more than measuring the UpTime or Response Time of your services. The dashboards shown are taken from other eCommerce sites used to monitor the health of their application, infrastructure as well as end user satisfaction and conversion rate.

#1: Infrastructure and Application Health

Dashboards need to show the system health impact on applications, services and processes. If no systems are impacted it has less priority to deal with high CPU, Memory, ...

Applications ultimately run on an IT Infrastructure; whether these machines are "physical", virtualized, or running in the cloud. Ensuring a healthy infrastructure is the key requirement for IT. But it is more important to know whether there is an immediate impact on the hosted applications, services and processes. Before upgrading any hardware, reconfiguring your routing tables, or bouncing your application, it is important to understand whether it actually impacts the application and the end user. Just because you run on 95% of CPU doesn't mean it's a problem - maybe your developers just built a perfect system that consumes all resources available in an optimum manner.

You need a dashboard that alerts on system monitoring issues but also take into account the applications, services, and processes running on them. Are these impacted by the resource shortage or not? That answer dictates your action if you know what is actually impacted.

#2: Application Performance

What are they key performance indicators per application? Are end users impacted by bad response times or failures? Is it the App or the underlying infrastructure?

The second dashboard you need focuses on the application, its performance, and impact on the end user. It answers the following critical questions:

  1. How much traffic is currently on the page? Is it still climbing? Is it outside the norm?
  2. Do we have an unusual high failure rate, e.g., failed credit card transactions, abandoned carts?
  3. What is the overall response time and is it violating my baseline?
  4. Is the application impacted by unhealthy app or web servers, e.g., high GC
  5. Are the hosts (physical, virtual or in the cloud) running into CPU, Memory or I/O limits?
  6. Are end users currently impacted when accessing the app? Are they leaving because of bad user experience?
  7. What is the current conversion rate or are we making money?

#3: Regional Availability and User Experience

How is user experience in our target markets? Any regional availability or performance problems?

The first two dashboards in this blog analyzed performance from within our datacenter. The dashboards above and below now focus on performance perceived from the outside - meaning - from the real end user perspective. You need to know if your app is not reachable from a specific region or when conversion rate drops even though your servers are doing fine. These two dashboards answer the following important questions for you:

  1. Is my site reachable from my key regional markets?
  2. If I am not reachable: How long did the outage last and did it impact Users?
  3. How many users do we have per region and what was their User Experience?
  4. How does the traffic per region develop over time?
  5. How is our conversion rate over time and how many orders do we actually get in?

How is conversion rate and number of orders evolving over time? If we have a drop in conversions - is it related to a regional problem or is it related to general system health issues in our data center?

#4: Real User Experience on the Conversion Funnel

Learning how users move through the conversion funnel, where they drop off and how response time and end user experience (APDEX) impacts the conversion funnel

You need a dedicated dashboard for all important actions along your conversion funnel. That includes landing pages and actions such as search, product details, add to cart and checkout. The dashboard helps you to understand:

  1. How many users you have on each conversion funnel step?
  2. Do they encounter problems during a particular action and is that the reason for a drop?
  3. How fast is each step and does it have an impact on end user experience?

#5: Third Party Monitoring dashboard

How fast is static content delivered by Akamai & Co? Are there spikes or outages that impact my end users?

Most eCommerce sites rely on third-party content which not only impacts the feature set of the site but also performance and with that end user experience. Third Party Monitoring requires a view from two different angles: The third parties that are directly included into your website or mobile app and the external services you call from your backend.

These are the questions your Third Party Monitoring dashboard has to answer:

  1. Are the resources delivered via CDN fast or do we have regional problems?
  2. Is the integrated social media (Facebook, LinkedIn, Xing...) slow?
  3. Are the backend services facing bad requests to the integrate third parties?
  4. Is the performance of the third party good?

How fast and reliable are third party services such as facebook or Google API? Does it impact the failure rate of my application?

#6: Desktop Web vs. Mobile Web vs. Mobile App dashboard

Get to know your users: what devices to they use and does that impact user experience?

Your potential customers can use desktops, tablets or smart phones to access your site. They will either have fast WiFi or slow dial-up speed. All of this impacts user experience. In order to analyze performance and optimize your site for these types of browsers, devices and connection speed you need a dashboard that tells you:

  1. How many users are accessing my portal via Mobile App or Mobile Browser?
  2. What are the top browsers used? Do we need specific optimized pages for older browsers?
  3. Do we need to optimize for lower bandwidths, e.g: use better image compression?
  4. Is there a difference between the Key Performance Indicators (KPI) depending on the different types of devices, browsers, mobile native vs. mobile web?

When disaster strikes: Collaborate with R&D
It is likely that you have smaller hiccups throughout the holiday season. To avoid lengthy and painful war room situations it is important to level-up your monitoring system and provide data your engineering team needs to speed up error resolution. Here is a list of capabilities that will speed up triage and error resolution:

  1. Capture all actions of each visitor
  2. Collect Crashes, JavaScript Errors, iOS/Andorid Exceptions from your mobile app
  3. Provide method level visibility on the server side including context information such as method arguments and return values
  4. Provide the ability for memory heap dumps and access to all requested application performance metrics, e.g: connection pool, thread count, heap sizes, ...
  5. Use tools to capture this data that developers already use and that also allow sharing data from different environments.

This level of detail is what developers need to understand what exactly went wrong instead of digging through giga bytes of log files

Crash information from mobile native apps by mobile device and version makes it easy to fix specific problems

Conclusion
Having these types of dashboards make it easy to monitor the success of the holiday season and also easy to react on problems and prevent larger damage by executing the right actions. Make sure you do not waste your time with problems that are not real, e.g., an individual user complains or trying to find a problem related to a regional outage of an ISP. Focus on those problems that impact a large number of users and that you can fix. This will make sure you keep conversion rate high and business flowing.

For further reading check out our other recent blogs such as DevOps Survival Guide: 2013 Online Holiday Shopping Season and With Confidence into the Holiday Season: Verifying Readiness in Test / Pre-Production

More Stories By Klaus Enzenhofer

Klaus Enzenhofer has several years of experience and expertise in the field of Web Performance Optimization and User Experience Management. He works as Technical Strategist in the Center of Excellence Team at dynaTrace Software. In this role he influences the development of the dynaTrace Application Performance Management Solution and the Web Performance Optimization Tool dynaTrace AJAX Edition. He mainly gathered his experience in web and performance by developing and running large-scale web portals at Tiscover GmbH.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@MicroservicesExpo Stories
Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more business becomes digital the more stakeholders are interested in this data including how it relates to business. Some of these people have never used a monitoring tool before. They have a question on their mind like “How is my application doing” but no id...
Is your application too difficult to manage? Do changes take dozens of developers hundreds of hours to execute, and frequently result in downtime across all your site’s functions? It sounds like you have a monolith! A monolith is one of the three main software architectures that define most applications. Whether you’ve intentionally set out to create a monolith or not, it’s worth at least weighing the pros and cons of the different architectural approaches and deciding which one makes the most s...
Developers want to create better apps faster. Static clouds are giving way to scalable systems, with dynamic resource allocation and application monitoring. You won't hear that chant from users on any picket line, but helping developers to create better apps faster is the mission of Lee Atchison, principal cloud architect and advocate at New Relic Inc., based in San Francisco. His singular job is to understand and drive the industry in the areas of cloud architecture, microservices, scalability ...
When you decide to launch a startup company, business advisors, counselors, bankers and armchair know-it-alls will tell you that the first thing you need to do is get funding. While there is some validity to that boilerplate piece of wisdom, the availability of and need for startup funding has gone through a dramatic transformation over the past decade, and the next few years will see even more of a shift. A perfect storm of events is causing this seismic shift. On the macroeconomic side this ...
Cloud promises the agility required by today’s digital businesses. As organizations adopt cloud based infrastructures and services, their IT resources become increasingly dynamic and hybrid in nature. Managing these require modern IT operations and tools. In his session at 20th Cloud Expo, Raj Sundaram, Senior Principal Product Manager at CA Technologies, will discuss how to modernize your IT operations in order to proactively manage your hybrid cloud and IT environments. He will be sharing be...
Cloud Expo, Inc. has announced today that Aruna Ravichandran, vice president of DevOps Product and Solutions Marketing at CA Technologies, has been named co-conference chair of DevOps at Cloud Expo 2017. The @DevOpsSummit at Cloud Expo New York will take place on June 6-8, 2017, at the Javits Center in New York City, New York, and @DevOpsSummit at Cloud Expo Silicon Valley will take place Oct. 31-Nov. 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
A Man in the Middle attack, or MITM, is a situation wherein a malicious entity can read/write data that is being transmitted between two or more systems (in most cases, between you and the website that you are surfing). MITMs are common in China, thanks to the “Great Cannon.” The “Great Cannon” is slightly different from the “The Great Firewall.” The firewall monitors web traffic moving in and out of China and blocks prohibited content. The Great Cannon, on the other hand, acts as a man in the...
To more closely examine the variety of ways in which IT departments around the world are integrating cloud services, and the effect hybrid IT has had on their organizations and IT job roles, SolarWinds recently released the SolarWinds IT Trends Report 2017: Portrait of a Hybrid Organization. This annual study consists of survey-based research that explores significant trends, developments, and movements related to and directly affecting IT and IT professionals.
NHK, Japan Broadcasting, will feature the upcoming @ThingsExpo Silicon Valley in a special 'Internet of Things' and smart technology documentary that will be filmed on the expo floor between November 3 to 5, 2015, in Santa Clara. NHK is the sole public TV network in Japan equivalent to the BBC in the UK and the largest in Asia with many award-winning science and technology programs. Japanese TV is producing a documentary about IoT and Smart technology and will be covering @ThingsExpo Silicon Val...
In his general session at 19th Cloud Expo, Manish Dixit, VP of Product and Engineering at Dice, discussed how Dice leverages data insights and tools to help both tech professionals and recruiters better understand how skills relate to each other and which skills are in high demand using interactive visualizations and salary indicator tools to maximize earning potential. Manish Dixit is VP of Product and Engineering at Dice. As the leader of the Product, Engineering and Data Sciences team at D...
Keeping pace with advancements in software delivery processes and tooling is taxing even for the most proficient organizations. Point tools, platforms, open source and the increasing adoption of private and public cloud services requires strong engineering rigor – all in the face of developer demands to use the tools of choice. As Agile has settled in as a mainstream practice, now DevOps has emerged as the next wave to improve software delivery speed and output. To make DevOps work, organization...
Enterprise architects are increasingly adopting multi-cloud strategies as they seek to utilize existing data center assets, leverage the advantages of cloud computing and avoid cloud vendor lock-in. This requires a globally aware traffic management strategy that can monitor infrastructure health across data centers and end-user experience globally, while responding to control changes and system specification at the speed of today’s DevOps teams. In his session at 20th Cloud Expo, Josh Gray, Chie...
This recent research on cloud computing from the Register delves a little deeper than many of the "We're all adopting cloud!" surveys we've seen. They found that meaningful cloud adoption and the idea of the cloud-first enterprise are still not reality for many businesses. The Register's stats also show a more gradual cloud deployment trend over the past five years, not any sort of explosion. One important takeaway is that coherence across internal and external clouds is essential for IT right n...
Back in February of 2017, Andrew Clay Schafer of Pivotal tweeted the following: “seriously tho, the whole software industry is stuck on deployment when we desperately need architecture and telemetry.” Intrigue in a 140 characters. For me, I hear Andrew saying, “we’re jumping to step 5 before we’ve successfully completed steps 1-4.”
In his session at 20th Cloud Expo, Scott Davis, CTO of Embotics, will discuss how automation can provide the dynamic management required to cost-effectively deliver microservices and container solutions at scale. He will discuss how flexible automation is the key to effectively bridging and seamlessly coordinating both IT and developer needs for component orchestration across disparate clouds – an increasingly important requirement at today’s multi-cloud enterprise.
In large enterprises, environment provisioning and server provisioning account for a significant portion of the operations team's time. This often leaves users frustrated while they wait for these services. For instance, server provisioning can take several days and sometimes even weeks. At the same time, digital transformation means the need for server and environment provisioning is constantly growing. Organizations are adopting agile methodologies and software teams are increasing the speed ...
Software as a service (SaaS), one of the earliest and most successful cloud services, has reached mainstream status. According to Cisco, by 2019 more than four-fifths (83 percent) of all data center traffic will be based in the cloud, up from 65 percent today. The majority of this traffic will be applications. Businesses of all sizes are adopting a variety of SaaS-based services – everything from collaboration tools to mission-critical commerce-oriented applications. The rise in SaaS usage has m...
The proper isolation of resources is essential for multi-tenant environments. The traditional approach to isolate resources is, however, rather heavyweight. In his session at 18th Cloud Expo, Igor Drobiazko, co-founder of elastic.io, drew upon his own experience with operating a Docker container-based infrastructure on a large scale and present a lightweight solution for resource isolation using microservices. He also discussed the implementation of microservices in data and application integrat...
We'd all like to fulfill that "find a job you love and you'll never work a day in your life" cliché. But in reality, every job (even if it's our dream job) comes with its downsides. For you, the constant fight against shadow IT might get on your last nerves. For your developer coworkers, infrastructure management is the roadblock that stands in the way of focusing on coding. As you watch more and more applications and processes move to the cloud, technology is coming to developers' rescue-most r...
2016 has been an amazing year for Docker and the container industry. We had 3 major releases of Docker engine this year , and tremendous increase in usage. The community has been following along and contributing amazing Docker resources to help you learn and get hands-on experience. Here’s some of the top read and viewed content for the year. Of course releases are always really popular, particularly when they fit requests we had from the community.