Welcome!

Microservices Expo Authors: Zakia Bouachraoui, Elizabeth White, Pat Romanski, Liz McMillan, Yeshim Deniz

Related Topics: @DevOpsSummit, Microservices Expo, Containers Expo Blog, @CloudExpo, SDN Journal

@DevOpsSummit: Blog Feed Post

Measuring and Monitoring: Apps and Stacks By @LMacVittie | @DevOpsSummit #DevOps #Microservices

One of the charter responsibilities of DevOps is measuring and monitoring applications once they're in production

One of the charter responsibilities of DevOps (because it's a charter responsibility of ops) is measuring and monitoring applications once they're in production. That means both performance and availability. Which means a lot more than folks might initially think because generally speaking what you measure and monitor is a bit different depending on whether you're looking at performance or availability*.

There are four primary variables you want to monitor and measure in order to have the operational data necessary to make any adjustments necessary to maintain performance and availability:

  • Connectivity

    • This determines whether or not upstream devices (ultimately, the client) can reach the app (IP). This is the most basic of tests and tells you absolutely nothing about the application except that the underlying network is reachable. While that is important, of course, connectivity is implied by the successful execution of monitors up the stack and thus the information available from a simple connectivity test is not generally useful for performance or availability monitoring. ICMP pings can also be detrimental in that they generate traffic and activity on systems that, in hyper-scale environments, can actually negatively impact performance.
  • Capacity

    • This measure is critical to both performance and availability, and measures how close to "full" the connection capacity (TCP) of a given instance is. These variables are measured against known values usually obtained during pre-release stress / load tests that determine how many connections an app instance can maintain before becoming overwhelmed and performance degrades.
  • App Status

    • This simple but important measure determines whether the application (the HTTP stack) is actually working. This is generally accomplished by sending an HTTP request and verifying that the response includes an HTTP 200 response. Any other response is generally considered an error. Systems can be instructed to retry this test multiple times and after a designated number of failures, the app instance is flagged as out of service.
  • Availability

    • This is often ignored but is key to determining if the application is responding correctly or not. This type of monitoring requires that the monitor be able to make a request and compare the actual results against a known "good" result. These are often synthetic transactions that test the app and its database connectivity to ensure that the entire stack is working properly.

measuring

App Status and Availability can be measured either actively or passively (in band). When measured actively, a monitor initiates a request to the application and verifies its response. This is a "synthetic" transaction; a "fake" transaction used to measure performance and availability. When measured passively, a monitor spies on real transactions and verifies responses without interference. It is more difficult to measure availability based on application content verification with a passive monitor than an active one as a passive monitor is unlikely to be able to verify responses against known ones because it doesn't control what requests are being made. The benefit of a passive monitor is that it isn't consuming resources on the app instance in order to execute a test and it is measuring real performance for real users.

You'll notice that there's a clear escalation "up the stack" from IP -> TCP -> HTTP -> Application. That's not coincidental. Each layer of the stack is a critical component in the communication that occurs between a client and the application. Each one provides key information that is important to measuring both performance and availability.

The thing is that while the application may be responsible for responding to queries about its status in terms of resource utilization (CPU, memory, I/O), everything else is generally collected external to the application, from an upstream service. Most often that upstream service is going to be a proxy or load balancer, as in addition to monitoring status and performance it needs those measurements to enable decisions regarding scale and availability. It has to know how many connections an app has right now because at some point (a predetermined threshold) it is going to have to start distributing load differently. Usually to a new instance.

In a DevOps world where automation and orchestration are in play, this process can be automated or at least triggered by the recognition that a threshold has been reached. But only if the proxy is actually monitoring and measuring the variables that might trigger that process.

But to do that, you've got to monitor and measure the right things. Simply sending out a ping every five seconds tells you the core network is up, available and working but says nothing about the capacity of the app platform (the web or application server) or whether or not the application is actually responding to requests. HTTP 500, anyone?

It's not the case that you must monitor everything. As you move up the stack some things are redundant. After all, if you can open a TCP connection you can assume that the core network is available. If you can send an HTTP request and get a response, well, you get the picture.

What's important is to figure out what you need to know - connectivity, capacity, status and availability - and monitor it so you can measure it and take decisive action based on that data.

Monitoring and measuring of performance and availability should be application specific; that is, capacity of an app isn't just about the platform and what max connections are set to in the web server configuration. The combination of users, content, and processing within the application make capacity a very app-specific measurement. That means the systems that need that data must be aligned better with each application to ensure not only optimal performance and availability but efficiency of resources.

That's one of the reason traditionally "network" services like load balancing and proxies are becoming the responsibility of DevOps rather than NetOps.

* Many variables associated with availability - like system load - also directly impact performance and can thus be used as part of the performance equation.

Read the original blog entry...

More Stories By Lori MacVittie

Lori MacVittie is responsible for education and evangelism of application services available across F5’s entire product suite. Her role includes authorship of technical materials and participation in a number of community-based forums and industry standards organizations, among other efforts. MacVittie has extensive programming experience as an application architect, as well as network and systems development and administration expertise. Prior to joining F5, MacVittie was an award-winning Senior Technology Editor at Network Computing Magazine, where she conducted product research and evaluation focused on integration with application and network architectures, and authored articles on a variety of topics aimed at IT professionals. Her most recent area of focus included SOA-related products and architectures. She holds a B.S. in Information and Computing Science from the University of Wisconsin at Green Bay, and an M.S. in Computer Science from Nova Southeastern University.

Microservices Articles
When building large, cloud-based applications that operate at a high scale, it’s important to maintain a high availability and resilience to failures. In order to do that, you must be tolerant of failures, even in light of failures in other areas of your application. “Fly two mistakes high” is an old adage in the radio control airplane hobby. It means, fly high enough so that if you make a mistake, you can continue flying with room to still make mistakes. In his session at 18th Cloud Expo, Lee A...
In his general session at 19th Cloud Expo, Manish Dixit, VP of Product and Engineering at Dice, discussed how Dice leverages data insights and tools to help both tech professionals and recruiters better understand how skills relate to each other and which skills are in high demand using interactive visualizations and salary indicator tools to maximize earning potential. Manish Dixit is VP of Product and Engineering at Dice. As the leader of the Product, Engineering and Data Sciences team at D...
Lori MacVittie is a subject matter expert on emerging technology responsible for outbound evangelism across F5's entire product suite. MacVittie has extensive development and technical architecture experience in both high-tech and enterprise organizations, in addition to network and systems administration expertise. Prior to joining F5, MacVittie was an award-winning technology editor at Network Computing Magazine where she evaluated and tested application-focused technologies including app secu...
Containers and Kubernetes allow for code portability across on-premise VMs, bare metal, or multiple cloud provider environments. Yet, despite this portability promise, developers may include configuration and application definitions that constrain or even eliminate application portability. In this session we'll describe best practices for "configuration as code" in a Kubernetes environment. We will demonstrate how a properly constructed containerized app can be deployed to both Amazon and Azure ...
Modern software design has fundamentally changed how we manage applications, causing many to turn to containers as the new virtual machine for resource management. As container adoption grows beyond stateless applications to stateful workloads, the need for persistent storage is foundational - something customers routinely cite as a top pain point. In his session at @DevOpsSummit at 21st Cloud Expo, Bill Borsari, Head of Systems Engineering at Datera, explored how organizations can reap the bene...
Using new techniques of information modeling, indexing, and processing, new cloud-based systems can support cloud-based workloads previously not possible for high-throughput insurance, banking, and case-based applications. In his session at 18th Cloud Expo, John Newton, CTO, Founder and Chairman of Alfresco, described how to scale cloud-based content management repositories to store, manage, and retrieve billions of documents and related information with fast and linear scalability. He addresse...
The now mainstream platform changes stemming from the first Internet boom brought many changes but didn’t really change the basic relationship between servers and the applications running on them. In fact, that was sort of the point. In his session at 18th Cloud Expo, Gordon Haff, senior cloud strategy marketing and evangelism manager at Red Hat, will discuss how today’s workloads require a new model and a new platform for development and execution. The platform must handle a wide range of rec...
SYS-CON Events announced today that DatacenterDynamics has been named “Media Sponsor” of SYS-CON's 18th International Cloud Expo, which will take place on June 7–9, 2016, at the Javits Center in New York City, NY. DatacenterDynamics is a brand of DCD Group, a global B2B media and publishing company that develops products to help senior professionals in the world's most ICT dependent organizations make risk-based infrastructure and capacity decisions.
Discussions of cloud computing have evolved in recent years from a focus on specific types of cloud, to a world of hybrid cloud, and to a world dominated by the APIs that make today's multi-cloud environments and hybrid clouds possible. In this Power Panel at 17th Cloud Expo, moderated by Conference Chair Roger Strukhoff, panelists addressed the importance of customers being able to use the specific technologies they need, through environments and ecosystems that expose their APIs to make true ...
In his keynote at 19th Cloud Expo, Sheng Liang, co-founder and CEO of Rancher Labs, discussed the technological advances and new business opportunities created by the rapid adoption of containers. With the success of Amazon Web Services (AWS) and various open source technologies used to build private clouds, cloud computing has become an essential component of IT strategy. However, users continue to face challenges in implementing clouds, as older technologies evolve and newer ones like Docker c...