Welcome!

Microservices Expo Authors: Elizabeth White, Liz McMillan, Craig Lowell, Greg O'Connor, Daniel Khan

Related Topics: @CloudExpo, Industrial IoT, Microservices Expo, Agile Computing, Cloud Security

@CloudExpo: Blog Post

Amazon Outage

Alerting to catch infrastructure problems

You don’t have to be a pre-cog to find and deal with infrastructure and application problems; you just need good monitoring.  We had quite a day Monday during the EC2 EBS availability incident.  Thanks to some early alerts - which started coming in about 2.5 hours before AWS started reporting problems - our ops team was able to intervene and make sure that our customers’ data was safe and sound. I’ll start with screenshots of what we saw and experienced, then get into what metrics to watch and alert on in your environment, as well as how to do so in TraceView.

10:30 AM EST: Increased disk latency, data pipeline backup
Around 10 am, we started to notice that writes weren’t moving through our pipeline as smoothly as before.  Sure enough, pretty soon we started seeing alerts about elevated DB load and disk latency.  Here’s what it looked like:

Figure 1: At 10 AM, we saw elevated DB load and disk latency.

12:30 PM EST: Diverting pipeline to S3 instead of EBS, pulling out hair

1:30 PM EST: Frontend offline, AWS incident report

Our workload is very write-heavy, so we first noticed performance problems there, but pretty soon reads made by our frontend were also affected, as a growing fraction of our customer’s data became affected by the mounting EBS problems.  At a certain point, any file I/O to affected EBS volumes would cause processes to enter an uninterruptible state, causing our MySQL servers to hang.  Here’s a view of the impact on our query sharding service:

Figure 2: Impact on our query sharding service.

6 PM EST: Debate whether backup restore or AWS EBS recovery will finish faster

9 PM EST: Back online

AWS started bringing volumes back online that evening.  During the downtime, we continued to collect customer performance data, diverting the pipeline to S3 until our databases came back online. Once the disks were back, we were able to get frontend servers back online, and spun up more pipeline workers to plow through the queued trace backlog as we replayed it from S3. Latency: the functional test of performance metrics You might be surprised by this, but monitoring latency is often the easiest and surest way to catch serious problems.  It’s the functional test of system: if any of the gears in the system being monitored start getting jammed, it will likely manifest in increased latency. However, latency can be noisy—how can we make this measurement more controlled, or to extend my testing metaphor, closer to a unit test?  Using TraceView, you can set alerts not only on the latency of your application, but also on individual layers of the stack, or particular URLs/controllers.  The performance of a predictable query load over time is a great way to detect aberrant database performance, for instance.

Figure 3: Use alerting to detect aberrant database performance.

Alerting on All of the Metrics
When looking at cases of infrastructure degradation, host-level metrics is where the buck stops.  Configuration is usually a pain: install agents on each machine and set thresholds.  We think the best alerts are at the intersection of easy and actionable.  With TraceView, you can set up a single alert and have it cover all hosts in an app.

Related Articles

The 5 Critical Things You Need to Know to Assure Optimal Performance in the Cloud

Verifying Network Performance SLAs

Who’s Managing the Performance of Your Cloud Applications?

More Stories By Dan Kuebrich

Dan Kuebrich is a web performance geek, currently working on Application Performance Management at AppNeta. He was previously a founder of Tracelytics (acquired by AppNeta), and before that worked on AmieStreet/Songza.com.

@MicroservicesExpo Stories
With so much going on in this space you could be forgiven for thinking you were always working with yesterday’s technologies. So much change, so quickly. What do you do if you have to build a solution from the ground up that is expected to live in the field for at least 5-10 years? This is the challenge we faced when we looked to refresh our existing 10-year-old custom hardware stack to measure the fullness of trash cans and compactors.
The emerging Internet of Everything creates tremendous new opportunities for customer engagement and business model innovation. However, enterprises must overcome a number of critical challenges to bring these new solutions to market. In his session at @ThingsExpo, Michael Martin, CTO/CIO at nfrastructure, outlined these key challenges and recommended approaches for overcoming them to achieve speed and agility in the design, development and implementation of Internet of Everything solutions wi...
A company’s collection of online systems is like a delicate ecosystem – all components must integrate with and complement each other, and one single malfunction in any of them can bring the entire system to a screeching halt. That’s why, when monitoring and analyzing the health of your online systems, you need a broad arsenal of different tools for your different needs. In addition to a wide-angle lens that provides a snapshot of the overall health of your system, you must also have precise, ...
It's been a busy time for tech's ongoing infatuation with containers. Amazon just announced EC2 Container Registry to simply container management. The new Azure container service taps into Microsoft's partnership with Docker and Mesosphere. You know when there's a standard for containers on the table there's money on the table, too. Everyone is talking containers because they reduce a ton of development-related challenges and make it much easier to move across production and testing environm...
Node.js and io.js are increasingly being used to run JavaScript on the server side for many types of applications, such as websites, real-time messaging and controllers for small devices with limited resources. For DevOps it is crucial to monitor the whole application stack and Node.js is rapidly becoming an important part of the stack in many organizations. Sematext has historically had a strong support for monitoring big data applications such as Elastic (aka Elasticsearch), Cassandra, Solr, S...
DevOps at Cloud Expo, taking place Nov 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 19th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long dev...
There's a lot of things we do to improve the performance of web and mobile applications. We use caching. We use compression. We offload security (SSL and TLS) to a proxy with greater compute capacity. We apply image optimization and minification to content. We do all that because performance is king. Failure to perform can be, for many businesses, equivalent to an outage with increased abandonment rates and angry customers taking to the Internet to express their extreme displeasure.
Monitoring of Docker environments is challenging. Why? Because each container typically runs a single process, has its own environment, utilizes virtual networks, or has various methods of managing storage. Traditional monitoring solutions take metrics from each server and applications they run. These servers and applications running on them are typically very static, with very long uptimes. Docker deployments are different: a set of containers may run many applications, all sharing the resource...
Right off the bat, Newman advises that we should "think of microservices as a specific approach for SOA in the same way that XP or Scrum are specific approaches for Agile Software development". These analogies are very interesting because my expectation was that microservices is a pattern. So I might infer that microservices is a set of process techniques as opposed to an architectural approach. Yet in the book, Newman clearly includes some elements of concept model and architecture as well as p...
SYS-CON Events announced today that 910Telecom will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Housed in the classic Denver Gas & Electric Building, 910 15th St., 910Telecom is a carrier-neutral telecom hotel located in the heart of Denver. Adjacent to CenturyLink, AT&T, and Denver Main, 910Telecom offers connectivity to all major carriers, Internet service providers, Internet backbones and ...
This complete kit provides a proven process and customizable documents that will help you evaluate rapid application delivery platforms and select the ideal partner for building mobile and web apps for your organization.
DevOps at Cloud Expo – being held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA – announces that its Call for Papers is open. Born out of proven success in agile development, cloud computing, and process automation, DevOps is a macro trend you cannot afford to miss. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the world's largest enterprises – and delivering real results. Am...

Modern organizations face great challenges as they embrace innovation and integrate new tools and services. They begin to mature and move away from the complacency of maintaining traditional technologies and systems that only solve individual, siloed problems and work “well enough.” In order to build...

The post Gearing up for Digital Transformation appeared first on Aug. 29, 2016 04:30 PM EDT  Reads: 1,657

Internet of @ThingsExpo, taking place November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 19th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devices - comp...
As the world moves toward more DevOps and Microservices, application deployment to the cloud ought to become a lot simpler. The Microservices architecture, which is the basis of many new age distributed systems such as OpenStack, NetFlix and so on, is at the heart of Cloud Foundry - a complete developer-oriented Platform as a Service (PaaS) that is IaaS agnostic and supports vCloud, OpenStack and AWS. Serverless computing is revolutionizing computing. In his session at 19th Cloud Expo, Raghav...
19th Cloud Expo, taking place November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Meanwhile, 94% of enterpri...
The following fictional case study is a composite of actual horror stories I’ve heard over the years. Unfortunately, this scenario often occurs when in-house integration teams take on the complexities of DevOps and ALM integration with an enterprise service bus (ESB) or custom integration. It is written from the perspective of an enterprise architect tasked with leading an organization’s effort to adopt Agile to become more competitive. The company has turned to Scaled Agile Framework (SAFe) as ...

Let's just nip the conflation of these terms in the bud, shall we?

"MIcro" is big these days. Both microservices and microsegmentation are having and will continue to have an impact on data center architecture, but not necessarily for the same reasons. There's a growing trend in which folks - particularly those with a network background - conflate the two and use them to mean the same thing.

They are not.

One is about the application. The other, the network. T...

If you are within a stones throw of the DevOps marketplace you have undoubtably noticed the growing trend in Microservices. Whether you have been staying up to date with the latest articles and blogs or you just read the definition for the first time, these 5 Microservices Resources You Need In Your Life will guide you through the ins and outs of Microservices in today’s world.
This is a no-hype, pragmatic post about why I think you should consider architecting your next project the way SOA and/or microservices suggest. No matter if it’s a greenfield approach or if you’re in dire need of refactoring. Please note: considering still keeps open the option of not taking that approach. After reading this, you will have a better idea about whether building multiple small components instead of a single, large component makes sense for your project. This post assumes that you...