Welcome!

Microservices Expo Authors: Elizabeth White, Liz McMillan, Pat Romanski, Matt Brickey, Christoph Schell

Related Topics: Microservices Expo, @CloudExpo, @DevOpsSummit

Microservices Expo: Blog Feed Post

Top Two Features of Self-Healing Microservices | @CloudExpo #Cloud #Microservices

Microservices-based environments are more complex than their monolithic counterparts

Top Two Features of Self Healing Microservices
By Martin Goodwell

Microservices-based environments are more complex than their monolithic counterparts. To operate microservices environments with the same level of convenience that you’ve come expect from operating self-contained monolithic application environments, you need to have the right tools in place and rely on best practices that will keep your microservices healthy.

We’ve noticing an increasing number of microservices environments deployed by our customers, and the trend only seems to be increasing. We recently asked some of these customers about their experiences. Most of their responses were in line with expectations. Two issues however caught us by surprise.

The two issues that surprised us
While increased complexity and the proliferation of services is a challenge that most customers contend with, it’s the orchestration layer that seems to cause the most trouble for customers. The complexity of microservices environments requires automated operation and self-healing capabilities. This means that some sort of orchestration layer is required.

Automated orchestration
The first issue for many customers is that they haven’t even considered using a tool for orchestration. Because they don’t have an orchestration layer in place they’ve been frustrated from the beginning. After a few deployments, maintaining different instances of services simply became too complicated.

This is too bad because plenty of good open-source tools are available to facilitate the operation of microservices. For example, Netflix OSS—one of the toolsets most commonly used by our customers—provides a number of utilities that Netflix has developed over the years.

Of the dozens of available Netflix OSS tools and libraries, Eureka service registry and Hystrix circuit breaker are the most popular and likely also the easiest to use (Spring Cloud is a real easy to getting started with those technologies quickly, by the way). The service registry allows your microservices to register themselves upon startup. Any client that intends to access a microservice can look up the available endpoint addresses via Eureka. This reduces the risk of accessing an unavailable service to near zero.

Netflix OSS Eureka and Hystrix at a glance

Circuit breakers keep your business going
Hystrix circuit breaker makes your service calls more resilient by keeping track of each endpoint’s status. Normally you face expensive request timeouts when an endpoint becomes unavailable. Hystrix saves you from such timeouts by “breaking” the connection to the endpoint (this is why Hystrix is called a “circuit breaker”). It then reports that the service as unavailable so that subsequent requests don’t run into the same timeouts. Hystrix then continues to poll the service in the background to see when it’s available again.

With the risk of accessing unavailable services reduced to near-zero and expensive timeouts reduced to a minimum, your website keeps your content online and your visitors happy. You probably already know how much money you lose with each additional second of wait time, so using a service registry and a circuit breaker is a no-brainer in my opinion.

The second problem reported by our customers really caught us by surprise: a large number of customers report that their orchestration layers cause increased troubleshooting effort when problems are encountered.

Self-healing systems impact troubleshooting?
We were puzzled by this at first because the orchestration layer is supposed to keep an environment healthy and self-healing—quite the opposite of what our customers were reporting. Once we dug into this issue a little deeper however, the cause of the additional troubleshooting effort became obvious. These customers were initially looking for problems only in their services, not the orchestration layer itself. So when the orchestration layer did fail, customers spent considerable time eliminating the possible root-causes in their own code before getting around to investigating the orchestration layer.

These experiences taught us and our customers an important lesson:

Orchestration layers need to be monitored
You need a proper monitoring tool in place to know when your orchestration layer fails. You’re probably used to monitoring your database connection pools, your service queues, and of course your API performance metrics. You need to apply the same monitoring strategy to your orchestration layer.

Eureka and Hystrix are well-crafted pieces of code that allow for monitoring of all the important orchestration-layer related metrics.

Increased service lookup times and failing connections indicate that your orchestration layer need maintenance.

The lessons we gleaned from our customers’ experiences leads us to the conclusion that monitoring of the orchestration layer should be an integral feature of all full-stack monitoring tools. Service registries, circuit breakers, and other orchestration tools should be part of all newly created environments, as should database connection pools and messaging queues.

Short interruptions are handled automatically by the orchestration layer. Ongoing problems indicate that the orchestration layer is in trouble.

Keep control over your environment
Ideally, your orchestration layer is a simple set of tools that hardly ever fails. Unfortunately, things that hardly ever fail are not the initial focus of troubleshooting. So, when your orchestration layer fails, you may have a hard time learning about it quickly.

Orchestration-layer metrics can be a great indicator of looming problems. In cases of bad performance, take a look at your orchestration layer metrics. The orchestration layer may not be your hottest troublespot, but when when there is trouble there, you’ll be glad to know about it.

Do you know your metrics?
Is your environment based on microservices? Do you actively use service registries and circuit breakers? If so, do you know how much time service lookups consume? Do you know how often fallback mechanisms need to jump in? And do you know if your environment is truly self-healing?

If you have Netflix OSS components in use, you can add Netflix OSS monitoring by downloading the plugin file from our GitHub repo at JMX-Extensions/Netflix OSS extensions/plugin.json. Activating it is just a simple matter of uploading the file to your Dynatrace Ruxit environment.

In one of our upcoming releases, we will include this feature to be available out-of-the-box. Ruxit provides immediate value to you out-of-the-box. Haven’t tried Dynatrace Ruxit yet? Take the free trial and see how resilient your environement is. You may be surprised at how Ruxit’s auto-detection and zero-configuration have simplified monitoring setup.

The post Top 2 features of self-healing microservices appeared first on #monitoringlife.

Read the original blog entry...

More Stories By Dynatrace Blog

Building a revolutionary approach to software performance monitoring takes an extraordinary team. With decades of combined experience and an impressive history of disruptive innovation, that’s exactly what we ruxit has.

Get to know ruxit, and get to know the future of data analytics.

@MicroservicesExpo Stories
There is a huge demand for responsive, real-time mobile and web experiences, but current architectural patterns do not easily accommodate applications that respond to events in real time. Common solutions using message queues or HTTP long-polling quickly lead to resiliency, scalability and development velocity challenges. In his session at 21st Cloud Expo, Ryland Degnan, a Senior Software Engineer on the Netflix Edge Platform team, will discuss how by leveraging a reactive stream-based protocol,...
DevOps at Cloud Expo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to w...
"At the keynote this morning we spoke about the value proposition of Nutanix, of having a DevOps culture and a mindset, and the business outcomes of achieving agility and scale, which everybody here is trying to accomplish," noted Mark Lavi, DevOps Solution Architect at Nutanix, in this SYS-CON.tv interview at @DevOpsSummit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
From personal care products to groceries and movies on demand, cloud-based subscriptions are fulfilling the needs of consumers across an array of market sectors. Nowhere is this shift to subscription services more evident than in the technology sector. By adopting an Everything-as-a-Service (XaaS) delivery model, companies are able to tailor their computing environments to shape the experiences they want for customers as well as their workforce.
If you read a lot of business and technology publications, you might think public clouds are universally preferred over all other cloud options. To be sure, the numbers posted by Amazon Web Services (AWS) and Microsoft’s Azure platform are nothing short of impressive. Statistics reveal that public clouds are growing faster than private clouds and analysts at IDC predict that public cloud growth will be 3 times that of private clouds by 2019.
"Outscale was founded in 2010, is based in France, is a strategic partner to Dassault Systémes and has done quite a bit of work with divisions of Dassault," explained Jackie Funk, Digital Marketing exec at Outscale, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We focus on SAP workloads because they are among the most powerful but somewhat challenging workloads out there to take into public cloud," explained Swen Conrad, CEO of Ocean9, Inc., in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"DivvyCloud as a company set out to help customers automate solutions to the most common cloud problems," noted Jeremy Snyder, VP of Business Development at DivvyCloud, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"I think DevOps is now a rambunctious teenager – it’s starting to get a mind of its own, wanting to get its own things but it still needs some adult supervision," explained Thomas Hooker, VP of marketing at CollabNet, in this SYS-CON.tv interview at DevOps Summit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Your homes and cars can be automated and self-serviced. Why can't your storage? From simply asking questions to analyze and troubleshoot your infrastructure, to provisioning storage with snapshots, recovery and replication, your wildest sci-fi dream has come true. In his session at @DevOpsSummit at 20th Cloud Expo, Dan Florea, Director of Product Management at Tintri, provided a ChatOps demo where you can talk to your storage and manage it from anywhere, through Slack and similar services with...
For over a decade, Application Programming Interface or APIs have been used to exchange data between multiple platforms. From social media to news and media sites, most websites depend on APIs to provide a dynamic and real-time digital experience. APIs have made its way into almost every device and service available today and it continues to spur innovations in every field of technology. There are multiple programming languages used to build and run applications in the online world. And just li...
If you are thinking about moving applications off a mainframe and over to open systems and the cloud, consider these guidelines to prioritize what to move and what to eliminate. On the surface, mainframe architecture seems relatively simple: A centrally located computer processes data through an input/output subsystem and stores its computations in memory. At the other end of the mainframe are printers and terminals that communicate with the mainframe through protocols. For all of its appare...
"Peak 10 is a hybrid infrastructure provider across the nation. We are in the thick of things when it comes to hybrid IT," explained Michael Fuhrman, Chief Technology Officer at Peak 10, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Data reduction delivers compelling cost reduction that substantially improves the business case in every cloud deployment model. No matter which cloud approach you choose, the cost savings benefits from data reduction should not be ignored and must be a component of your cloud strategy. IT professionals are finding that the future of IT infrastructure lies in the cloud. Data reduction technologies enable clouds — public, private, and hybrid — to deliver business agility and elasticity at the lo...
"As we've gone out into the public cloud we've seen that over time we may have lost a few things - we've lost control, we've given up cost to a certain extent, and then security, flexibility," explained Steve Conner, VP of Sales at Cloudistics,in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"I will be talking about ChatOps and ChatOps as a way to solve some problems in the DevOps space," explained Himanshu Chhetri, CTO of Addteq, in this SYS-CON.tv interview at @DevOpsSummit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm. In their Day 3 Keynote at 20th Cloud Expo, Chris Brown, a Solutions Marketing Manager at Nutanix, and Mark Lav...
"NetApp's vision is how we help organizations manage data - delivering the right data in the right place, in the right time, to the people who need it, and doing it agnostic to what the platform is," explained Josh Atwell, Developer Advocate for NetApp, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Five years ago development was seen as a dead-end career, now it’s anything but – with an explosion in mobile and IoT initiatives increasing the demand for skilled engineers. But apart from having a ready supply of great coders, what constitutes true ‘DevOps Royalty’? It’ll be the ability to craft resilient architectures, supportability, security everywhere across the software lifecycle. In his keynote at @DevOpsSummit at 20th Cloud Expo, Jeffrey Scheaffer, GM and SVP, Continuous Delivery Busine...
"We do one of the best file systems in the world. We learned how to deal with Big Data many years ago and we implemented this knowledge into our software," explained Jakub Ratajczak, Business Development Manager at MooseFS, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.