Welcome!

Microservices Expo Authors: Stackify Blog, Aruna Ravichandran, Dalibor Siroky, Kevin Jackson, PagerDuty Blog

Related Topics: Microservices Expo, @CloudExpo, @DevOpsSummit

Microservices Expo: Blog Feed Post

Top Two Features of Self-Healing Microservices | @CloudExpo #Cloud #Microservices

Microservices-based environments are more complex than their monolithic counterparts

Top Two Features of Self Healing Microservices
By Martin Goodwell

Microservices-based environments are more complex than their monolithic counterparts. To operate microservices environments with the same level of convenience that you’ve come expect from operating self-contained monolithic application environments, you need to have the right tools in place and rely on best practices that will keep your microservices healthy.

We’ve noticing an increasing number of microservices environments deployed by our customers, and the trend only seems to be increasing. We recently asked some of these customers about their experiences. Most of their responses were in line with expectations. Two issues however caught us by surprise.

The two issues that surprised us
While increased complexity and the proliferation of services is a challenge that most customers contend with, it’s the orchestration layer that seems to cause the most trouble for customers. The complexity of microservices environments requires automated operation and self-healing capabilities. This means that some sort of orchestration layer is required.

Automated orchestration
The first issue for many customers is that they haven’t even considered using a tool for orchestration. Because they don’t have an orchestration layer in place they’ve been frustrated from the beginning. After a few deployments, maintaining different instances of services simply became too complicated.

This is too bad because plenty of good open-source tools are available to facilitate the operation of microservices. For example, Netflix OSS—one of the toolsets most commonly used by our customers—provides a number of utilities that Netflix has developed over the years.

Of the dozens of available Netflix OSS tools and libraries, Eureka service registry and Hystrix circuit breaker are the most popular and likely also the easiest to use (Spring Cloud is a real easy to getting started with those technologies quickly, by the way). The service registry allows your microservices to register themselves upon startup. Any client that intends to access a microservice can look up the available endpoint addresses via Eureka. This reduces the risk of accessing an unavailable service to near zero.

Netflix OSS Eureka and Hystrix at a glance

Circuit breakers keep your business going
Hystrix circuit breaker makes your service calls more resilient by keeping track of each endpoint’s status. Normally you face expensive request timeouts when an endpoint becomes unavailable. Hystrix saves you from such timeouts by “breaking” the connection to the endpoint (this is why Hystrix is called a “circuit breaker”). It then reports that the service as unavailable so that subsequent requests don’t run into the same timeouts. Hystrix then continues to poll the service in the background to see when it’s available again.

With the risk of accessing unavailable services reduced to near-zero and expensive timeouts reduced to a minimum, your website keeps your content online and your visitors happy. You probably already know how much money you lose with each additional second of wait time, so using a service registry and a circuit breaker is a no-brainer in my opinion.

The second problem reported by our customers really caught us by surprise: a large number of customers report that their orchestration layers cause increased troubleshooting effort when problems are encountered.

Self-healing systems impact troubleshooting?
We were puzzled by this at first because the orchestration layer is supposed to keep an environment healthy and self-healing—quite the opposite of what our customers were reporting. Once we dug into this issue a little deeper however, the cause of the additional troubleshooting effort became obvious. These customers were initially looking for problems only in their services, not the orchestration layer itself. So when the orchestration layer did fail, customers spent considerable time eliminating the possible root-causes in their own code before getting around to investigating the orchestration layer.

These experiences taught us and our customers an important lesson:

Orchestration layers need to be monitored
You need a proper monitoring tool in place to know when your orchestration layer fails. You’re probably used to monitoring your database connection pools, your service queues, and of course your API performance metrics. You need to apply the same monitoring strategy to your orchestration layer.

Eureka and Hystrix are well-crafted pieces of code that allow for monitoring of all the important orchestration-layer related metrics.

Increased service lookup times and failing connections indicate that your orchestration layer need maintenance.

The lessons we gleaned from our customers’ experiences leads us to the conclusion that monitoring of the orchestration layer should be an integral feature of all full-stack monitoring tools. Service registries, circuit breakers, and other orchestration tools should be part of all newly created environments, as should database connection pools and messaging queues.

Short interruptions are handled automatically by the orchestration layer. Ongoing problems indicate that the orchestration layer is in trouble.

Keep control over your environment
Ideally, your orchestration layer is a simple set of tools that hardly ever fails. Unfortunately, things that hardly ever fail are not the initial focus of troubleshooting. So, when your orchestration layer fails, you may have a hard time learning about it quickly.

Orchestration-layer metrics can be a great indicator of looming problems. In cases of bad performance, take a look at your orchestration layer metrics. The orchestration layer may not be your hottest troublespot, but when when there is trouble there, you’ll be glad to know about it.

Do you know your metrics?
Is your environment based on microservices? Do you actively use service registries and circuit breakers? If so, do you know how much time service lookups consume? Do you know how often fallback mechanisms need to jump in? And do you know if your environment is truly self-healing?

If you have Netflix OSS components in use, you can add Netflix OSS monitoring by downloading the plugin file from our GitHub repo at JMX-Extensions/Netflix OSS extensions/plugin.json. Activating it is just a simple matter of uploading the file to your Dynatrace Ruxit environment.

In one of our upcoming releases, we will include this feature to be available out-of-the-box. Ruxit provides immediate value to you out-of-the-box. Haven’t tried Dynatrace Ruxit yet? Take the free trial and see how resilient your environement is. You may be surprised at how Ruxit’s auto-detection and zero-configuration have simplified monitoring setup.

The post Top 2 features of self-healing microservices appeared first on #monitoringlife.

Read the original blog entry...

More Stories By Dynatrace Blog

Building a revolutionary approach to software performance monitoring takes an extraordinary team. With decades of combined experience and an impressive history of disruptive innovation, that’s exactly what we ruxit has.

Get to know ruxit, and get to know the future of data analytics.

@MicroservicesExpo Stories
How is DevOps going within your organization? If you need some help measuring just how well it is going, we have prepared a list of some key DevOps metrics to track. These metrics can help you understand how your team is doing over time. The word DevOps means different things to different people. Some say it a culture and every vendor in the industry claims that their tools help with DevOps. Depending on how you define DevOps, some of these metrics may matter more or less to you and your team.
For many of us laboring in the fields of digital transformation, 2017 was a year of high-intensity work and high-reward achievement. So we’re looking forward to a little breather over the end-of-year holiday season. But we’re going to have to get right back on the Continuous Delivery bullet train in 2018. Markets move too fast and customer expectations elevate too precipitously for businesses to rest on their laurels. Here’s a DevOps “to-do list” for 2018 that should be priorities for anyone w...
If testing environments are constantly unavailable and affected by outages, release timelines will be affected. You can use three metrics to measure stability events for specific environments and plan around events that will affect your critical path to release.
In a recent post, titled “10 Surprising Facts About Cloud Computing and What It Really Is”, Zac Johnson highlighted some interesting facts about cloud computing in the SMB marketplace: Cloud Computing is up to 40 times more cost-effective for an SMB, compared to running its own IT system. 94% of SMBs have experienced security benefits in the cloud that they didn’t have with their on-premises service
DevOps failure is a touchy subject with some, because DevOps is typically perceived as a way to avoid failure. As a result, when you fail in a DevOps practice, the situation can seem almost hopeless. However, just as a fail-fast business approach, or the “fail and adjust sooner” methodology of Agile often proves, DevOps failures are actually a step in the right direction. They’re the first step toward learning from failures and turning your DevOps practice into one that will lead you toward even...
DevOps is under attack because developers don’t want to mess with infrastructure. They will happily own their code into production, but want to use platforms instead of raw automation. That’s changing the landscape that we understand as DevOps with both architecture concepts (CloudNative) and process redefinition (SRE). Rob Hirschfeld’s recent work in Kubernetes operations has led to the conclusion that containers and related platforms have changed the way we should be thinking about DevOps and...
While walking around the office I happened upon a relatively new employee dragging emails from his inbox into folders. I asked why and was told, “I’m just answering emails and getting stuff off my desk.” An empty inbox may be emotionally satisfying to look at, but in practice, you should never do it. Here’s why. I recently wrote a piece arguing that from a mathematical perspective, Messy Desks Are Perfectly Optimized. While it validated the genius of my friends with messy desks, it also gener...
The goal of Microservices is to improve software delivery speed and increase system safety as scale increases. Microservices being modular these are faster to change and enables an evolutionary architecture where systems can change, as the business needs change. Microservices can scale elastically and by being service oriented can enable APIs natively. Microservices also reduce implementation and release cycle time and enables continuous delivery. This paper provides a logical overview of the Mi...
The next XaaS is CICDaaS. Why? Because CICD saves developers a huge amount of time. CD is an especially great option for projects that require multiple and frequent contributions to be integrated. But… securing CICD best practices is an emerging, essential, yet little understood practice for DevOps teams and their Cloud Service Providers. The only way to get CICD to work in a highly secure environment takes collaboration, patience and persistence. Building CICD in the cloud requires rigorous ar...
The enterprise data storage marketplace is poised to become a battlefield. No longer the quiet backwater of cloud computing services, the focus of this global transition is now going from compute to storage. An overview of recent storage market history is needed to understand why this transition is important. Before 2007 and the birth of the cloud computing market we are witnessing today, the on-premise model hosted in large local data centers dominated enterprise storage. Key marketplace play...
The cloud revolution in enterprises has very clearly crossed the phase of proof-of-concepts into a truly mainstream adoption. One of most popular enterprise-wide initiatives currently going on are “cloud migration” programs of some kind or another. Finding business value for these programs is not hard to fathom – they include hyperelasticity in infrastructure consumption, subscription based models, and agility derived from rapid speed of deployment of applications. These factors will continue to...
Some people are directors, managers, and administrators. Others are disrupters. Eddie Webb (@edwardawebb) is an IT Disrupter for Software Development Platforms at Liberty Mutual and was a presenter at the 2016 All Day DevOps conference. His talk, Organically DevOps: Building Quality and Security into the Software Supply Chain at Liberty Mutual, looked at Liberty Mutual's transformation to Continuous Integration, Continuous Delivery, and DevOps. For a large, heavily regulated industry, this task ...
Following a tradition dating back to 2002 at ZapThink and continuing at Intellyx since 2014, it’s time for Intellyx’s annual predictions for the coming year. If you’re a long-time fan, you know we have a twist to the typical annual prediction post: we actually critique our predictions from the previous year. To make things even more interesting, Charlie and I switch off, judging the other’s predictions. And now that he’s been with Intellyx for more than a year, this Cortex represents my first ...
"Grape Up leverages Cloud Native technologies and helps companies build software using microservices, and work the DevOps agile way. We've been doing digital innovation for the last 12 years," explained Daniel Heckman, of Grape Up in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
The Toyota Production System, a world-renowned production system is based on the "complete elimination of all waste". The "Toyota Way", grounded on continuous improvement dates to the 1860s. The methodology is widely proven to be successful yet there are still industries within and tangential to manufacturing struggling to adopt its core principles: Jidoka: a process should stop when an issue is identified prevents releasing defective products
We seem to run this cycle with every new technology that comes along. A good idea with practical applications is born, then both marketers and over-excited users start to declare it is the solution for all or our problems. Compliments of Gartner, we know it generally as “The Hype Cycle”, but each iteration is a little different. 2018’s flavor will be serverless computing, and by 2018, I mean starting now, but going most of next year, you’ll be sick of it. We are already seeing people write such...
Defining the term ‘monitoring’ is a difficult task considering the performance space has evolved significantly over the years. Lately, there has been a shift in the monitoring world, sparking a healthy debate regarding the definition and purpose of monitoring, through which a new term has emerged: observability. Some of that debate can be found in blogs by Charity Majors and Cindy Sridharan.
It’s “time to move on from DevOps and continuous delivery.” This was the provocative title of a recent article in ZDNet, in which Kelsey Hightower, staff developer advocate at Google Cloud Platform, suggested that “software shops should have put these concepts into action years ago.” Reading articles like this or listening to talks at most DevOps conferences might make you think that we’re entering a post-DevOps world. But vast numbers of organizations still struggle to start and drive transfo...
Let's do a visualization exercise. Imagine it's December 31, 2018, and you're ringing in the New Year with your friends and family. You think back on everything that you accomplished in the last year: your company's revenue is through the roof thanks to the success of your product, and you were promoted to Lead Developer. 2019 is poised to be an even bigger year for your company because you have the tools and insight to scale as quickly as demand requires. You're a happy human, and it's not just...
"Opsani helps the enterprise adopt containers, help them move their infrastructure into this modern world of DevOps, accelerate the delivery of new features into production, and really get them going on the container path," explained Ross Schibler, CEO of Opsani, and Peter Nickolov, CTO of Opsani, in this SYS-CON.tv interview at DevOps Summit at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.