|By Dynatrace Blog||
|April 23, 2016 11:00 AM EDT||
Top Two Features of Self Healing Microservices
By Martin Goodwell
Microservices-based environments are more complex than their monolithic counterparts. To operate microservices environments with the same level of convenience that you’ve come expect from operating self-contained monolithic application environments, you need to have the right tools in place and rely on best practices that will keep your microservices healthy.
We’ve noticing an increasing number of microservices environments deployed by our customers, and the trend only seems to be increasing. We recently asked some of these customers about their experiences. Most of their responses were in line with expectations. Two issues however caught us by surprise.
The two issues that surprised us
While increased complexity and the proliferation of services is a challenge that most customers contend with, it’s the orchestration layer that seems to cause the most trouble for customers. The complexity of microservices environments requires automated operation and self-healing capabilities. This means that some sort of orchestration layer is required.
The first issue for many customers is that they haven’t even considered using a tool for orchestration. Because they don’t have an orchestration layer in place they’ve been frustrated from the beginning. After a few deployments, maintaining different instances of services simply became too complicated.
This is too bad because plenty of good open-source tools are available to facilitate the operation of microservices. For example, Netflix OSS—one of the toolsets most commonly used by our customers—provides a number of utilities that Netflix has developed over the years.
Of the dozens of available Netflix OSS tools and libraries, Eureka service registry and Hystrix circuit breaker are the most popular and likely also the easiest to use (Spring Cloud is a real easy to getting started with those technologies quickly, by the way). The service registry allows your microservices to register themselves upon startup. Any client that intends to access a microservice can look up the available endpoint addresses via Eureka. This reduces the risk of accessing an unavailable service to near zero.
Netflix OSS Eureka and Hystrix at a glance
Circuit breakers keep your business going
Hystrix circuit breaker makes your service calls more resilient by keeping track of each endpoint’s status. Normally you face expensive request timeouts when an endpoint becomes unavailable. Hystrix saves you from such timeouts by “breaking” the connection to the endpoint (this is why Hystrix is called a “circuit breaker”). It then reports that the service as unavailable so that subsequent requests don’t run into the same timeouts. Hystrix then continues to poll the service in the background to see when it’s available again.
With the risk of accessing unavailable services reduced to near-zero and expensive timeouts reduced to a minimum, your website keeps your content online and your visitors happy. You probably already know how much money you lose with each additional second of wait time, so using a service registry and a circuit breaker is a no-brainer in my opinion.
The second problem reported by our customers really caught us by surprise: a large number of customers report that their orchestration layers cause increased troubleshooting effort when problems are encountered.
Self-healing systems impact troubleshooting?
We were puzzled by this at first because the orchestration layer is supposed to keep an environment healthy and self-healing—quite the opposite of what our customers were reporting. Once we dug into this issue a little deeper however, the cause of the additional troubleshooting effort became obvious. These customers were initially looking for problems only in their services, not the orchestration layer itself. So when the orchestration layer did fail, customers spent considerable time eliminating the possible root-causes in their own code before getting around to investigating the orchestration layer.
These experiences taught us and our customers an important lesson:
Orchestration layers need to be monitored
You need a proper monitoring tool in place to know when your orchestration layer fails. You’re probably used to monitoring your database connection pools, your service queues, and of course your API performance metrics. You need to apply the same monitoring strategy to your orchestration layer.
Eureka and Hystrix are well-crafted pieces of code that allow for monitoring of all the important orchestration-layer related metrics.
Increased service lookup times and failing connections indicate that your orchestration layer need maintenance.
The lessons we gleaned from our customers’ experiences leads us to the conclusion that monitoring of the orchestration layer should be an integral feature of all full-stack monitoring tools. Service registries, circuit breakers, and other orchestration tools should be part of all newly created environments, as should database connection pools and messaging queues.
Short interruptions are handled automatically by the orchestration layer. Ongoing problems indicate that the orchestration layer is in trouble.
Keep control over your environment
Ideally, your orchestration layer is a simple set of tools that hardly ever fails. Unfortunately, things that hardly ever fail are not the initial focus of troubleshooting. So, when your orchestration layer fails, you may have a hard time learning about it quickly.
Orchestration-layer metrics can be a great indicator of looming problems. In cases of bad performance, take a look at your orchestration layer metrics. The orchestration layer may not be your hottest troublespot, but when when there is trouble there, you’ll be glad to know about it.
Do you know your metrics?
Is your environment based on microservices? Do you actively use service registries and circuit breakers? If so, do you know how much time service lookups consume? Do you know how often fallback mechanisms need to jump in? And do you know if your environment is truly self-healing?
If you have Netflix OSS components in use, you can add Netflix OSS monitoring by downloading the plugin file from our GitHub repo at JMX-Extensions/Netflix OSS extensions/plugin.json. Activating it is just a simple matter of uploading the file to your Dynatrace Ruxit environment.
In one of our upcoming releases, we will include this feature to be available out-of-the-box. Ruxit provides immediate value to you out-of-the-box. Haven’t tried Dynatrace Ruxit yet? Take the free trial and see how resilient your environement is. You may be surprised at how Ruxit’s auto-detection and zero-configuration have simplified monitoring setup.
Building custom add-ons does not need to be limited to the ideas you see on a marketplace. In his session at 20th Cloud Expo, Sukhbir Dhillon, CEO and founder of Addteq, will go over some adventures they faced in developing integrations using Atlassian SDK and other technologies/platforms and how it has enabled development teams to experiment with newer paradigms like Serverless and newer features of Atlassian SDKs. In this presentation, you will be taken on a journey of Add-On and Integration ...
Mar. 27, 2017 08:15 AM EDT Reads: 3,050
Culture is the most important ingredient of DevOps. The challenge for most organizations is defining and communicating a vision of beneficial DevOps culture for their organizations, and then facilitating the changes needed to achieve that. Often this comes down to an ability to provide true leadership. As a CIO, are your direct reports IT managers or are they IT leaders? The hard truth is that many IT managers have risen through the ranks based on their technical skills, not their leadership abi...
Mar. 27, 2017 05:00 AM EDT Reads: 11,042
The essence of cloud computing is that all consumable IT resources are delivered as services. In his session at 15th Cloud Expo, Yung Chou, Technology Evangelist at Microsoft, demonstrated the concepts and implementations of two important cloud computing deliveries: Infrastructure as a Service (IaaS) and Platform as a Service (PaaS). He discussed from business and technical viewpoints what exactly they are, why we care, how they are different and in what ways, and the strategies for IT to transi...
Mar. 27, 2017 05:00 AM EDT Reads: 6,163
Without a clear strategy for cost control and an architecture designed with cloud services in mind, costs and operational performance can quickly get out of control. To avoid multiple architectural redesigns requires extensive thought and planning. Boundary (now part of BMC) launched a new public-facing multi-tenant high resolution monitoring service on Amazon AWS two years ago, facing challenges and learning best practices in the early days of the new service.
Mar. 27, 2017 03:45 AM EDT Reads: 2,973
All organizations that did not originate this moment have a pre-existing culture as well as legacy technology and processes that can be more or less amenable to DevOps implementation. That organizational culture is influenced by the personalities and management styles of Executive Management, the wider culture in which the organization is situated, and the personalities of key team members at all levels of the organization. This culture and entrenched interests usually throw a wrench in the work...
Mar. 27, 2017 03:00 AM EDT Reads: 2,995
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm.
Mar. 27, 2017 12:45 AM EDT Reads: 2,117
As software becomes more and more complex, we, as software developers, have been splitting up our code into smaller and smaller components. This is also true for the environment in which we run our code: going from bare metal, to VMs to the modern-day Cloud Native world of containers, schedulers and micro services. While we have figured out how to run containerized applications in the cloud using schedulers, we've yet to come up with a good solution to bridge the gap between getting your contain...
Mar. 26, 2017 09:45 PM EDT Reads: 7,641
As organizations realize the scope of the Internet of Things, gaining key insights from Big Data, through the use of advanced analytics, becomes crucial. However, IoT also creates the need for petabyte scale storage of data from millions of devices. A new type of Storage is required which seamlessly integrates robust data analytics with massive scale. These storage systems will act as “smart systems” provide in-place analytics that speed discovery and enable businesses to quickly derive meaningf...
Mar. 26, 2017 07:45 PM EDT Reads: 9,608
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm. In his Day 3 Keynote at 20th Cloud Expo, Chris Brown, a Solutions Marketing Manager at Nutanix, will explore t...
Mar. 26, 2017 03:15 PM EDT Reads: 2,844
DevOps has often been described in terms of CAMS: Culture, Automation, Measuring, Sharing. While we’ve seen a lot of focus on the “A” and even on the “M”, there are very few examples of why the “C" is equally important in the DevOps equation. In her session at @DevOps Summit, Lori MacVittie, of F5 Networks, explored HTTP/1 and HTTP/2 along with Microservices to illustrate why a collaborative culture between Dev, Ops, and the Network is critical to ensuring success.
Mar. 26, 2017 03:00 PM EDT Reads: 10,592
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend @CloudExpo | @ThingsExpo, June 6-8, 2017, at the Javits Center in New York City, NY and October 31 - November 2, 2017, Santa Clara Convention Center, CA. Learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
Mar. 26, 2017 01:45 PM EDT Reads: 8,572
Everyone wants to use containers, but monitoring containers is hard. New ephemeral architecture introduces new challenges in how monitoring tools need to monitor and visualize containers, so your team can make sense of everything. In his session at @DevOpsSummit, David Gildeh, co-founder and CEO of Outlyer, will go through the challenges and show there is light at the end of the tunnel if you use the right tools and understand what you need to be monitoring to successfully use containers in your...
Mar. 26, 2017 01:00 PM EDT Reads: 1,612
What if you could build a web application that could support true web-scale traffic without having to ever provision or manage a single server? Sounds magical, and it is! In his session at 20th Cloud Expo, Chris Munns, Senior Developer Advocate for Serverless Applications at Amazon Web Services, will show how to build a serverless website that scales automatically using services like AWS Lambda, Amazon API Gateway, and Amazon S3. We will review several frameworks that can help you build serverle...
Mar. 26, 2017 12:45 PM EDT Reads: 1,942
The IT industry is undergoing a significant evolution to keep up with cloud application demand. We see this happening as a mindset shift, from traditional IT teams to more well-rounded, cloud-focused job roles. The IT industry has become so cloud-minded that Gartner predicts that by 2020, this cloud shift will impact more than $1 trillion of global IT spending. This shift, however, has left some IT professionals feeling a little anxious about what lies ahead. The good news is that cloud computin...
Mar. 26, 2017 10:30 AM EDT Reads: 1,285
SYS-CON Events announced today that HTBase will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. HTBase (Gartner 2016 Cool Vendor) delivers a Composable IT infrastructure solution architected for agility and increased efficiency. It turns compute, storage, and fabric into fluid pools of resources that are easily composed and re-composed to meet each application’s needs. With HTBase, companies can quickly prov...
Mar. 26, 2017 08:15 AM EDT Reads: 2,929
An overall theme of Cloud computing and the specific practices within it is fundamentally one of automation. The core value of technology is to continually automate low level procedures to free up people to work on more value add activities, ultimately leading to the utopian goal of full Autonomic Computing. For example a great way to define your plan for DevOps tool chain adoption is through this lens. In this TechTarget article they outline a simple maturity model for planning this.
Mar. 26, 2017 06:00 AM EDT Reads: 4,289
While DevOps most critically and famously fosters collaboration, communication, and integration through cultural change, culture is more of an output than an input. In order to actively drive cultural evolution, organizations must make substantial organizational and process changes, and adopt new technologies, to encourage a DevOps culture. Moderated by Andi Mann, panelists discussed how to balance these three pillars of DevOps, where to focus attention (and resources), where organizations might...
Mar. 26, 2017 05:15 AM EDT Reads: 6,191
The rise of containers and microservices has skyrocketed the rate at which new applications are moved into production environments today. While developers have been deploying containers to speed up the development processes for some time, there still remain challenges with running microservices efficiently. Most existing IT monitoring tools don’t actually maintain visibility into the containers that make up microservices. As those container applications move into production, some IT operations t...
Mar. 26, 2017 01:00 AM EDT Reads: 2,980
For organizations that have amassed large sums of software complexity, taking a microservices approach is the first step toward DevOps and continuous improvement / development. Integrating system-level analysis with microservices makes it easier to change and add functionality to applications at any time without the increase of risk. Before you start big transformation projects or a cloud migration, make sure these changes won’t take down your entire organization.
Mar. 25, 2017 09:45 PM EDT Reads: 3,653
Software development is a moving target. You have to keep your eye on trends in the tech space that haven’t even happened yet just to stay current. Consider what’s happened with augmented reality (AR) in this year alone. If you said you were working on an AR app in 2015, you might have gotten a lot of blank stares or jokes about Google Glass. Then Pokémon GO happened. Like AR, the trends listed below have been building steam for some time, but they’ll be taking off in surprising new directions b...
Mar. 25, 2017 01:30 PM EDT Reads: 5,841