|By Dynatrace Blog||
|April 23, 2016 11:00 AM EDT||
Top Two Features of Self Healing Microservices
By Martin Goodwell
Microservices-based environments are more complex than their monolithic counterparts. To operate microservices environments with the same level of convenience that you’ve come expect from operating self-contained monolithic application environments, you need to have the right tools in place and rely on best practices that will keep your microservices healthy.
We’ve noticing an increasing number of microservices environments deployed by our customers, and the trend only seems to be increasing. We recently asked some of these customers about their experiences. Most of their responses were in line with expectations. Two issues however caught us by surprise.
The two issues that surprised us
While increased complexity and the proliferation of services is a challenge that most customers contend with, it’s the orchestration layer that seems to cause the most trouble for customers. The complexity of microservices environments requires automated operation and self-healing capabilities. This means that some sort of orchestration layer is required.
The first issue for many customers is that they haven’t even considered using a tool for orchestration. Because they don’t have an orchestration layer in place they’ve been frustrated from the beginning. After a few deployments, maintaining different instances of services simply became too complicated.
This is too bad because plenty of good open-source tools are available to facilitate the operation of microservices. For example, Netflix OSS—one of the toolsets most commonly used by our customers—provides a number of utilities that Netflix has developed over the years.
Of the dozens of available Netflix OSS tools and libraries, Eureka service registry and Hystrix circuit breaker are the most popular and likely also the easiest to use (Spring Cloud is a real easy to getting started with those technologies quickly, by the way). The service registry allows your microservices to register themselves upon startup. Any client that intends to access a microservice can look up the available endpoint addresses via Eureka. This reduces the risk of accessing an unavailable service to near zero.
Netflix OSS Eureka and Hystrix at a glance
Circuit breakers keep your business going
Hystrix circuit breaker makes your service calls more resilient by keeping track of each endpoint’s status. Normally you face expensive request timeouts when an endpoint becomes unavailable. Hystrix saves you from such timeouts by “breaking” the connection to the endpoint (this is why Hystrix is called a “circuit breaker”). It then reports that the service as unavailable so that subsequent requests don’t run into the same timeouts. Hystrix then continues to poll the service in the background to see when it’s available again.
With the risk of accessing unavailable services reduced to near-zero and expensive timeouts reduced to a minimum, your website keeps your content online and your visitors happy. You probably already know how much money you lose with each additional second of wait time, so using a service registry and a circuit breaker is a no-brainer in my opinion.
The second problem reported by our customers really caught us by surprise: a large number of customers report that their orchestration layers cause increased troubleshooting effort when problems are encountered.
Self-healing systems impact troubleshooting?
We were puzzled by this at first because the orchestration layer is supposed to keep an environment healthy and self-healing—quite the opposite of what our customers were reporting. Once we dug into this issue a little deeper however, the cause of the additional troubleshooting effort became obvious. These customers were initially looking for problems only in their services, not the orchestration layer itself. So when the orchestration layer did fail, customers spent considerable time eliminating the possible root-causes in their own code before getting around to investigating the orchestration layer.
These experiences taught us and our customers an important lesson:
Orchestration layers need to be monitored
You need a proper monitoring tool in place to know when your orchestration layer fails. You’re probably used to monitoring your database connection pools, your service queues, and of course your API performance metrics. You need to apply the same monitoring strategy to your orchestration layer.
Eureka and Hystrix are well-crafted pieces of code that allow for monitoring of all the important orchestration-layer related metrics.
Increased service lookup times and failing connections indicate that your orchestration layer need maintenance.
The lessons we gleaned from our customers’ experiences leads us to the conclusion that monitoring of the orchestration layer should be an integral feature of all full-stack monitoring tools. Service registries, circuit breakers, and other orchestration tools should be part of all newly created environments, as should database connection pools and messaging queues.
Short interruptions are handled automatically by the orchestration layer. Ongoing problems indicate that the orchestration layer is in trouble.
Keep control over your environment
Ideally, your orchestration layer is a simple set of tools that hardly ever fails. Unfortunately, things that hardly ever fail are not the initial focus of troubleshooting. So, when your orchestration layer fails, you may have a hard time learning about it quickly.
Orchestration-layer metrics can be a great indicator of looming problems. In cases of bad performance, take a look at your orchestration layer metrics. The orchestration layer may not be your hottest troublespot, but when when there is trouble there, you’ll be glad to know about it.
Do you know your metrics?
Is your environment based on microservices? Do you actively use service registries and circuit breakers? If so, do you know how much time service lookups consume? Do you know how often fallback mechanisms need to jump in? And do you know if your environment is truly self-healing?
If you have Netflix OSS components in use, you can add Netflix OSS monitoring by downloading the plugin file from our GitHub repo at JMX-Extensions/Netflix OSS extensions/plugin.json. Activating it is just a simple matter of uploading the file to your Dynatrace Ruxit environment.
In one of our upcoming releases, we will include this feature to be available out-of-the-box. Ruxit provides immediate value to you out-of-the-box. Haven’t tried Dynatrace Ruxit yet? Take the free trial and see how resilient your environement is. You may be surprised at how Ruxit’s auto-detection and zero-configuration have simplified monitoring setup.
Enterprise architects are increasingly adopting multi-cloud strategies as they seek to utilize existing data center assets, leverage the advantages of cloud computing and avoid cloud vendor lock-in. This requires a globally aware traffic management strategy that can monitor infrastructure health across data centers and end-user experience globally, while responding to control changes and system specification at the speed of today’s DevOps teams. In his session at 20th Cloud Expo, Josh Gray, Chie...
Apr. 23, 2017 04:45 AM EDT Reads: 2,914
To more closely examine the variety of ways in which IT departments around the world are integrating cloud services, and the effect hybrid IT has had on their organizations and IT job roles, SolarWinds recently released the SolarWinds IT Trends Report 2017: Portrait of a Hybrid Organization. This annual study consists of survey-based research that explores significant trends, developments, and movements related to and directly affecting IT and IT professionals.
Apr. 23, 2017 04:00 AM EDT Reads: 1,024
Is your application too difficult to manage? Do changes take dozens of developers hundreds of hours to execute, and frequently result in downtime across all your site’s functions? It sounds like you have a monolith! A monolith is one of the three main software architectures that define most applications. Whether you’ve intentionally set out to create a monolith or not, it’s worth at least weighing the pros and cons of the different architectural approaches and deciding which one makes the most s...
Apr. 22, 2017 09:30 PM EDT Reads: 2,404
This recent research on cloud computing from the Register delves a little deeper than many of the "We're all adopting cloud!" surveys we've seen. They found that meaningful cloud adoption and the idea of the cloud-first enterprise are still not reality for many businesses. The Register's stats also show a more gradual cloud deployment trend over the past five years, not any sort of explosion. One important takeaway is that coherence across internal and external clouds is essential for IT right n...
Apr. 22, 2017 07:30 PM EDT Reads: 1,225
Software as a service (SaaS), one of the earliest and most successful cloud services, has reached mainstream status. According to Cisco, by 2019 more than four-fifths (83 percent) of all data center traffic will be based in the cloud, up from 65 percent today. The majority of this traffic will be applications. Businesses of all sizes are adopting a variety of SaaS-based services – everything from collaboration tools to mission-critical commerce-oriented applications. The rise in SaaS usage has m...
Apr. 22, 2017 06:15 PM EDT Reads: 4,558
Cloud Expo, Inc. has announced today that Aruna Ravichandran, vice president of DevOps Product and Solutions Marketing at CA Technologies, has been named co-conference chair of DevOps at Cloud Expo 2017. The @DevOpsSummit at Cloud Expo New York will take place on June 6-8, 2017, at the Javits Center in New York City, New York, and @DevOpsSummit at Cloud Expo Silicon Valley will take place Oct. 31-Nov. 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Apr. 22, 2017 05:30 PM EDT Reads: 2,171
Developers want to create better apps faster. Static clouds are giving way to scalable systems, with dynamic resource allocation and application monitoring. You won't hear that chant from users on any picket line, but helping developers to create better apps faster is the mission of Lee Atchison, principal cloud architect and advocate at New Relic Inc., based in San Francisco. His singular job is to understand and drive the industry in the areas of cloud architecture, microservices, scalability ...
Apr. 22, 2017 12:45 PM EDT Reads: 3,115
Back in February of 2017, Andrew Clay Schafer of Pivotal tweeted the following: “seriously tho, the whole software industry is stuck on deployment when we desperately need architecture and telemetry.” Intrigue in a 140 characters. For me, I hear Andrew saying, “we’re jumping to step 5 before we’ve successfully completed steps 1-4.”
Apr. 22, 2017 09:15 AM EDT Reads: 1,188
The proper isolation of resources is essential for multi-tenant environments. The traditional approach to isolate resources is, however, rather heavyweight. In his session at 18th Cloud Expo, Igor Drobiazko, co-founder of elastic.io, drew upon his own experience with operating a Docker container-based infrastructure on a large scale and present a lightweight solution for resource isolation using microservices. He also discussed the implementation of microservices in data and application integrat...
Apr. 22, 2017 05:45 AM EDT Reads: 5,849
We'd all like to fulfill that "find a job you love and you'll never work a day in your life" cliché. But in reality, every job (even if it's our dream job) comes with its downsides. For you, the constant fight against shadow IT might get on your last nerves. For your developer coworkers, infrastructure management is the roadblock that stands in the way of focusing on coding. As you watch more and more applications and processes move to the cloud, technology is coming to developers' rescue-most r...
Apr. 22, 2017 04:00 AM EDT Reads: 3,896
2016 has been an amazing year for Docker and the container industry. We had 3 major releases of Docker engine this year , and tremendous increase in usage. The community has been following along and contributing amazing Docker resources to help you learn and get hands-on experience. Here’s some of the top read and viewed content for the year. Of course releases are always really popular, particularly when they fit requests we had from the community.
Apr. 22, 2017 03:45 AM EDT Reads: 3,370
Keeping pace with advancements in software delivery processes and tooling is taxing even for the most proficient organizations. Point tools, platforms, open source and the increasing adoption of private and public cloud services requires strong engineering rigor – all in the face of developer demands to use the tools of choice. As Agile has settled in as a mainstream practice, now DevOps has emerged as the next wave to improve software delivery speed and output. To make DevOps work, organization...
Apr. 22, 2017 12:00 AM EDT Reads: 8,574
Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more business becomes digital the more stakeholders are interested in this data including how it relates to business. Some of these people have never used a monitoring tool before. They have a question on their mind like “How is my application doing” but no id...
Apr. 21, 2017 11:45 PM EDT Reads: 6,723
In large enterprises, environment provisioning and server provisioning account for a significant portion of the operations team's time. This often leaves users frustrated while they wait for these services. For instance, server provisioning can take several days and sometimes even weeks. At the same time, digital transformation means the need for server and environment provisioning is constantly growing. Organizations are adopting agile methodologies and software teams are increasing the speed ...
Apr. 21, 2017 07:45 PM EDT Reads: 3,067
Even for the most seasoned IT pros, the cloud is complicated. It can be difficult just to wrap your head around the many terms and acronyms that make up the cloud dictionary-not to mention actually mastering the technology. Unfortunately, complicated cloud terms are often combined to the point that their meanings are lost in a sea of conflicting opinions. Two terms that are used interchangeably (but shouldn't be) are hybrid cloud and multicloud. If you want to be the cloud expert your company ne...
Apr. 21, 2017 04:15 PM EDT Reads: 2,138
In his general session at 19th Cloud Expo, Manish Dixit, VP of Product and Engineering at Dice, discussed how Dice leverages data insights and tools to help both tech professionals and recruiters better understand how skills relate to each other and which skills are in high demand using interactive visualizations and salary indicator tools to maximize earning potential. Manish Dixit is VP of Product and Engineering at Dice. As the leader of the Product, Engineering and Data Sciences team at D...
Apr. 21, 2017 02:45 AM EDT Reads: 5,607
In his session at 20th Cloud Expo, Scott Davis, CTO of Embotics, will discuss how automation can provide the dynamic management required to cost-effectively deliver microservices and container solutions at scale. He will discuss how flexible automation is the key to effectively bridging and seamlessly coordinating both IT and developer needs for component orchestration across disparate clouds – an increasingly important requirement at today’s multi-cloud enterprise.
Apr. 19, 2017 06:00 AM EDT Reads: 4,213
SYS-CON Events announced today that CollabNet, a global leader in enterprise software development, release automation and DevOps solutions, will be a Bronze Sponsor of SYS-CON's 20th International Cloud Expo®, taking place from June 6-8, 2017, at the Javits Center in New York City, NY. CollabNet offers a broad range of solutions with the mission of helping modern organizations deliver quality software at speed. The company’s latest innovation, the DevOps Lifecycle Manager (DLM), supports Value S...
Apr. 18, 2017 03:30 PM EDT Reads: 4,229
The human body is the most complex machine ever created! With a complex network of interconnected organs, millions of cells and the most advanced processor, human body is the most automated system in this planet. In this article, we will draw comparisons between working of a human body to that of a datacenter. We will learn how self-defense and self-healing capabilities of our human body is similar to firewalls and intelligent monitoring capabilities in our datacenters. We will draw parallels b...
Apr. 16, 2017 01:00 PM EDT Reads: 2,670
Cloud adoption is often driven by a desire to increase efficiency, boost agility and save money. All too often, however, the reality involves unpredictable cost spikes and lack of oversight due to resource limitations. In his session at 20th Cloud Expo, Joe Kinsella, CTO and Founder of CloudHealth Technologies, will tackle the question: “How do you build a fully optimized cloud?” He will examine: Why TCO is critical to achieving cloud success – and why attendees should be thinking holisticall...
Apr. 16, 2017 10:00 AM EDT Reads: 2,755