Welcome!

Microservices Expo Authors: Elizabeth White, Liz McMillan, Pat Romanski, Aruna Ravichandran, Cameron Van Orman

Related Topics: Microservices Expo, Java IoT

Microservices Expo: Article

Genesis of a Genetic Algorithm

Understanding GAs from business to implementation

Have you ever wondered how a good idea is transformed into business value? Have you ever thought how does someone take an abstract idea and produce business value from apparent nothingness? Have you ever postulated what can you do to leverage the assets at your disposal for the greater good? If so, than sit tight because you are about to take a journey that will equip your inquisitiveness with the step by step actions that FireScope took when it transformed its corporate wide historical metrics from passive data asset into the next level of business intelligence.

With unique insight, FireScope is able to identify potentially hidden relationships from among your IT assets and reveal cause-and-effect metrics which you might not have even known existed. This wizardry is accomplished using a distinctive genetic algorithm solution and yet is as simple to execute as applying a few mouse clicks. In this article we will introduce the business idea that triggered subsequent investigations, which lead to initial analysis, and ultimately lead to the completion of FireScope's genetic algorithm implementation. This journey will cover several topics such as business proposition, data normalization, and genetic modeling. But at the end of the journey you'll recognize that the combination of metric collection and data comparison delivered by FireScope Inc. is unmatched in the IT industry.

But do take caution, because the later portion of this article is presented from the perspective that you already possess a basic understanding of what a genetic algorithm is, as well as some of the concepts that are used in modeling a genetic algorithm. Fear not if you are not yet there, as you can still garner significant benefit from this article without understanding the nuts and bolts of it. Let the journey begin.

FireScope from 30,000 feet
As background, a functioning FireScope deployment has the ability to gather metrics from all forms of existing IT assets, normalize the gathered metrics, provide historical analysis of the metrics, and most importantly provide service views for worldwide operations which are unparalleled in the IT industry.

FireScope collects a vast array of corporate wide metrics. Some examples are CPU utilization, disk storage, host temperature, interface traffic, and memory utilization. Other examples include database metrics, JMX metrics, NetApp metrics, VMware metrics, web server response time, and so many more that we've neither the time nor the space to mention them all. It is also important to understand that every metric gathered by FireScope is collected on its own schedule. Even two or more metrics that have the same collection interval do not collect at exactly the same instant. So with this universe of data the natural question became what can we do to leverage this asset and deliver the next level of business intelligence?

Revealing hidden secrets
If you've been around the IT world long enough than you've likely experienced a time when an update was made to a web-server which invoked new or existing services from an application server which in turn caused dead locks on your database server. Unfortunately, the dead locks did not occur in test because they were load related and as a result weren't uncovered until your public facing application was placed under heavy load on Cyber Monday. Don't worry you didn't need those sales anyway!

Now to be fair, anyone who consistently monitors their IT infrastructure can identify that their web server is not performing as expected. Furthermore, if you know all of the relationships between your web servers, application servers, and database servers, you can even set up static alerts that draw your attention to the notion that one layer of your business is impacting another layer. But where the story gets really interesting is if you fall into one or more of the following categories:

  1. The alerting values that you set are either too high or too low.
  2. You have not taken the time to set up alerts for related IT assets.
  3. You are not aware of the relationships between your IT assets.
  4. You do not properly monitor your IT assets

I hope you can see that this is a very complex world that we live in. To detect a catastrophic corporate shutdown you need tools that can search out the fact that independent metrics collected from independent servers are impacting one another. This is exactly the idea that FireScope sought to solve when it postulated the use of a genetic algorithm to provide optimal search heuristics and uncover hidden relationships in the very same metrics that it had already collected from your IT assets. But not so fast, in order to accomplish this goal we need the ability to compare disparate metrics.

Not all time is created equal
As noted above, FireScope collects a universe of metrics from a universe of IT assets and each is collected on its own schedule. As an example consider CPU utilization taken from two different hosts both being polled on 30 second intervals. Since both are polled on their own schedule we really couldn't say that the two metrics had similar signatures if our metric collection times were not equal. Yet another problem arises when you consider comparing two metrics that have different collection intervals such as CPU utilization collected every 30 seconds on one asset vs. CPU Utilization collected every 5 minutes on another asset. How can these be easily compared?

FireScope needed the ability to easily normalize the time domain from all collected metrics in order to fairly and accurately compare metrics collected on independent schedules. As it turns out, FireScope already trends all metrics that have numeric representations. You can think of trending as averaging over time. But for the purposes of FireScope's genetic algorithm, the trending operation also contributed an effective normalization in the time domain for all metrics having a numeric representation. So problem one is solved, because all numeric metrics are trended every hour starting on the hour.

Not all units have the same value
Let's trade one for one. I'll give you a nickel for every dollar you give me. Sound fair? Yea, I didn't think you would go for that either, but that thought does bring to light the next challenge of comparing metrics having differing units. The problem becomes even more challenging when you consider that in some instances the units might not even be from the same domain! Consider the chart below which details just a few metric/unit pairings from your IT assets:

Metric

Units

Interface Traffic

Bytes/second

CPU Utilization

Percentage

Host temperature

Degrees F or C

Web Server Response Time

Seconds

How can "Degrees F" be compared to "Percent CPU Utilization"? FireScope needed the ability to compare differing metrics collected across your IT infrastructure each having potentially different metric/unit representations.

Well, what if we compared the relative rise/fall of collected values instead of comparing the values themselves? In doing so we wouldn't be comparing the values themselves, but the increase or decrease in metric value over time. In short, FireScope calculates the tangent, or rate of change, between hourly trends for all collected metrics as a pre-processing step for its genetic algorithm. As you'll see later, this series of data is used to form a "gene" or genetic sequence and subsequently used to compare one metric against another to determine how closely the two signatures rise or fall within the same time frame.

Action and reaction
If you're in IT, you will most likely care if a rise in web traffic caused a delayed rise in CPU activity on another system which in turn caused a general slowdown in your business response times.

The third level of value that FireScope delivers with its genetic algorithm solution is the ability to search out cause-and-effect metrics from within your IT assets. By applying a sliding window comparison of a selected metric against the searched metrics, FireScope can search out the possibility that deviations in one metric appear before or after a target metric anomaly.  In doing so, the notion that one metric caused another metric to deviate is displayed graphically by a simple time series display. Furthermore this cause and effect rendering can appear in either direction.

  1. The target metric may have deviated because it is impacted by some other metric
  2. The target metric deviation may have impacted some other metric in your system
  3. Or both of the above are true and your system has multiple cause-and-effect metrics

But once again you may not have even known that these metrics have cause-and-effect relationships or that the relationships exhibit delayed response signatures.

Starting the analysis
Let's assume that you fall into category 3 referenced above which implies that you are actively monitoring your IT assets, but you might not be aware of all of the relationships between all of your IT assets. Let's also assume that you are experiencing a slow-down in one of your important business operations, but other than the apparent slow-down you can't quite explain why this portion of your business is slow. You observe that your CPU load is higher than normal on one system which is displayed via a historical graph of the system CPU load. You start FireScope's analytics and select this same metric. You provide the analysis start time by selecting the data area just prior to the point in time where the CPU load started to rise. Next you select the analysis end time by marking the data area after the CPU returned to normal, or you select now because the CPU load is still high. As a last step you trigger the Genetic Algorithm analysis, asking FireScope to search out other metrics that exhibit a similar metric response during the same timeframe as the selected metric. The result of the analysis is the top 5 metrics that most closely match the signature of the selected metric. The value of the analysis is that the resulting metrics may very well contain metrics that either caused the spike in CPU load that is being evaluated, or these other metrics may be being impacted by the selected metric.

Components of a genetic algorithm
A genetic algorithm is an optimized search solution that attempts to mimic the process of natural evolution. Natural evolution uses "genes", "chromosomes", "genetic-mutation", "genetic-crossover" and multiple generations to produce improvements in nature. Of course sometimes natural evolution produces defects or mutations, but just as in nature, these apparent defects can turn out to be extremely valuable assets in the evolutionary process. Genetic algorithms attempt to simulate natural evolution concepts in software by expressing an optimized search solution using these very same natural evolution concepts.

The figure below illustrates FireScope's use of several genetic algorithm constructs and provides a brief synopsis of the genetic algorithm process.

GA building blocks, the "gene"
Let's use a bottom up approach and talk about genes first. As was mentioned above, FireScope compares the relative change over time of disparate metric values. This comparison is accomplished by first digitizing the comparison values into a representative alphabet where each character in the alphabet represents the change in slope, or tangent, between two sequential trend values. Since time is always increasing in this domain the only values that are relevant are values that fall between -π/2 (-90 degrees) to π/2 (90 degrees). FireScope divides this range into 90 distinct buckets each representing one letter of a 90 character alphabet. The digitization process allows for optimized comparison and also filters out small changes that are not significant enough to impact the comparison process. The series of alphabetic values representing digitized values from one metric are encoded into a gene starting from the earliest time slot under consideration to the latest.

GA building blocks, the "chromosome"
Moving up in our bottom up approach is the formation of multiple genes into a chromosome. FireScope's goal is to identify five metrics that are most closely related to a pre-selected target metric in a time range that is slightly wider than the selected metric time range. As a result, FireScope creates a chromosome from five genes of other metric trend values each of which were digitized into a gene. The resulting chromosome represents one possible solution from among millions of combinations of solutions. As the genetic algorithm progresses, it will search out from among millions of chromosomes five other chromosomes that best match the trend pattern of the target chromosome.

GA building blocks, population/fitness/scoring
The first selection of genes to form the initial chromosome population is purely a random selection from all existing numeric trend data. FireScope creates an initial population of several hundred chromosomes (possible solutions) and then applies a fitness algorithm to this population. The fitness algorithm assigns a numeric value which represents an assessment of how closely each chromosome in the population matches the target metric. Chromosomes that score the highest are chosen for mating to produce the next generation. The desired intent is that improved next generation chromosomes are the natural result of combining chromosomes from the best parents of the prior generation.

GA building blocks, mating
This process is sometimes referred to as genetic crossover, and is the process of selecting some genes (metrics) from each of two different parents to produce a new child chromosome. The new child chromosome is 5 genes made via crossover or mating from two high scoring parents from the prior generation. This chromosome, as with all others, represents one possible solution from among millions of possible solutions that might be the 5 metrics that most closely resemble the signature of the pre-selected target metric.

GA building blocks, "mutations"
After mating a small proportion of the new generation of chromosomes are chosen to be randomly mutated. FireScope uses the mutation process to inject previously unexplored genes (metrics) into the search process. If a chromosome is selected for mutation a random gene is replaced by a gene from a metric that has not yet been explored. This mutation process can have the effect of improving the overall score of the selected chromosome, or degrading it. However the randomness of this process has been shown to improve GA search capabilities just as mutation in nature sometime provides improvement though natural genetic evolution.

Cause-and effect
In comparing other metrics against a target metric, FireScope evaluates a longer timeframe than the time range selected by the user's target metric. The compared metric trend values are compared both prior to the target metric and after the target metric. By expanding the search window and sliding the target metric over the searched metric FireScope delivers the ability to detect if a searched metric may have caused the target metric to deviate from a normal signature, or if the target metric caused other metrics to deviate from their normal signature. This determination is accomplished by detecting the similarity of the target metric to the searched metric. This approach has nothing to do with genetic algorithms, but is simply a higher level value that is extracted from the FireScope's genetic algorithm implementation.

GA building blocks, "completion"
After several thousands of generations have been evaluated and the improvement of scoring has slowed to an acceptable level the genetic algorithm completes and the top scoring chromosome from the last generation is selected as the best solution. This chromosome contains 5 genes that represent the top scoring metrics which most closely match the target metric. Each gene (metric) from the top scoring chromosome can be displayed back to the user for further investigation, and each is displayed on the same graph as the selected target metric.

Conclusion
If this is your first exposure to genetic algorithm techniques, it can be overwhelming to try to understand all of this Genetic Algorithm terminology. Concepts such as genes, chromosomes, mutation, and fitness algorithms are difficult to conceptualize. While FireScope uses genetic algorithm techniques, it is important to understand that this approach is nothing more than achieving optimal search times to deliver the business value of revealing possibly unknown relationships between disparate metrics collected throughout your IT assets.

It is also instructive to realize that the real ingenuity in this approach is not in the application of the genetic algorithm, but rather in the normalization techniques that were applied to deliver the ability to compare disparate metrics. While the application of the genetic algorithm does provide optimized search results, these results could not have been achieved if it weren't for the initial work of normalizing the time domain, normalizing the value domain, and implementing the sliding window analysis that delivers the ability to uncover delayed cause-and-effect metrics hidden within your IT infrastructure.

FireScope's genetic algorithm coupled with your inquisitiveness form a near super human capability that exists nowhere else in the IT industry. As with all supernatural powers, you must use them wisely!

References

More Stories By Pete Whitney

Pete Whitney is a Solutions Architect for Cloudera. His primary role at Cloudera is guiding and assisting Cloudera's clients through successful adoption of Cloudera's Enterprise Data Hub and surrounding technologies.

Previously Pete served as VP of Cloud Development for FireScope Inc. In the advertising industry Pete designed and delivered DG Fastchannel’s internet-based advertising distribution architecture. Pete also excelled in other areas including design enhancements in robotic machine vision systems for FSI International Inc. These enhancements included mathematical changes for improved accuracy, improved speed, and automated calibration. He also designed a narrow spectrum light source, and a narrow spectrum band pass camera filter for controlled machine vision imaging.

Pete graduated Cum Laude from the University of Texas at Dallas, and holds a BS in Computer Science. Pete can be contacted via Email at [email protected]

@MicroservicesExpo Stories
Nordstrom is transforming the way that they do business and the cloud is the key to enabling speed and hyper personalized customer experiences. In his session at 21st Cloud Expo, Ken Schow, VP of Engineering at Nordstrom, will discuss some of the key learnings and common pitfalls of large enterprises moving to the cloud. This includes strategies around choosing a cloud provider(s), architecture, and lessons learned. In addition, he’ll go over some of the best practices for structured team migrat...
As people view cloud as a preferred option to build IT systems, the size of the cloud-based system is getting bigger and more complex. As the system gets bigger, more people need to collaborate from design to management. As more people collaborate to create a bigger system, the need for a systematic approach to automate the process is required. Just as in software, cloud now needs DevOps. In this session, the audience can see how people can solve this issue with a visual model. Visual models ha...
We all know that end users experience the Internet primarily with mobile devices. From an app development perspective, we know that successfully responding to the needs of mobile customers depends on rapid DevOps – failing fast, in short, until the right solution evolves in your customers' relationship to your business. Whether you’re decomposing an SOA monolith, or developing a new application cloud natively, it’s not a question of using microservices – not doing so will be a path to eventual b...
Enterprises are adopting Kubernetes to accelerate the development and the delivery of cloud-native applications. However, sharing a Kubernetes cluster between members of the same team can be challenging. And, sharing clusters across multiple teams is even harder. Kubernetes offers several constructs to help implement segmentation and isolation. However, these primitives can be complex to understand and apply. As a result, it’s becoming common for enterprises to end up with several clusters. Thi...
Containers are rapidly finding their way into enterprise data centers, but change is difficult. How do enterprises transform their architecture with technologies like containers without losing the reliable components of their current solutions? In his session at @DevOpsSummit at 21st Cloud Expo, Tony Campbell, Director, Educational Services at CoreOS, will explore the challenges organizations are facing today as they move to containers and go over how Kubernetes applications can deploy with lega...
Today most companies are adopting or evaluating container technology - Docker in particular - to speed up application deployment, drive down cost, ease management and make application delivery more flexible overall. As with most new architectures, this dream takes significant work to become a reality. Even when you do get your application componentized enough and packaged properly, there are still challenges for DevOps teams to making the shift to continuous delivery and achieving that reducti...
Transforming cloud-based data into a reportable format can be a very expensive, time-intensive and complex operation. As a SaaS platform with more than 30 million global users, Cornerstone OnDemand’s challenge was to create a scalable solution that would improve the time it took customers to access their user data. Our Real-Time Data Warehouse (RTDW) process vastly reduced data time-to-availability from 24 hours to just 10 minutes. In his session at 21st Cloud Expo, Mark Goldin, Chief Technolo...
Is advanced scheduling in Kubernetes achievable? Yes, however, how do you properly accommodate every real-life scenario that a Kubernetes user might encounter? How do you leverage advanced scheduling techniques to shape and describe each scenario in easy-to-use rules and configurations? In his session at @DevOpsSummit at 21st Cloud Expo, Oleg Chunikhin, CTO at Kublr, will answer these questions and demonstrate techniques for implementing advanced scheduling. For example, using spot instances ...
Digital transformation leaders have poured tons of money and effort into coding in recent years. And with good reason. To succeed at digital, you must be able to write great code. You also have to build a strong Agile culture so your coding efforts tightly align with market signals and business outcomes. But if your investments in testing haven’t kept pace with your investments in coding, you’ll lose. But if your investments in testing haven’t kept pace with your investments in coding, you’ll...
In his session at 21st Cloud Expo, Michael Burley, a Senior Business Development Executive in IT Services at NetApp, will describe how NetApp designed a three-year program of work to migrate 25PB of a major telco's enterprise data to a new STaaS platform, and then secured a long-term contract to manage and operate the platform. This significant program blended the best of NetApp’s solutions and services capabilities to enable this telco’s successful adoption of private cloud storage and launchi...
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm. In their Day 3 Keynote at 20th Cloud Expo, Chris Brown, a Solutions Marketing Manager at Nutanix, and Mark Lav...
DevOps at Cloud Expo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to w...
SYS-CON Events announced today that Cloud Academy has been named “Bronze Sponsor” of SYS-CON's 21st International Cloud Expo®, which will take place on Oct. 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Cloud Academy is the leading technology training platform for enterprise multi-cloud infrastructure. Cloud Academy is trusted by leading companies to deliver continuous learning solutions across Amazon Web Services, Microsoft Azure, Google Cloud Platform, and the most...
The last two years has seen discussions about cloud computing evolve from the public / private / hybrid split to the reality that most enterprises will be creating a complex, multi-cloud strategy. Companies are wary of committing all of their resources to a single cloud, and instead are choosing to spread the risk – and the benefits – of cloud computing across multiple providers and internal infrastructures, as they follow their business needs. Will this approach be successful? How large is the ...
Many organizations adopt DevOps to reduce cycle times and deliver software faster; some take on DevOps to drive higher quality and better end-user experience; others look to DevOps for a clearer line-of-sight to customers to drive better business impacts. In truth, these three foundations go together. In this power panel at @DevOpsSummit 21st Cloud Expo, moderated by DevOps Conference Co-Chair Andi Mann, industry experts will discuss how leading organizations build application success from all...
DevSecOps – a trend around transformation in process, people and technology – is about breaking down silos and waste along the software development lifecycle and using agile methodologies, automation and insights to help get apps to market faster. This leads to higher quality apps, greater trust in organizations, less organizational friction, and ultimately a five-star customer experience. These apps are the new competitive currency in this digital economy and they’re powered by data. Without ...
A common misconception about the cloud is that one size fits all. Companies expecting to run all of their operations using one cloud solution or service must realize that doing so is akin to forcing the totality of their business functionality into a straightjacket. Unlocking the full potential of the cloud means embracing the multi-cloud future where businesses use their own cloud, and/or clouds from different vendors, to support separate functions or product groups. There is no single cloud so...
For most organizations, the move to hybrid cloud is now a question of when, not if. Fully 82% of enterprises plan to have a hybrid cloud strategy this year, according to Infoholic Research. The worldwide hybrid cloud computing market is expected to grow about 34% annually over the next five years, reaching $241.13 billion by 2022. Companies are embracing hybrid cloud because of the many advantages it offers compared to relying on a single provider for all of their cloud needs. Hybrid offers bala...
With the modern notion of digital transformation, enterprises are chipping away at the fundamental organizational and operational structures that have been with us since the nineteenth century or earlier. One remarkable casualty: the business process. Business processes have become so ingrained in how we envision large organizations operating and the roles people play within them that relegating them to the scrap heap is almost unimaginable, and unquestionably transformative. In the Digital ...
These days, APIs have become an integral part of the digital transformation journey for all enterprises. Every digital innovation story is connected to APIs . But have you ever pondered over to know what are the source of these APIs? Let me explain - APIs sources can be varied, internal or external, solving different purposes, but mostly categorized into the following two categories. Data lakes is a term used to represent disconnected but relevant data that are used by various business units wit...