Welcome!

Microservices Expo Authors: John Rauser, Liz McMillan, Madhavan Krishnan, VP, Cloud Solutions, Virtusa, Jason Bloomberg, Pat Romanski

Related Topics: Microservices Expo, Java IoT

Microservices Expo: Article

Genesis of a Genetic Algorithm

Understanding GAs from business to implementation

Have you ever wondered how a good idea is transformed into business value? Have you ever thought how does someone take an abstract idea and produce business value from apparent nothingness? Have you ever postulated what can you do to leverage the assets at your disposal for the greater good? If so, than sit tight because you are about to take a journey that will equip your inquisitiveness with the step by step actions that FireScope took when it transformed its corporate wide historical metrics from passive data asset into the next level of business intelligence.

With unique insight, FireScope is able to identify potentially hidden relationships from among your IT assets and reveal cause-and-effect metrics which you might not have even known existed. This wizardry is accomplished using a distinctive genetic algorithm solution and yet is as simple to execute as applying a few mouse clicks. In this article we will introduce the business idea that triggered subsequent investigations, which lead to initial analysis, and ultimately lead to the completion of FireScope's genetic algorithm implementation. This journey will cover several topics such as business proposition, data normalization, and genetic modeling. But at the end of the journey you'll recognize that the combination of metric collection and data comparison delivered by FireScope Inc. is unmatched in the IT industry.

But do take caution, because the later portion of this article is presented from the perspective that you already possess a basic understanding of what a genetic algorithm is, as well as some of the concepts that are used in modeling a genetic algorithm. Fear not if you are not yet there, as you can still garner significant benefit from this article without understanding the nuts and bolts of it. Let the journey begin.

FireScope from 30,000 feet
As background, a functioning FireScope deployment has the ability to gather metrics from all forms of existing IT assets, normalize the gathered metrics, provide historical analysis of the metrics, and most importantly provide service views for worldwide operations which are unparalleled in the IT industry.

FireScope collects a vast array of corporate wide metrics. Some examples are CPU utilization, disk storage, host temperature, interface traffic, and memory utilization. Other examples include database metrics, JMX metrics, NetApp metrics, VMware metrics, web server response time, and so many more that we've neither the time nor the space to mention them all. It is also important to understand that every metric gathered by FireScope is collected on its own schedule. Even two or more metrics that have the same collection interval do not collect at exactly the same instant. So with this universe of data the natural question became what can we do to leverage this asset and deliver the next level of business intelligence?

Revealing hidden secrets
If you've been around the IT world long enough than you've likely experienced a time when an update was made to a web-server which invoked new or existing services from an application server which in turn caused dead locks on your database server. Unfortunately, the dead locks did not occur in test because they were load related and as a result weren't uncovered until your public facing application was placed under heavy load on Cyber Monday. Don't worry you didn't need those sales anyway!

Now to be fair, anyone who consistently monitors their IT infrastructure can identify that their web server is not performing as expected. Furthermore, if you know all of the relationships between your web servers, application servers, and database servers, you can even set up static alerts that draw your attention to the notion that one layer of your business is impacting another layer. But where the story gets really interesting is if you fall into one or more of the following categories:

  1. The alerting values that you set are either too high or too low.
  2. You have not taken the time to set up alerts for related IT assets.
  3. You are not aware of the relationships between your IT assets.
  4. You do not properly monitor your IT assets

I hope you can see that this is a very complex world that we live in. To detect a catastrophic corporate shutdown you need tools that can search out the fact that independent metrics collected from independent servers are impacting one another. This is exactly the idea that FireScope sought to solve when it postulated the use of a genetic algorithm to provide optimal search heuristics and uncover hidden relationships in the very same metrics that it had already collected from your IT assets. But not so fast, in order to accomplish this goal we need the ability to compare disparate metrics.

Not all time is created equal
As noted above, FireScope collects a universe of metrics from a universe of IT assets and each is collected on its own schedule. As an example consider CPU utilization taken from two different hosts both being polled on 30 second intervals. Since both are polled on their own schedule we really couldn't say that the two metrics had similar signatures if our metric collection times were not equal. Yet another problem arises when you consider comparing two metrics that have different collection intervals such as CPU utilization collected every 30 seconds on one asset vs. CPU Utilization collected every 5 minutes on another asset. How can these be easily compared?

FireScope needed the ability to easily normalize the time domain from all collected metrics in order to fairly and accurately compare metrics collected on independent schedules. As it turns out, FireScope already trends all metrics that have numeric representations. You can think of trending as averaging over time. But for the purposes of FireScope's genetic algorithm, the trending operation also contributed an effective normalization in the time domain for all metrics having a numeric representation. So problem one is solved, because all numeric metrics are trended every hour starting on the hour.

Not all units have the same value
Let's trade one for one. I'll give you a nickel for every dollar you give me. Sound fair? Yea, I didn't think you would go for that either, but that thought does bring to light the next challenge of comparing metrics having differing units. The problem becomes even more challenging when you consider that in some instances the units might not even be from the same domain! Consider the chart below which details just a few metric/unit pairings from your IT assets:

Metric

Units

Interface Traffic

Bytes/second

CPU Utilization

Percentage

Host temperature

Degrees F or C

Web Server Response Time

Seconds

How can "Degrees F" be compared to "Percent CPU Utilization"? FireScope needed the ability to compare differing metrics collected across your IT infrastructure each having potentially different metric/unit representations.

Well, what if we compared the relative rise/fall of collected values instead of comparing the values themselves? In doing so we wouldn't be comparing the values themselves, but the increase or decrease in metric value over time. In short, FireScope calculates the tangent, or rate of change, between hourly trends for all collected metrics as a pre-processing step for its genetic algorithm. As you'll see later, this series of data is used to form a "gene" or genetic sequence and subsequently used to compare one metric against another to determine how closely the two signatures rise or fall within the same time frame.

Action and reaction
If you're in IT, you will most likely care if a rise in web traffic caused a delayed rise in CPU activity on another system which in turn caused a general slowdown in your business response times.

The third level of value that FireScope delivers with its genetic algorithm solution is the ability to search out cause-and-effect metrics from within your IT assets. By applying a sliding window comparison of a selected metric against the searched metrics, FireScope can search out the possibility that deviations in one metric appear before or after a target metric anomaly.  In doing so, the notion that one metric caused another metric to deviate is displayed graphically by a simple time series display. Furthermore this cause and effect rendering can appear in either direction.

  1. The target metric may have deviated because it is impacted by some other metric
  2. The target metric deviation may have impacted some other metric in your system
  3. Or both of the above are true and your system has multiple cause-and-effect metrics

But once again you may not have even known that these metrics have cause-and-effect relationships or that the relationships exhibit delayed response signatures.

Starting the analysis
Let's assume that you fall into category 3 referenced above which implies that you are actively monitoring your IT assets, but you might not be aware of all of the relationships between all of your IT assets. Let's also assume that you are experiencing a slow-down in one of your important business operations, but other than the apparent slow-down you can't quite explain why this portion of your business is slow. You observe that your CPU load is higher than normal on one system which is displayed via a historical graph of the system CPU load. You start FireScope's analytics and select this same metric. You provide the analysis start time by selecting the data area just prior to the point in time where the CPU load started to rise. Next you select the analysis end time by marking the data area after the CPU returned to normal, or you select now because the CPU load is still high. As a last step you trigger the Genetic Algorithm analysis, asking FireScope to search out other metrics that exhibit a similar metric response during the same timeframe as the selected metric. The result of the analysis is the top 5 metrics that most closely match the signature of the selected metric. The value of the analysis is that the resulting metrics may very well contain metrics that either caused the spike in CPU load that is being evaluated, or these other metrics may be being impacted by the selected metric.

Components of a genetic algorithm
A genetic algorithm is an optimized search solution that attempts to mimic the process of natural evolution. Natural evolution uses "genes", "chromosomes", "genetic-mutation", "genetic-crossover" and multiple generations to produce improvements in nature. Of course sometimes natural evolution produces defects or mutations, but just as in nature, these apparent defects can turn out to be extremely valuable assets in the evolutionary process. Genetic algorithms attempt to simulate natural evolution concepts in software by expressing an optimized search solution using these very same natural evolution concepts.

The figure below illustrates FireScope's use of several genetic algorithm constructs and provides a brief synopsis of the genetic algorithm process.

GA building blocks, the "gene"
Let's use a bottom up approach and talk about genes first. As was mentioned above, FireScope compares the relative change over time of disparate metric values. This comparison is accomplished by first digitizing the comparison values into a representative alphabet where each character in the alphabet represents the change in slope, or tangent, between two sequential trend values. Since time is always increasing in this domain the only values that are relevant are values that fall between -π/2 (-90 degrees) to π/2 (90 degrees). FireScope divides this range into 90 distinct buckets each representing one letter of a 90 character alphabet. The digitization process allows for optimized comparison and also filters out small changes that are not significant enough to impact the comparison process. The series of alphabetic values representing digitized values from one metric are encoded into a gene starting from the earliest time slot under consideration to the latest.

GA building blocks, the "chromosome"
Moving up in our bottom up approach is the formation of multiple genes into a chromosome. FireScope's goal is to identify five metrics that are most closely related to a pre-selected target metric in a time range that is slightly wider than the selected metric time range. As a result, FireScope creates a chromosome from five genes of other metric trend values each of which were digitized into a gene. The resulting chromosome represents one possible solution from among millions of combinations of solutions. As the genetic algorithm progresses, it will search out from among millions of chromosomes five other chromosomes that best match the trend pattern of the target chromosome.

GA building blocks, population/fitness/scoring
The first selection of genes to form the initial chromosome population is purely a random selection from all existing numeric trend data. FireScope creates an initial population of several hundred chromosomes (possible solutions) and then applies a fitness algorithm to this population. The fitness algorithm assigns a numeric value which represents an assessment of how closely each chromosome in the population matches the target metric. Chromosomes that score the highest are chosen for mating to produce the next generation. The desired intent is that improved next generation chromosomes are the natural result of combining chromosomes from the best parents of the prior generation.

GA building blocks, mating
This process is sometimes referred to as genetic crossover, and is the process of selecting some genes (metrics) from each of two different parents to produce a new child chromosome. The new child chromosome is 5 genes made via crossover or mating from two high scoring parents from the prior generation. This chromosome, as with all others, represents one possible solution from among millions of possible solutions that might be the 5 metrics that most closely resemble the signature of the pre-selected target metric.

GA building blocks, "mutations"
After mating a small proportion of the new generation of chromosomes are chosen to be randomly mutated. FireScope uses the mutation process to inject previously unexplored genes (metrics) into the search process. If a chromosome is selected for mutation a random gene is replaced by a gene from a metric that has not yet been explored. This mutation process can have the effect of improving the overall score of the selected chromosome, or degrading it. However the randomness of this process has been shown to improve GA search capabilities just as mutation in nature sometime provides improvement though natural genetic evolution.

Cause-and effect
In comparing other metrics against a target metric, FireScope evaluates a longer timeframe than the time range selected by the user's target metric. The compared metric trend values are compared both prior to the target metric and after the target metric. By expanding the search window and sliding the target metric over the searched metric FireScope delivers the ability to detect if a searched metric may have caused the target metric to deviate from a normal signature, or if the target metric caused other metrics to deviate from their normal signature. This determination is accomplished by detecting the similarity of the target metric to the searched metric. This approach has nothing to do with genetic algorithms, but is simply a higher level value that is extracted from the FireScope's genetic algorithm implementation.

GA building blocks, "completion"
After several thousands of generations have been evaluated and the improvement of scoring has slowed to an acceptable level the genetic algorithm completes and the top scoring chromosome from the last generation is selected as the best solution. This chromosome contains 5 genes that represent the top scoring metrics which most closely match the target metric. Each gene (metric) from the top scoring chromosome can be displayed back to the user for further investigation, and each is displayed on the same graph as the selected target metric.

Conclusion
If this is your first exposure to genetic algorithm techniques, it can be overwhelming to try to understand all of this Genetic Algorithm terminology. Concepts such as genes, chromosomes, mutation, and fitness algorithms are difficult to conceptualize. While FireScope uses genetic algorithm techniques, it is important to understand that this approach is nothing more than achieving optimal search times to deliver the business value of revealing possibly unknown relationships between disparate metrics collected throughout your IT assets.

It is also instructive to realize that the real ingenuity in this approach is not in the application of the genetic algorithm, but rather in the normalization techniques that were applied to deliver the ability to compare disparate metrics. While the application of the genetic algorithm does provide optimized search results, these results could not have been achieved if it weren't for the initial work of normalizing the time domain, normalizing the value domain, and implementing the sliding window analysis that delivers the ability to uncover delayed cause-and-effect metrics hidden within your IT infrastructure.

FireScope's genetic algorithm coupled with your inquisitiveness form a near super human capability that exists nowhere else in the IT industry. As with all supernatural powers, you must use them wisely!

References

More Stories By Pete Whitney

Pete Whitney is a Solutions Architect for Cloudera. His primary role at Cloudera is guiding and assisting Cloudera's clients through successful adoption of Cloudera's Enterprise Data Hub and surrounding technologies.

Previously Pete served as VP of Cloud Development for FireScope Inc. In the advertising industry Pete designed and delivered DG Fastchannel’s internet-based advertising distribution architecture. Pete also excelled in other areas including design enhancements in robotic machine vision systems for FSI International Inc. These enhancements included mathematical changes for improved accuracy, improved speed, and automated calibration. He also designed a narrow spectrum light source, and a narrow spectrum band pass camera filter for controlled machine vision imaging.

Pete graduated Cum Laude from the University of Texas at Dallas, and holds a BS in Computer Science. Pete can be contacted via Email at [email protected]

@MicroservicesExpo Stories
While we understand Agile as a means to accelerate innovation, manage uncertainty and cope with ambiguity, many are inclined to think that it conflicts with the objectives of traditional engineering projects, such as building a highway, skyscraper or power plant. These are plan-driven and predictive projects that seek to avoid any uncertainty. This type of thinking, however, is short-sighted. Agile approaches are valuable in controlling uncertainty because they constrain the complexity that ste...
Agile has finally jumped the technology shark, expanding outside the software world. Enterprises are now increasingly adopting Agile practices across their organizations in order to successfully navigate the disruptive waters that threaten to drown them. In our quest for establishing change as a core competency in our organizations, this business-centric notion of Agile is an essential component of Agile Digital Transformation. In the years since the publication of the Agile Manifesto, the conn...
The cloud revolution in enterprises has very clearly crossed the phase of proof-of-concepts into a truly mainstream adoption. One of most popular enterprise-wide initiatives currently going on are “cloud migration” programs of some kind or another. Finding business value for these programs is not hard to fathom – they include hyperelasticity in infrastructure consumption, subscription based models, and agility derived from rapid speed of deployment of applications. These factors will continue to...
"This all sounds great. But it's just not realistic." This is what a group of five senior IT executives told me during a workshop I held not long ago. We were working through an exercise on the organizational characteristics necessary to successfully execute a digital transformation, and the group was doing their ‘readout.' The executives loved everything we discussed and agreed that if such an environment existed, it would make transformation much easier. They just didn't believe it was reali...
"Opsani helps the enterprise adopt containers, help them move their infrastructure into this modern world of DevOps, accelerate the delivery of new features into production, and really get them going on the container path," explained Ross Schibler, CEO of Opsani, and Peter Nickolov, CTO of Opsani, in this SYS-CON.tv interview at DevOps Summit at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"We're developing a software that is based on the cloud environment and we are providing those services to corporations and the general public," explained Seungmin Kim, CEO/CTO of SM Systems Inc., in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Enterprises are adopting Kubernetes to accelerate the development and the delivery of cloud-native applications. However, sharing a Kubernetes cluster between members of the same team can be challenging. And, sharing clusters across multiple teams is even harder. Kubernetes offers several constructs to help implement segmentation and isolation. However, these primitives can be complex to understand and apply. As a result, it’s becoming common for enterprises to end up with several clusters. Thi...
"Codigm is based on the cloud and we are here to explore marketing opportunities in America. Our mission is to make an ecosystem of the SW environment that anyone can understand, learn, teach, and develop the SW on the cloud," explained Sung Tae Ryu, CEO of Codigm, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"CA has been doing a lot of things in the area of DevOps. Now we have a complete set of tool sets in order to enable customers to go all the way from planning to development to testing down to release into the operations," explained Aruna Ravichandran, Vice President of Global Marketing and Strategy at CA Technologies, in this SYS-CON.tv interview at DevOps Summit at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
The nature of test environments is inherently temporary—you set up an environment, run through an automated test suite, and then tear down the environment. If you can reduce the cycle time for this process down to hours or minutes, then you may be able to cut your test environment budgets considerably. The impact of cloud adoption on test environments is a valuable advancement in both cost savings and agility. The on-demand model takes advantage of public cloud APIs requiring only payment for t...
Cavirin Systems has just announced C2, a SaaS offering designed to bring continuous security assessment and remediation to hybrid environments, containers, and data centers. Cavirin C2 is deployed within Amazon Web Services (AWS) and features a flexible licensing model for easy scalability and clear pay-as-you-go pricing. Although native to AWS, it also supports assessment and remediation of virtual or container instances within Microsoft Azure, Google Cloud Platform (GCP), or on-premise. By dr...
Let's do a visualization exercise. Imagine it's December 31, 2018, and you're ringing in the New Year with your friends and family. You think back on everything that you accomplished in the last year: your company's revenue is through the roof thanks to the success of your product, and you were promoted to Lead Developer. 2019 is poised to be an even bigger year for your company because you have the tools and insight to scale as quickly as demand requires. You're a happy human, and it's not just...
Many enterprise and government IT organizations are realizing the benefits of cloud computing by extending IT delivery and management processes across private and public cloud services. But they are often challenged with balancing the need for centralized cloud governance without stifling user-driven innovation. This strategy requires an approach that fundamentally reshapes how IT is delivered today, shifting the focus from infrastructure to services aggregation, and mixing and matching the bes...
identify the sources of event storms and performance anomalies will require automated, real-time root-cause analysis. I think Enterprise Management Associates said it well: “The data and metrics collected at instrumentation points across the application ecosystem are essential to performance monitoring and root cause analysis. However, analytics capable of transforming data and metrics into an application-focused report or dashboards are what separates actual application monitoring from relat...
The benefits of automation are well documented; it increases productivity, cuts cost and minimizes errors. It eliminates repetitive manual tasks, freeing us up to be more innovative. By that logic, surely, we should automate everything possible, right? So, is attempting to automate everything a sensible - even feasible - goal? In a word: no. Consider this your short guide as to what to automate and what not to automate.
DevOps teams have more on their plate than ever. As infrastructure needs grow, so does the time required to ensure that everything's running smoothly. This makes automation crucial - especially in the server and network monitoring world. Server monitoring tools can save teams time by automating server management and providing real-time performance updates. As budgets reset for the New Year, there is no better time to implement a new server monitoring tool (or re-evaluate your current solution)....
While some developers care passionately about how data centers and clouds are architected, for most, it is only the end result that matters. To the majority of companies, technology exists to solve a business problem, and only delivers value when it is solving that problem. 2017 brings the mainstream adoption of containers for production workloads. In his session at 21st Cloud Expo, Ben McCormack, VP of Operations at Evernote, discussed how data centers of the future will be managed, how the p...
We just came off of a review of a product that handles both containers and virtual machines in the same interface. Under the covers, implementation of containers defaults to LXC, though recently Docker support was added. When reading online, or searching for information, increasingly we see “Container Management” products listed as competitors to Docker, when in reality things like Rocket, LXC/LXD, and Virtualization are Dockers competitors. After doing some looking around, we have decided tha...
High-velocity engineering teams are applying not only continuous delivery processes, but also lessons in experimentation from established leaders like Amazon, Netflix, and Facebook. These companies have made experimentation a foundation for their release processes, allowing them to try out major feature releases and redesigns within smaller groups before making them broadly available. In his session at 21st Cloud Expo, Brian Lucas, Senior Staff Engineer at Optimizely, discussed how by using ne...
Digital transformation has changed the way users interact with the world, and the traditional healthcare experience no longer meets rising consumer expectations. Enterprise Health Clouds (EHCs) are designed to easily and securely deliver the smart and engaging digital health experience that patients expect today, while ensuring the compliance and data integration that care providers require. Jikku Venkat