Welcome!

Microservices Expo Authors: Elizabeth White, Liz McMillan, Mehdi Daoudi, Pat Romanski, Flint Brenton

Related Topics: @BigDataExpo, Java IoT, Microservices Expo, Microsoft Cloud

@BigDataExpo: Article

Detecting Anomalies that Matter!

Like needles in a haystack

As Netuitive's Chief Data Scientist, I am fortunate to work closely with some of the worlds' largest banks, telcos, and eCommerce companies. Increasingly the executives that I speak with at these companies are no longer focused on just detecting application performance anomalies - they want to understand the impact this has on the business.  For example - "is the current slowdown in the payment service impacting sales?"

You can think of it as detecting IT operations anomalies that really matter - but this is easier said than done.

Like Needles in a Haystack
When it comes to IT analytics, there is a general notion that the more monitoring data you are able to consume, analyze, and correlate, the more accurate your results will be. Just pile all that infrastructure, application performance, and business metric data together and good things are bound to happen, right?

Larger organizations typically have access to voluminous data being generated from dozens of monitoring tools that are tracking thousands of infrastructure and application components.  At the same time, these companies often track hundreds of business metrics using a totally different set of tools.

The problem is that, collectively, these monitoring tools do not communicate with each other.  Not only is it hard to get holistic visibility into the performance and health of a particular business service, it's even harder to discover complex anomalies that have business impact.

Anomalies are Like Snowflakes
Compounding the challenge is the fact that no two anomalies are alike.  Anomalies that matter have multiple facets.  They reflect a composite behavior of many layers of interacting and inter-dependent components.  Additionally, they can be cleverly disguised or hidden in a haze of visible but insignificant noise.  No matter how many graphs and charts you display on the largest LCD monitor you can find - the type of scalable real-time analysis required to find and expose what's important is humanly impossible.

Enter IT Operations Analytics
Analytics such as statistical machine learning allow us to understand the "normal" behavior of each resource we are tracking - be it a single IT component, web service, application, or business process. Additional algorithms help us find patterns and correlations between the thousands of IT and business metrics that matter in a critical service.

The Shift Towards IT Operations Analytics is Already Happening
This is not about the future.  It's about what companies are doing today.

Several years ago thought-leading enterprises (primarily large banks with critical revenue driving services) began experimenting with a new breed of IT analytics platform. These companies' electronic and web facing businesses had so much revenue (and reputation) at stake that they needed to find the anomalies that matter -- the ones that were truly indicative of current or impending problems.

Starting with an almost "blank slate", these forward-thinking companies began developing open IT analytics platforms that easily integrated any type of data source in real time to provide a comprehensive view of patterns and relationships between IT infrastructure and business service performance. This was only possible with technologies that leveraged sophisticated data integration, knowledge modeling, and analytics to discover and capture the unique behavior of complex business services.  Anything less would fail, because, like snowflakes, no two anomalies are alike.

The Continuous Need for Algorithm Research
The online banking system at one bank is different than the online system at the next bank.  And the transaction slowdown that occurred last week may have a totally different root cause than the one two months ago.  Even more interesting are external factors such as seasonality and its effects on demand.  For example, payment companies see increased workload around holidays such as Thanksgiving and Mother's Day whereas gaming/betting companies' demand is driven more by factors such as the NFL Playoffs or the World Series.

For this reason, analytics research is an ongoing endeavor at Netuitive - part driven by customer needs and in part by advances in technology.   Once Netuitive technology is installed in an enterprise and integrating data collected across multiple layers in the service stack, behavior learning begins immediately.  As time passes, the statistical algorithms have more observations to feed their results and this leads to increasing confidence in both anomalies detected and proactive forecasts.  Additionally, customer domain knowledge can be layered in to Netuitive's real-time analysis in the form of knowledge bases and supervised learning algorithms.  The Research Group at Netuitive works closely with our Professional Services Group as well as directly with customers to regularly review actual delivered alarm quality to tune the algorithms that we have as well as identify new algorithms that would deliver greater value in an actionable timeframe.

Since Netuitive's software architecture allows for "pluggable" algorithms, we can incrementally introduce new analytics capabilities easily, at first in an experimental or laboratory setting and ultimately, once verified, into production.

The IT operations management market has matured over the past two decades to the point that most critical components are well instrumented.  The data is there and mainstream IT organizations (not just visionary early adopters) realize that analytics deliver measurable and tangible value.   My vision and challenge is to get our platform to the point where customers can easily customize the algorithms on their own, as their needs and IT infrastructure evolve over time.  This is where platforms need to get to because of the endless variety of ways that enterprises must discover and remediate "anomalies that matter".

Stay tuned.  In an upcoming blog I will drill down on some specific industry examples of algorithms we developed as part of some large enterprise IT analytic platform solutions.

More Stories By Elizabeth A. Nichols, Ph.D

As Chief Data Scientist for Netuitive, Elizabeth A. Nichols, Ph.D. leads development of algorithms, models, and analytics. This includes both enriching the company’s current portfolio as well as developing new analytics to support current and emerging technologies and IT-dependent business services across multiple industry sectors.

Previously, Dr. Nichols co-founded PlexLogic, a provider of open analytics services for quantitative data analysis, risk modeling and data visualization. In her role as CTO and Chief Data Scientist, she developed a cloud platform for collecting, cleansing and correlating data from heterogeneous sources, computing metrics, applying algorithms and models, and visualizing results. Prior to Plexlogic, Dr. Nichols co-founded and served as CTO for ClearPoint Metrics, a security metrics software platform that was eventually sold to nCircle. Prior to ClearPoint Metrics, Dr. Nichols served in technical advisory and leadership positions at CA, Legent Corp, BladeLogic, and Digital Analysis Corp. At CA, she was VP of Research and Development and Lead Architect for agent instrumentation and analytics for CA Unicenter. After receiving a Ph.D. in Mathematics from Duke University, she began her career as an operations research analyst developing war gaming models for the US Army.

@MicroservicesExpo Stories
Today most companies are adopting or evaluating container technology - Docker in particular - to speed up application deployment, drive down cost, ease management and make application delivery more flexible overall. As with most new architectures, this dream takes significant work to become a reality. Even when you do get your application componentized enough and packaged properly, there are still challenges for DevOps teams to making the shift to continuous delivery and achieving that reducti...
Enterprises are moving to the cloud faster than most of us in security expected. CIOs are going from 0 to 100 in cloud adoption and leaving security teams in the dust. Once cloud is part of an enterprise stack, it’s unclear who has responsibility for the protection of applications, services, and data. When cloud breaches occur, whether active compromise or a publicly accessible database, the blame must fall on both service providers and users. In his session at 21st Cloud Expo, Ben Johnson, C...
Most of the time there is a lot of work involved to move to the cloud, and most of that isn't really related to AWS or Azure or Google Cloud. Before we talk about public cloud vendors and DevOps tools, there are usually several technical and non-technical challenges that are connected to it and that every company needs to solve to move to the cloud. In his session at 21st Cloud Expo, Stefano Bellasio, CEO and founder of Cloud Academy Inc., will discuss what the tools, disciplines, and cultural...
21st International Cloud Expo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Me...
With the rise of DevOps, containers are at the brink of becoming a pervasive technology in Enterprise IT to accelerate application delivery for the business. When it comes to adopting containers in the enterprise, security is the highest adoption barrier. Is your organization ready to address the security risks with containers for your DevOps environment? In his session at @DevOpsSummit at 21st Cloud Expo, Chris Van Tuin, Chief Technologist, NA West at Red Hat, will discuss: The top security r...
‘Trend’ is a pretty common business term, but its definition tends to vary by industry. In performance monitoring, trend, or trend shift, is a key metric that is used to indicate change. Change is inevitable. Today’s websites must frequently update and change to keep up with competition and attract new users, but such changes can have a negative impact on the user experience if not managed properly. The dynamic nature of the Internet makes it necessary to constantly monitor different metrics. O...
Agile has finally jumped the technology shark, expanding outside the software world. Enterprises are now increasingly adopting Agile practices across their organizations in order to successfully navigate the disruptive waters that threaten to drown them. In our quest for establishing change as a core competency in our organizations, this business-centric notion of Agile is an essential component of Agile Digital Transformation. In the years since the publication of the Agile Manifesto, the conn...
The nature of the technology business is forward-thinking. It focuses on the future and what’s coming next. Innovations and creativity in our world of software development strive to improve the status quo and increase customer satisfaction through speed and increased connectivity. Yet, while it's exciting to see enterprises embrace new ways of thinking and advance their processes with cutting edge technology, it rarely happens rapidly or even simultaneously across all industries.
Many organizations are now looking to DevOps maturity models to gauge their DevOps adoption and compare their maturity to their peers. However, as enterprise organizations rush to adopt DevOps, moving past experimentation to embrace it at scale, they are in danger of falling into the trap that they have fallen into time and time again. Unfortunately, we've seen this movie before, and we know how it ends: badly.
These days, APIs have become an integral part of the digital transformation journey for all enterprises. Every digital innovation story is connected to APIs . But have you ever pondered over to know what are the source of these APIs? Let me explain - APIs sources can be varied, internal or external, solving different purposes, but mostly categorized into the following two categories. Data lakes is a term used to represent disconnected but relevant data that are used by various business units wit...
There is a huge demand for responsive, real-time mobile and web experiences, but current architectural patterns do not easily accommodate applications that respond to events in real time. Common solutions using message queues or HTTP long-polling quickly lead to resiliency, scalability and development velocity challenges. In his session at 21st Cloud Expo, Ryland Degnan, a Senior Software Engineer on the Netflix Edge Platform team, will discuss how by leveraging a reactive stream-based protocol,...
Many organizations adopt DevOps to reduce cycle times and deliver software faster; some take on DevOps to drive higher quality and better end-user experience; others look to DevOps for a clearer line-of-sight to customers to drive better business impacts. In truth, these three foundations go together. In this power panel at @DevOpsSummit 21st Cloud Expo, moderated by DevOps Conference Co-Chair Andi Mann, industry experts will discuss how leading organizations build application success from all...
The last two years has seen discussions about cloud computing evolve from the public / private / hybrid split to the reality that most enterprises will be creating a complex, multi-cloud strategy. Companies are wary of committing all of their resources to a single cloud, and instead are choosing to spread the risk – and the benefits – of cloud computing across multiple providers and internal infrastructures, as they follow their business needs. Will this approach be successful? How large is the ...
You know you need the cloud, but you’re hesitant to simply dump everything at Amazon since you know that not all workloads are suitable for cloud. You know that you want the kind of ease of use and scalability that you get with public cloud, but your applications are architected in a way that makes the public cloud a non-starter. You’re looking at private cloud solutions based on hyperconverged infrastructure, but you’re concerned with the limits inherent in those technologies.
"NetApp's vision is how we help organizations manage data - delivering the right data in the right place, in the right time, to the people who need it, and doing it agnostic to what the platform is," explained Josh Atwell, Developer Advocate for NetApp, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
The “Digital Era” is forcing us to engage with new methods to build, operate and maintain applications. This transformation also implies an evolution to more and more intelligent applications to better engage with the customers, while creating significant market differentiators. In both cases, the cloud has become a key enabler to embrace this digital revolution. So, moving to the cloud is no longer the question; the new questions are HOW and WHEN. To make this equation even more complex, most ...
One of the biggest challenges with adopting a DevOps mentality is: new applications are easily adapted to cloud-native, microservice-based, or containerized architectures - they can be built for them - but old applications need complex refactoring. On the other hand, these new technologies can require relearning or adapting new, oftentimes more complex, methodologies and tools to be ready for production. In his general session at @DevOpsSummit at 20th Cloud Expo, Chris Brown, Solutions Marketi...
Leading companies, from the Global Fortune 500 to the smallest companies, are adopting hybrid cloud as the path to business advantage. Hybrid cloud depends on cloud services and on-premises infrastructure working in unison. Successful implementations require new levels of data mobility, enabled by an automated and seamless flow across on-premises and cloud resources. In his general session at 21st Cloud Expo, Greg Tevis, an IBM Storage Software Technical Strategist and Customer Solution Architec...
Today companies are looking to achieve cloud-first digital agility to reduce time-to-market, optimize utilization of resources, and rapidly deliver disruptive business solutions. However, leveraging the benefits of cloud deployments can be complicated for companies with extensive legacy computing environments. In his session at 21st Cloud Expo, Craig Sproule, founder and CEO of Metavine, will outline the challenges enterprises face in migrating legacy solutions to the cloud. He will also prese...
DevOps at Cloud Expo – being held October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA – announces that its Call for Papers is open. Born out of proven success in agile development, cloud computing, and process automation, DevOps is a macro trend you cannot afford to miss. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the world's largest enterprises – and delivering real r...