Welcome!

Microservices Expo Authors: Zakia Bouachraoui, Elizabeth White, Pat Romanski, Liz McMillan, Yeshim Deniz

Related Topics: @DevOpsSummit, Containers Expo Blog, @CloudExpo

@DevOpsSummit: Article

Make Sense of Errors and Logging By @Stackify | @DevOpsSummit [#DevOps]

While errors and logs are often instrumental to diagnosing application issues, getting the most out of them isn't easy

Three Ways to Make Sense of Errors & Logging
By Craig Ferril

Errors and log files are two of the most important tools a developer have to try and find the source of a problem.  If you're like most developers, your approach to capturing and utilizing errors and logs is fairly straightforward. You probably send log output to a file or a log aggregation product. You may notify on the occurrence of errors, either sending emails directly from your code, or via an error monitoring product.

What's lacking from these approaches is something a bit more holistic, comprehensive, and contextual. The trouble comes in two forms:

  • There's often way more noise than signal if you're solely relying on logs to track, isolate, and make sense out of your errors, especially if an error is being thrown over and over again, or if you're dealing with log files across numerous servers
  • If you're focused primarily on errors, either emailing on error occurrences or using an error monitoring program, that approach removes the relevant logs from the picture altogether, leaving you without the context you need to determine root cause.

In this article , I'll cover three ways you can make sense of your errors and logs together:

  1. Aggregation - If you're developing an application that runs on a single server, finding all of your logs isn't an issue for you.  But it's far more likely that you have applications hosted on multiple servers for purposes of availability, scalability and redundancy, making it more difficult to easily (and centrally) access errors and logging data. Tools exist to aggregate logs in various standard formats (assuming you have access ), which is a good step in the right direction, given the potential for numerous separate logging files, as well as log file rotation and retention issues. The right answer is to implement a solution that aggregates logs and errors with development in mind. That way, you can be sure you are collecting every piece of information necessary and have it presented in a way that's geared toward developers.
  2. Error De-duplication - While aggregation ensures that all of your logs and errors end up in a central location, that can lead to a lot of noise that hides the truly valuable insights that are hiding in your logs. Taking a step beyond simply aggregating log statements, toward deriving fast insights from your logs and errors, means implementing a strategy that de-duplicates errors and provides additional information anchored to each incident of the error, without forcing you to wade through an endless stream of error statements in a log. Treating individual errors as first-class items of interest, rather than just yet another line in the log file, gives you top-level visibility, enables you to configure effective notification and resolution strategies focused on a specific exception, and, with the right platform, gives you an anchor point for seeing only the log statements related to that error (rather than sifting through all log statements to find the ones that matter). This all adds up to a strategy that filters out all the noise and focuses your efforts in on just what you care about.
  3. Analysis - Even if you can aggregate your data and associate the error and logging data together, you still are left with a very long chronological list of stuff your application did (and didn't do - thus, the exceptions).  There are still several needs to be addressed before we can truly say we can make sense of this data set - issues like seeing the frequency of errors, tying exceptions on one server with methods and processes on another, being able to search quickly through this massive data set, and even just being able to quickly jump to a particular point in time - all of these, and more, need to be part of the solution to properly make sense of the data you have.

While errors and logs are often instrumental to diagnosing application issues, getting the most out of them isn't easy. If you're using a narrowly focused tool or rolling your own solution, it's likely you're either struggling to quickly get to the data you need when you need it, or you're trying to find a needle in a haystack (or, perhaps more apt, a needle in a needle stack). Creating effective error notifications, error de-duplication, log aggregation and analysis, and seamless correlation between errors and just the log statements that are relevant presents an especially difficult challenge. Getting it right requires tremendous custom development, a mix of custom development on top of a product that offers a partial solution, or possibly adopting multiple solutions that each only solve part of the problem. That is, of course, unless you use Stackify Smart Error and Log Management!

To get a more in-depth look at evolving your application troubleshooting, read the whitepaper 3 Steps to Evolve your Application Troubleshooting .

Photo Credit: Windell Oskay

More Stories By Stackify Blog

Stackify offers the only developers-friendly solution that fully integrates error and log management with application performance monitoring and management. Allowing you to easily isolate issues, identify what needs to be fixed quicker and focus your efforts – Support less, Code more. Stackify provides software developers, operations and support managers with an innovative cloud based solution that gives them DevOps insight and allows them to monitor, detect and resolve application issues before they affect the business to ensure a better end user experience. Start your free trial now stackify.com

Microservices Articles
When building large, cloud-based applications that operate at a high scale, it’s important to maintain a high availability and resilience to failures. In order to do that, you must be tolerant of failures, even in light of failures in other areas of your application. “Fly two mistakes high” is an old adage in the radio control airplane hobby. It means, fly high enough so that if you make a mistake, you can continue flying with room to still make mistakes. In his session at 18th Cloud Expo, Lee A...
In his general session at 19th Cloud Expo, Manish Dixit, VP of Product and Engineering at Dice, discussed how Dice leverages data insights and tools to help both tech professionals and recruiters better understand how skills relate to each other and which skills are in high demand using interactive visualizations and salary indicator tools to maximize earning potential. Manish Dixit is VP of Product and Engineering at Dice. As the leader of the Product, Engineering and Data Sciences team at D...
Lori MacVittie is a subject matter expert on emerging technology responsible for outbound evangelism across F5's entire product suite. MacVittie has extensive development and technical architecture experience in both high-tech and enterprise organizations, in addition to network and systems administration expertise. Prior to joining F5, MacVittie was an award-winning technology editor at Network Computing Magazine where she evaluated and tested application-focused technologies including app secu...
Containers and Kubernetes allow for code portability across on-premise VMs, bare metal, or multiple cloud provider environments. Yet, despite this portability promise, developers may include configuration and application definitions that constrain or even eliminate application portability. In this session we'll describe best practices for "configuration as code" in a Kubernetes environment. We will demonstrate how a properly constructed containerized app can be deployed to both Amazon and Azure ...
Modern software design has fundamentally changed how we manage applications, causing many to turn to containers as the new virtual machine for resource management. As container adoption grows beyond stateless applications to stateful workloads, the need for persistent storage is foundational - something customers routinely cite as a top pain point. In his session at @DevOpsSummit at 21st Cloud Expo, Bill Borsari, Head of Systems Engineering at Datera, explored how organizations can reap the bene...
Using new techniques of information modeling, indexing, and processing, new cloud-based systems can support cloud-based workloads previously not possible for high-throughput insurance, banking, and case-based applications. In his session at 18th Cloud Expo, John Newton, CTO, Founder and Chairman of Alfresco, described how to scale cloud-based content management repositories to store, manage, and retrieve billions of documents and related information with fast and linear scalability. He addresse...
The now mainstream platform changes stemming from the first Internet boom brought many changes but didn’t really change the basic relationship between servers and the applications running on them. In fact, that was sort of the point. In his session at 18th Cloud Expo, Gordon Haff, senior cloud strategy marketing and evangelism manager at Red Hat, will discuss how today’s workloads require a new model and a new platform for development and execution. The platform must handle a wide range of rec...
SYS-CON Events announced today that DatacenterDynamics has been named “Media Sponsor” of SYS-CON's 18th International Cloud Expo, which will take place on June 7–9, 2016, at the Javits Center in New York City, NY. DatacenterDynamics is a brand of DCD Group, a global B2B media and publishing company that develops products to help senior professionals in the world's most ICT dependent organizations make risk-based infrastructure and capacity decisions.
Discussions of cloud computing have evolved in recent years from a focus on specific types of cloud, to a world of hybrid cloud, and to a world dominated by the APIs that make today's multi-cloud environments and hybrid clouds possible. In this Power Panel at 17th Cloud Expo, moderated by Conference Chair Roger Strukhoff, panelists addressed the importance of customers being able to use the specific technologies they need, through environments and ecosystems that expose their APIs to make true ...
In his keynote at 19th Cloud Expo, Sheng Liang, co-founder and CEO of Rancher Labs, discussed the technological advances and new business opportunities created by the rapid adoption of containers. With the success of Amazon Web Services (AWS) and various open source technologies used to build private clouds, cloud computing has become an essential component of IT strategy. However, users continue to face challenges in implementing clouds, as older technologies evolve and newer ones like Docker c...