open
  1 (866) 447-2526 Resources Events Blog

Can Not See the Forest for the Trees (Part 4)

Blog

Can Not See the Forest for the Trees (Part 4)


 

This article is part of a 5 part series covering some of ways to deal with the top challenges for IT Operations and how machine learning techniques can be applied to address them.

 

  • Having Trouble Finding The Root Cause
  • Stuck In Reactive Firefighting Mode?
  • Not Sure What is Important
  • Can not See the Forest for the Trees
  • Overwhelmed with false alarms?

Part 4 of a 5 part series

-  -  -  -

How does one distinguish and recognize high-level situations and what’s really going on? A use case that demonstrates this is where an organization has 600,000 events per hour across 40,000 severs. These generate in the range of 47,000 help desk tickets, which results in approximately 2000 Level-2 escalations per year. In other words, this is 66 Level-2 escalations per day. That is a significant amount of escalations to deal with.

A typical Level-1 enterprise monitoring experience consists of the following:

There are different applications, different tools and many alerts indicating where the issue is and what to focus on. While this provides a lot of data, nevertheless it is not clear what is going on, and what are these dots referring to.  How are the results correlated? Are they just repeated alerts? What is going on?

Machine Learning can help in this situation, with a Clustering technique. In clustering, there are two fundamentally different approaches: Bottom-up Clustering, and Top-down Clustering.

Bottom-up Clustering

The first approach is Bottom-up Clustering. In this approach, the algorithm examines all the data while trying to group them into reasonable chunks. Once similar events are found, they are grouped together. The procedure is repeated until all the remaining grouped events are too different from each other. Finally, a common description that explains the significance of this chunk of data is assigned.

One of the advantages of the Bottom-up Clustering approach is that this is completely unsupervised. That means that it doesn’t need any human intervention. The algorithm runs on this data and automatically extracts interesting groups of data.

Top-down Clustering

The other approach is Top-down Clustering. This approach is based on the notion that the operators already know what might happen in the system, and they can try to match events to correspond to this. For example, in a manual deployment, the operators might expect some changes to take place in the system, as well as some alerts to appear.

This requires some human intervention, and some rules to be specified or some other templates to be applied. 

For instance, at 8 am a manual server migration began that caused a couple of alerts. By using clustering, this group of alerts can be identified. First, they are aggregated by a specific application layer using Bottom-up Clustering. Next, these clusters can also be correlated using a Top-down approach. By combining both clustering approaches, it is evident that these alerts came from the same action in the IT system – the server migration

Then a few hours later, someone implements a manual change request, after the server was migrated, again generating alerts. Finally, when a new version of the application is deployed, many new alerts appear. Just looking at a corner of specific dots (on the diagram), one would have no idea about what's going on. 

Instead, one needs to group these events into meaningful chunks to get a good idea of what happened.

That would result in something looking like this:

The server was migrated, some changes were implemented and the new version was deployed. Using Machine Learning techniques, one gets much better insight into what happened to the system.

Instead of just looking at the trees, you can see the forest.

See Evolven in action!
Unlock the power of actual changes. Register now for a live demo.

About the Author
Bostjan Kaluza, PhD

Boštjan Kaluža is the Chief Data Scientist at Evolven. He's also a hardcore researcher who's done a lot of research into artificial intelligence and intelligent systems, machine learning, predictive analytics and anomaly detection. Prior to Evolven, Boštjan served as a senior researcher in the Department of Intelligent Systems at the Jozef Stefan Institute, the leading Slovenian scientific research institution and led research projects involving pattern and anomaly detection, machine learning and predictive analytics.

 

Focusing on the detection of suspicious behavior and data analysis, Boštjan has published numerous articles in professional journals and delivered conference papers. In 2013, Boštjan published his first book on data science, Instant Weka How-to, exploring how to leverage machine learning using Weka. Boštjan is now working on his second book Practical Machine Learning in Java, scheduled to be published later this year. Boštjan is also the author and contributor to a number of patents in the areas of anomaly detection and pattern recognition.

 

Boštjan earned his PhD at Jožef Stefan International Postgraduate School in Ljubljana, Slovenia, rigorously defending a doctoral dissertation entitled Detection of Anomalous and Suspicious Behavior Patterns.