1 (866) 447-2526 Resources Events Blog

Not Sure What is Important (Part 3)

Blog

Not Sure What is Important (Part 3)


 

This article is part of a 5 part series covering some of ways to deal with the top challenges for IT Operations and how machine learning techniques can be applied to address them.

 

  • Having Trouble Finding The Root Cause
  • Stuck In Reactive Firefighting Mode?
  • Not Sure What is Important
  • Can not See the Forest for the Trees
  • Overwhelmed with false alarms?

Part 3 of a 5 part series

-  -  -  -

This use case depicts the scenario where one is simply not sure what data is important.

This example looks at an average one minute of CPU consumption as shown in the next image.

A typical algorithm based on dynamic thresholds looks at the variation of the signal and sets a threshold that covers most of the variance. An example of such a transformation is shown in the next figure.

This figure shows that every couple of hours there is a spike in CPU consumption. While this may appear to be a cause for concern, this large spike is actually caused by an automated backup script. Yet, suddenly when there is an issue, this monitoring figure still appears exactly like all the other backups, even though there is in fact an incident.

Typical reporting helps set a dynamic threshold, and then when a value falls above this dynamic threshold, an alert is reported. Reporting such an alert every 12 hours will cause lots of alerts that no one actually cares about. The backup script was expected to run, causing the CPU to spike. In fact, it is even more interesting to know if the backup did not happen! Also, what is also important to know is whether there was actually a critical incident that caused the spike in CPU consumption.

Self-Adaptive Learning

Instead of relying upon dynamic thresholds, Machine Learning techniques like Self-Adaptive Learning can be applied to focus on the anomalies that could indicate an incident. This approach starts by using a learning period to establish a baseline for how the system functions, then classifies new behaviors. If a new behavior is similar to what was already seen, then everything is ok, and any spikes that appear are considered expected and part of the normal routine.

However, when something different happens, for example, average CPU memory usage goes from 20% to 80% or there is a spike over a longer period of time, then a different behavior is noted as compared to what was seen before. After a certain period of time this behavior change is considered permanent and the system can ‘learn’ it and understand that this is a new pattern in the expected behavior, and no alerts are delivered.

What is found to be unusual, deserving of an operator’s attention, are the unusual changes, showing how the resource is consumed.

By looking at specific KPIs, Self-Adaptive Learning is crucial for avoiding false alarms and for learning and determining what is typical system behavior, to let operators focus on what is truly important.

See Evolven in action!
Unlock the power of actual changes. Register now for a live demo.

About the Author
Bostjan Kaluza, PhD

Boštjan Kaluža is the Chief Data Scientist at Evolven. He's also a hardcore researcher who's done a lot of research into artificial intelligence and intelligent systems, machine learning, predictive analytics and anomaly detection. Prior to Evolven, Boštjan served as a senior researcher in the Department of Intelligent Systems at the Jozef Stefan Institute, the leading Slovenian scientific research institution and led research projects involving pattern and anomaly detection, machine learning and predictive analytics.

 

Focusing on the detection of suspicious behavior and data analysis, Boštjan has published numerous articles in professional journals and delivered conference papers. In 2013, Boštjan published his first book on data science, Instant Weka How-to, exploring how to leverage machine learning using Weka. Boštjan is now working on his second book Practical Machine Learning in Java, scheduled to be published later this year. Boštjan is also the author and contributor to a number of patents in the areas of anomaly detection and pattern recognition.

 

Boštjan earned his PhD at Jožef Stefan International Postgraduate School in Ljubljana, Slovenia, rigorously defending a doctoral dissertation entitled Detection of Anomalous and Suspicious Behavior Patterns.