Algorithm of the Week: Naive Bayes
In machine learning, Naïve Bayes represents a supervised learning method for classification problems. Given a set of attribute values, a Naïve Bayes classifier is able to predict a distribution over a set of outcomes, rather than an exact outcome. This can be used as a degree of certainty, that is, how sure the classifier is in its prediction. As opposed to the nearest neighbor approach, Bayesian methods are called eager learners. When given a training set, eager learners immediately analyze the data and build a model.
Naïve Bayes classifier is a part of a family of probabilistic classifiers based on applying Thomas Bayes'theorem naively assuming that the features are independent. This also assumes an underlying probabilistic model that allows you to capture uncertainty about the model in a principled way by determining probabilities of the outcomes by simply multiplying conditional probabilities, instead of applying a chain rule. Hence, every feature gets a say in determining which label should be assigned to a given input value.
Using Bayesian probability terminology, the equation can be described as:
Input: A dataset of feature vectors and corresponding outcomes.
Output: The contribution from each feature is then combined with the prior probability, to arrive at a likelihood estimate for each label. Naïve Bayes classifier assumes that the value of a particular feature is independent of the value of any other feature, given the class variable. For example, a vehicle may be considered to be a car if it has four wheels, five doors, and weighs about 3000 pounds. A Naïve Bayes classifier considers each of these features as independent contributions to the probability that this vehicle is a car, regardless of any possible correlations between the number of wheels, doors, and the vehicle weight.
Naïve Bayes is a conditional probability model. Given a problem instance to be classified, it is represented by a vector,
This represents some n features; it assigns to this instance probabilities for each of k possible outcomes or classes.
Combining prior knowledge and observed data, the Bayesian classification provides practical learning algorithms, and offers a useful perspective for understanding and evaluating many learning algorithms. It calculates explicit probabilities for hypothesis and is robust for dealing with noise in input data.
The Naïve Bayes algorithm is simple and effective and should be one of the first methods to try on a (supervised) classification problem, solving diagnostic and predictive problems like:
- Medical diagnoses
- Spam recognition (spam Filtering)
- Choosing web pages which might be interesting for a particular user
- Studying genome characteristics
- Churn prediction in marketing
- IT environment stability
Applying Naïve Bayes to IT Operations Analytics (ITOA)
The Naïve Bayes classifier is a general-purpose, simple to implement algorithm that works well for most applications. It improves the probability of the outcome by looking at prior likelihood and current evidence data, then estimates the likelihood of an outcome. The Naïve Bayes classifier approach can be applied to the large data stores routinely collected in day to-day IT operations, and enable IT Operations Analytics tools to check whether an event is expected or not.
IT specialists need to see the big picture and understand, for example, how an unexpected change on a system component could impact stability. For example, the usage monitor tool regularly triggers an alert at 15 minute intervals. So, when a new alert occurs, you look at various alert properties such as:
- affected component (CPU, network, disk, …)
- criticality (warning, critical, …)
- the number of alerts in last period (1 hr, 2 hrs, 12 hrs)
In addition, there is a database of alerts with known outcomes as would be perceived by an IT operator, for example, each alert can be considered either suspect or irrelevant.
Now suppose a new alert about high CPU usage appears. The Naïve Bayes classifier can then take into consideration the known outcomes for prior probability about high CPU usage. If this is something that happens frequently, and is irrelevant in most cases, it will be classified as expected, and not addressed. However, when an alert comes up that warns about low disk space, this is not expected. This contradicts the previous belief about the stable state of the system, driving IT operations to investigate further, helping to narrow in on the right area to investigate about an encroaching risk.