1 (866) 447-2526 Resources Events Blog

Algorithm of the Week: Apriori

Blog

Algorithm of the Week: Apriori


 

Definition

Apriori algorithm is a classic algorithm used for frequent pattern mining and association rule learning over transactional databases such as collections of items bought by customers, or details of website frequentation. By identifying the frequent individual items in a database and extending them to larger and larger item sets, Apriori can determine the association rules, which highlight general trends about a database. The Apriori approach is usually used in "Market Basket Analysis" or MBA for short. Detecting changes in basket composition, size and value, can offer business owners new understanding of their customer buying behavior, providing insight into which products tend to be purchased together and which are most amenable to promotion.

Example database with 4 items and 5 transactions

ds alg apriori1

Implementation

Aprori algorithm constructs a set of patterns, where each pattern consists of subset of items, for example, pattern 1 = {Item A, Item B}. Each pattern is described with support, which counts the number of occurrences in the database. Intuitively, it tells us how many evidencessupport pattern. For example, pattern {milk, bread} in the example has support 2, while pattern {milk, bread, butter} has support only 1.

Apriori uses a "bottom up" approach, where frequent subsets are extended one item at a time, and works by eliminating most large sets as candidates by looking first at smaller sets and recognizing that a large set cannot be frequent unless all its subsets are. The algorithm terminates when no further successful extensions are found.

In this example, we have an extremely small transactional database, where we see the items and transactions. In actual applications, a rule needs a support threshold of several hundred transactions before it can be considered statistically significant, and datasets often contain thousands or millions of transactions. 

Input: Transaction Database, D; Item T, Minimum Support Smin
Output: Frequent Item sets, F

Basic Objects: 

  • Cj: candidate item sets of size j
  • Fj: frequent item sets of size j
Apriori (D,T,Smin) // Apriori Algorithm
  Begin    
 
  k :=1;                  
// Initialize the item set size
 
  Ck :=Ut∈D {{i}};
// Start with single candidate
 
  Fk :=prune (Ek,T,Smin)
// Determine whether item is frequent?
 
  While Fk≠ ø do begin
// Repeat till frequent item sets available
 
  Ck+1 :=candidate (Fk)     
// Create candidate with 1 item more
 
  Fk+1 :=prune (Ck+1,T,Smin)) 
// Determine whether frequent item sets
 
  k := k+1;                           
// Increment the item counter
 
  End                  
 
 
  Return
// Return the frequent item sets
  End    

Practical Examples

Apriori's frequent item set mining and association rule induction are powerful methods for finding regularities in a wide variety of areas:

  • Retail: Build a list of frequently purchased item pairs from tracking sales data.
  • Credit card transactions: Provide insight into the next product that customers are likely to purchase (based on credit card purchases, such as rental cars and hotel rooms).
  • Telecommunication service purchases: Determine how to bundle services together to maximize revenue based on the services purchased by telecommunications customers (call waiting, call forwarding, DSL, speed call, and so on).
  • Banking services: Launch different financial products and services to provide personalized customer service, making a bank's marketing policy more targeted.
  • Insurance claims: Find unusual combinations of insurance claims to show fraud and spark further investigations.
  • Medical patient histories: Provide indications of likely complications based on certain combinations of treatments.

Applying Apriori to IT Operations Analytics (ITOA)

Based on records of large number of transactions, the Apriori approach is well suited to be applied to the data routinely collected in day to-day IT operations, enabling IT Operations Analytics tools to detect frequent patterns, and identify critical changes. IT specialists need to see the big picture and understand, for example, how a problem on a database could impact an application server. 

For a specific day, IT operations may take in a variety of alerts presenting them in a transactional database. Using the Apriori algorithm, IT Operations Analytics tools can correlate and detect the frequent patterns of alerts appearing together. This can lead to better understanding how one component impacts another. 

With identified alert patterns, it is possible to apply predictive analytics. For example, a particular database server hosts a web application and suddenly an alert about a database is triggered. By looking into frequent patterns identified by Apriori, this means that the IT staff needs to take action before the web application is impacted. 

Apriori can also discover alert events originating from the same IT event. For example, every time a new user is added, six changes in the Windows operating systems are detected. Next, in Application portfolio management IT may face multiple alerts, showing that transactional time in a database is high. Should all these issues originate from the same source (such as getting hundreds of alerts about changes that are all due to a Windows update), this frequent pattern mining can help to quickly cut through number of alerts, allowing IT operations to focus on truly critical changes.

Your Turn
What algorithms have an impact on IT Operations Analytics?

About the Author
Bostjan Kaluza, PhD

Boštjan Kaluža is the Chief Data Scientist at Evolven. He's also a hardcore researcher who's done a lot of research into artificial intelligence and intelligent systems, machine learning, predictive analytics and anomaly detection. Prior to Evolven, Boštjan served as a senior researcher in the Department of Intelligent Systems at the Jozef Stefan Institute, the leading Slovenian scientific research institution and led research projects involving pattern and anomaly detection, machine learning and predictive analytics.

 

Focusing on the detection of suspicious behavior and data analysis, Boštjan has published numerous articles in professional journals and delivered conference papers. In 2013, Boštjan published his first book on data science, Instant Weka How-to, exploring how to leverage machine learning using Weka. Boštjan is now working on his second book Practical Machine Learning in Java, scheduled to be published later this year. Boštjan is also the author and contributor to a number of patents in the areas of anomaly detection and pattern recognition.

 

Boštjan earned his PhD at Jožef Stefan International Postgraduate School in Ljubljana, Slovenia, rigorously defending a doctoral dissertation entitled Detection of Anomalous and Suspicious Behavior Patterns.