Identify Patterns in data using Association Rules

- Pentaho

Introduction

Databases usually contain a large amount of data. Deriving insights from the data is an essential task and an important component of all data science tasks. Association rule mining is one of the several methods used for determining interesting relationships between attributes.

Association Rule Mining

Association rule mining is a procedure which is used to find frequent patterns, correlations, associations or causal structures from data sets in various databases and other data repositories. Market basket analysis is an example of association rule mining. The placement of related products in a store paves the way for a customer to purchase them; for example, if a customer buys bread, he/she may look for related products like jam or cheese. Therefore, related products are placed together in a store so that it reduces the time taken for buying and generates relationships between products.

The following is a list of concepts that are integral to association rule mining:

Support – Fraction of transactions containing an item set.

Confidence – Probability of occurrence of {Y} when {X} is present.

Lift – Lift is given by the confidence divided by the expected confidence, assuming that {X} and {Y} are independent of each other. The expected confidence is given by the confidence divided by the frequency of {Y}.

How is rule mining done?

First, items that are frequently bought together are identified by setting a threshold (minimum support value). The item sets are counted individually and the item sets that satisfy the threshold are isolated. Second, binary partition of the item sets is performed and the item sets that do not satisfy the threshold are pruned; this step is repeated until only item sets that satisfy the threshold remain. This procedure produces a set of association rules, which satisfy both the minimum support and minimum confidence conditions.

Conclusion

Association rule mining can help marketers organise products in stores in such a way that related items are placed near each other and help reduce waiting time for customers. It can also help in prioritising stocking of products that are most frequently bought together.