Pattern Recognition using K-NN Algorithm

- Pentaho

Introduction

k-Nearest Neighbours is a non-parametric classification algorithm. It can be used both for classification and regression. It is essentially a method of classification of the most similar points in the data used for training a model. kNN works best on small data sets that do not have many features. kNN assumes that similar things exist in close proximity. kNN plots the data points on a graph and gauges the similarity between them based on their distance. The performance of kNN is influenced by:

  • The distance metric used to locate the nearest neighbours.
  • The distance rule used to derive classifications.
  • The number of neighbours used to classify new samples (k values)

Concepts and Application of kNN

The kNN algorithm works on the basis of minimum distance from the query instance to the training samples to determine the K-nearest neighbours. In other words, if you are similar to your neighbours, then you are one of them. For example, if apple looks more similar to banana, orange, and melon (fruits) than donkey, dog and cat (animals), then most likely apple is a fruit.

Consider the image shown above, there are two classes in the data i.e, the blue square and red triangle. The green dot needs to be sorted out either in the mentioned two classes. K-NN sorts this issue by finding the nearest neighbors and classifying the data into the appropriate class. In this example, the green dot will mostly be classified as red triangle.

Applications/Use-Cases:

K-Nearest Neighbor techniques for Pattern Recognition are often used for theft prevention in the modern retail business. You’re accustomed to seeing CCTV cameras around almost every store you visit, you might imagine that there is someone in the back room monitoring these cameras for suspicious activity, and perhaps that is how things were done in the past. But today, a modern surveillance system is intelligent enough to analyze and interpret video data on its own, without a need for human assistance.

K-Nearest Neighbor is also used in Retail to detect patterns in credit card usage. Many new transaction-scrutinizing software applications use kNN algorithms to analyze register data and spot unusual patterns that indicate suspicious activity. For example, if register data indicates that a lot of customer information is being entered manually rather than through automated scanning and swiping, this could indicate that the employee who’s using that register is in fact stealing customer’s personal information. Or if register data indicates that a particular good is being returned or exchanged multiple times, this could indicate that employees are misusing the return policy or trying to make money from doing fake returns.

Other Application’s of K-NN:

  • Concept Searching
  • Recommender Systems

III. Conclusion

The K-Nearest Neighbors (KNN) algorithm is a simple, supervised machine learning algorithm that can be used to solve classification problems. It’s easy to implement and understand, but has a major drawback of becoming significantly slow as the size of the data grows.