K NN (K— Nearest Neighbours)

6 min readApr 5, 2022

KNN

KNN is distance based machine learning algorithm that works based on the distance metrics between the data which is highly interpretable, and this is considered as baseline model in machine learning projects. This article descirbes about idea of KNN as its variations.

Geometric Intuition

Consider 2 dimensional data with two labels , if we plot the data in a graph , the geometric problem statement is to find which class x_q belongs to, and the solution is to choose K number of data points that is nearer to x_q and take the majority vote of the neighbours labels, and declare x_q belongs to that category.

If there are highly mixed labels which cannot be seperated , then feature engineering can be done to get new features which can be more seperable.

Mixed labels can be identified by using Visualization , if it is hard to visualize , dimensionality reduction techniques can be employed.

Identifying an Outlier

Since KNN is distance based algorithm, if there is a outlier in the data, it will impact the model. For instance, if there is a positive point too far let say 100 units, and all the other data are 0 to 0.5 separated away, and if the query point x_q actually belongs to positive class, it will be declared as negative because of one outlier. It is good idea to remove outlier, below is one of the way.

Take each datapoint in the dataset
Compute its distance from its first neighbour and store all the distance
Sort these distance and take 99th percentile value

4. The points which are greater than 99th percentile value can be treated as outlier and can be removed.

Distance Measures

Most commonly used distance measures are

a) Euclidean

b) Manhattan

c) LP Norm / Minkowski Distance

d) Hamming Distance

e) Cosine Distance

Eucledian

Eucledian distance is calcualted as below and it is also called as the L2 norm of the vector.

Manhattan Distance

Manhattan distance between two points is nothing but absolute distance between two points which is also called as L1 Norm of the vector.

LP Norm / Minknowski Distance

This is a generic representation of the above two norms where if p=2 it will be Eucledian and p=1 it gives Manhattan distance

Hamming Distance

This is used to calculate the distance between the boolean points. Hamming distance basically gives the no. of locations where the binary values are different.

Sklearn KNN uses default metric as Minknowski.

Cosine Distance

The Cosine angle between two datapoints can be considered for considering nearest neighbour as they relatively capture similarity between them. For instance, in the below figure, x1 and x3 are similar vectors(collinear) where the angle is zero. Cos (0) =1, whereas between x1 and x2 the value of Cos(theta) will be less than 1.

In terms of dot product, Cos angle is defined as the ration of dot product with their L2 Norms and if x1 and x2 are unit vectors then it is simply the dot product of x1 and x2. So to know the similarity between two datapoints, just take the dot product , the high the value , more the similarity is.

xTime and Space Complexity

The inputs to the KNN algorithm is D_Train, K Value , and the query point Xq.

At run time, the given query point, iterate over the entire dataset to compute the distance between them and store the smallest one in array.

The time complexity is O(nd) where n is the number of datapoints and d is the dimension. We are considered ‘d’ , because there will be ‘d’ no. of subtrations to be done for finding distance between X_i and d_i

Limitations of KNN

KNN has large space and time complexity so the other versions of KNN such as KD tree which works on basis of binary search as well as LSH(Locality sensitive hashing) is used.

Hyperparameter — K

The K value in KNN is a hyperparamter which can be selected by doing k fold cross validation.

KNN in Regression

In Regression, given a query point x_q find the k nearest neighbours , and from those nearest neighbour , take mean of the labels.

Weighted KNN

Instead of considering just the neighbours, give weightage to the neighbours based on the distance from query point. The weight can be simply inverse of the distance.

In the below example , if we use a majority vote, we would have concluded that the query point belongs to the positive class. But actually if we observe the query point is very closer to the negative class points which can only be identified when we do weighted KNN.

K-D Tree

a) Plot the points on the axis

b) Pick an axis let say X axis and project the points on x-axis

c) Compute median and draw a plane that separates based on Median

d) Similarly do for Y-axis and alternate this activity till we get to leaf nodes

Based on the Planes we can write a simple if else decision condition for e.g. if point > 3, then look into that region and apply the condition to identify that point. This will reduce the time as vannila KNN searches the entire space but the KD Tree reduces the search space.

KD-Tree Limitations

For 2 dimensional data point, At the final stage, the query point needs to look up 4 adjacent regions as below to identify which region it belongs to

For 3-D , it needs to look for the 8 adjacent regions, so in general it looks for 2^d adjoining regions. so for a 20 dimensional data , it needs to look at around 1 million adjacent regions.

Time Complexity of KD-Tree

The time complexity is O(log(n)) for 1-NN and O(2^d(log(n))) , so when d= 10,11,12 and so on , the time complexity increases.

So, KD-Tree is good when the dimension is small.

Variations of KD TREE

Few variations of KD Tree are Quad Tree, Ball Tree (present in scikit learn) and Octtree.

Locality Sensitive Hashing

Given a datapoint, the idea is to store the similar points in a bucket by using Local Sensitive Hashing.

Conclusion

One of the advantage of KNN is that there is no assumption about the data to run this algorithm, so this can be used in both linear and non linear and the downside is the memory, it requires huge memory to store all the distance information of the other datapoints.