# \$k\$-nearest neighbors algorithm

Institute for Environmental and Spatial Analysis...University of North Georgia

## 1   Introduction

A distance-based nonparametric algorithm by Fix and Hodges (1951). \$k\$ is a hyperparameter, which is used to control the learning process, not a parameter, which is learned during training.

## 2   Applications

Supervised machine learning for data classification

Regression

## 3   Exercise: \$k\$-nearest neighbors classification

### 3.1   Import modules

``````import matplotlib.pyplot as plt
import numpy as np``````

### 3.2   Define hyperparameters

``````# specify the number of nearest neighbors
k = 3

# specify the size of training data
train_n = 100``````

### 3.3   Generate training data

``````# make this exercise repeatable
np.random.seed(1)

# generate training data
train_x = np.random.rand(train_n, 2)
# generate training labels
train_y = np.random.randint(2, size=train_n)``````

### 3.4   Generate new data

``````# generate new data
new_x = np.random.rand(1, 2)``````

### 3.5   Plot data

``````# plot data
plt.scatter(train_x[train_y==0,0], train_x[train_y==0,1], color='red', marker='^')
plt.scatter(train_x[train_y==1,0], train_x[train_y==1,1], color='blue', marker='s')
plt.scatter(new_x[:,0], new_x[:,1], color='green')
plt.show()``````

### 3.6   Calculate Euclidean distances

``````# calculate Euclidean distances between the new data and training data
dist = ((train_x[:,0]-new_x[:,0])**2 + (train_x[:,1]-new_x[:,1])**2)**0.5``````

### 3.7   Find \$k\$-nearest neighbors

``````# sort by distance
sort_idx = np.argsort(dist)

# extract k-nearest neighbors
knn_idx = sort_idx[0:k]
knn_x = train_x[knn_idx]
knn_y = train_y[knn_idx]
knn_dist = dist[knn_idx]``````

### 3.8   Calculate the new class

``````# calculate the new class
# https://stackoverflow.com/a/6252400
knn_counts = np.bincount(knn_y)
new_y = knn_counts.argmax()

print(f'counts: {knn_counts}')
print(f'new class: {new_y}')``````

### 3.9   Plot everything

``````# plot everything
plt.scatter(train_x[train_y==0,0], train_x[train_y==0,1], color='red', marker='^')
plt.scatter(train_x[train_y==1,0], train_x[train_y==1,1], color='blue', marker='s')
plt.scatter(new_x[:,0], new_x[:,1], color='green')

for r in knn_dist:
# https://stackoverflow.com/a/29184075
circle = plt.Circle((new_x[:,0], new_x[:,1]), r, color='gray', fill=False)

plt.show()``````

## 5   Exercise: Tiling-color-based forest classification using the \$k\$-nearest neighbors algorithm

``````nrows, ncols, _ = im.shape

for i in range(0, nrows, tile_size):
i2 = min(i + tile_size, nrows)
for j in range(0, ncols, tile_size):
j2 = min(j + tile_size, ncols)
tile = im[i:i2,j:j2]``````