π K-Nearest Neighbors (KNN) Pseudocode
K-Nearest Neighbors (KNN) is one of the simplest yet powerful machine learning algorithms. It's a non-parametric, instance-based learning method that makes predictions based on the similarity to training examples.
The Algorithm
KNN Classification
ALGORITHM: K-Nearest Neighbors Classification
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
INPUT:
X_train: Training feature matrix of shape (n_samples, n_features)
y_train: Training labels of shape (n_samples,)
x_query: New sample to classify of shape (n_features,)
k: Number of neighbors to consider
distance_metric: Function to compute distance (default: Euclidean)
OUTPUT:
predicted_class: The predicted class label
probabilities: Class probability distribution (optional)
PROCEDURE:
1. // Compute distances to all training samples
distances β empty list
FOR i = 1 TO n_samples:
d β distance_metric(x_query, X_train[i])
APPEND (d, y_train[i], i) to distances
2. // Sort by distance (ascending)
SORT distances by first element
3. // Select k nearest neighbors
k_nearest β first k elements of distances
4. // Extract labels of k nearest neighbors
neighbor_labels β [label for (_, label, _) in k_nearest]
5. // Majority voting
class_counts β count occurrences of each class in neighbor_labels
predicted_class β class with maximum count
6. // Compute probabilities (optional)
probabilities β class_counts / k
7. RETURN predicted_class, probabilitiesKNN Regression
ALGORITHM: K-Nearest Neighbors Regression
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
INPUT:
X_train: Training feature matrix of shape (n_samples, n_features)
y_train: Training target values of shape (n_samples,)
x_query: New sample to predict
k: Number of neighbors to consider
distance_metric: Function to compute distance
weighting: 'uniform' or 'distance'
OUTPUT:
predicted_value: The predicted continuous value
PROCEDURE:
1. // Compute distances to all training samples
distances β empty list
FOR i = 1 TO n_samples:
d β distance_metric(x_query, X_train[i])
APPEND (d, y_train[i]) to distances
2. // Sort by distance (ascending)
SORT distances by first element
3. // Select k nearest neighbors
k_nearest β first k elements of distances
4. IF weighting = 'uniform':
// Simple average
predicted_value β mean([y for (_, y) in k_nearest])
ELSE IF weighting = 'distance':
// Weighted average (inverse distance)
weights β [1/d for (d, _) in k_nearest]
// Handle d=0: set weight to very large number
values β [y for (_, y) in k_nearest]
predicted_value β Ξ£(weights[i] Γ values[i]) / Ξ£(weights)
5. RETURN predicted_valueCommon Distance Metrics
FUNCTION: Distance Metrics
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Euclidean Distance (L2):
d(x, y) = β(Ξ£α΅’ (xα΅’ - yα΅’)Β²)
Manhattan Distance (L1):
d(x, y) = Ξ£α΅’ |xα΅’ - yα΅’|
Minkowski Distance (Lp):
d(x, y) = (Ξ£α΅’ |xα΅’ - yα΅’|α΅)^(1/p)
Cosine Distance:
d(x, y) = 1 - (x Β· y) / (||x|| Γ ||y||)
Hamming Distance (for categorical):
d(x, y) = Ξ£α΅’ π(xα΅’ β yα΅’)Optimized KNN with KD-Tree
For large datasets, we use spatial data structures to speed up neighbor search:
ALGORITHM: KNN with KD-Tree
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
BUILD KD-TREE:
FUNCTION build_kdtree(points, depth=0):
IF points is empty:
RETURN null
axis β depth MOD n_features
SORT points by axis dimension
median_idx β length(points) / 2
RETURN Node(
point: points[median_idx],
left: build_kdtree(points[:median_idx], depth + 1),
right: build_kdtree(points[median_idx+1:], depth + 1)
)
SEARCH K-NEAREST:
FUNCTION search_knn(node, query, k, depth=0, heap=empty):
IF node is null:
RETURN heap
// Add current node to candidates
d β distance(query, node.point)
IF size(heap) < k:
push (d, node.point) to heap
ELSE IF d < max_distance in heap:
replace max in heap with (d, node.point)
axis β depth MOD n_features
diff β query[axis] - node.point[axis]
// Search the side containing query first
IF diff < 0:
first, second β node.left, node.right
ELSE:
first, second β node.right, node.left
search_knn(first, query, k, depth + 1, heap)
// Check if we need to search the other side
IF size(heap) < k OR |diff| < max_distance in heap:
search_knn(second, query, k, depth + 1, heap)
RETURN heapComplexity Analysis
| Operation | Brute Force | KD-Tree (balanced) | Ball Tree |
|---|---|---|---|
| Training | $O(1)$ | $O(n \log n)$ | $O(n \log n)$ |
| Query | $O(n \cdot m)$ | $O(k \log n)$ avg | $O(k \log n)$ avg |
| Space | $O(n \cdot m)$ | $O(n)$ | $O(n)$ |
Where $n$ = samples, $m$ = features, $k$ = neighbors.
Note: KD-Trees degrade to $O(n)$ in high dimensions (curse of dimensionality).
Choosing K
ALGORITHM: Cross-Validation for K Selection
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
INPUT:
X, y: Dataset
k_range: List of k values to test (e.g., [1, 3, 5, 7, 9, ...])
n_folds: Number of cross-validation folds
PROCEDURE:
best_k β 1
best_score β -β
FOR k in k_range:
scores β cross_validate(KNN(k), X, y, n_folds)
mean_score β average(scores)
IF mean_score > best_score:
best_score β mean_score
best_k β k
RETURN best_kKey Insights
Lazy Learning: No explicit training phase; all computation happens at prediction time.
Feature Scaling: Essential! Features with larger ranges dominate distance calculations.
Curse of Dimensionality: Performance degrades in high dimensions; consider dimensionality reduction.
Odd K for Binary Classification: Avoids ties in voting.
Weighted Voting: Giving closer neighbors more influence often improves results.
References
- Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory. https://ieeexplore.ieee.org/document/1053964
- Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning, Chapter 13. Springer. https://hastie.su.domains/ElemStatLearn/
- scikit-learn Documentation: KNeighborsClassifier. https://scikit-learn.org/stable/modules/neighbors.html
- Bentley, J. L. (1975). Multidimensional binary search trees used for associative searching. Communications of the ACM.