Class KNeighborsOptions<T>
Configuration options for K-Nearest Neighbors classifiers.
public class KNeighborsOptions<T> : ClassifierOptions<T>
Type Parameters
TThe data type used for calculations.
- Inheritance
-
KNeighborsOptions<T>
- Inherited Members
Remarks
K-Nearest Neighbors (KNN) is a simple, instance-based learning algorithm that classifies samples based on the majority class among their k nearest neighbors in the feature space.
For Beginners: KNN is like asking your neighbors for advice!
When you need to classify a new sample:
- Find the k training samples closest to it
- Look at what classes those neighbors belong to
- Predict the most common class among those neighbors
Example: To predict if a movie is "Action" or "Comedy":
- Find 5 similar movies (based on runtime, budget, etc.)
- If 4 are Action and 1 is Comedy, predict "Action"
Key settings:
- K (N_Neighbors): How many neighbors to consider (default: 5)
- Metric: How to measure distance (Euclidean, Manhattan, etc.)
Properties
Algorithm
Gets or sets the algorithm used to compute nearest neighbors.
public KNNAlgorithm Algorithm { get; set; }
Property Value
- KNNAlgorithm
The search algorithm. Default is Auto.
Remarks
Auto chooses the best algorithm based on data characteristics. BruteForce computes all pairwise distances (O(n*d) per query). KDTree uses a tree structure for faster queries in low dimensions. BallTree is better for high-dimensional data.
For Beginners: How to find neighbors efficiently.
- Auto: Let the algorithm choose (recommended)
- BruteForce: Compare to every training sample (slow but accurate)
- KDTree: Use a tree structure (fast for small dimensions)
- BallTree: Better for many dimensions
LeafSize
Gets or sets the leaf size for tree-based algorithms.
public int LeafSize { get; set; }
Property Value
- int
The leaf size for KDTree or BallTree. Default is 30.
Remarks
This affects the speed of tree construction and query, as well as memory requirements. Larger values create shallower trees.
Metric
Gets or sets the distance metric used to find nearest neighbors.
public DistanceMetric Metric { get; set; }
Property Value
- DistanceMetric
The distance metric. Default is Euclidean.
Remarks
The choice of metric affects which points are considered "nearest." Euclidean distance works well for continuous features with similar scales. Manhattan distance can be better for high-dimensional data.
For Beginners: This determines how we measure "closeness."
- Euclidean: Straight-line distance (like a bird flying)
- Manhattan: Distance along axes (like walking city blocks)
- Minkowski: Generalization of both (with parameter p)
Euclidean is the most common choice for most problems.
NNeighbors
Gets or sets the number of neighbors to use for classification.
public int NNeighbors { get; set; }
Property Value
- int
The number of neighbors (k). Default is 5.
Remarks
Smaller values of k make the model more sensitive to noise but can capture local patterns. Larger values provide smoother decision boundaries but may miss local patterns.
For Beginners: K is the number of neighbors to ask for their opinion.
- K = 1: Only look at the single closest neighbor (very sensitive to noise)
- K = 5: Look at 5 closest neighbors (good balance)
- K = 20: Look at 20 neighbors (smoother but may ignore local patterns)
Odd values are often preferred to avoid ties in binary classification.
P
Gets or sets the power parameter for the Minkowski metric.
public double P { get; set; }
Property Value
- double
The Minkowski power parameter. Default is 2 (Euclidean).
Remarks
Only used when Metric is Minkowski. p = 1 is equivalent to Manhattan distance. p = 2 is equivalent to Euclidean distance.
Weights
Gets or sets the weight function used in prediction.
public WeightingScheme Weights { get; set; }
Property Value
- WeightingScheme
The weighting scheme. Default is Uniform.
Remarks
Uniform weighting treats all neighbors equally. Distance weighting gives closer neighbors more influence on the prediction.
For Beginners: Should all neighbors have equal say?
- Uniform: Every neighbor's vote counts equally
- Distance: Closer neighbors count more (weight = 1/distance)
Distance weighting often works better because the closest neighbors are usually more relevant.