Table of Contents

Class SVMBase<T>

Namespace
AiDotNet.Classification.SVM
Assembly
AiDotNet.dll

Base class for Support Vector Machine classifiers.

public abstract class SVMBase<T> : ProbabilisticClassifierBase<T>, IProbabilisticClassifier<T>, IDecisionFunctionClassifier<T>, IClassifier<T>, IFullModel<T, Matrix<T>, Vector<T>>, IModel<Matrix<T>, Vector<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Matrix<T>, Vector<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Matrix<T>, Vector<T>>>, IGradientComputable<T, Matrix<T>, Vector<T>>, IJitCompilable<T>

Type Parameters

T

The numeric data type used for calculations (e.g., float, double).

Inheritance
SVMBase<T>
Implements
IFullModel<T, Matrix<T>, Vector<T>>
IModel<Matrix<T>, Vector<T>, ModelMetadata<T>>
IParameterizable<T, Matrix<T>, Vector<T>>
ICloneable<IFullModel<T, Matrix<T>, Vector<T>>>
IGradientComputable<T, Matrix<T>, Vector<T>>
Derived
Inherited Members
Extension Methods

Remarks

Support Vector Machines (SVMs) are powerful classifiers that find the optimal hyperplane separating classes with maximum margin. This base class provides common functionality for SVM implementations.

For Beginners: SVMs are like finding the best possible line (or curve) to separate different groups. Unlike other methods that just find "a" line that works, SVMs find "the best" line by maximizing the gap (margin) between the line and the nearest points from each class.

Key SVM concepts:

  • Margin: The gap between the decision boundary and the nearest training points
  • Support Vectors: The training points closest to the decision boundary
  • Kernel Trick: A way to handle non-linear boundaries without explicitly computing new features

Constructors

SVMBase(SVMOptions<T>?, IRegularization<T, Matrix<T>, Vector<T>>?)

Initializes a new instance of the SVMBase class.

protected SVMBase(SVMOptions<T>? options = null, IRegularization<T, Matrix<T>, Vector<T>>? regularization = null)

Parameters

options SVMOptions<T>

Configuration options for the SVM.

regularization IRegularization<T, Matrix<T>, Vector<T>>

Optional regularization strategy.

Fields

_dualCoef

The dual coefficients for the support vectors.

protected Matrix<T>? _dualCoef

Field Value

Matrix<T>

_intercept

The bias terms for each classifier.

protected Vector<T>? _intercept

Field Value

Vector<T>

_supportVectors

The support vectors learned during training.

protected Matrix<T>? _supportVectors

Field Value

Matrix<T>

Properties

NSupportVectors

Gets the number of support vectors.

public int NSupportVectors { get; }

Property Value

int

The count of support vectors, or 0 if not trained.

Options

Gets the SVM specific options.

protected SVMOptions<T> Options { get; }

Property Value

SVMOptions<T>

SupportVectors

Gets the support vectors learned during training.

public Matrix<T>? SupportVectors { get; }

Property Value

Matrix<T>

The matrix of support vectors, or null if not applicable or not trained. Each row is a support vector.

Remarks

Support vectors are the training samples that lie closest to the decision boundary. They are the most "informative" samples and completely define the decision boundary.

For Beginners: Support vectors are the key training examples that define where the decision boundary goes. If you removed them, the classifier would change. Other training points (that are far from the boundary) don't affect the decision boundary at all.

A classifier with fewer support vectors relative to training samples has learned a simpler model.

Methods

ApplyGradients(Vector<T>, T)

Applies pre-computed gradients to update the model parameters.

public override void ApplyGradients(Vector<T> gradients, T learningRate)

Parameters

gradients Vector<T>

The gradient vector to apply.

learningRate T

The learning rate for the update.

Remarks

Updates parameters using: θ = θ - learningRate * gradients

For Beginners: After computing gradients (seeing which direction to move), this method actually moves the model in that direction. The learning rate controls how big of a step to take.

Distributed Training: In DDP/ZeRO-2, this applies the synchronized (averaged) gradients after communication across workers. Each worker applies the same averaged gradients to keep parameters consistent.

ComputeGradients(Matrix<T>, Vector<T>, ILossFunction<T>?)

Computes gradients of the loss function with respect to model parameters for the given data, WITHOUT updating the model parameters.

public override Vector<T> ComputeGradients(Matrix<T> input, Vector<T> target, ILossFunction<T>? lossFunction = null)

Parameters

input Matrix<T>

The input data.

target Vector<T>

The target/expected output.

lossFunction ILossFunction<T>

The loss function to use for gradient computation. If null, uses the model's default loss function.

Returns

Vector<T>

A vector containing gradients with respect to all model parameters.

Remarks

This method performs a forward pass, computes the loss, and back-propagates to compute gradients, but does NOT update the model's parameters. The parameters remain unchanged after this call.

Distributed Training: In DDP/ZeRO-2, each worker calls this to compute local gradients on its data batch. These gradients are then synchronized (averaged) across workers before applying updates. This ensures all workers compute the same parameter updates despite having different data.

For Meta-Learning: After adapting a model on a support set, you can use this method to compute gradients on the query set. These gradients become the meta-gradients for updating the meta-parameters.

For Beginners: Think of this as "dry run" training: - The model sees what direction it should move (the gradients) - But it doesn't actually move (parameters stay the same) - You get to decide what to do with this information (average with others, inspect, modify, etc.)

Exceptions

InvalidOperationException

If lossFunction is null and the model has no default loss function.

ComputeKernel(Vector<T>, Vector<T>)

Computes the kernel between two vectors.

protected T ComputeKernel(Vector<T> x, Vector<T> y)

Parameters

x Vector<T>

First vector.

y Vector<T>

Second vector.

Returns

T

The kernel value K(x, y).

ComputeLaplacianKernel(Vector<T>, Vector<T>)

Computes Laplacian kernel: K(x, y) = exp(-gamma * ||x - y||_1)

protected T ComputeLaplacianKernel(Vector<T> x, Vector<T> y)

Parameters

x Vector<T>
y Vector<T>

Returns

T

ComputeLinearKernel(Vector<T>, Vector<T>)

Computes linear kernel: K(x, y) = x · y

protected T ComputeLinearKernel(Vector<T> x, Vector<T> y)

Parameters

x Vector<T>
y Vector<T>

Returns

T

ComputePolynomialKernel(Vector<T>, Vector<T>)

Computes polynomial kernel: K(x, y) = (gamma * x · y + coef0)^degree

protected T ComputePolynomialKernel(Vector<T> x, Vector<T> y)

Parameters

x Vector<T>
y Vector<T>

Returns

T

ComputeRBFKernel(Vector<T>, Vector<T>)

Computes RBF kernel: K(x, y) = exp(-gamma * ||x - y||^2)

protected T ComputeRBFKernel(Vector<T> x, Vector<T> y)

Parameters

x Vector<T>
y Vector<T>

Returns

T

ComputeSigmoidKernel(Vector<T>, Vector<T>)

Computes sigmoid kernel: K(x, y) = tanh(gamma * x · y + coef0)

protected T ComputeSigmoidKernel(Vector<T> x, Vector<T> y)

Parameters

x Vector<T>
y Vector<T>

Returns

T

DecisionFunction(Matrix<T>)

Computes the decision function for the input samples.

public abstract Matrix<T> DecisionFunction(Matrix<T> input)

Parameters

input Matrix<T>

The input features matrix where each row is a sample.

Returns

Matrix<T>

A matrix of decision values. For binary classification, this is a single column representing the signed distance to the decision boundary. For multi-class, the shape depends on the multi-class strategy (OvR vs OvO).

Remarks

The decision function provides the "raw" output of the classifier before any probability calibration. For SVMs, this is the signed distance to the separating hyperplane.

For Beginners: This gives you the classifier's "confidence" without converting to probabilities.

Use this when you want to:

  • Apply custom thresholds for classification
  • Understand how confident the classifier is
  • Create your own probability calibration

GetGamma()

Gets the gamma value, computing it automatically if not specified.

protected T GetGamma()

Returns

T

GetModelMetadata()

Gets metadata about the model.

public override ModelMetadata<T> GetModelMetadata()

Returns

ModelMetadata<T>

A ModelMetadata object containing information about the model.

Remarks

This method returns metadata about the model, including its type, feature count, complexity, description, and additional information specific to classification.

For Beginners: Model metadata provides information about the model itself, rather than the predictions it makes. This includes details about the model's structure (like how many features it uses) and characteristics (like how many classes it can predict). This information can be useful for understanding and comparing different models.

GetParameters()

Gets all model parameters as a single vector.

public override Vector<T> GetParameters()

Returns

Vector<T>

A vector containing all model parameters.

Remarks

This method returns a vector containing all model parameters for use with optimization algorithms or model comparison.

For Beginners: This method packages all the model's parameters into a single collection. This is useful for optimization algorithms that need to work with all parameters at once.

GetRow(Matrix<T>, int)

Extracts a row from a matrix as a vector.

protected Vector<T> GetRow(Matrix<T> matrix, int row)

Parameters

matrix Matrix<T>
row int

Returns

Vector<T>

SetParameters(Vector<T>)

Sets the parameters for this model.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

A vector containing all model parameters.

Exceptions

ArgumentException

Thrown when the parameters vector has an incorrect length.

WithParameters(Vector<T>)

Creates a new instance of the model with specified parameters.

public override IFullModel<T, Matrix<T>, Vector<T>> WithParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

A vector containing all model parameters.

Returns

IFullModel<T, Matrix<T>, Vector<T>>

A new model instance with the specified parameters.

Exceptions

ArgumentException

Thrown when the parameters vector has an incorrect length.