Class SmoteAugmenter<T>
- Namespace
- AiDotNet.Augmentation.Tabular
- Assembly
- AiDotNet.dll
Implements SMOTE (Synthetic Minority Over-sampling Technique) for imbalanced datasets.
public class SmoteAugmenter<T> : TabularAugmenterBase<T>, IAugmentation<T, Matrix<T>>
Type Parameters
TThe numeric type for calculations.
- Inheritance
-
AugmentationBase<T, Matrix<T>>SmoteAugmenter<T>
- Implements
-
IAugmentation<T, Matrix<T>>
- Inherited Members
Remarks
For Beginners: SMOTE creates new synthetic samples for the minority class by interpolating between existing minority samples and their nearest neighbors. This helps balance imbalanced datasets where one class has far fewer samples than others.
How it works:
- For each minority sample, find its k nearest neighbors (also from minority class)
- Randomly select one of these neighbors
- Create a new sample along the line between the original and the neighbor
When to use:
- Classification with severe class imbalance (e.g., fraud detection, rare disease)
- When the minority class has too few samples to learn from
- When undersampling the majority class would lose too much information
When NOT to use:
- When classes are already balanced
- For regression tasks (use other techniques)
- When features are highly categorical (use SMOTE-NC instead)
Reference: Chawla et al., "SMOTE: Synthetic Minority Over-sampling Technique" (2002)
Constructors
SmoteAugmenter(int, double, double)
Creates a new SMOTE augmenter.
public SmoteAugmenter(int kNeighbors = 5, double samplingRatio = 1, double probability = 1)
Parameters
kNeighborsintNumber of nearest neighbors to use (default: 5).
samplingRatiodoubleRatio of synthetic samples to generate (default: 1.0).
probabilitydoubleProbability of applying this augmentation (default: 1.0).
Properties
KNeighbors
Gets the number of nearest neighbors to consider.
public int KNeighbors { get; }
Property Value
Remarks
Default: 5
Higher values create more diverse synthetic samples but require more minority samples.
SamplingRatio
Gets the sampling ratio for synthetic sample generation.
public double SamplingRatio { get; }
Property Value
Remarks
Default: 1.0 (generate as many synthetic samples as original minority samples)
Values > 1.0 create more synthetic samples; values < 1.0 create fewer.
Methods
ApplyAugmentation(Matrix<T>, AugmentationContext<T>)
Implement this method to perform the actual augmentation.
protected override Matrix<T> ApplyAugmentation(Matrix<T> data, AugmentationContext<T> context)
Parameters
dataMatrix<T>The input data.
contextAugmentationContext<T>The augmentation context.
Returns
- Matrix<T>
The augmented data.
ApplySmoteWithLabels(Matrix<T>, Vector<T>, AugmentationContext<T>)
Applies SMOTE and returns combined original and synthetic data.
public (Matrix<T> Data, Vector<T> Labels) ApplySmoteWithLabels(Matrix<T> minorityData, Vector<T> minorityLabels, AugmentationContext<T> context)
Parameters
minorityDataMatrix<T>Matrix containing only minority class samples.
minorityLabelsVector<T>Labels for the minority class.
contextAugmentationContext<T>The augmentation context.
Returns
- (Matrix<T> Data, Vector<T> Labels)
Tuple of (combined data, combined labels) including both original and synthetic samples.
GenerateSyntheticSamples(Matrix<T>, AugmentationContext<T>)
Applies SMOTE to generate synthetic samples for the minority class.
public Matrix<T> GenerateSyntheticSamples(Matrix<T> minorityData, AugmentationContext<T> context)
Parameters
minorityDataMatrix<T>Matrix containing only minority class samples.
contextAugmentationContext<T>The augmentation context.
Returns
- Matrix<T>
Matrix containing synthetic samples (original data is NOT included).
GetParameters()
Gets the parameters of this augmentation.
public override IDictionary<string, object> GetParameters()
Returns
- IDictionary<string, object>
A dictionary of parameter names to values.