Class Samplers

Namespace: AiDotNet.Data.Sampling

Assembly: AiDotNet.dll

Static factory class for creating data samplers with beginner-friendly methods.

public static class Samplers

Inheritance: object

Samplers

Inherited Members: object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

Samplers provides factory methods for creating various sampling strategies used during training. These samplers control which data points are selected and in what order.

For Beginners: Sampling strategies determine how you pick data from your dataset. Different strategies can help with: - Class imbalance (use weighted sampling) - Curriculum learning (start with easy examples, progress to hard) - Active learning (focus on uncertain examples)

Common Patterns:

// Random sampling (default, good for most cases)
var sampler = Samplers.Random(dataSize);

// Balanced sampling for imbalanced classes
var sampler = Samplers.Balanced(labels, numClasses);

// Curriculum learning (easy to hard)
var sampler = Samplers.Curriculum(difficulties, totalEpochs);

Methods

ActiveLearning(int, ActiveLearningStrategy, double, int?)

Creates an active learning sampler that prioritizes uncertain samples.

public static ActiveLearningSampler<double> ActiveLearning(int datasetSize, ActiveLearningStrategy strategy = ActiveLearningStrategy.Uncertainty, double diversityWeight = 0.3, int? seed = null)

Parameters

datasetSize int: The total number of samples in the dataset.
strategy ActiveLearningStrategy: The active learning selection strategy.
diversityWeight double: Weight for diversity in hybrid strategy (0-1).
seed int?: Optional random seed for reproducibility.

Returns

ActiveLearningSampler<double>: An active learning sampler.

Remarks

For Beginners: Active learning prioritizes samples where the model is most uncertain. This is especially useful when you can only label a limited number of samples. Call MarkAsLabeled() to mark samples that have been labeled, and UpdateUncertainty() to update uncertainty scores after each batch.

Common strategies:

Uncertainty: Focus on samples with highest uncertainty (entropy, margin, etc.)
Diversity: Focus on diverse samples using clustering
Hybrid: Combine uncertainty and diversity

ActiveLearning<T>(int, ActiveLearningStrategy, double, int?)

Creates an active learning sampler that prioritizes uncertain samples.

public static ActiveLearningSampler<T> ActiveLearning<T>(int datasetSize, ActiveLearningStrategy strategy = ActiveLearningStrategy.Uncertainty, double diversityWeight = 0.3, int? seed = null)

Parameters

datasetSize int: The total number of samples in the dataset.
strategy ActiveLearningStrategy: The active learning selection strategy.
diversityWeight double: Weight for diversity in hybrid strategy (0-1).
seed int?: Optional random seed for reproducibility.

Returns

ActiveLearningSampler<T>: An active learning sampler.

Type Parameters

T: The numeric type for uncertainty scores.

Remarks

Common strategies:

Uncertainty: Focus on samples with highest uncertainty (entropy, margin, etc.)
Diversity: Focus on diverse samples using clustering
Hybrid: Combine uncertainty and diversity

Balanced(IReadOnlyList<int>, int, int?)

Creates a balanced sampler that oversamples minority classes.

public static WeightedSampler<double> Balanced(IReadOnlyList<int> labels, int numClasses, int? seed = null)

Parameters

labels IReadOnlyList<int>: The class labels for each sample.
numClasses int: The number of classes.
seed int?: Optional random seed for reproducibility.

Returns

WeightedSampler<double>: A weighted sampler configured for class balancing.

Remarks

For Beginners: Use this for imbalanced datasets where some classes have many fewer examples than others. The sampler will select minority class examples more often to balance the training.

Example:

// If you have 1000 samples of class A and 100 samples of class B,
// this sampler will pick class B samples ~10x more often
var sampler = Samplers.Balanced(labels, numClasses: 2);

Curriculum(IEnumerable<double>, int, CurriculumStrategy, int?)

Creates a curriculum learning sampler that starts with easy samples.

public static CurriculumSampler<double> Curriculum(IEnumerable<double> difficulties, int totalEpochs = 100, CurriculumStrategy strategy = CurriculumStrategy.Linear, int? seed = null)

Parameters

difficulties IEnumerable<double>: Difficulty score for each sample (0 = easiest, 1 = hardest).
totalEpochs int: Total number of epochs for curriculum completion. Default is 100.
strategy CurriculumStrategy: The curriculum progression strategy.
seed int?: Optional random seed for reproducibility.

Returns

CurriculumSampler<double>: A curriculum sampler.

Remarks

For Beginners: Curriculum learning trains on easy examples first, then gradually introduces harder ones. This often leads to better and faster learning.

Example:

// Difficulty scores: 0 = easy, 1 = hard
var difficulties = ComputeDifficultyScores(data);
var sampler = Samplers.Curriculum(difficulties);  // Uses default 100 epochs

// In early epochs, mainly easy samples
// In later epochs, mix of easy and hard samples

Curriculum<T>(IEnumerable<T>, int, CurriculumStrategy, int?)

Creates a curriculum learning sampler that starts with easy samples.

public static CurriculumSampler<T> Curriculum<T>(IEnumerable<T> difficulties, int totalEpochs, CurriculumStrategy strategy = CurriculumStrategy.Linear, int? seed = null)

Parameters

difficulties IEnumerable<T>: Difficulty score for each sample (0 = easiest, 1 = hardest).
totalEpochs int: Total number of epochs for curriculum completion.
strategy CurriculumStrategy: The curriculum progression strategy.
seed int?: Optional random seed for reproducibility.

Returns

CurriculumSampler<T>: A curriculum sampler.

Type Parameters

T: The numeric type for difficulty scores.

Remarks

For Beginners: Curriculum learning trains on easy examples first, then gradually introduces harder ones. This often leads to better and faster learning.

Example:

// Difficulty scores: 0 = easy, 1 = hard
var difficulties = ComputeDifficultyScores(data);
var sampler = Samplers.Curriculum(difficulties, totalEpochs: 100);

// In early epochs, mainly easy samples
// In later epochs, mix of easy and hard samples

Importance(int, double, bool, int?)

Creates an importance sampler that prioritizes high-loss samples.

public static ImportanceSampler<double> Importance(int datasetSize, double smoothingFactor = 0.2, bool stabilize = true, int? seed = null)

Parameters

datasetSize int: The total number of samples in the dataset.
smoothingFactor double: Smoothing factor to prevent extreme sampling (0.1-0.5 recommended).
stabilize bool: Whether to clip extreme importance values.
seed int?: Optional random seed for reproducibility.

Returns

ImportanceSampler<double>: An importance sampler.

Remarks

For Beginners: Importance sampling focuses training on samples the model currently gets wrong (high loss). This can speed up training by focusing on hard examples. Call SetImportances() or UpdateImportance() after each batch to update importance scores.

Importance<T>(int, double, bool, int?)

Creates an importance sampler that prioritizes high-loss samples.

public static ImportanceSampler<T> Importance<T>(int datasetSize, double smoothingFactor = 0.2, bool stabilize = true, int? seed = null)

Parameters

datasetSize int: The total number of samples in the dataset.
smoothingFactor double: Smoothing factor to prevent extreme sampling (0.1-0.5 recommended).
stabilize bool: Whether to clip extreme importance values.
seed int?: Optional random seed for reproducibility.

Returns

ImportanceSampler<T>: An importance sampler.

Type Parameters

T: The numeric type for importance scores.

Remarks

Random(int, int?)

Creates a random sampler that shuffles data each epoch.

public static RandomSampler Random(int dataSize, int? seed = null)

Parameters

dataSize int: The total number of samples.
seed int?: Optional random seed for reproducibility.

Returns

RandomSampler: A random sampler.

Remarks

For Beginners: This is the default and most common sampler. It randomly shuffles your data each epoch, which helps the model generalize better.

SelfPaced(int, double, double, int, int?)

Creates a self-paced learning sampler with default parameters.

public static SelfPacedSampler<double> SelfPaced(int datasetSize, double initialLambda = 0.1, double lambdaGrowthRate = 0.1, int totalEpochs = 100, int? seed = null)

Parameters

datasetSize int: The total number of samples in the dataset.
initialLambda double: Starting pace parameter (lower = stricter selection). Default is 0.1.
lambdaGrowthRate double: How much lambda increases each epoch. Default is 0.1.
totalEpochs int
seed int?: Optional random seed for reproducibility.

Returns

SelfPacedSampler<double>: A self-paced sampler.

Remarks

For Beginners: Like curriculum learning, but the difficulty is determined by the model's loss on each sample. Samples the model finds easy are included first. Call UpdateLoss() or UpdateLosses() after each batch to update sample losses.

SelfPaced<T>(int, T, T, int, int?)

Creates a self-paced learning sampler that adapts based on model performance.

public static SelfPacedSampler<T> SelfPaced<T>(int datasetSize, T initialLambda, T lambdaGrowthRate, int totalEpochs = 100, int? seed = null)

Parameters

datasetSize int: The total number of samples in the dataset.
initialLambda T: Starting pace parameter (lower = stricter selection).
lambdaGrowthRate T: How much lambda increases each epoch.
totalEpochs int
seed int?: Optional random seed for reproducibility.

Returns

SelfPacedSampler<T>: A self-paced sampler.

Type Parameters

T: The numeric type for losses.

Remarks

Sequential(int)

Creates a sequential sampler that iterates through data in order.

public static SequentialSampler Sequential(int dataSize)

Parameters

dataSize int: The total number of samples.

Returns

SequentialSampler: A sequential sampler.

Remarks

For Beginners: Use this when you want to iterate through data in the same order every time. Useful for validation/testing or when order matters.

Stratified(IReadOnlyList<int>, int, int?)

Creates a stratified sampler that maintains class proportions in each batch.

public static StratifiedSampler Stratified(IReadOnlyList<int> labels, int numClasses, int? seed = null)

Parameters

labels IReadOnlyList<int>: The class labels for each sample.
numClasses int: The number of classes.
seed int?: Optional random seed for reproducibility.

Returns

StratifiedSampler: A stratified sampler.

Remarks

For Beginners: Use this when you want each batch to have the same proportion of classes as your full dataset. This helps prevent batches that are all one class.

Subset(IEnumerable<int>, bool, int?)

Creates a subset sampler that samples from specific indices.

public static SubsetSampler Subset(IEnumerable<int> indices, bool shuffle = false, int? seed = null)

Parameters

indices IEnumerable<int>: The indices to sample from.
shuffle bool: Whether to shuffle the subset indices.
seed int?: Optional random seed for reproducibility.

Returns

SubsetSampler: A subset sampler.

Remarks

For Beginners: Use this when you only want to train on a portion of your data, or when you've pre-computed a specific sampling order.

Weighted<T>(IEnumerable<T>, int, bool, int?)

Creates a weighted sampler that samples based on per-sample weights.

public static WeightedSampler<T> Weighted<T>(IEnumerable<T> weights, int numSamples, bool replacement = true, int? seed = null)

Parameters

weights IEnumerable<T>: The weight for each sample (higher = more likely to be sampled).
numSamples int: Number of samples to draw per epoch.
replacement bool: Whether to sample with replacement. Default is true.
seed int?: Optional random seed for reproducibility.

Returns

WeightedSampler<T>: A weighted sampler.

Type Parameters

T: The numeric type for weights.

Remarks

For Beginners: Use this when some samples are more important than others. Higher weights make a sample more likely to be selected.

Table of Contents

Class Samplers

Remarks

Methods

ActiveLearning(int, ActiveLearningStrategy, double, int?)

Parameters

Returns

Remarks

ActiveLearning<T>(int, ActiveLearningStrategy, double, int?)

Parameters

Returns

Type Parameters

Remarks

Balanced(IReadOnlyList<int>, int, int?)

Parameters

Returns

Remarks

Curriculum(IEnumerable<double>, int, CurriculumStrategy, int?)

Parameters

Returns

Remarks

Curriculum<T>(IEnumerable<T>, int, CurriculumStrategy, int?)

Parameters

Returns

Type Parameters

Remarks

Importance(int, double, bool, int?)

Parameters

Returns

Remarks

Importance<T>(int, double, bool, int?)

Parameters

Returns

Type Parameters

Remarks

Random(int, int?)

Parameters

Returns

Remarks

SelfPaced(int, double, double, int, int?)

Parameters

Returns

Remarks

SelfPaced<T>(int, T, T, int, int?)

Parameters

Returns

Type Parameters

Remarks

Sequential(int)

Parameters

Returns

Remarks

Stratified(IReadOnlyList<int>, int, int?)

Parameters

Returns

Remarks

Subset(IEnumerable<int>, bool, int?)

Parameters

Returns

Remarks

Weighted<T>(IEnumerable<T>, int, bool, int?)

Parameters

Returns

Type Parameters

Remarks