Table of Contents

Class Samplers

Namespace
AiDotNet.Data.Sampling
Assembly
AiDotNet.dll

Static factory class for creating data samplers with beginner-friendly methods.

public static class Samplers
Inheritance
Samplers
Inherited Members

Remarks

Samplers provides factory methods for creating various sampling strategies used during training. These samplers control which data points are selected and in what order.

For Beginners: Sampling strategies determine how you pick data from your dataset. Different strategies can help with: - Class imbalance (use weighted sampling) - Curriculum learning (start with easy examples, progress to hard) - Active learning (focus on uncertain examples)

Common Patterns:

// Random sampling (default, good for most cases)
var sampler = Samplers.Random(dataSize);

// Balanced sampling for imbalanced classes
var sampler = Samplers.Balanced(labels, numClasses);

// Curriculum learning (easy to hard)
var sampler = Samplers.Curriculum(difficulties, totalEpochs);

Methods

ActiveLearning(int, ActiveLearningStrategy, double, int?)

Creates an active learning sampler that prioritizes uncertain samples.

public static ActiveLearningSampler<double> ActiveLearning(int datasetSize, ActiveLearningStrategy strategy = ActiveLearningStrategy.Uncertainty, double diversityWeight = 0.3, int? seed = null)

Parameters

datasetSize int

The total number of samples in the dataset.

strategy ActiveLearningStrategy

The active learning selection strategy.

diversityWeight double

Weight for diversity in hybrid strategy (0-1).

seed int?

Optional random seed for reproducibility.

Returns

ActiveLearningSampler<double>

An active learning sampler.

Remarks

For Beginners: Active learning prioritizes samples where the model is most uncertain. This is especially useful when you can only label a limited number of samples. Call MarkAsLabeled() to mark samples that have been labeled, and UpdateUncertainty() to update uncertainty scores after each batch.

Common strategies:

  • Uncertainty: Focus on samples with highest uncertainty (entropy, margin, etc.)
  • Diversity: Focus on diverse samples using clustering
  • Hybrid: Combine uncertainty and diversity

ActiveLearning<T>(int, ActiveLearningStrategy, double, int?)

Creates an active learning sampler that prioritizes uncertain samples.

public static ActiveLearningSampler<T> ActiveLearning<T>(int datasetSize, ActiveLearningStrategy strategy = ActiveLearningStrategy.Uncertainty, double diversityWeight = 0.3, int? seed = null)

Parameters

datasetSize int

The total number of samples in the dataset.

strategy ActiveLearningStrategy

The active learning selection strategy.

diversityWeight double

Weight for diversity in hybrid strategy (0-1).

seed int?

Optional random seed for reproducibility.

Returns

ActiveLearningSampler<T>

An active learning sampler.

Type Parameters

T

The numeric type for uncertainty scores.

Remarks

For Beginners: Active learning prioritizes samples where the model is most uncertain. This is especially useful when you can only label a limited number of samples. Call MarkAsLabeled() to mark samples that have been labeled, and UpdateUncertainty() to update uncertainty scores after each batch.

Common strategies:

  • Uncertainty: Focus on samples with highest uncertainty (entropy, margin, etc.)
  • Diversity: Focus on diverse samples using clustering
  • Hybrid: Combine uncertainty and diversity

Balanced(IReadOnlyList<int>, int, int?)

Creates a balanced sampler that oversamples minority classes.

public static WeightedSampler<double> Balanced(IReadOnlyList<int> labels, int numClasses, int? seed = null)

Parameters

labels IReadOnlyList<int>

The class labels for each sample.

numClasses int

The number of classes.

seed int?

Optional random seed for reproducibility.

Returns

WeightedSampler<double>

A weighted sampler configured for class balancing.

Remarks

For Beginners: Use this for imbalanced datasets where some classes have many fewer examples than others. The sampler will select minority class examples more often to balance the training.

Example:

// If you have 1000 samples of class A and 100 samples of class B,
// this sampler will pick class B samples ~10x more often
var sampler = Samplers.Balanced(labels, numClasses: 2);

Curriculum(IEnumerable<double>, int, CurriculumStrategy, int?)

Creates a curriculum learning sampler that starts with easy samples.

public static CurriculumSampler<double> Curriculum(IEnumerable<double> difficulties, int totalEpochs = 100, CurriculumStrategy strategy = CurriculumStrategy.Linear, int? seed = null)

Parameters

difficulties IEnumerable<double>

Difficulty score for each sample (0 = easiest, 1 = hardest).

totalEpochs int

Total number of epochs for curriculum completion. Default is 100.

strategy CurriculumStrategy

The curriculum progression strategy.

seed int?

Optional random seed for reproducibility.

Returns

CurriculumSampler<double>

A curriculum sampler.

Remarks

For Beginners: Curriculum learning trains on easy examples first, then gradually introduces harder ones. This often leads to better and faster learning.

Example:

// Difficulty scores: 0 = easy, 1 = hard
var difficulties = ComputeDifficultyScores(data);
var sampler = Samplers.Curriculum(difficulties);  // Uses default 100 epochs

// In early epochs, mainly easy samples
// In later epochs, mix of easy and hard samples

Curriculum<T>(IEnumerable<T>, int, CurriculumStrategy, int?)

Creates a curriculum learning sampler that starts with easy samples.

public static CurriculumSampler<T> Curriculum<T>(IEnumerable<T> difficulties, int totalEpochs, CurriculumStrategy strategy = CurriculumStrategy.Linear, int? seed = null)

Parameters

difficulties IEnumerable<T>

Difficulty score for each sample (0 = easiest, 1 = hardest).

totalEpochs int

Total number of epochs for curriculum completion.

strategy CurriculumStrategy

The curriculum progression strategy.

seed int?

Optional random seed for reproducibility.

Returns

CurriculumSampler<T>

A curriculum sampler.

Type Parameters

T

The numeric type for difficulty scores.

Remarks

For Beginners: Curriculum learning trains on easy examples first, then gradually introduces harder ones. This often leads to better and faster learning.

Example:

// Difficulty scores: 0 = easy, 1 = hard
var difficulties = ComputeDifficultyScores(data);
var sampler = Samplers.Curriculum(difficulties, totalEpochs: 100);

// In early epochs, mainly easy samples
// In later epochs, mix of easy and hard samples

Importance(int, double, bool, int?)

Creates an importance sampler that prioritizes high-loss samples.

public static ImportanceSampler<double> Importance(int datasetSize, double smoothingFactor = 0.2, bool stabilize = true, int? seed = null)

Parameters

datasetSize int

The total number of samples in the dataset.

smoothingFactor double

Smoothing factor to prevent extreme sampling (0.1-0.5 recommended).

stabilize bool

Whether to clip extreme importance values.

seed int?

Optional random seed for reproducibility.

Returns

ImportanceSampler<double>

An importance sampler.

Remarks

For Beginners: Importance sampling focuses training on samples the model currently gets wrong (high loss). This can speed up training by focusing on hard examples. Call SetImportances() or UpdateImportance() after each batch to update importance scores.

Importance<T>(int, double, bool, int?)

Creates an importance sampler that prioritizes high-loss samples.

public static ImportanceSampler<T> Importance<T>(int datasetSize, double smoothingFactor = 0.2, bool stabilize = true, int? seed = null)

Parameters

datasetSize int

The total number of samples in the dataset.

smoothingFactor double

Smoothing factor to prevent extreme sampling (0.1-0.5 recommended).

stabilize bool

Whether to clip extreme importance values.

seed int?

Optional random seed for reproducibility.

Returns

ImportanceSampler<T>

An importance sampler.

Type Parameters

T

The numeric type for importance scores.

Remarks

For Beginners: Importance sampling focuses training on samples the model currently gets wrong (high loss). This can speed up training by focusing on hard examples. Call SetImportances() or UpdateImportance() after each batch to update importance scores.

Random(int, int?)

Creates a random sampler that shuffles data each epoch.

public static RandomSampler Random(int dataSize, int? seed = null)

Parameters

dataSize int

The total number of samples.

seed int?

Optional random seed for reproducibility.

Returns

RandomSampler

A random sampler.

Remarks

For Beginners: This is the default and most common sampler. It randomly shuffles your data each epoch, which helps the model generalize better.

SelfPaced(int, double, double, int, int?)

Creates a self-paced learning sampler with default parameters.

public static SelfPacedSampler<double> SelfPaced(int datasetSize, double initialLambda = 0.1, double lambdaGrowthRate = 0.1, int totalEpochs = 100, int? seed = null)

Parameters

datasetSize int

The total number of samples in the dataset.

initialLambda double

Starting pace parameter (lower = stricter selection). Default is 0.1.

lambdaGrowthRate double

How much lambda increases each epoch. Default is 0.1.

totalEpochs int
seed int?

Optional random seed for reproducibility.

Returns

SelfPacedSampler<double>

A self-paced sampler.

Remarks

For Beginners: Like curriculum learning, but the difficulty is determined by the model's loss on each sample. Samples the model finds easy are included first. Call UpdateLoss() or UpdateLosses() after each batch to update sample losses.

SelfPaced<T>(int, T, T, int, int?)

Creates a self-paced learning sampler that adapts based on model performance.

public static SelfPacedSampler<T> SelfPaced<T>(int datasetSize, T initialLambda, T lambdaGrowthRate, int totalEpochs = 100, int? seed = null)

Parameters

datasetSize int

The total number of samples in the dataset.

initialLambda T

Starting pace parameter (lower = stricter selection).

lambdaGrowthRate T

How much lambda increases each epoch.

totalEpochs int
seed int?

Optional random seed for reproducibility.

Returns

SelfPacedSampler<T>

A self-paced sampler.

Type Parameters

T

The numeric type for losses.

Remarks

For Beginners: Like curriculum learning, but the difficulty is determined by the model's loss on each sample. Samples the model finds easy are included first. Call UpdateLoss() or UpdateLosses() after each batch to update sample losses.

Sequential(int)

Creates a sequential sampler that iterates through data in order.

public static SequentialSampler Sequential(int dataSize)

Parameters

dataSize int

The total number of samples.

Returns

SequentialSampler

A sequential sampler.

Remarks

For Beginners: Use this when you want to iterate through data in the same order every time. Useful for validation/testing or when order matters.

Stratified(IReadOnlyList<int>, int, int?)

Creates a stratified sampler that maintains class proportions in each batch.

public static StratifiedSampler Stratified(IReadOnlyList<int> labels, int numClasses, int? seed = null)

Parameters

labels IReadOnlyList<int>

The class labels for each sample.

numClasses int

The number of classes.

seed int?

Optional random seed for reproducibility.

Returns

StratifiedSampler

A stratified sampler.

Remarks

For Beginners: Use this when you want each batch to have the same proportion of classes as your full dataset. This helps prevent batches that are all one class.

Subset(IEnumerable<int>, bool, int?)

Creates a subset sampler that samples from specific indices.

public static SubsetSampler Subset(IEnumerable<int> indices, bool shuffle = false, int? seed = null)

Parameters

indices IEnumerable<int>

The indices to sample from.

shuffle bool

Whether to shuffle the subset indices.

seed int?

Optional random seed for reproducibility.

Returns

SubsetSampler

A subset sampler.

Remarks

For Beginners: Use this when you only want to train on a portion of your data, or when you've pre-computed a specific sampling order.

Weighted<T>(IEnumerable<T>, int, bool, int?)

Creates a weighted sampler that samples based on per-sample weights.

public static WeightedSampler<T> Weighted<T>(IEnumerable<T> weights, int numSamples, bool replacement = true, int? seed = null)

Parameters

weights IEnumerable<T>

The weight for each sample (higher = more likely to be sampled).

numSamples int

Number of samples to draw per epoch.

replacement bool

Whether to sample with replacement. Default is true.

seed int?

Optional random seed for reproducibility.

Returns

WeightedSampler<T>

A weighted sampler.

Type Parameters

T

The numeric type for weights.

Remarks

For Beginners: Use this when some samples are more important than others. Higher weights make a sample more likely to be selected.