Class Samplers
Static factory class for creating data samplers with beginner-friendly methods.
public static class Samplers
- Inheritance
-
Samplers
- Inherited Members
Remarks
Samplers provides factory methods for creating various sampling strategies used during training. These samplers control which data points are selected and in what order.
For Beginners: Sampling strategies determine how you pick data from your dataset. Different strategies can help with: - Class imbalance (use weighted sampling) - Curriculum learning (start with easy examples, progress to hard) - Active learning (focus on uncertain examples)
Common Patterns:
// Random sampling (default, good for most cases)
var sampler = Samplers.Random(dataSize);
// Balanced sampling for imbalanced classes
var sampler = Samplers.Balanced(labels, numClasses);
// Curriculum learning (easy to hard)
var sampler = Samplers.Curriculum(difficulties, totalEpochs);
Methods
ActiveLearning(int, ActiveLearningStrategy, double, int?)
Creates an active learning sampler that prioritizes uncertain samples.
public static ActiveLearningSampler<double> ActiveLearning(int datasetSize, ActiveLearningStrategy strategy = ActiveLearningStrategy.Uncertainty, double diversityWeight = 0.3, int? seed = null)
Parameters
datasetSizeintThe total number of samples in the dataset.
strategyActiveLearningStrategyThe active learning selection strategy.
diversityWeightdoubleWeight for diversity in hybrid strategy (0-1).
seedint?Optional random seed for reproducibility.
Returns
- ActiveLearningSampler<double>
An active learning sampler.
Remarks
For Beginners: Active learning prioritizes samples where the model is most uncertain. This is especially useful when you can only label a limited number of samples. Call MarkAsLabeled() to mark samples that have been labeled, and UpdateUncertainty() to update uncertainty scores after each batch.
Common strategies:
- Uncertainty: Focus on samples with highest uncertainty (entropy, margin, etc.)
- Diversity: Focus on diverse samples using clustering
- Hybrid: Combine uncertainty and diversity
ActiveLearning<T>(int, ActiveLearningStrategy, double, int?)
Creates an active learning sampler that prioritizes uncertain samples.
public static ActiveLearningSampler<T> ActiveLearning<T>(int datasetSize, ActiveLearningStrategy strategy = ActiveLearningStrategy.Uncertainty, double diversityWeight = 0.3, int? seed = null)
Parameters
datasetSizeintThe total number of samples in the dataset.
strategyActiveLearningStrategyThe active learning selection strategy.
diversityWeightdoubleWeight for diversity in hybrid strategy (0-1).
seedint?Optional random seed for reproducibility.
Returns
- ActiveLearningSampler<T>
An active learning sampler.
Type Parameters
TThe numeric type for uncertainty scores.
Remarks
For Beginners: Active learning prioritizes samples where the model is most uncertain. This is especially useful when you can only label a limited number of samples. Call MarkAsLabeled() to mark samples that have been labeled, and UpdateUncertainty() to update uncertainty scores after each batch.
Common strategies:
- Uncertainty: Focus on samples with highest uncertainty (entropy, margin, etc.)
- Diversity: Focus on diverse samples using clustering
- Hybrid: Combine uncertainty and diversity
Balanced(IReadOnlyList<int>, int, int?)
Creates a balanced sampler that oversamples minority classes.
public static WeightedSampler<double> Balanced(IReadOnlyList<int> labels, int numClasses, int? seed = null)
Parameters
labelsIReadOnlyList<int>The class labels for each sample.
numClassesintThe number of classes.
seedint?Optional random seed for reproducibility.
Returns
- WeightedSampler<double>
A weighted sampler configured for class balancing.
Remarks
For Beginners: Use this for imbalanced datasets where some classes have many fewer examples than others. The sampler will select minority class examples more often to balance the training.
Example:
// If you have 1000 samples of class A and 100 samples of class B,
// this sampler will pick class B samples ~10x more often
var sampler = Samplers.Balanced(labels, numClasses: 2);
Curriculum(IEnumerable<double>, int, CurriculumStrategy, int?)
Creates a curriculum learning sampler that starts with easy samples.
public static CurriculumSampler<double> Curriculum(IEnumerable<double> difficulties, int totalEpochs = 100, CurriculumStrategy strategy = CurriculumStrategy.Linear, int? seed = null)
Parameters
difficultiesIEnumerable<double>Difficulty score for each sample (0 = easiest, 1 = hardest).
totalEpochsintTotal number of epochs for curriculum completion. Default is 100.
strategyCurriculumStrategyThe curriculum progression strategy.
seedint?Optional random seed for reproducibility.
Returns
- CurriculumSampler<double>
A curriculum sampler.
Remarks
For Beginners: Curriculum learning trains on easy examples first, then gradually introduces harder ones. This often leads to better and faster learning.
Example:
// Difficulty scores: 0 = easy, 1 = hard
var difficulties = ComputeDifficultyScores(data);
var sampler = Samplers.Curriculum(difficulties); // Uses default 100 epochs
// In early epochs, mainly easy samples
// In later epochs, mix of easy and hard samples
Curriculum<T>(IEnumerable<T>, int, CurriculumStrategy, int?)
Creates a curriculum learning sampler that starts with easy samples.
public static CurriculumSampler<T> Curriculum<T>(IEnumerable<T> difficulties, int totalEpochs, CurriculumStrategy strategy = CurriculumStrategy.Linear, int? seed = null)
Parameters
difficultiesIEnumerable<T>Difficulty score for each sample (0 = easiest, 1 = hardest).
totalEpochsintTotal number of epochs for curriculum completion.
strategyCurriculumStrategyThe curriculum progression strategy.
seedint?Optional random seed for reproducibility.
Returns
- CurriculumSampler<T>
A curriculum sampler.
Type Parameters
TThe numeric type for difficulty scores.
Remarks
For Beginners: Curriculum learning trains on easy examples first, then gradually introduces harder ones. This often leads to better and faster learning.
Example:
// Difficulty scores: 0 = easy, 1 = hard
var difficulties = ComputeDifficultyScores(data);
var sampler = Samplers.Curriculum(difficulties, totalEpochs: 100);
// In early epochs, mainly easy samples
// In later epochs, mix of easy and hard samples
Importance(int, double, bool, int?)
Creates an importance sampler that prioritizes high-loss samples.
public static ImportanceSampler<double> Importance(int datasetSize, double smoothingFactor = 0.2, bool stabilize = true, int? seed = null)
Parameters
datasetSizeintThe total number of samples in the dataset.
smoothingFactordoubleSmoothing factor to prevent extreme sampling (0.1-0.5 recommended).
stabilizeboolWhether to clip extreme importance values.
seedint?Optional random seed for reproducibility.
Returns
- ImportanceSampler<double>
An importance sampler.
Remarks
For Beginners: Importance sampling focuses training on samples the model currently gets wrong (high loss). This can speed up training by focusing on hard examples. Call SetImportances() or UpdateImportance() after each batch to update importance scores.
Importance<T>(int, double, bool, int?)
Creates an importance sampler that prioritizes high-loss samples.
public static ImportanceSampler<T> Importance<T>(int datasetSize, double smoothingFactor = 0.2, bool stabilize = true, int? seed = null)
Parameters
datasetSizeintThe total number of samples in the dataset.
smoothingFactordoubleSmoothing factor to prevent extreme sampling (0.1-0.5 recommended).
stabilizeboolWhether to clip extreme importance values.
seedint?Optional random seed for reproducibility.
Returns
- ImportanceSampler<T>
An importance sampler.
Type Parameters
TThe numeric type for importance scores.
Remarks
For Beginners: Importance sampling focuses training on samples the model currently gets wrong (high loss). This can speed up training by focusing on hard examples. Call SetImportances() or UpdateImportance() after each batch to update importance scores.
Random(int, int?)
Creates a random sampler that shuffles data each epoch.
public static RandomSampler Random(int dataSize, int? seed = null)
Parameters
Returns
- RandomSampler
A random sampler.
Remarks
For Beginners: This is the default and most common sampler. It randomly shuffles your data each epoch, which helps the model generalize better.
SelfPaced(int, double, double, int, int?)
Creates a self-paced learning sampler with default parameters.
public static SelfPacedSampler<double> SelfPaced(int datasetSize, double initialLambda = 0.1, double lambdaGrowthRate = 0.1, int totalEpochs = 100, int? seed = null)
Parameters
datasetSizeintThe total number of samples in the dataset.
initialLambdadoubleStarting pace parameter (lower = stricter selection). Default is 0.1.
lambdaGrowthRatedoubleHow much lambda increases each epoch. Default is 0.1.
totalEpochsintseedint?Optional random seed for reproducibility.
Returns
- SelfPacedSampler<double>
A self-paced sampler.
Remarks
For Beginners: Like curriculum learning, but the difficulty is determined by the model's loss on each sample. Samples the model finds easy are included first. Call UpdateLoss() or UpdateLosses() after each batch to update sample losses.
SelfPaced<T>(int, T, T, int, int?)
Creates a self-paced learning sampler that adapts based on model performance.
public static SelfPacedSampler<T> SelfPaced<T>(int datasetSize, T initialLambda, T lambdaGrowthRate, int totalEpochs = 100, int? seed = null)
Parameters
datasetSizeintThe total number of samples in the dataset.
initialLambdaTStarting pace parameter (lower = stricter selection).
lambdaGrowthRateTHow much lambda increases each epoch.
totalEpochsintseedint?Optional random seed for reproducibility.
Returns
- SelfPacedSampler<T>
A self-paced sampler.
Type Parameters
TThe numeric type for losses.
Remarks
For Beginners: Like curriculum learning, but the difficulty is determined by the model's loss on each sample. Samples the model finds easy are included first. Call UpdateLoss() or UpdateLosses() after each batch to update sample losses.
Sequential(int)
Creates a sequential sampler that iterates through data in order.
public static SequentialSampler Sequential(int dataSize)
Parameters
dataSizeintThe total number of samples.
Returns
- SequentialSampler
A sequential sampler.
Remarks
For Beginners: Use this when you want to iterate through data in the same order every time. Useful for validation/testing or when order matters.
Stratified(IReadOnlyList<int>, int, int?)
Creates a stratified sampler that maintains class proportions in each batch.
public static StratifiedSampler Stratified(IReadOnlyList<int> labels, int numClasses, int? seed = null)
Parameters
labelsIReadOnlyList<int>The class labels for each sample.
numClassesintThe number of classes.
seedint?Optional random seed for reproducibility.
Returns
- StratifiedSampler
A stratified sampler.
Remarks
For Beginners: Use this when you want each batch to have the same proportion of classes as your full dataset. This helps prevent batches that are all one class.
Subset(IEnumerable<int>, bool, int?)
Creates a subset sampler that samples from specific indices.
public static SubsetSampler Subset(IEnumerable<int> indices, bool shuffle = false, int? seed = null)
Parameters
indicesIEnumerable<int>The indices to sample from.
shuffleboolWhether to shuffle the subset indices.
seedint?Optional random seed for reproducibility.
Returns
- SubsetSampler
A subset sampler.
Remarks
For Beginners: Use this when you only want to train on a portion of your data, or when you've pre-computed a specific sampling order.
Weighted<T>(IEnumerable<T>, int, bool, int?)
Creates a weighted sampler that samples based on per-sample weights.
public static WeightedSampler<T> Weighted<T>(IEnumerable<T> weights, int numSamples, bool replacement = true, int? seed = null)
Parameters
weightsIEnumerable<T>The weight for each sample (higher = more likely to be sampled).
numSamplesintNumber of samples to draw per epoch.
replacementboolWhether to sample with replacement. Default is true.
seedint?Optional random seed for reproducibility.
Returns
- WeightedSampler<T>
A weighted sampler.
Type Parameters
TThe numeric type for weights.
Remarks
For Beginners: Use this when some samples are more important than others. Higher weights make a sample more likely to be selected.