Table of Contents

Interface IDataSampler

Namespace
AiDotNet.Interfaces
Assembly
AiDotNet.dll

Defines the contract for sampling indices from a dataset during batch iteration.

public interface IDataSampler

Remarks

Data samplers control how samples are selected for each epoch of training. Different sampling strategies can improve training convergence and handle imbalanced datasets.

For Beginners: A sampler decides which data points to include in each batch and in what order. The default is random sampling, but you might want:

  • Stratified sampling: Ensures each class is represented proportionally in every batch
  • Weighted sampling: Gives more weight to underrepresented or important samples
  • Curriculum learning: Starts with easy examples and gradually increases difficulty

Example usage:

// Use weighted sampling to handle class imbalance
var sampler = new WeightedSampler<float>(weights);
foreach (var batch in dataLoader.GetBatches(sampler: sampler))
{
    model.TrainOnBatch(batch);
}

Properties

Length

Gets the total number of samples this sampler will produce per epoch.

int Length { get; }

Property Value

int

Remarks

This may differ from the dataset size for oversampling or undersampling strategies.

Methods

GetIndices()

Returns an enumerable of indices for one epoch of sampling.

IEnumerable<int> GetIndices()

Returns

IEnumerable<int>

An enumerable of sample indices in the order they should be processed.

Remarks

Each call to this method starts a new epoch. The returned indices determine which samples are included and in what order.

For Beginners: This method provides the "shopping list" of data points to include in this round of training. The order matters for learning!

OnEpochStart(int)

Called at the start of each epoch to allow the sampler to adjust its behavior.

void OnEpochStart(int epoch)

Parameters

epoch int

The current epoch number (0-based).

Remarks

This method allows samplers to implement epoch-dependent behavior such as: - Curriculum learning: adjusting difficulty thresholds as training progresses - Self-paced learning: updating sample inclusion thresholds - Active learning: refreshing uncertainty estimates

For Beginners: Some sampling strategies change over time. For example, curriculum learning starts with easy examples and gradually adds harder ones. This method tells the sampler "we're starting epoch N" so it can adjust accordingly.

SetSeed(int)

Sets the random seed for reproducible sampling.

void SetSeed(int seed)

Parameters

seed int

The random seed value.

Remarks

Setting a seed ensures the same sampling order is produced each time, which is important for reproducibility and debugging.