Interface IDataSampler
- Namespace
- AiDotNet.Interfaces
- Assembly
- AiDotNet.dll
Defines the contract for sampling indices from a dataset during batch iteration.
public interface IDataSampler
Remarks
Data samplers control how samples are selected for each epoch of training. Different sampling strategies can improve training convergence and handle imbalanced datasets.
For Beginners: A sampler decides which data points to include in each batch and in what order. The default is random sampling, but you might want:
- Stratified sampling: Ensures each class is represented proportionally in every batch
- Weighted sampling: Gives more weight to underrepresented or important samples
- Curriculum learning: Starts with easy examples and gradually increases difficulty
Example usage:
// Use weighted sampling to handle class imbalance
var sampler = new WeightedSampler<float>(weights);
foreach (var batch in dataLoader.GetBatches(sampler: sampler))
{
model.TrainOnBatch(batch);
}
Properties
Length
Gets the total number of samples this sampler will produce per epoch.
int Length { get; }
Property Value
Remarks
This may differ from the dataset size for oversampling or undersampling strategies.
Methods
GetIndices()
Returns an enumerable of indices for one epoch of sampling.
IEnumerable<int> GetIndices()
Returns
- IEnumerable<int>
An enumerable of sample indices in the order they should be processed.
Remarks
Each call to this method starts a new epoch. The returned indices determine which samples are included and in what order.
For Beginners: This method provides the "shopping list" of data points to include in this round of training. The order matters for learning!
OnEpochStart(int)
Called at the start of each epoch to allow the sampler to adjust its behavior.
void OnEpochStart(int epoch)
Parameters
epochintThe current epoch number (0-based).
Remarks
This method allows samplers to implement epoch-dependent behavior such as: - Curriculum learning: adjusting difficulty thresholds as training progresses - Self-paced learning: updating sample inclusion thresholds - Active learning: refreshing uncertainty estimates
For Beginners: Some sampling strategies change over time. For example, curriculum learning starts with easy examples and gradually adds harder ones. This method tells the sampler "we're starting epoch N" so it can adjust accordingly.
SetSeed(int)
Sets the random seed for reproducible sampling.
void SetSeed(int seed)
Parameters
seedintThe random seed value.
Remarks
Setting a seed ensures the same sampling order is produced each time, which is important for reproducibility and debugging.