Interface IBatchIterable<TBatch>
- Namespace
- AiDotNet.Interfaces
- Assembly
- AiDotNet.dll
Defines capability to iterate through data in batches.
public interface IBatchIterable<TBatch>
Type Parameters
TBatchThe type of batch returned by iteration.
- Extension Methods
Remarks
Data loaders that implement this interface can provide data in batches, which is the standard way to process data during model training.
For Beginners: Instead of feeding your model one example at a time, batching groups multiple examples together. Training in batches is faster (more efficient GPU usage) and often leads to better learning (smoother gradients).
Properties
BatchSize
Gets or sets the number of samples per batch.
int BatchSize { get; set; }
Property Value
HasNext
Gets whether there are more batches available in the current iteration.
bool HasNext { get; }
Property Value
Methods
GetBatches(int?, bool, bool, int?)
Iterates through all batches in the dataset using lazy evaluation.
IEnumerable<TBatch> GetBatches(int? batchSize = null, bool shuffle = true, bool dropLast = false, int? seed = null)
Parameters
batchSizeint?Optional batch size override. Uses default BatchSize if null.
shuffleboolWhether to shuffle data before batching. Default is true.
dropLastboolWhether to drop the last incomplete batch. Default is false.
seedint?Optional random seed for reproducible shuffling.
Returns
- IEnumerable<TBatch>
An enumerable sequence of batches using yield return for memory efficiency.
Remarks
This method provides a PyTorch-style iteration pattern using IEnumerable and yield return for memory-efficient lazy evaluation. Each call creates a fresh iteration, automatically handling reset and shuffle operations.
For Beginners: This is the recommended way to iterate through your data:
foreach (var (xBatch, yBatch) in dataLoader.GetBatches(batchSize: 32, shuffle: true))
{
// Train on this batch
model.TrainOnBatch(xBatch, yBatch);
}
Unlike GetNextBatch(), you don't need to call Reset() - each GetBatches() call starts fresh. The yield return pattern means batches are generated on-demand, not all loaded into memory at once.
GetBatchesAsync(int?, bool, bool, int?, int, CancellationToken)
Asynchronously iterates through all batches with prefetching support.
IAsyncEnumerable<TBatch> GetBatchesAsync(int? batchSize = null, bool shuffle = true, bool dropLast = false, int? seed = null, int prefetchCount = 2, CancellationToken cancellationToken = default)
Parameters
batchSizeint?Optional batch size override. Uses default BatchSize if null.
shuffleboolWhether to shuffle data before batching. Default is true.
dropLastboolWhether to drop the last incomplete batch. Default is false.
seedint?Optional random seed for reproducible shuffling.
prefetchCountintNumber of batches to prefetch ahead. Default is 2.
cancellationTokenCancellationTokenToken to cancel the iteration.
Returns
- IAsyncEnumerable<TBatch>
An async enumerable sequence of batches.
Remarks
This method enables async batch iteration with configurable prefetching, similar to PyTorch's num_workers or TensorFlow's prefetch(). Batches are prepared in the background while the current batch is being processed.
For Beginners: Use this for large datasets or when batch preparation is slow:
await foreach (var (xBatch, yBatch) in dataLoader.GetBatchesAsync(prefetchCount: 2))
{
// While training on this batch, the next 2 batches are being prepared
await model.TrainOnBatchAsync(xBatch, yBatch);
}
Prefetching helps hide data loading latency, especially useful for:
- Large images that need decoding
- Data that requires preprocessing
- Slow storage (network drives, cloud storage)
GetNextBatch()
Gets the next batch of data.
TBatch GetNextBatch()
Returns
- TBatch
The next batch of data.
Exceptions
- InvalidOperationException
Thrown when no more batches are available.
TryGetNextBatch(out TBatch)
Attempts to get the next batch without throwing if unavailable.
bool TryGetNextBatch(out TBatch batch)
Parameters
batchTBatchThe batch if available, default otherwise.
Returns
- bool
True if a batch was available, false if iteration is complete.
Remarks
When false is returned, batch contains the default value for TBatch. Callers should check the return value before using batch.