Table of Contents

Interface IBatchIterable<TBatch>

Namespace
AiDotNet.Interfaces
Assembly
AiDotNet.dll

Defines capability to iterate through data in batches.

public interface IBatchIterable<TBatch>

Type Parameters

TBatch

The type of batch returned by iteration.

Extension Methods

Remarks

Data loaders that implement this interface can provide data in batches, which is the standard way to process data during model training.

For Beginners: Instead of feeding your model one example at a time, batching groups multiple examples together. Training in batches is faster (more efficient GPU usage) and often leads to better learning (smoother gradients).

Properties

BatchSize

Gets or sets the number of samples per batch.

int BatchSize { get; set; }

Property Value

int

HasNext

Gets whether there are more batches available in the current iteration.

bool HasNext { get; }

Property Value

bool

Methods

GetBatches(int?, bool, bool, int?)

Iterates through all batches in the dataset using lazy evaluation.

IEnumerable<TBatch> GetBatches(int? batchSize = null, bool shuffle = true, bool dropLast = false, int? seed = null)

Parameters

batchSize int?

Optional batch size override. Uses default BatchSize if null.

shuffle bool

Whether to shuffle data before batching. Default is true.

dropLast bool

Whether to drop the last incomplete batch. Default is false.

seed int?

Optional random seed for reproducible shuffling.

Returns

IEnumerable<TBatch>

An enumerable sequence of batches using yield return for memory efficiency.

Remarks

This method provides a PyTorch-style iteration pattern using IEnumerable and yield return for memory-efficient lazy evaluation. Each call creates a fresh iteration, automatically handling reset and shuffle operations.

For Beginners: This is the recommended way to iterate through your data:

foreach (var (xBatch, yBatch) in dataLoader.GetBatches(batchSize: 32, shuffle: true))
{
    // Train on this batch
    model.TrainOnBatch(xBatch, yBatch);
}

Unlike GetNextBatch(), you don't need to call Reset() - each GetBatches() call starts fresh. The yield return pattern means batches are generated on-demand, not all loaded into memory at once.

GetBatchesAsync(int?, bool, bool, int?, int, CancellationToken)

Asynchronously iterates through all batches with prefetching support.

IAsyncEnumerable<TBatch> GetBatchesAsync(int? batchSize = null, bool shuffle = true, bool dropLast = false, int? seed = null, int prefetchCount = 2, CancellationToken cancellationToken = default)

Parameters

batchSize int?

Optional batch size override. Uses default BatchSize if null.

shuffle bool

Whether to shuffle data before batching. Default is true.

dropLast bool

Whether to drop the last incomplete batch. Default is false.

seed int?

Optional random seed for reproducible shuffling.

prefetchCount int

Number of batches to prefetch ahead. Default is 2.

cancellationToken CancellationToken

Token to cancel the iteration.

Returns

IAsyncEnumerable<TBatch>

An async enumerable sequence of batches.

Remarks

This method enables async batch iteration with configurable prefetching, similar to PyTorch's num_workers or TensorFlow's prefetch(). Batches are prepared in the background while the current batch is being processed.

For Beginners: Use this for large datasets or when batch preparation is slow:

await foreach (var (xBatch, yBatch) in dataLoader.GetBatchesAsync(prefetchCount: 2))
{
    // While training on this batch, the next 2 batches are being prepared
    await model.TrainOnBatchAsync(xBatch, yBatch);
}

Prefetching helps hide data loading latency, especially useful for:

  • Large images that need decoding
  • Data that requires preprocessing
  • Slow storage (network drives, cloud storage)

GetNextBatch()

Gets the next batch of data.

TBatch GetNextBatch()

Returns

TBatch

The next batch of data.

Exceptions

InvalidOperationException

Thrown when no more batches are available.

TryGetNextBatch(out TBatch)

Attempts to get the next batch without throwing if unavailable.

bool TryGetNextBatch(out TBatch batch)

Parameters

batch TBatch

The batch if available, default otherwise.

Returns

bool

True if a batch was available, false if iteration is complete.

Remarks

When false is returned, batch contains the default value for TBatch. Callers should check the return value before using batch.