Table of Contents

Class StreamingDataLoader<T, TInput, TOutput>

Namespace
AiDotNet.Data.Loaders
Assembly
AiDotNet.dll

A data loader that streams data from disk or other sources without loading all data into memory.

public class StreamingDataLoader<T, TInput, TOutput> : StreamingDataLoaderBase<T, TInput, TOutput>, IStreamingDataLoader<T, TInput, TOutput>, IDataLoader<T>, IResettable, ICountable

Type Parameters

T

The numeric type used for calculations, typically float or double.

TInput

The type of input data.

TOutput

The type of output/label data.

Inheritance
StreamingDataLoaderBase<T, TInput, TOutput>
StreamingDataLoader<T, TInput, TOutput>
Implements
IStreamingDataLoader<T, TInput, TOutput>
Inherited Members
Extension Methods

Remarks

StreamingDataLoader is designed for datasets that don't fit in memory. Instead of loading all data upfront, it reads data on-demand from a source, processes it, and yields batches.

For Beginners: When your dataset is too large to fit in RAM (e.g., millions of images or text documents), you can't load it all at once. StreamingDataLoader solves this by reading data piece by piece as needed.

Example:

// Define how to read individual samples
var loader = new StreamingDataLoader<float, Tensor<float>, int>(
    sampleCount: 1000000,  // 1 million samples
    sampleReader: async (index, ct) =>
    {
        var image = await LoadImageFromDisk(index, ct);
        var label = await LoadLabelFromDisk(index, ct);
        return (image, label);
    },
    batchSize: 32
);

await foreach (var (inputs, labels) in loader.GetBatchesAsync())
{
    await model.TrainOnBatchAsync(inputs, labels);
}

Constructors

StreamingDataLoader(int, Func<int, CancellationToken, Task<(TInput, TOutput)>>, int, string?, int, int)

Initializes a new instance of the StreamingDataLoader class.

public StreamingDataLoader(int sampleCount, Func<int, CancellationToken, Task<(TInput, TOutput)>> sampleReader, int batchSize, string? name = null, int prefetchCount = 2, int numWorkers = 4)

Parameters

sampleCount int

Total number of samples in the dataset.

sampleReader Func<int, CancellationToken, Task<(TInput, TOutput)>>

Async function that reads a single sample by index.

batchSize int

Number of samples per batch.

name string

Optional name for the data loader.

prefetchCount int

Number of batches to prefetch. Default is 2.

numWorkers int

Number of parallel workers for sample loading. Default is 4.

Properties

Name

Gets the human-readable name of this data loader.

public override string Name { get; }

Property Value

string

Remarks

Examples: "MNIST", "Cora Citation Network", "IMDB Reviews"

SampleCount

Gets the total number of samples in the dataset.

public override int SampleCount { get; }

Property Value

int

Remarks

This may be known upfront (e.g., from file metadata) or estimated. For truly streaming sources where the count is unknown, this may return -1.

Methods

ReadSampleAsync(int, CancellationToken)

Reads a single sample by index.

protected override Task<(TInput Input, TOutput Output)> ReadSampleAsync(int index, CancellationToken cancellationToken = default)

Parameters

index int

The index of the sample to read.

cancellationToken CancellationToken

Cancellation token.

Returns

Task<(TInput Input, TOutput Output)>

A tuple containing the input and output for the sample.

Remarks

Derived classes must implement this to read a single sample from the data source. This method is called by the batching infrastructure to build batches.