Table of Contents

Class FileStreamingDataLoader<T, TInput, TOutput>

Namespace
AiDotNet.Data.Loaders
Assembly
AiDotNet.dll

A streaming data loader that reads from files in a directory.

public class FileStreamingDataLoader<T, TInput, TOutput> : StreamingDataLoaderBase<T, TInput, TOutput>, IStreamingDataLoader<T, TInput, TOutput>, IDataLoader<T>, IResettable, ICountable

Type Parameters

T

The numeric type used for calculations.

TInput

The type of input data.

TOutput

The type of output/label data.

Inheritance
StreamingDataLoaderBase<T, TInput, TOutput>
FileStreamingDataLoader<T, TInput, TOutput>
Implements
IStreamingDataLoader<T, TInput, TOutput>
Inherited Members
Extension Methods

Remarks

FileStreamingDataLoader automatically discovers files in a directory and streams them during training. This is ideal for image datasets where each file is a sample.

For Beginners: If you have a folder full of images with labels in the filename or a separate label file, this loader will read them one by one as needed.

Example:

var loader = new FileStreamingDataLoader<float, float[], int>(
    directory: "path/to/images",
    filePattern: "*.png",
    fileProcessor: async (filePath, ct) =>
    {
        var pixels = await LoadImagePixels(filePath, ct);
        var label = ExtractLabelFromPath(filePath);
        return (pixels, label);
    },
    batchSize: 32
);

Constructors

FileStreamingDataLoader(string, string, Func<string, CancellationToken, Task<(TInput, TOutput)>>, int, SearchOption, int, int)

Initializes a new instance of the FileStreamingDataLoader class.

public FileStreamingDataLoader(string directory, string filePattern, Func<string, CancellationToken, Task<(TInput, TOutput)>> fileProcessor, int batchSize, SearchOption searchOption = SearchOption.TopDirectoryOnly, int prefetchCount = 2, int numWorkers = 4)

Parameters

directory string

The directory containing the data files.

filePattern string

The file pattern to match (e.g., "*.png").

fileProcessor Func<string, CancellationToken, Task<(TInput, TOutput)>>

Function that processes a file and returns (input, output).

batchSize int

Number of samples per batch.

searchOption SearchOption

Whether to search subdirectories.

prefetchCount int

Number of batches to prefetch.

numWorkers int

Number of parallel workers.

Properties

FilePaths

Gets all file paths in the dataset.

public IReadOnlyList<string> FilePaths { get; }

Property Value

IReadOnlyList<string>

Name

Gets the human-readable name of this data loader.

public override string Name { get; }

Property Value

string

Remarks

Examples: "MNIST", "Cora Citation Network", "IMDB Reviews"

SampleCount

Gets the total number of samples in the dataset.

public override int SampleCount { get; }

Property Value

int

Remarks

This may be known upfront (e.g., from file metadata) or estimated. For truly streaming sources where the count is unknown, this may return -1.

Methods

ReadSampleAsync(int, CancellationToken)

Reads a single sample by index.

protected override Task<(TInput Input, TOutput Output)> ReadSampleAsync(int index, CancellationToken cancellationToken = default)

Parameters

index int

The index of the sample to read.

cancellationToken CancellationToken

Cancellation token.

Returns

Task<(TInput Input, TOutput Output)>

A tuple containing the input and output for the sample.

Remarks

Derived classes must implement this to read a single sample from the data source. This method is called by the batching infrastructure to build batches.