Table of Contents

Class CsvStreamingDataLoader<T, TInput, TOutput>

Namespace
AiDotNet.Data.Loaders
Assembly
AiDotNet.dll

A streaming data loader that reads from a CSV file line by line.

public class CsvStreamingDataLoader<T, TInput, TOutput> : StreamingDataLoaderBase<T, TInput, TOutput>, IStreamingDataLoader<T, TInput, TOutput>, IDataLoader<T>, IResettable, ICountable

Type Parameters

T

The numeric type used for calculations.

TInput

The type of input data.

TOutput

The type of output/label data.

Inheritance
StreamingDataLoaderBase<T, TInput, TOutput>
CsvStreamingDataLoader<T, TInput, TOutput>
Implements
IStreamingDataLoader<T, TInput, TOutput>
Inherited Members
Extension Methods

Remarks

CsvStreamingDataLoader reads a CSV file line by line without loading the entire file into memory. This is ideal for large tabular datasets.

For Beginners: If you have a large CSV file (gigabytes of data), this loader will read it row by row as needed during training.

Example:

var loader = new CsvStreamingDataLoader<float, float[], float>(
    filePath: "large_dataset.csv",
    lineParser: (line, lineNumber) =>
    {
        var parts = line.Split(',');
        var features = parts.Take(10).Select(float.Parse).ToArray();
        var label = float.Parse(parts[10]);
        return (features, label);
    },
    batchSize: 256,
    hasHeader: true
);

Constructors

CsvStreamingDataLoader(string, Func<string, int, (TInput, TOutput)>, int, bool, int, int)

Initializes a new instance of the CsvStreamingDataLoader class.

public CsvStreamingDataLoader(string filePath, Func<string, int, (TInput, TOutput)> lineParser, int batchSize, bool hasHeader = true, int prefetchCount = 2, int numWorkers = 4)

Parameters

filePath string

Path to the CSV file.

lineParser Func<string, int, (TInput, TOutput)>

Function that parses a line into (input, output).

batchSize int

Number of samples per batch.

hasHeader bool

Whether the CSV has a header row to skip.

prefetchCount int

Number of batches to prefetch.

numWorkers int

Number of parallel workers.

Properties

Name

Gets the human-readable name of this data loader.

public override string Name { get; }

Property Value

string

Remarks

Examples: "MNIST", "Cora Citation Network", "IMDB Reviews"

SampleCount

Gets the total number of samples in the dataset.

public override int SampleCount { get; }

Property Value

int

Remarks

This may be known upfront (e.g., from file metadata) or estimated. For truly streaming sources where the count is unknown, this may return -1.

Methods

GetSequentialBatches(int?, bool)

Iterates through the CSV file sequentially without loading all lines into memory.

public IEnumerable<(TInput[] Inputs, TOutput[] Outputs)> GetSequentialBatches(int? batchSize = null, bool dropLast = false)

Parameters

batchSize int?

Batch size to use.

dropLast bool

Whether to drop the last incomplete batch.

Returns

IEnumerable<(TInput[] Inputs, TOutput[] Outputs)>

An enumerable of batches.

Remarks

This method provides true streaming iteration without caching all lines. Use this when memory is constrained and you don't need shuffling.

ReadSampleAsync(int, CancellationToken)

Reads a single sample by index.

protected override Task<(TInput Input, TOutput Output)> ReadSampleAsync(int index, CancellationToken cancellationToken = default)

Parameters

index int

The index of the sample to read.

cancellationToken CancellationToken

Cancellation token.

Returns

Task<(TInput Input, TOutput Output)>

A tuple containing the input and output for the sample.

Remarks

Derived classes must implement this to read a single sample from the data source. This method is called by the batching infrastructure to build batches.

UnloadDataCore()

Core data unloading implementation to be provided by derived classes.

protected override void UnloadDataCore()

Remarks

Derived classes should implement this to release resources: - Clear internal data structures - Release file handles or connections - Allow garbage collection of loaded data