Class MemoryMappedStreamingDataLoader<T, TInput, TOutput>

Namespace: AiDotNet.Data.Loaders

Assembly: AiDotNet.dll

A streaming data loader that uses memory-mapped files for efficient random access to large binary datasets.

public class MemoryMappedStreamingDataLoader<T, TInput, TOutput> : StreamingDataLoaderBase<T, TInput, TOutput>, IStreamingDataLoader<T, TInput, TOutput>, IDataLoader<T>, IResettable, ICountable, IDisposable

Type Parameters

T: The numeric type used for calculations.
TInput: The type of input data.
TOutput: The type of output/label data.

Inheritance: object

DataLoaderBase<T>

StreamingDataLoaderBase<T, TInput, TOutput>

MemoryMappedStreamingDataLoader<T, TInput, TOutput>

Implements: IStreamingDataLoader<T, TInput, TOutput>

IDataLoader<T>

IResettable

ICountable

IDisposable

Inherited Members: StreamingDataLoaderBase<T, TInput, TOutput>.SampleCount

StreamingDataLoaderBase<T, TInput, TOutput>.TotalCount

StreamingDataLoaderBase<T, TInput, TOutput>.PrefetchCount

StreamingDataLoaderBase<T, TInput, TOutput>.NumWorkers

StreamingDataLoaderBase<T, TInput, TOutput>.ReadSampleAsync(int, CancellationToken)

StreamingDataLoaderBase<T, TInput, TOutput>.AggregateSamples(IList<(TInput Input, TOutput Output)>)

StreamingDataLoaderBase<T, TInput, TOutput>.GetBatches(bool, bool, int?)

StreamingDataLoaderBase<T, TInput, TOutput>.GetBatchesAsync(bool, bool, int?, CancellationToken)

StreamingDataLoaderBase<T, TInput, TOutput>.GetShuffledIndices(bool, int?)

StreamingDataLoaderBase<T, TInput, TOutput>.LoadDataCoreAsync(CancellationToken)

StreamingDataLoaderBase<T, TInput, TOutput>.UnloadDataCore()

DataLoaderBase<T>.Description

DataLoaderBase<T>.IsLoaded

DataLoaderBase<T>.TotalCount

DataLoaderBase<T>.CurrentIndex

DataLoaderBase<T>.BatchSize

DataLoaderBase<T>.BatchCount

DataLoaderBase<T>.CurrentBatchIndex

DataLoaderBase<T>.Progress

DataLoaderBase<T>.Reset()

DataLoaderBase<T>.LoadAsync(CancellationToken)

DataLoaderBase<T>.Unload()

DataLoaderBase<T>.LoadDataCoreAsync(CancellationToken)

DataLoaderBase<T>.OnReset()

DataLoaderBase<T>.EnsureLoaded()

DataLoaderBase<T>.AdvanceIndex(int)

DataLoaderBase<T>.AdvanceBatchIndex()

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Extension Methods: DataPipelineExtensions.ToAsyncPipeline<T, TInput, TOutput>(IStreamingDataLoader<T, TInput, TOutput>, bool, int?)

DataPipelineExtensions.ToPipeline<T, TInput, TOutput>(IStreamingDataLoader<T, TInput, TOutput>, bool, int?)

DataPipelineExtensions.ToSamplePipeline<T, TInput, TOutput>(IStreamingDataLoader<T, TInput, TOutput>, bool, int?)

Remarks

MemoryMappedStreamingDataLoader uses MemoryMappedFile for efficient random access to large datasets stored in binary format. The operating system handles paging data in and out of physical memory as needed, enabling efficient access to datasets larger than available RAM.

File Format Requirements:

Binary file with fixed-size samples
Each sample is inputSizeBytes + outputSizeBytes bytes
Samples are stored contiguously with optional header

For Beginners: Memory-mapped files let the operating system manage which parts of a large file are in memory. When you access a sample, the OS automatically loads that portion of the file into RAM. This is very efficient for random access patterns like shuffled batch iteration on datasets too large to fit in memory.

Example:

// Create a memory-mapped loader for binary image data
var loader = new MemoryMappedStreamingDataLoader<float, float[], int>(
    filePath: "images.bin",
    sampleCount: 60000,
    inputSizeBytes: 784 * sizeof(float),   // 28x28 image
    outputSizeBytes: sizeof(int),           // Label
    inputDeserializer: (bytes) => {
        var floats = new float[784];
        for (int i = 0; i < 784; i++)
            floats[i] = BitConverter.ToSingle(bytes, i * 4);
        return floats;
    },
    outputDeserializer: (bytes) => BitConverter.ToInt32(bytes, 0),
    batchSize: 32
);

await foreach (var batch in loader.GetBatchesAsync())
{
    await model.TrainOnBatchAsync(batch.Inputs, batch.Outputs);
}

Constructors

MemoryMappedStreamingDataLoader(string, int, int, int, Func<byte[], TInput>, Func<byte[], TOutput>, int, long, int, int)

Initializes a new instance of the MemoryMappedStreamingDataLoader class.

public MemoryMappedStreamingDataLoader(string filePath, int sampleCount, int inputSizeBytes, int outputSizeBytes, Func<byte[], TInput> inputDeserializer, Func<byte[], TOutput> outputDeserializer, int batchSize, long headerSizeBytes = 0, int prefetchCount = 2, int numWorkers = 4)

Parameters

filePath string: Path to the binary data file.
sampleCount int: Total number of samples in the dataset.
inputSizeBytes int: Size of input data per sample in bytes.
outputSizeBytes int: Size of output/label data per sample in bytes.
inputDeserializer Func<byte[], TInput>: Function to deserialize input bytes to TInput.
outputDeserializer Func<byte[], TOutput>: Function to deserialize output bytes to TOutput.
batchSize int: Number of samples per batch.
headerSizeBytes long: Size of file header to skip in bytes. Default is 0.
prefetchCount int: Number of batches to prefetch. Default is 2.
numWorkers int: Number of parallel workers. Default is 4.

Exceptions

ArgumentNullException: Thrown when filePath or deserializers are null.
ArgumentOutOfRangeException: Thrown when sizes are invalid.
FileNotFoundException: Thrown when the file does not exist.

Properties

HeaderSizeBytes

Gets the size of the file header in bytes.

public long HeaderSizeBytes { get; }

Property Value

long

Name

Gets the human-readable name of this data loader.

public override string Name { get; }

Property Value

string

Remarks

Examples: "MNIST", "Cora Citation Network", "IMDB Reviews"

SampleCount

Gets the total number of samples in the dataset.

public override int SampleCount { get; }

Property Value

int

Remarks

This may be known upfront (e.g., from file metadata) or estimated. For truly streaming sources where the count is unknown, this may return -1.

SampleSizeBytes

Gets the size of each sample in bytes (input + output).

public int SampleSizeBytes { get; }

Property Value

int

Methods

Dispose()

Releases all resources used by the memory-mapped data loader.

public void Dispose()

Dispose(bool)

Releases the unmanaged resources and optionally releases the managed resources.

protected void Dispose(bool disposing)

Parameters

disposing bool: True to release both managed and unmanaged resources.

~MemoryMappedStreamingDataLoader()

Finalizer to ensure resources are released.

protected ~MemoryMappedStreamingDataLoader()

ReadSampleAsync(int, CancellationToken)

Reads a single sample by index.

protected override Task<(TInput Input, TOutput Output)> ReadSampleAsync(int index, CancellationToken cancellationToken = default)

Parameters

index int: The index of the sample to read.
cancellationToken CancellationToken: Cancellation token.

Returns

Task<(TInput Input, TOutput Output)>: A tuple containing the input and output for the sample.

Remarks

Derived classes must implement this to read a single sample from the data source. This method is called by the batching infrastructure to build batches.

UnloadDataCore()

Core data unloading implementation to be provided by derived classes.

protected override void UnloadDataCore()

Remarks

Derived classes should implement this to release resources: - Clear internal data structures - Release file handles or connections - Allow garbage collection of loaded data

Table of Contents

Class MemoryMappedStreamingDataLoader<T, TInput, TOutput>

Type Parameters

Remarks

Constructors

MemoryMappedStreamingDataLoader(string, int, int, int, Func<byte[], TInput>, Func<byte[], TOutput>, int, long, int, int)

Parameters

Exceptions

Properties

HeaderSizeBytes

Property Value

Name

Property Value

Remarks

SampleCount

Property Value

Remarks

SampleSizeBytes

Property Value

Methods

Dispose()

Dispose(bool)

Parameters

~MemoryMappedStreamingDataLoader()

ReadSampleAsync(int, CancellationToken)

Parameters

Returns

Remarks

UnloadDataCore()

Remarks