Table of Contents

Class MemoryMappedStreamingDataLoader<T, TInput, TOutput>

Namespace
AiDotNet.Data.Loaders
Assembly
AiDotNet.dll

A streaming data loader that uses memory-mapped files for efficient random access to large binary datasets.

public class MemoryMappedStreamingDataLoader<T, TInput, TOutput> : StreamingDataLoaderBase<T, TInput, TOutput>, IStreamingDataLoader<T, TInput, TOutput>, IDataLoader<T>, IResettable, ICountable, IDisposable

Type Parameters

T

The numeric type used for calculations.

TInput

The type of input data.

TOutput

The type of output/label data.

Inheritance
StreamingDataLoaderBase<T, TInput, TOutput>
MemoryMappedStreamingDataLoader<T, TInput, TOutput>
Implements
IStreamingDataLoader<T, TInput, TOutput>
Inherited Members
Extension Methods

Remarks

MemoryMappedStreamingDataLoader uses MemoryMappedFile for efficient random access to large datasets stored in binary format. The operating system handles paging data in and out of physical memory as needed, enabling efficient access to datasets larger than available RAM.

File Format Requirements:

  • Binary file with fixed-size samples
  • Each sample is inputSizeBytes + outputSizeBytes bytes
  • Samples are stored contiguously with optional header

For Beginners: Memory-mapped files let the operating system manage which parts of a large file are in memory. When you access a sample, the OS automatically loads that portion of the file into RAM. This is very efficient for random access patterns like shuffled batch iteration on datasets too large to fit in memory.

Example:

// Create a memory-mapped loader for binary image data
var loader = new MemoryMappedStreamingDataLoader<float, float[], int>(
    filePath: "images.bin",
    sampleCount: 60000,
    inputSizeBytes: 784 * sizeof(float),   // 28x28 image
    outputSizeBytes: sizeof(int),           // Label
    inputDeserializer: (bytes) => {
        var floats = new float[784];
        for (int i = 0; i < 784; i++)
            floats[i] = BitConverter.ToSingle(bytes, i * 4);
        return floats;
    },
    outputDeserializer: (bytes) => BitConverter.ToInt32(bytes, 0),
    batchSize: 32
);

await foreach (var batch in loader.GetBatchesAsync())
{
    await model.TrainOnBatchAsync(batch.Inputs, batch.Outputs);
}

Constructors

MemoryMappedStreamingDataLoader(string, int, int, int, Func<byte[], TInput>, Func<byte[], TOutput>, int, long, int, int)

Initializes a new instance of the MemoryMappedStreamingDataLoader class.

public MemoryMappedStreamingDataLoader(string filePath, int sampleCount, int inputSizeBytes, int outputSizeBytes, Func<byte[], TInput> inputDeserializer, Func<byte[], TOutput> outputDeserializer, int batchSize, long headerSizeBytes = 0, int prefetchCount = 2, int numWorkers = 4)

Parameters

filePath string

Path to the binary data file.

sampleCount int

Total number of samples in the dataset.

inputSizeBytes int

Size of input data per sample in bytes.

outputSizeBytes int

Size of output/label data per sample in bytes.

inputDeserializer Func<byte[], TInput>

Function to deserialize input bytes to TInput.

outputDeserializer Func<byte[], TOutput>

Function to deserialize output bytes to TOutput.

batchSize int

Number of samples per batch.

headerSizeBytes long

Size of file header to skip in bytes. Default is 0.

prefetchCount int

Number of batches to prefetch. Default is 2.

numWorkers int

Number of parallel workers. Default is 4.

Exceptions

ArgumentNullException

Thrown when filePath or deserializers are null.

ArgumentOutOfRangeException

Thrown when sizes are invalid.

FileNotFoundException

Thrown when the file does not exist.

Properties

HeaderSizeBytes

Gets the size of the file header in bytes.

public long HeaderSizeBytes { get; }

Property Value

long

Name

Gets the human-readable name of this data loader.

public override string Name { get; }

Property Value

string

Remarks

Examples: "MNIST", "Cora Citation Network", "IMDB Reviews"

SampleCount

Gets the total number of samples in the dataset.

public override int SampleCount { get; }

Property Value

int

Remarks

This may be known upfront (e.g., from file metadata) or estimated. For truly streaming sources where the count is unknown, this may return -1.

SampleSizeBytes

Gets the size of each sample in bytes (input + output).

public int SampleSizeBytes { get; }

Property Value

int

Methods

Dispose()

Releases all resources used by the memory-mapped data loader.

public void Dispose()

Dispose(bool)

Releases the unmanaged resources and optionally releases the managed resources.

protected void Dispose(bool disposing)

Parameters

disposing bool

True to release both managed and unmanaged resources.

~MemoryMappedStreamingDataLoader()

Finalizer to ensure resources are released.

protected ~MemoryMappedStreamingDataLoader()

ReadSampleAsync(int, CancellationToken)

Reads a single sample by index.

protected override Task<(TInput Input, TOutput Output)> ReadSampleAsync(int index, CancellationToken cancellationToken = default)

Parameters

index int

The index of the sample to read.

cancellationToken CancellationToken

Cancellation token.

Returns

Task<(TInput Input, TOutput Output)>

A tuple containing the input and output for the sample.

Remarks

Derived classes must implement this to read a single sample from the data source. This method is called by the batching infrastructure to build batches.

UnloadDataCore()

Core data unloading implementation to be provided by derived classes.

protected override void UnloadDataCore()

Remarks

Derived classes should implement this to release resources: - Clear internal data structures - Release file handles or connections - Allow garbage collection of loaded data