Class InMemoryDataLoader<T, TInput, TOutput>

Namespace: AiDotNet.Data.Loaders

Assembly: AiDotNet.dll

A simple in-memory data loader for supervised learning data.

public class InMemoryDataLoader<T, TInput, TOutput> : InputOutputDataLoaderBase<T, TInput, TOutput>, IInputOutputDataLoader<T, TInput, TOutput>, IDataLoader<T>, IResettable, ICountable, IBatchIterable<(TInput Features, TOutput Labels)>, IShuffleable

Type Parameters

T: The numeric type used for calculations, typically float or double.
TInput: The input data type (e.g., Matrix<T>, Tensor<T>).
TOutput: The output data type (e.g., Vector<T>, Tensor<T>).

Inheritance: object

DataLoaderBase<T>

InputOutputDataLoaderBase<T, TInput, TOutput>

InMemoryDataLoader<T, TInput, TOutput>

Implements: IInputOutputDataLoader<T, TInput, TOutput>

IDataLoader<T>

IResettable

ICountable

IBatchIterable<(TInput Features, TOutput Labels)>

IShuffleable

Inherited Members: InputOutputDataLoaderBase<T, TInput, TOutput>.NumOps

InputOutputDataLoaderBase<T, TInput, TOutput>.LoadedFeatures

InputOutputDataLoaderBase<T, TInput, TOutput>.LoadedLabels

InputOutputDataLoaderBase<T, TInput, TOutput>.Indices

InputOutputDataLoaderBase<T, TInput, TOutput>.Features

InputOutputDataLoaderBase<T, TInput, TOutput>.Labels

InputOutputDataLoaderBase<T, TInput, TOutput>.FeatureCount

InputOutputDataLoaderBase<T, TInput, TOutput>.OutputDimension

InputOutputDataLoaderBase<T, TInput, TOutput>.BatchSize

InputOutputDataLoaderBase<T, TInput, TOutput>.HasNext

InputOutputDataLoaderBase<T, TInput, TOutput>.IsShuffled

InputOutputDataLoaderBase<T, TInput, TOutput>.GetNextBatch()

InputOutputDataLoaderBase<T, TInput, TOutput>.TryGetNextBatch(out (TInput Features, TOutput Labels))

InputOutputDataLoaderBase<T, TInput, TOutput>.Shuffle(int?)

InputOutputDataLoaderBase<T, TInput, TOutput>.Unshuffle()

InputOutputDataLoaderBase<T, TInput, TOutput>.Split(double, double, int?)

InputOutputDataLoaderBase<T, TInput, TOutput>.OnReset()

InputOutputDataLoaderBase<T, TInput, TOutput>.InitializeIndices(int)

InputOutputDataLoaderBase<T, TInput, TOutput>.ExtractBatch(int[])

InputOutputDataLoaderBase<T, TInput, TOutput>.ValidateSplitRatios(double, double)

InputOutputDataLoaderBase<T, TInput, TOutput>.ComputeSplitSizes(int, double, double)

InputOutputDataLoaderBase<T, TInput, TOutput>.GetBatches(int?, bool, bool, int?)

InputOutputDataLoaderBase<T, TInput, TOutput>.GetBatchesAsync(int?, bool, bool, int?, int, CancellationToken)

DataLoaderBase<T>.IsLoaded

DataLoaderBase<T>.CurrentIndex

DataLoaderBase<T>.BatchSize

DataLoaderBase<T>.BatchCount

DataLoaderBase<T>.CurrentBatchIndex

DataLoaderBase<T>.Progress

DataLoaderBase<T>.Reset()

DataLoaderBase<T>.LoadAsync(CancellationToken)

DataLoaderBase<T>.Unload()

DataLoaderBase<T>.OnReset()

DataLoaderBase<T>.EnsureLoaded()

DataLoaderBase<T>.AdvanceIndex(int)

DataLoaderBase<T>.AdvanceBatchIndex()

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Extension Methods: ParallelBatchLoaderExtensions.WithParallelLoading<TBatch>(IBatchIterable<TBatch>, Func<int[], TBatch>, int, int?, int?)

DataPipelineExtensions.ToPipeline<TBatch>(IBatchIterable<TBatch>)

DataLoaderExtensions.CreateBatchesAsync<TBatch>(IBatchIterable<TBatch>, int?, int)

DataLoaderExtensions.CreateBatches<TBatch>(IBatchIterable<TBatch>, int?)

Remarks

InMemoryDataLoader is the simplest way to create a data loader from existing data. It's ideal for: - Small to medium datasets that fit in memory - Quick prototyping and testing - Converting raw arrays or matrices to the IDataLoader interface

For Beginners: This is the easiest data loader to use. Simply pass your feature data (X) and label data (Y) to the constructor, and you're ready to train!

Example:

// Create feature matrix and label vector
var features = new Matrix<double>(100, 5);  // 100 samples, 5 features
var labels = new Vector<double>(100);       // 100 labels

// Create the loader
var loader = new InMemoryDataLoader<double, Matrix<double>, Vector<double>>(features, labels);

// Use with AiModelBuilder
var result = await builder
    .ConfigureDataLoader(loader)
    .ConfigureModel(model)
    .BuildAsync();

Constructors

InMemoryDataLoader(TInput, TOutput)

Creates a new in-memory data loader with the specified features and labels.

public InMemoryDataLoader(TInput features, TOutput labels)

Parameters

features TInput: The input features (X data).
labels TOutput: The output labels (Y data).

Remarks

For Beginners: The features are your input data - the information you use to make predictions. The labels are the correct answers you're trying to predict.

For example, if you're predicting house prices:

Features: Square footage, number of bedrooms, location (as numbers)
Labels: The actual house prices

Both must have the same number of samples (rows).

Exceptions

ArgumentNullException: Thrown when features or labels is null.
ArgumentException: Thrown when the sample counts don't match.

Properties

Description

Gets a description of the dataset and its intended use.

public override string Description { get; }

Property Value

string

FeatureCount

Gets the number of features per sample.

public override int FeatureCount { get; }

Property Value

int

Name

Gets the human-readable name of this data loader.

public override string Name { get; }

Property Value

string

Remarks

Examples: "MNIST", "Cora Citation Network", "IMDB Reviews"

OutputDimension

Gets the number of output dimensions (1 for regression/binary classification, N for multi-class with N classes).

public override int OutputDimension { get; }

Property Value

int

TotalCount

Gets the total number of samples in the dataset.

public override int TotalCount { get; }

Property Value

int

Methods

ExtractBatch(int[])

Extracts a batch of features and labels at the specified indices.

protected override (TInput Features, TOutput Labels) ExtractBatch(int[] indices)

Parameters

indices int[]: The indices of samples to extract.

Returns

(TInput Input, TOutput Output): A tuple containing the features and labels for the batch.

Remarks

Derived classes must implement this to extract data based on their specific TInput and TOutput types.

LoadDataCoreAsync(CancellationToken)

Core data loading implementation to be provided by derived classes.

protected override Task LoadDataCoreAsync(CancellationToken cancellationToken)

Parameters

cancellationToken CancellationToken: Cancellation token for async operation.

Returns

Task: A task that completes when loading is finished.

Remarks

Derived classes must implement this to perform actual data loading: - Load from files, databases, or remote sources - Parse and validate data format - Store in appropriate internal structures

Split(double, double, int?)

Creates a train/validation/test split of the data.

public override (IInputOutputDataLoader<T, TInput, TOutput> Train, IInputOutputDataLoader<T, TInput, TOutput> Validation, IInputOutputDataLoader<T, TInput, TOutput> Test) Split(double trainRatio = 0.7, double validationRatio = 0.15, int? seed = null)

Parameters

trainRatio double: Fraction of data for training (0.0 to 1.0).
validationRatio double: Fraction of data for validation (0.0 to 1.0).
seed int?: Optional random seed for reproducible splits.

Returns

(IInputOutputDataLoader<T, TInput, TOutput> Train, IInputOutputDataLoader<T, TInput, TOutput> Validation, IInputOutputDataLoader<T, TInput, TOutput> Test): A tuple containing three data loaders: (train, validation, test).

Remarks

The test ratio is implicitly 1 - trainRatio - validationRatio.

For Beginners: Splitting data is crucial for evaluating your model: - **Training set**: Data the model learns from - **Validation set**: Data used to tune hyperparameters and prevent overfitting - **Test set**: Data used only once at the end to get an unbiased performance estimate

Common splits are 60/20/20 or 70/15/15 (train/validation/test).

UnloadDataCore()

Core data unloading implementation to be provided by derived classes.

protected override void UnloadDataCore()

Remarks

Derived classes should implement this to release resources: - Clear internal data structures - Release file handles or connections - Allow garbage collection of loaded data

Table of Contents

Class InMemoryDataLoader<T, TInput, TOutput>

Type Parameters

Remarks

Constructors

InMemoryDataLoader(TInput, TOutput)

Parameters

Remarks

Exceptions

Properties

Description

Property Value

FeatureCount

Property Value

Name

Property Value

Remarks

OutputDimension

Property Value

TotalCount

Property Value

Methods

ExtractBatch(int[])

Parameters

Returns

Remarks

LoadDataCoreAsync(CancellationToken)

Parameters

Returns

Remarks

Split(double, double, int?)

Parameters

Returns

Remarks

UnloadDataCore()

Remarks