Table of Contents

Class InMemoryDataLoader<T, TInput, TOutput>

Namespace
AiDotNet.Data.Loaders
Assembly
AiDotNet.dll

A simple in-memory data loader for supervised learning data.

public class InMemoryDataLoader<T, TInput, TOutput> : InputOutputDataLoaderBase<T, TInput, TOutput>, IInputOutputDataLoader<T, TInput, TOutput>, IDataLoader<T>, IResettable, ICountable, IBatchIterable<(TInput Features, TOutput Labels)>, IShuffleable

Type Parameters

T

The numeric type used for calculations, typically float or double.

TInput

The input data type (e.g., Matrix<T>, Tensor<T>).

TOutput

The output data type (e.g., Vector<T>, Tensor<T>).

Inheritance
InputOutputDataLoaderBase<T, TInput, TOutput>
InMemoryDataLoader<T, TInput, TOutput>
Implements
IInputOutputDataLoader<T, TInput, TOutput>
IBatchIterable<(TInput Features, TOutput Labels)>
Inherited Members
Extension Methods

Remarks

InMemoryDataLoader is the simplest way to create a data loader from existing data. It's ideal for: - Small to medium datasets that fit in memory - Quick prototyping and testing - Converting raw arrays or matrices to the IDataLoader interface

For Beginners: This is the easiest data loader to use. Simply pass your feature data (X) and label data (Y) to the constructor, and you're ready to train!

Example:

// Create feature matrix and label vector
var features = new Matrix<double>(100, 5);  // 100 samples, 5 features
var labels = new Vector<double>(100);       // 100 labels

// Create the loader
var loader = new InMemoryDataLoader<double, Matrix<double>, Vector<double>>(features, labels);

// Use with AiModelBuilder
var result = await builder
    .ConfigureDataLoader(loader)
    .ConfigureModel(model)
    .BuildAsync();

Constructors

InMemoryDataLoader(TInput, TOutput)

Creates a new in-memory data loader with the specified features and labels.

public InMemoryDataLoader(TInput features, TOutput labels)

Parameters

features TInput

The input features (X data).

labels TOutput

The output labels (Y data).

Remarks

For Beginners: The features are your input data - the information you use to make predictions. The labels are the correct answers you're trying to predict.

For example, if you're predicting house prices:

  • Features: Square footage, number of bedrooms, location (as numbers)
  • Labels: The actual house prices

Both must have the same number of samples (rows).

Exceptions

ArgumentNullException

Thrown when features or labels is null.

ArgumentException

Thrown when the sample counts don't match.

Properties

Description

Gets a description of the dataset and its intended use.

public override string Description { get; }

Property Value

string

FeatureCount

Gets the number of features per sample.

public override int FeatureCount { get; }

Property Value

int

Name

Gets the human-readable name of this data loader.

public override string Name { get; }

Property Value

string

Remarks

Examples: "MNIST", "Cora Citation Network", "IMDB Reviews"

OutputDimension

Gets the number of output dimensions (1 for regression/binary classification, N for multi-class with N classes).

public override int OutputDimension { get; }

Property Value

int

TotalCount

Gets the total number of samples in the dataset.

public override int TotalCount { get; }

Property Value

int

Methods

ExtractBatch(int[])

Extracts a batch of features and labels at the specified indices.

protected override (TInput Features, TOutput Labels) ExtractBatch(int[] indices)

Parameters

indices int[]

The indices of samples to extract.

Returns

(TInput Input, TOutput Output)

A tuple containing the features and labels for the batch.

Remarks

Derived classes must implement this to extract data based on their specific TInput and TOutput types.

LoadDataCoreAsync(CancellationToken)

Core data loading implementation to be provided by derived classes.

protected override Task LoadDataCoreAsync(CancellationToken cancellationToken)

Parameters

cancellationToken CancellationToken

Cancellation token for async operation.

Returns

Task

A task that completes when loading is finished.

Remarks

Derived classes must implement this to perform actual data loading: - Load from files, databases, or remote sources - Parse and validate data format - Store in appropriate internal structures

Split(double, double, int?)

Creates a train/validation/test split of the data.

public override (IInputOutputDataLoader<T, TInput, TOutput> Train, IInputOutputDataLoader<T, TInput, TOutput> Validation, IInputOutputDataLoader<T, TInput, TOutput> Test) Split(double trainRatio = 0.7, double validationRatio = 0.15, int? seed = null)

Parameters

trainRatio double

Fraction of data for training (0.0 to 1.0).

validationRatio double

Fraction of data for validation (0.0 to 1.0).

seed int?

Optional random seed for reproducible splits.

Returns

(IInputOutputDataLoader<T, TInput, TOutput> Train, IInputOutputDataLoader<T, TInput, TOutput> Validation, IInputOutputDataLoader<T, TInput, TOutput> Test)

A tuple containing three data loaders: (train, validation, test).

Remarks

The test ratio is implicitly 1 - trainRatio - validationRatio.

For Beginners: Splitting data is crucial for evaluating your model: - **Training set**: Data the model learns from - **Validation set**: Data used to tune hyperparameters and prevent overfitting - **Test set**: Data used only once at the end to get an unbiased performance estimate

Common splits are 60/20/20 or 70/15/15 (train/validation/test).

UnloadDataCore()

Core data unloading implementation to be provided by derived classes.

protected override void UnloadDataCore()

Remarks

Derived classes should implement this to release resources: - Clear internal data structures - Release file handles or connections - Allow garbage collection of loaded data