Table of Contents

Class DataLoaders

Namespace
AiDotNet.Data.Loaders
Assembly
AiDotNet.dll

Static factory class for creating data loaders with beginner-friendly methods.

public static class DataLoaders
Inheritance
DataLoaders
Inherited Members

Remarks

DataLoaders provides the easiest way to create data loaders for common scenarios. It follows a factory pattern with static methods that handle type inference and common configurations automatically.

For Beginners: This is your starting point for loading data into AiDotNet! Choose the method that matches your data format:

Common Patterns:

// From arrays (simplest for small datasets)
var loader = DataLoaders.FromArrays(features, labels);

// From Matrix and Vector (most common for ML)
var loader = DataLoaders.FromMatrixVector(featureMatrix, labelVector);

// From Tensors (for deep learning)
var loader = DataLoaders.FromTensors(inputTensor, outputTensor);

All loaders support:

  • Batching: loader.BatchSize = 32;
  • Shuffling: loader.Shuffle();
  • Splitting: var (train, val, test) = loader.Split();

Methods

Empty<T>()

Creates an empty data loader placeholder (useful for meta-learning or custom scenarios).

public static InMemoryDataLoader<T, Matrix<T>, Vector<T>> Empty<T>()

Returns

InMemoryDataLoader<T, Matrix<T>, Vector<T>>

A data loader with empty data.

Type Parameters

T

The numeric type (float, double, etc.).

Remarks

For Beginners: You typically won't need this method. It's used for advanced scenarios where data is loaded dynamically or for meta-learning tasks that don't use traditional supervised learning data.

FromArrays<T>(T[,], T[])

Creates a data loader from 2D feature array and 1D label array.

public static InMemoryDataLoader<T, Matrix<T>, Vector<T>> FromArrays<T>(T[,] features, T[] labels)

Parameters

features T[,]

2D array where rows are samples and columns are features.

labels T[]

1D array of labels, one per sample.

Returns

InMemoryDataLoader<T, Matrix<T>, Vector<T>>

A configured InMemoryDataLoader ready for training.

Type Parameters

T

The numeric type (float, double, etc.).

Remarks

For Beginners: This is the simplest way to load tabular data.

Example - Predicting House Prices:

// Features: [sqft, bedrooms, bathrooms]
double[,] features = new double[,] {
    { 1500, 3, 2 },
    { 2000, 4, 3 },
    { 1200, 2, 1 }
};

// Labels: price
double[] labels = { 300000, 450000, 250000 };

var loader = DataLoaders.FromArrays(features, labels);

Exceptions

ArgumentNullException

Thrown when features or labels is null.

ArgumentException

Thrown when dimensions don't match.

FromArrays<T>(T[], T[])

Creates a data loader from 1D feature array (single feature) and 1D label array.

public static InMemoryDataLoader<T, Matrix<T>, Vector<T>> FromArrays<T>(T[] features, T[] labels)

Parameters

features T[]

1D array of single feature values.

labels T[]

1D array of labels, one per sample.

Returns

InMemoryDataLoader<T, Matrix<T>, Vector<T>>

A configured InMemoryDataLoader ready for training.

Type Parameters

T

The numeric type (float, double, etc.).

Remarks

For Beginners: Use this for simple regression with one input variable.

Example - Simple Linear Regression:

// X: study hours
double[] features = { 1, 2, 3, 4, 5 };
// Y: test scores
double[] labels = { 50, 60, 70, 80, 90 };

var loader = DataLoaders.FromArrays(features, labels);

Exceptions

ArgumentNullException

Thrown when features or labels is null.

ArgumentException

Thrown when lengths don't match.

FromArrays<T>(T[][], T[])

Creates a data loader from jagged feature array and 1D label array.

public static InMemoryDataLoader<T, Matrix<T>, Vector<T>> FromArrays<T>(T[][] features, T[] labels)

Parameters

features T[][]

Jagged array where each inner array is a sample's features.

labels T[]

1D array of labels, one per sample.

Returns

InMemoryDataLoader<T, Matrix<T>, Vector<T>>

A configured InMemoryDataLoader ready for training.

Type Parameters

T

The numeric type (float, double, etc.).

Remarks

For Beginners: Use this when your data is in jagged array format.

Example:

double[][] features = {
    new[] { 1.0, 2.0, 3.0 },
    new[] { 4.0, 5.0, 6.0 },
    new[] { 7.0, 8.0, 9.0 }
};
double[] labels = { 0, 1, 0 };

var loader = DataLoaders.FromArrays(features, labels);

Exceptions

ArgumentNullException

Thrown when features or labels is null.

ArgumentException

Thrown when dimensions don't match or arrays are inconsistent.

FromCsv<T, TInput, TOutput>(string, Func<string, int, (TInput, TOutput)>, int, bool, int)

Creates a streaming data loader from a CSV file.

public static CsvStreamingDataLoader<T, TInput, TOutput> FromCsv<T, TInput, TOutput>(string filePath, Func<string, int, (TInput, TOutput)> lineParser, int batchSize, bool hasHeader = true, int prefetchCount = 2)

Parameters

filePath string

Path to the CSV file.

lineParser Func<string, int, (TInput, TOutput)>

Function that parses a CSV line into (input, output).

batchSize int

Number of samples per batch.

hasHeader bool

Whether the CSV has a header row to skip. Default is true.

prefetchCount int

Number of batches to prefetch. Default is 2.

Returns

CsvStreamingDataLoader<T, TInput, TOutput>

A CSV streaming data loader.

Type Parameters

T

The numeric type (float, double, etc.).

TInput

The input data type for each row.

TOutput

The output/label data type for each row.

Remarks

For Beginners: Use this for large CSV files that don't fit in memory. The file is read line by line during training.

Example - Large Tabular Dataset:

var loader = DataLoaders.FromCsv<double, double[], double>(
    filePath: "data/huge_dataset.csv",
    lineParser: (line, lineNumber) =>
    {
        var parts = line.Split(',');
        var features = parts.Take(10).Select(double.Parse).ToArray();
        var label = double.Parse(parts[10]);
        return (features, label);
    },
    batchSize: 256,
    hasHeader: true
);

FromDirectory<T, TInput, TOutput>(string, string, Func<string, CancellationToken, Task<(TInput, TOutput)>>, int, SearchOption, int, int)

Creates a streaming data loader from a directory of files.

public static FileStreamingDataLoader<T, TInput, TOutput> FromDirectory<T, TInput, TOutput>(string directory, string filePattern, Func<string, CancellationToken, Task<(TInput, TOutput)>> fileProcessor, int batchSize, SearchOption searchOption = SearchOption.TopDirectoryOnly, int prefetchCount = 2, int numWorkers = 4)

Parameters

directory string

The directory containing data files.

filePattern string

The file pattern to match (e.g., ".png", ".csv").

fileProcessor Func<string, CancellationToken, Task<(TInput, TOutput)>>

Function that processes a file and returns (input, output).

batchSize int

Number of samples per batch.

searchOption SearchOption

Whether to search subdirectories. Default is TopDirectoryOnly.

prefetchCount int

Number of batches to prefetch. Default is 2.

numWorkers int

Number of parallel workers. Default is 4.

Returns

FileStreamingDataLoader<T, TInput, TOutput>

A file streaming data loader.

Type Parameters

T

The numeric type (float, double, etc.).

TInput

The input data type for each sample.

TOutput

The output/label data type for each sample.

Remarks

For Beginners: Use this when you have a folder of data files (images, audio, etc.) that you want to stream during training.

Example - Image Dataset:

var loader = DataLoaders.FromDirectory<float, float[], int>(
    directory: "data/images",
    filePattern: "*.png",
    fileProcessor: async (filePath, ct) =>
    {
        var pixels = await LoadImagePixelsAsync(filePath, ct);
        var label = ParseLabelFromFilename(filePath);
        return (pixels, label);
    },
    batchSize: 64
);

FromLeafFederatedJsonFiles<T>(string, string?, LeafFederatedDatasetLoadOptions?)

Creates a LEAF federated data loader from LEAF benchmark JSON files.

public static LeafFederatedDataLoader<T> FromLeafFederatedJsonFiles<T>(string trainFilePath, string? testFilePath = null, LeafFederatedDatasetLoadOptions? options = null)

Parameters

trainFilePath string

Path to the LEAF train split JSON file.

testFilePath string

Optional path to the LEAF test split JSON file.

options LeafFederatedDatasetLoadOptions

Optional LEAF load options (subset, validation).

Returns

LeafFederatedDataLoader<T>

A configured LEAF data loader ready for federated learning.

Type Parameters

T

The numeric type (float, double, etc.).

Remarks

For Beginners: LEAF is a standard federated learning benchmark suite where each "user" is treated as one client. This loader keeps that per-client split intact so federated learning simulations match the benchmark.

FromMatrices<T>(Matrix<T>, Matrix<T>)

Creates a data loader from a feature Matrix and label Matrix (for multi-output regression).

public static InMemoryDataLoader<T, Matrix<T>, Matrix<T>> FromMatrices<T>(Matrix<T> features, Matrix<T> labels)

Parameters

features Matrix<T>

Matrix where rows are samples and columns are features.

labels Matrix<T>

Matrix where rows are samples and columns are output dimensions.

Returns

InMemoryDataLoader<T, Matrix<T>, Matrix<T>>

A configured InMemoryDataLoader ready for training.

Type Parameters

T

The numeric type (float, double, etc.).

Remarks

For Beginners: Use this when predicting multiple outputs simultaneously.

Example - Predicting Multiple Properties:

// Input: molecule features
var features = new Matrix<double>(100, 10);

// Output: multiple properties (e.g., toxicity, solubility, binding affinity)
var labels = new Matrix<double>(100, 3);

var loader = DataLoaders.FromMatrices(features, labels);

Exceptions

ArgumentNullException

Thrown when features or labels is null.

ArgumentException

Thrown when row counts don't match.

FromMatrixVector<T>(Matrix<T>, Vector<T>)

Creates a data loader from a feature Matrix and label Vector.

public static InMemoryDataLoader<T, Matrix<T>, Vector<T>> FromMatrixVector<T>(Matrix<T> features, Vector<T> labels)

Parameters

features Matrix<T>

Matrix where rows are samples and columns are features.

labels Vector<T>

Vector of labels, one per sample.

Returns

InMemoryDataLoader<T, Matrix<T>, Vector<T>>

A configured InMemoryDataLoader ready for training.

Type Parameters

T

The numeric type (float, double, etc.).

Remarks

For Beginners: This is the most common format for machine learning. Use this when you already have Matrix and Vector objects.

Example:

var features = new Matrix<double>(100, 5);  // 100 samples, 5 features
var labels = new Vector<double>(100);       // 100 labels

// Fill your data...

var loader = DataLoaders.FromMatrixVector(features, labels);

// Use with AiModelBuilder
var result = await builder
    .ConfigureDataLoader(loader)
    .ConfigureModel(model)
    .BuildAsync();

Exceptions

ArgumentNullException

Thrown when features or labels is null.

ArgumentException

Thrown when row count doesn't match label count.

FromTensorVector<T>(Tensor<T>, Vector<T>)

Creates a data loader from a Tensor of features and a Vector of labels.

public static InMemoryDataLoader<T, Tensor<T>, Vector<T>> FromTensorVector<T>(Tensor<T> features, Vector<T> labels)

Parameters

features Tensor<T>

Input tensor where first dimension is batch/samples.

labels Vector<T>

Vector of labels, one per sample.

Returns

InMemoryDataLoader<T, Tensor<T>, Vector<T>>

A configured InMemoryDataLoader ready for training.

Type Parameters

T

The numeric type (float, double, etc.).

Remarks

For Beginners: Common pattern for classification with complex inputs.

Example - Image Classification with Class Labels:

// Input: images as tensor
var features = new Tensor<float>([1000, 28, 28, 1]);

// Output: class indices (0-9)
var labels = new Vector<float>(1000);  // Contains values 0-9

var loader = DataLoaders.FromTensorVector(features, labels);

Exceptions

ArgumentNullException

Thrown when features or labels is null.

ArgumentException

Thrown when sample counts don't match.

FromTensors<T>(Tensor<T>, Tensor<T>)

Creates a data loader from input and output Tensors.

public static InMemoryDataLoader<T, Tensor<T>, Tensor<T>> FromTensors<T>(Tensor<T> features, Tensor<T> labels)

Parameters

features Tensor<T>

Input tensor where first dimension is batch/samples.

labels Tensor<T>

Output tensor where first dimension is batch/samples.

Returns

InMemoryDataLoader<T, Tensor<T>, Tensor<T>>

A configured InMemoryDataLoader ready for training.

Type Parameters

T

The numeric type (float, double, etc.).

Remarks

For Beginners: Use tensors for deep learning with multi-dimensional data.

Example - Image Classification:

// Input: 1000 images, 28x28 pixels, 1 channel (grayscale)
var features = new Tensor<float>([1000, 28, 28, 1]);

// Output: 1000 labels, 10 classes (one-hot encoded)
var labels = new Tensor<float>([1000, 10]);

var loader = DataLoaders.FromTensors(features, labels);

Example - Sequence Data:

// Input: 500 sequences, 100 time steps, 32 features per step
var features = new Tensor<double>([500, 100, 32]);

// Output: 500 predictions
var labels = new Tensor<double>([500, 1]);

var loader = DataLoaders.FromTensors(features, labels);

Exceptions

ArgumentNullException

Thrown when features or labels is null.

ArgumentException

Thrown when sample counts don't match.

ModelNet40Classification<T>(ModelNet40ClassificationDataLoaderOptions?)

Creates a ModelNet40 classification data loader.

public static ModelNet40ClassificationDataLoader<T> ModelNet40Classification<T>(ModelNet40ClassificationDataLoaderOptions? options = null)

Parameters

options ModelNet40ClassificationDataLoaderOptions

Returns

ModelNet40ClassificationDataLoader<T>

Type Parameters

T

ScanNetSemanticSegmentation<T>(ScanNetSemanticSegmentationDataLoaderOptions?)

Creates a ScanNet semantic segmentation data loader.

public static ScanNetSemanticSegmentationDataLoader<T> ScanNetSemanticSegmentation<T>(ScanNetSemanticSegmentationDataLoaderOptions? options = null)

Parameters

options ScanNetSemanticSegmentationDataLoaderOptions

Returns

ScanNetSemanticSegmentationDataLoader<T>

Type Parameters

T

ShapeNetCorePartSegmentation<T>(ShapeNetCorePartSegmentationDataLoaderOptions?)

Creates a ShapeNetCore part segmentation data loader.

public static ShapeNetCorePartSegmentationDataLoader<T> ShapeNetCorePartSegmentation<T>(ShapeNetCorePartSegmentationDataLoaderOptions? options = null)

Parameters

options ShapeNetCorePartSegmentationDataLoaderOptions

Returns

ShapeNetCorePartSegmentationDataLoader<T>

Type Parameters

T

Streaming<T, TInput, TOutput>(int, Func<int, CancellationToken, Task<(TInput, TOutput)>>, int, int, int)

Creates a streaming data loader that reads samples on-demand.

public static StreamingDataLoader<T, TInput, TOutput> Streaming<T, TInput, TOutput>(int sampleCount, Func<int, CancellationToken, Task<(TInput, TOutput)>> sampleReader, int batchSize, int prefetchCount = 2, int numWorkers = 4)

Parameters

sampleCount int

Total number of samples in the dataset.

sampleReader Func<int, CancellationToken, Task<(TInput, TOutput)>>

Async function that reads a single sample by index.

batchSize int

Number of samples per batch.

prefetchCount int

Number of batches to prefetch. Default is 2.

numWorkers int

Number of parallel workers. Default is 4.

Returns

StreamingDataLoader<T, TInput, TOutput>

A streaming data loader.

Type Parameters

T

The numeric type (float, double, etc.).

TInput

The input data type for each sample.

TOutput

The output/label data type for each sample.

Remarks

For Beginners: Use this when your dataset is too large to fit in memory. The sampleReader function is called on-demand to load individual samples.

Example - Loading Images:

var loader = DataLoaders.Streaming<float, float[], int>(
    sampleCount: 1000000,
    sampleReader: async (index, ct) =>
    {
        var image = await LoadImageAsync($"images/{index}.png", ct);
        var label = GetLabel(index);
        return (image, label);
    },
    batchSize: 32
);

await foreach (var batch in loader.GetBatchesAsync())
{
    await model.TrainOnBatchAsync(batch.Inputs, batch.Outputs);
}

WithBatchSize<T>(Matrix<T>, Vector<T>, int)

Creates a data loader with pre-configured batch size.

public static InMemoryDataLoader<T, Matrix<T>, Vector<T>> WithBatchSize<T>(Matrix<T> features, Vector<T> labels, int batchSize)

Parameters

features Matrix<T>

Matrix where rows are samples and columns are features.

labels Vector<T>

Vector of labels, one per sample.

batchSize int

The batch size for iteration.

Returns

InMemoryDataLoader<T, Matrix<T>, Vector<T>>

A configured InMemoryDataLoader with the specified batch size.

Type Parameters

T

The numeric type (float, double, etc.).

Remarks

For Beginners: Batch size determines how many samples are processed together. Common values: - 32: Good default for most cases - 16-64: Standard range for GPU training - 1: Stochastic gradient descent (slowest but most updates) - Full dataset: Batch gradient descent (fewer updates but more stable)

Example:

var loader = DataLoaders.WithBatchSize(features, labels, batchSize: 64);