Class DataLoaders
Static factory class for creating data loaders with beginner-friendly methods.
public static class DataLoaders
- Inheritance
-
DataLoaders
- Inherited Members
Remarks
DataLoaders provides the easiest way to create data loaders for common scenarios. It follows a factory pattern with static methods that handle type inference and common configurations automatically.
For Beginners: This is your starting point for loading data into AiDotNet! Choose the method that matches your data format:
Common Patterns:
// From arrays (simplest for small datasets)
var loader = DataLoaders.FromArrays(features, labels);
// From Matrix and Vector (most common for ML)
var loader = DataLoaders.FromMatrixVector(featureMatrix, labelVector);
// From Tensors (for deep learning)
var loader = DataLoaders.FromTensors(inputTensor, outputTensor);
All loaders support:
- Batching:
loader.BatchSize = 32; - Shuffling:
loader.Shuffle(); - Splitting:
var (train, val, test) = loader.Split();
Methods
Empty<T>()
Creates an empty data loader placeholder (useful for meta-learning or custom scenarios).
public static InMemoryDataLoader<T, Matrix<T>, Vector<T>> Empty<T>()
Returns
- InMemoryDataLoader<T, Matrix<T>, Vector<T>>
A data loader with empty data.
Type Parameters
TThe numeric type (float, double, etc.).
Remarks
For Beginners: You typically won't need this method. It's used for advanced scenarios where data is loaded dynamically or for meta-learning tasks that don't use traditional supervised learning data.
FromArrays<T>(T[,], T[])
Creates a data loader from 2D feature array and 1D label array.
public static InMemoryDataLoader<T, Matrix<T>, Vector<T>> FromArrays<T>(T[,] features, T[] labels)
Parameters
featuresT[,]2D array where rows are samples and columns are features.
labelsT[]1D array of labels, one per sample.
Returns
- InMemoryDataLoader<T, Matrix<T>, Vector<T>>
A configured InMemoryDataLoader ready for training.
Type Parameters
TThe numeric type (float, double, etc.).
Remarks
For Beginners: This is the simplest way to load tabular data.
Example - Predicting House Prices:
// Features: [sqft, bedrooms, bathrooms]
double[,] features = new double[,] {
{ 1500, 3, 2 },
{ 2000, 4, 3 },
{ 1200, 2, 1 }
};
// Labels: price
double[] labels = { 300000, 450000, 250000 };
var loader = DataLoaders.FromArrays(features, labels);
Exceptions
- ArgumentNullException
Thrown when features or labels is null.
- ArgumentException
Thrown when dimensions don't match.
FromArrays<T>(T[], T[])
Creates a data loader from 1D feature array (single feature) and 1D label array.
public static InMemoryDataLoader<T, Matrix<T>, Vector<T>> FromArrays<T>(T[] features, T[] labels)
Parameters
featuresT[]1D array of single feature values.
labelsT[]1D array of labels, one per sample.
Returns
- InMemoryDataLoader<T, Matrix<T>, Vector<T>>
A configured InMemoryDataLoader ready for training.
Type Parameters
TThe numeric type (float, double, etc.).
Remarks
For Beginners: Use this for simple regression with one input variable.
Example - Simple Linear Regression:
// X: study hours
double[] features = { 1, 2, 3, 4, 5 };
// Y: test scores
double[] labels = { 50, 60, 70, 80, 90 };
var loader = DataLoaders.FromArrays(features, labels);
Exceptions
- ArgumentNullException
Thrown when features or labels is null.
- ArgumentException
Thrown when lengths don't match.
FromArrays<T>(T[][], T[])
Creates a data loader from jagged feature array and 1D label array.
public static InMemoryDataLoader<T, Matrix<T>, Vector<T>> FromArrays<T>(T[][] features, T[] labels)
Parameters
featuresT[][]Jagged array where each inner array is a sample's features.
labelsT[]1D array of labels, one per sample.
Returns
- InMemoryDataLoader<T, Matrix<T>, Vector<T>>
A configured InMemoryDataLoader ready for training.
Type Parameters
TThe numeric type (float, double, etc.).
Remarks
For Beginners: Use this when your data is in jagged array format.
Example:
double[][] features = {
new[] { 1.0, 2.0, 3.0 },
new[] { 4.0, 5.0, 6.0 },
new[] { 7.0, 8.0, 9.0 }
};
double[] labels = { 0, 1, 0 };
var loader = DataLoaders.FromArrays(features, labels);
Exceptions
- ArgumentNullException
Thrown when features or labels is null.
- ArgumentException
Thrown when dimensions don't match or arrays are inconsistent.
FromCsv<T, TInput, TOutput>(string, Func<string, int, (TInput, TOutput)>, int, bool, int)
Creates a streaming data loader from a CSV file.
public static CsvStreamingDataLoader<T, TInput, TOutput> FromCsv<T, TInput, TOutput>(string filePath, Func<string, int, (TInput, TOutput)> lineParser, int batchSize, bool hasHeader = true, int prefetchCount = 2)
Parameters
filePathstringPath to the CSV file.
lineParserFunc<string, int, (TInput, TOutput)>Function that parses a CSV line into (input, output).
batchSizeintNumber of samples per batch.
hasHeaderboolWhether the CSV has a header row to skip. Default is true.
prefetchCountintNumber of batches to prefetch. Default is 2.
Returns
- CsvStreamingDataLoader<T, TInput, TOutput>
A CSV streaming data loader.
Type Parameters
TThe numeric type (float, double, etc.).
TInputThe input data type for each row.
TOutputThe output/label data type for each row.
Remarks
For Beginners: Use this for large CSV files that don't fit in memory. The file is read line by line during training.
Example - Large Tabular Dataset:
var loader = DataLoaders.FromCsv<double, double[], double>(
filePath: "data/huge_dataset.csv",
lineParser: (line, lineNumber) =>
{
var parts = line.Split(',');
var features = parts.Take(10).Select(double.Parse).ToArray();
var label = double.Parse(parts[10]);
return (features, label);
},
batchSize: 256,
hasHeader: true
);
FromDirectory<T, TInput, TOutput>(string, string, Func<string, CancellationToken, Task<(TInput, TOutput)>>, int, SearchOption, int, int)
Creates a streaming data loader from a directory of files.
public static FileStreamingDataLoader<T, TInput, TOutput> FromDirectory<T, TInput, TOutput>(string directory, string filePattern, Func<string, CancellationToken, Task<(TInput, TOutput)>> fileProcessor, int batchSize, SearchOption searchOption = SearchOption.TopDirectoryOnly, int prefetchCount = 2, int numWorkers = 4)
Parameters
directorystringThe directory containing data files.
filePatternstringThe file pattern to match (e.g., ".png", ".csv").
fileProcessorFunc<string, CancellationToken, Task<(TInput, TOutput)>>Function that processes a file and returns (input, output).
batchSizeintNumber of samples per batch.
searchOptionSearchOptionWhether to search subdirectories. Default is TopDirectoryOnly.
prefetchCountintNumber of batches to prefetch. Default is 2.
numWorkersintNumber of parallel workers. Default is 4.
Returns
- FileStreamingDataLoader<T, TInput, TOutput>
A file streaming data loader.
Type Parameters
TThe numeric type (float, double, etc.).
TInputThe input data type for each sample.
TOutputThe output/label data type for each sample.
Remarks
For Beginners: Use this when you have a folder of data files (images, audio, etc.) that you want to stream during training.
Example - Image Dataset:
var loader = DataLoaders.FromDirectory<float, float[], int>(
directory: "data/images",
filePattern: "*.png",
fileProcessor: async (filePath, ct) =>
{
var pixels = await LoadImagePixelsAsync(filePath, ct);
var label = ParseLabelFromFilename(filePath);
return (pixels, label);
},
batchSize: 64
);
FromLeafFederatedJsonFiles<T>(string, string?, LeafFederatedDatasetLoadOptions?)
Creates a LEAF federated data loader from LEAF benchmark JSON files.
public static LeafFederatedDataLoader<T> FromLeafFederatedJsonFiles<T>(string trainFilePath, string? testFilePath = null, LeafFederatedDatasetLoadOptions? options = null)
Parameters
trainFilePathstringPath to the LEAF train split JSON file.
testFilePathstringOptional path to the LEAF test split JSON file.
optionsLeafFederatedDatasetLoadOptionsOptional LEAF load options (subset, validation).
Returns
- LeafFederatedDataLoader<T>
A configured LEAF data loader ready for federated learning.
Type Parameters
TThe numeric type (float, double, etc.).
Remarks
For Beginners: LEAF is a standard federated learning benchmark suite where each "user" is treated as one client. This loader keeps that per-client split intact so federated learning simulations match the benchmark.
FromMatrices<T>(Matrix<T>, Matrix<T>)
Creates a data loader from a feature Matrix and label Matrix (for multi-output regression).
public static InMemoryDataLoader<T, Matrix<T>, Matrix<T>> FromMatrices<T>(Matrix<T> features, Matrix<T> labels)
Parameters
featuresMatrix<T>Matrix where rows are samples and columns are features.
labelsMatrix<T>Matrix where rows are samples and columns are output dimensions.
Returns
- InMemoryDataLoader<T, Matrix<T>, Matrix<T>>
A configured InMemoryDataLoader ready for training.
Type Parameters
TThe numeric type (float, double, etc.).
Remarks
For Beginners: Use this when predicting multiple outputs simultaneously.
Example - Predicting Multiple Properties:
// Input: molecule features
var features = new Matrix<double>(100, 10);
// Output: multiple properties (e.g., toxicity, solubility, binding affinity)
var labels = new Matrix<double>(100, 3);
var loader = DataLoaders.FromMatrices(features, labels);
Exceptions
- ArgumentNullException
Thrown when features or labels is null.
- ArgumentException
Thrown when row counts don't match.
FromMatrixVector<T>(Matrix<T>, Vector<T>)
Creates a data loader from a feature Matrix and label Vector.
public static InMemoryDataLoader<T, Matrix<T>, Vector<T>> FromMatrixVector<T>(Matrix<T> features, Vector<T> labels)
Parameters
featuresMatrix<T>Matrix where rows are samples and columns are features.
labelsVector<T>Vector of labels, one per sample.
Returns
- InMemoryDataLoader<T, Matrix<T>, Vector<T>>
A configured InMemoryDataLoader ready for training.
Type Parameters
TThe numeric type (float, double, etc.).
Remarks
For Beginners: This is the most common format for machine learning. Use this when you already have Matrix and Vector objects.
Example:
var features = new Matrix<double>(100, 5); // 100 samples, 5 features
var labels = new Vector<double>(100); // 100 labels
// Fill your data...
var loader = DataLoaders.FromMatrixVector(features, labels);
// Use with AiModelBuilder
var result = await builder
.ConfigureDataLoader(loader)
.ConfigureModel(model)
.BuildAsync();
Exceptions
- ArgumentNullException
Thrown when features or labels is null.
- ArgumentException
Thrown when row count doesn't match label count.
FromTensorVector<T>(Tensor<T>, Vector<T>)
Creates a data loader from a Tensor of features and a Vector of labels.
public static InMemoryDataLoader<T, Tensor<T>, Vector<T>> FromTensorVector<T>(Tensor<T> features, Vector<T> labels)
Parameters
featuresTensor<T>Input tensor where first dimension is batch/samples.
labelsVector<T>Vector of labels, one per sample.
Returns
- InMemoryDataLoader<T, Tensor<T>, Vector<T>>
A configured InMemoryDataLoader ready for training.
Type Parameters
TThe numeric type (float, double, etc.).
Remarks
For Beginners: Common pattern for classification with complex inputs.
Example - Image Classification with Class Labels:
// Input: images as tensor
var features = new Tensor<float>([1000, 28, 28, 1]);
// Output: class indices (0-9)
var labels = new Vector<float>(1000); // Contains values 0-9
var loader = DataLoaders.FromTensorVector(features, labels);
Exceptions
- ArgumentNullException
Thrown when features or labels is null.
- ArgumentException
Thrown when sample counts don't match.
FromTensors<T>(Tensor<T>, Tensor<T>)
Creates a data loader from input and output Tensors.
public static InMemoryDataLoader<T, Tensor<T>, Tensor<T>> FromTensors<T>(Tensor<T> features, Tensor<T> labels)
Parameters
featuresTensor<T>Input tensor where first dimension is batch/samples.
labelsTensor<T>Output tensor where first dimension is batch/samples.
Returns
- InMemoryDataLoader<T, Tensor<T>, Tensor<T>>
A configured InMemoryDataLoader ready for training.
Type Parameters
TThe numeric type (float, double, etc.).
Remarks
For Beginners: Use tensors for deep learning with multi-dimensional data.
Example - Image Classification:
// Input: 1000 images, 28x28 pixels, 1 channel (grayscale)
var features = new Tensor<float>([1000, 28, 28, 1]);
// Output: 1000 labels, 10 classes (one-hot encoded)
var labels = new Tensor<float>([1000, 10]);
var loader = DataLoaders.FromTensors(features, labels);
Example - Sequence Data:
// Input: 500 sequences, 100 time steps, 32 features per step
var features = new Tensor<double>([500, 100, 32]);
// Output: 500 predictions
var labels = new Tensor<double>([500, 1]);
var loader = DataLoaders.FromTensors(features, labels);
Exceptions
- ArgumentNullException
Thrown when features or labels is null.
- ArgumentException
Thrown when sample counts don't match.
ModelNet40Classification<T>(ModelNet40ClassificationDataLoaderOptions?)
Creates a ModelNet40 classification data loader.
public static ModelNet40ClassificationDataLoader<T> ModelNet40Classification<T>(ModelNet40ClassificationDataLoaderOptions? options = null)
Parameters
Returns
Type Parameters
T
ScanNetSemanticSegmentation<T>(ScanNetSemanticSegmentationDataLoaderOptions?)
Creates a ScanNet semantic segmentation data loader.
public static ScanNetSemanticSegmentationDataLoader<T> ScanNetSemanticSegmentation<T>(ScanNetSemanticSegmentationDataLoaderOptions? options = null)
Parameters
Returns
Type Parameters
T
ShapeNetCorePartSegmentation<T>(ShapeNetCorePartSegmentationDataLoaderOptions?)
Creates a ShapeNetCore part segmentation data loader.
public static ShapeNetCorePartSegmentationDataLoader<T> ShapeNetCorePartSegmentation<T>(ShapeNetCorePartSegmentationDataLoaderOptions? options = null)
Parameters
Returns
Type Parameters
T
Streaming<T, TInput, TOutput>(int, Func<int, CancellationToken, Task<(TInput, TOutput)>>, int, int, int)
Creates a streaming data loader that reads samples on-demand.
public static StreamingDataLoader<T, TInput, TOutput> Streaming<T, TInput, TOutput>(int sampleCount, Func<int, CancellationToken, Task<(TInput, TOutput)>> sampleReader, int batchSize, int prefetchCount = 2, int numWorkers = 4)
Parameters
sampleCountintTotal number of samples in the dataset.
sampleReaderFunc<int, CancellationToken, Task<(TInput, TOutput)>>Async function that reads a single sample by index.
batchSizeintNumber of samples per batch.
prefetchCountintNumber of batches to prefetch. Default is 2.
numWorkersintNumber of parallel workers. Default is 4.
Returns
- StreamingDataLoader<T, TInput, TOutput>
A streaming data loader.
Type Parameters
TThe numeric type (float, double, etc.).
TInputThe input data type for each sample.
TOutputThe output/label data type for each sample.
Remarks
For Beginners: Use this when your dataset is too large to fit in memory. The sampleReader function is called on-demand to load individual samples.
Example - Loading Images:
var loader = DataLoaders.Streaming<float, float[], int>(
sampleCount: 1000000,
sampleReader: async (index, ct) =>
{
var image = await LoadImageAsync($"images/{index}.png", ct);
var label = GetLabel(index);
return (image, label);
},
batchSize: 32
);
await foreach (var batch in loader.GetBatchesAsync())
{
await model.TrainOnBatchAsync(batch.Inputs, batch.Outputs);
}
WithBatchSize<T>(Matrix<T>, Vector<T>, int)
Creates a data loader with pre-configured batch size.
public static InMemoryDataLoader<T, Matrix<T>, Vector<T>> WithBatchSize<T>(Matrix<T> features, Vector<T> labels, int batchSize)
Parameters
featuresMatrix<T>Matrix where rows are samples and columns are features.
labelsVector<T>Vector of labels, one per sample.
batchSizeintThe batch size for iteration.
Returns
- InMemoryDataLoader<T, Matrix<T>, Vector<T>>
A configured InMemoryDataLoader with the specified batch size.
Type Parameters
TThe numeric type (float, double, etc.).
Remarks
For Beginners: Batch size determines how many samples are processed together. Common values: - 32: Good default for most cases - 16-64: Standard range for GPU training - 1: Stochastic gradient descent (slowest but most updates) - Full dataset: Batch gradient descent (fewer updates but more stable)
Example:
var loader = DataLoaders.WithBatchSize(features, labels, batchSize: 64);