Class InMemoryDataLoader<T, TInput, TOutput>
A simple in-memory data loader for supervised learning data.
public class InMemoryDataLoader<T, TInput, TOutput> : InputOutputDataLoaderBase<T, TInput, TOutput>, IInputOutputDataLoader<T, TInput, TOutput>, IDataLoader<T>, IResettable, ICountable, IBatchIterable<(TInput Features, TOutput Labels)>, IShuffleable
Type Parameters
TThe numeric type used for calculations, typically float or double.
TInputThe input data type (e.g., Matrix<T>, Tensor<T>).
TOutputThe output data type (e.g., Vector<T>, Tensor<T>).
- Inheritance
-
InputOutputDataLoaderBase<T, TInput, TOutput>InMemoryDataLoader<T, TInput, TOutput>
- Implements
-
IInputOutputDataLoader<T, TInput, TOutput>IDataLoader<T>
- Inherited Members
- Extension Methods
Remarks
InMemoryDataLoader is the simplest way to create a data loader from existing data. It's ideal for: - Small to medium datasets that fit in memory - Quick prototyping and testing - Converting raw arrays or matrices to the IDataLoader interface
For Beginners: This is the easiest data loader to use. Simply pass your feature data (X) and label data (Y) to the constructor, and you're ready to train!
Example:
// Create feature matrix and label vector
var features = new Matrix<double>(100, 5); // 100 samples, 5 features
var labels = new Vector<double>(100); // 100 labels
// Create the loader
var loader = new InMemoryDataLoader<double, Matrix<double>, Vector<double>>(features, labels);
// Use with AiModelBuilder
var result = await builder
.ConfigureDataLoader(loader)
.ConfigureModel(model)
.BuildAsync();
Constructors
InMemoryDataLoader(TInput, TOutput)
Creates a new in-memory data loader with the specified features and labels.
public InMemoryDataLoader(TInput features, TOutput labels)
Parameters
featuresTInputThe input features (X data).
labelsTOutputThe output labels (Y data).
Remarks
For Beginners: The features are your input data - the information you use to make predictions. The labels are the correct answers you're trying to predict.
For example, if you're predicting house prices:
- Features: Square footage, number of bedrooms, location (as numbers)
- Labels: The actual house prices
Both must have the same number of samples (rows).
Exceptions
- ArgumentNullException
Thrown when features or labels is null.
- ArgumentException
Thrown when the sample counts don't match.
Properties
Description
Gets a description of the dataset and its intended use.
public override string Description { get; }
Property Value
FeatureCount
Gets the number of features per sample.
public override int FeatureCount { get; }
Property Value
Name
Gets the human-readable name of this data loader.
public override string Name { get; }
Property Value
Remarks
Examples: "MNIST", "Cora Citation Network", "IMDB Reviews"
OutputDimension
Gets the number of output dimensions (1 for regression/binary classification, N for multi-class with N classes).
public override int OutputDimension { get; }
Property Value
TotalCount
Gets the total number of samples in the dataset.
public override int TotalCount { get; }
Property Value
Methods
ExtractBatch(int[])
Extracts a batch of features and labels at the specified indices.
protected override (TInput Features, TOutput Labels) ExtractBatch(int[] indices)
Parameters
indicesint[]The indices of samples to extract.
Returns
Remarks
Derived classes must implement this to extract data based on their specific TInput and TOutput types.
LoadDataCoreAsync(CancellationToken)
Core data loading implementation to be provided by derived classes.
protected override Task LoadDataCoreAsync(CancellationToken cancellationToken)
Parameters
cancellationTokenCancellationTokenCancellation token for async operation.
Returns
- Task
A task that completes when loading is finished.
Remarks
Derived classes must implement this to perform actual data loading: - Load from files, databases, or remote sources - Parse and validate data format - Store in appropriate internal structures
Split(double, double, int?)
Creates a train/validation/test split of the data.
public override (IInputOutputDataLoader<T, TInput, TOutput> Train, IInputOutputDataLoader<T, TInput, TOutput> Validation, IInputOutputDataLoader<T, TInput, TOutput> Test) Split(double trainRatio = 0.7, double validationRatio = 0.15, int? seed = null)
Parameters
trainRatiodoubleFraction of data for training (0.0 to 1.0).
validationRatiodoubleFraction of data for validation (0.0 to 1.0).
seedint?Optional random seed for reproducible splits.
Returns
- (IInputOutputDataLoader<T, TInput, TOutput> Train, IInputOutputDataLoader<T, TInput, TOutput> Validation, IInputOutputDataLoader<T, TInput, TOutput> Test)
A tuple containing three data loaders: (train, validation, test).
Remarks
The test ratio is implicitly 1 - trainRatio - validationRatio.
For Beginners: Splitting data is crucial for evaluating your model: - **Training set**: Data the model learns from - **Validation set**: Data used to tune hyperparameters and prevent overfitting - **Test set**: Data used only once at the end to get an unbiased performance estimate
Common splits are 60/20/20 or 70/15/15 (train/validation/test).
UnloadDataCore()
Core data unloading implementation to be provided by derived classes.
protected override void UnloadDataCore()
Remarks
Derived classes should implement this to release resources: - Clear internal data structures - Release file handles or connections - Allow garbage collection of loaded data