Interface IInputOutputDataLoader<T, TInput, TOutput>

Namespace: AiDotNet.Interfaces

Assembly: AiDotNet.dll

Interface for data loaders that provide standard input-output (X, Y) data for supervised learning.

public interface IInputOutputDataLoader<T, TInput, TOutput> : IDataLoader<T>, IResettable, ICountable, IBatchIterable<(TInput Features, TOutput Labels)>, IShuffleable

Type Parameters

T: The numeric type used for calculations, typically float or double.
TInput: The input data type (e.g., Matrix<T>, Tensor<T>).
TOutput: The output data type (e.g., Vector<T>, Tensor<T>).

Inherited Members: IDataLoader<T>.Name

IDataLoader<T>.Description

IDataLoader<T>.IsLoaded

IDataLoader<T>.LoadAsync(CancellationToken)

IDataLoader<T>.Unload()

IResettable.Reset()

ICountable.TotalCount

ICountable.CurrentIndex

ICountable.BatchCount

ICountable.CurrentBatchIndex

ICountable.Progress

IBatchIterable<(TInput Features, TOutput Labels)>.BatchSize

IBatchIterable<(TInput Features, TOutput Labels)>.HasNext

IBatchIterable<(TInput Features, TOutput Labels)>.GetNextBatch()

IBatchIterable<(TInput Features, TOutput Labels)>.TryGetNextBatch(out (TInput Features, TOutput Labels))

IBatchIterable<(TInput Features, TOutput Labels)>.GetBatches(int?, bool, bool, int?)

IBatchIterable<(TInput Features, TOutput Labels)>.GetBatchesAsync(int?, bool, bool, int?, int, CancellationToken)

IShuffleable.IsShuffled

IShuffleable.Shuffle(int?)

IShuffleable.Unshuffle()

Extension Methods: ParallelBatchLoaderExtensions.WithParallelLoading<TBatch>(IBatchIterable<TBatch>, Func<int[], TBatch>, int, int?, int?)

DataPipelineExtensions.ToPipeline<TBatch>(IBatchIterable<TBatch>)

DataLoaderExtensions.CreateBatchesAsync<TBatch>(IBatchIterable<TBatch>, int?, int)

DataLoaderExtensions.CreateBatches<TBatch>(IBatchIterable<TBatch>, int?)

Remarks

This interface is for standard supervised learning scenarios where you have: - Input features (X): The data used to make predictions - Output labels (Y): The correct answers the model should learn to predict

For Beginners: Most machine learning tasks fall into this pattern:

Example: House Price Prediction

X (inputs): Square footage, number of bedrooms, location, age
Y (outputs): The actual house price

Example: Email Spam Detection

X (inputs): Email text features (word counts, sender info, etc.)
Y (outputs): Label (spam=1, not spam=0)

The data loader loads this data from files, databases, or other sources and provides it in the format your model needs for training.

Properties

FeatureCount

Gets the number of features per sample.

int FeatureCount { get; }

Property Value

int

Features

Gets all input features as a single data structure.

TInput Features { get; }

Property Value

TInput

Remarks

This provides access to the complete feature set. For large datasets, prefer using batch iteration methods instead of loading everything at once.

Labels

Gets all output labels as a single data structure.

TOutput Labels { get; }

Property Value

TOutput

Remarks

This provides access to all labels. For large datasets, prefer using batch iteration methods instead of loading everything at once.

OutputDimension

Gets the number of output dimensions (1 for regression/binary classification, N for multi-class with N classes).

int OutputDimension { get; }

Property Value

int

Methods

Split(double, double, int?)

Creates a train/validation/test split of the data.

(IInputOutputDataLoader<T, TInput, TOutput> Train, IInputOutputDataLoader<T, TInput, TOutput> Validation, IInputOutputDataLoader<T, TInput, TOutput> Test) Split(double trainRatio = 0.7, double validationRatio = 0.15, int? seed = null)

Parameters

trainRatio double: Fraction of data for training (0.0 to 1.0).
validationRatio double: Fraction of data for validation (0.0 to 1.0).
seed int?: Optional random seed for reproducible splits.

Returns

(IInputOutputDataLoader<T, TInput, TOutput> Train, IInputOutputDataLoader<T, TInput, TOutput> Validation, IInputOutputDataLoader<T, TInput, TOutput> Test): A tuple containing three data loaders: (train, validation, test).

Remarks

The test ratio is implicitly 1 - trainRatio - validationRatio.

For Beginners: Splitting data is crucial for evaluating your model: - **Training set**: Data the model learns from - **Validation set**: Data used to tune hyperparameters and prevent overfitting - **Test set**: Data used only once at the end to get an unbiased performance estimate

Common splits are 60/20/20 or 70/15/15 (train/validation/test).

Table of Contents

Interface IInputOutputDataLoader<T, TInput, TOutput>

Type Parameters

Remarks

Properties

FeatureCount

Property Value

Features

Property Value

Remarks

Labels

Property Value

Remarks

OutputDimension

Property Value

Methods

Split(double, double, int?)

Parameters

Returns

Remarks