Interface IInputOutputDataLoader<T, TInput, TOutput>
- Namespace
- AiDotNet.Interfaces
- Assembly
- AiDotNet.dll
Interface for data loaders that provide standard input-output (X, Y) data for supervised learning.
public interface IInputOutputDataLoader<T, TInput, TOutput> : IDataLoader<T>, IResettable, ICountable, IBatchIterable<(TInput Features, TOutput Labels)>, IShuffleable
Type Parameters
TThe numeric type used for calculations, typically float or double.
TInputThe input data type (e.g., Matrix<T>, Tensor<T>).
TOutputThe output data type (e.g., Vector<T>, Tensor<T>).
- Inherited Members
- Extension Methods
Remarks
This interface is for standard supervised learning scenarios where you have: - Input features (X): The data used to make predictions - Output labels (Y): The correct answers the model should learn to predict
For Beginners: Most machine learning tasks fall into this pattern:
Example: House Price Prediction
- X (inputs): Square footage, number of bedrooms, location, age
- Y (outputs): The actual house price
Example: Email Spam Detection
- X (inputs): Email text features (word counts, sender info, etc.)
- Y (outputs): Label (spam=1, not spam=0)
The data loader loads this data from files, databases, or other sources and provides it in the format your model needs for training.
Properties
FeatureCount
Gets the number of features per sample.
int FeatureCount { get; }
Property Value
Features
Gets all input features as a single data structure.
TInput Features { get; }
Property Value
- TInput
Remarks
This provides access to the complete feature set. For large datasets, prefer using batch iteration methods instead of loading everything at once.
Labels
Gets all output labels as a single data structure.
TOutput Labels { get; }
Property Value
- TOutput
Remarks
This provides access to all labels. For large datasets, prefer using batch iteration methods instead of loading everything at once.
OutputDimension
Gets the number of output dimensions (1 for regression/binary classification, N for multi-class with N classes).
int OutputDimension { get; }
Property Value
Methods
Split(double, double, int?)
Creates a train/validation/test split of the data.
(IInputOutputDataLoader<T, TInput, TOutput> Train, IInputOutputDataLoader<T, TInput, TOutput> Validation, IInputOutputDataLoader<T, TInput, TOutput> Test) Split(double trainRatio = 0.7, double validationRatio = 0.15, int? seed = null)
Parameters
trainRatiodoubleFraction of data for training (0.0 to 1.0).
validationRatiodoubleFraction of data for validation (0.0 to 1.0).
seedint?Optional random seed for reproducible splits.
Returns
- (IInputOutputDataLoader<T, TInput, TOutput> Train, IInputOutputDataLoader<T, TInput, TOutput> Validation, IInputOutputDataLoader<T, TInput, TOutput> Test)
A tuple containing three data loaders: (train, validation, test).
Remarks
The test ratio is implicitly 1 - trainRatio - validationRatio.
For Beginners: Splitting data is crucial for evaluating your model: - **Training set**: Data the model learns from - **Validation set**: Data used to tune hyperparameters and prevent overfitting - **Test set**: Data used only once at the end to get an unbiased performance estimate
Common splits are 60/20/20 or 70/15/15 (train/validation/test).