Interface IDataset<T, TInput, TOutput>
- Namespace
- AiDotNet.Interfaces
- Assembly
- AiDotNet.dll
Interface for datasets used in active learning scenarios.
public interface IDataset<T, TInput, TOutput>
Type Parameters
TThe numeric type used for calculations.
TInputThe type of input features.
TOutputThe type of output labels.
- Extension Methods
Remarks
For Beginners: A dataset in machine learning is a collection of samples, where each sample has input features (X) and optionally output labels (Y). This interface provides a unified way to work with datasets in active learning.
Key Concepts:
- Inputs: The feature vectors (X) used for prediction
- Outputs: The labels or targets (Y) we want to predict
- Indexing: Access individual samples by their position
- Subsetting: Create new datasets from selected indices
Active Learning Usage:
- Labeled pool: Samples where outputs are known
- Unlabeled pool: Samples where only inputs are known
- Subsets are created when selecting samples for labeling
Properties
Count
Gets the number of samples in the dataset.
int Count { get; }
Property Value
HasLabels
Gets whether this dataset has labels for all samples.
bool HasLabels { get; }
Property Value
Inputs
Gets the input features for all samples.
IReadOnlyList<TInput> Inputs { get; }
Property Value
- IReadOnlyList<TInput>
Outputs
Gets the output labels for all samples.
IReadOnlyList<TOutput> Outputs { get; }
Property Value
- IReadOnlyList<TOutput>
Remarks
For unlabeled datasets, this may contain default values or be empty. Use HasLabels to check if labels are available.
Methods
AddSamples(TInput[], TOutput[])
Adds samples with labels to the dataset.
IDataset<T, TInput, TOutput> AddSamples(TInput[] inputs, TOutput[] outputs)
Parameters
inputsTInput[]The input features to add.
outputsTOutput[]The output labels to add.
Returns
- IDataset<T, TInput, TOutput>
A new dataset with the added samples.
Clone()
Creates a shallow copy of this dataset.
IDataset<T, TInput, TOutput> Clone()
Returns
- IDataset<T, TInput, TOutput>
A new dataset with the same samples.
Except(int[])
Creates a subset of the dataset excluding the specified indices.
IDataset<T, TInput, TOutput> Except(int[] indices)
Parameters
indicesint[]The indices to exclude from the subset.
Returns
- IDataset<T, TInput, TOutput>
A new dataset without the specified samples.
GetIndices()
Gets the indices of all samples in this dataset.
int[] GetIndices()
Returns
- int[]
An array of indices from 0 to Count-1.
GetInput(int)
Gets the input features for a specific sample.
TInput GetInput(int index)
Parameters
indexintThe index of the sample.
Returns
- TInput
The input features at the specified index.
GetOutput(int)
Gets the output label for a specific sample.
TOutput GetOutput(int index)
Parameters
indexintThe index of the sample.
Returns
- TOutput
The output label at the specified index.
GetSample(int)
Gets both input and output for a specific sample.
(TInput Input, TOutput Output) GetSample(int index)
Parameters
indexintThe index of the sample.
Returns
Merge(IDataset<T, TInput, TOutput>)
Merges another dataset into this one.
IDataset<T, TInput, TOutput> Merge(IDataset<T, TInput, TOutput> other)
Parameters
otherIDataset<T, TInput, TOutput>The dataset to merge.
Returns
- IDataset<T, TInput, TOutput>
A new dataset containing samples from both datasets.
RemoveSamples(int[])
Removes samples at the specified indices from the dataset.
IDataset<T, TInput, TOutput> RemoveSamples(int[] indices)
Parameters
indicesint[]The indices to remove.
Returns
- IDataset<T, TInput, TOutput>
A new dataset without the specified samples.
Shuffle(Random?)
Shuffles the dataset and returns a new shuffled dataset.
IDataset<T, TInput, TOutput> Shuffle(Random? random = null)
Parameters
randomRandomOptional random generator for reproducibility.
Returns
- IDataset<T, TInput, TOutput>
A new shuffled dataset.
Split(double, Random?)
Splits the dataset into training and test sets.
(IDataset<T, TInput, TOutput> Train, IDataset<T, TInput, TOutput> Test) Split(double trainRatio = 0.8, Random? random = null)
Parameters
trainRatiodoubleThe fraction of data for training (0.0 to 1.0).
randomRandomOptional random generator for reproducibility.
Returns
- (IDataset<T, TInput, TOutput> Train, IDataset<T, TInput, TOutput> Test)
A tuple containing training and test datasets.
Subset(int[])
Creates a subset of the dataset containing only the specified indices.
IDataset<T, TInput, TOutput> Subset(int[] indices)
Parameters
indicesint[]The indices to include in the subset.
Returns
- IDataset<T, TInput, TOutput>
A new dataset containing only the specified samples.
UpdateLabels(int[], TOutput[])
Updates the labels for specific samples.
IDataset<T, TInput, TOutput> UpdateLabels(int[] indices, TOutput[] labels)
Parameters
indicesint[]The indices to update.
labelsTOutput[]The new labels.
Returns
- IDataset<T, TInput, TOutput>
A new dataset with updated labels.