Table of Contents

Interface IDataset<T, TInput, TOutput>

Namespace
AiDotNet.Interfaces
Assembly
AiDotNet.dll

Interface for datasets used in active learning scenarios.

public interface IDataset<T, TInput, TOutput>

Type Parameters

T

The numeric type used for calculations.

TInput

The type of input features.

TOutput

The type of output labels.

Extension Methods

Remarks

For Beginners: A dataset in machine learning is a collection of samples, where each sample has input features (X) and optionally output labels (Y). This interface provides a unified way to work with datasets in active learning.

Key Concepts:

  • Inputs: The feature vectors (X) used for prediction
  • Outputs: The labels or targets (Y) we want to predict
  • Indexing: Access individual samples by their position
  • Subsetting: Create new datasets from selected indices

Active Learning Usage:

  • Labeled pool: Samples where outputs are known
  • Unlabeled pool: Samples where only inputs are known
  • Subsets are created when selecting samples for labeling

Properties

Count

Gets the number of samples in the dataset.

int Count { get; }

Property Value

int

HasLabels

Gets whether this dataset has labels for all samples.

bool HasLabels { get; }

Property Value

bool

Inputs

Gets the input features for all samples.

IReadOnlyList<TInput> Inputs { get; }

Property Value

IReadOnlyList<TInput>

Outputs

Gets the output labels for all samples.

IReadOnlyList<TOutput> Outputs { get; }

Property Value

IReadOnlyList<TOutput>

Remarks

For unlabeled datasets, this may contain default values or be empty. Use HasLabels to check if labels are available.

Methods

AddSamples(TInput[], TOutput[])

Adds samples with labels to the dataset.

IDataset<T, TInput, TOutput> AddSamples(TInput[] inputs, TOutput[] outputs)

Parameters

inputs TInput[]

The input features to add.

outputs TOutput[]

The output labels to add.

Returns

IDataset<T, TInput, TOutput>

A new dataset with the added samples.

Clone()

Creates a shallow copy of this dataset.

IDataset<T, TInput, TOutput> Clone()

Returns

IDataset<T, TInput, TOutput>

A new dataset with the same samples.

Except(int[])

Creates a subset of the dataset excluding the specified indices.

IDataset<T, TInput, TOutput> Except(int[] indices)

Parameters

indices int[]

The indices to exclude from the subset.

Returns

IDataset<T, TInput, TOutput>

A new dataset without the specified samples.

GetIndices()

Gets the indices of all samples in this dataset.

int[] GetIndices()

Returns

int[]

An array of indices from 0 to Count-1.

GetInput(int)

Gets the input features for a specific sample.

TInput GetInput(int index)

Parameters

index int

The index of the sample.

Returns

TInput

The input features at the specified index.

GetOutput(int)

Gets the output label for a specific sample.

TOutput GetOutput(int index)

Parameters

index int

The index of the sample.

Returns

TOutput

The output label at the specified index.

GetSample(int)

Gets both input and output for a specific sample.

(TInput Input, TOutput Output) GetSample(int index)

Parameters

index int

The index of the sample.

Returns

(TInput Input, TOutput Output)

A tuple containing the input and output.

Merge(IDataset<T, TInput, TOutput>)

Merges another dataset into this one.

IDataset<T, TInput, TOutput> Merge(IDataset<T, TInput, TOutput> other)

Parameters

other IDataset<T, TInput, TOutput>

The dataset to merge.

Returns

IDataset<T, TInput, TOutput>

A new dataset containing samples from both datasets.

RemoveSamples(int[])

Removes samples at the specified indices from the dataset.

IDataset<T, TInput, TOutput> RemoveSamples(int[] indices)

Parameters

indices int[]

The indices to remove.

Returns

IDataset<T, TInput, TOutput>

A new dataset without the specified samples.

Shuffle(Random?)

Shuffles the dataset and returns a new shuffled dataset.

IDataset<T, TInput, TOutput> Shuffle(Random? random = null)

Parameters

random Random

Optional random generator for reproducibility.

Returns

IDataset<T, TInput, TOutput>

A new shuffled dataset.

Split(double, Random?)

Splits the dataset into training and test sets.

(IDataset<T, TInput, TOutput> Train, IDataset<T, TInput, TOutput> Test) Split(double trainRatio = 0.8, Random? random = null)

Parameters

trainRatio double

The fraction of data for training (0.0 to 1.0).

random Random

Optional random generator for reproducibility.

Returns

(IDataset<T, TInput, TOutput> Train, IDataset<T, TInput, TOutput> Test)

A tuple containing training and test datasets.

Subset(int[])

Creates a subset of the dataset containing only the specified indices.

IDataset<T, TInput, TOutput> Subset(int[] indices)

Parameters

indices int[]

The indices to include in the subset.

Returns

IDataset<T, TInput, TOutput>

A new dataset containing only the specified samples.

UpdateLabels(int[], TOutput[])

Updates the labels for specific samples.

IDataset<T, TInput, TOutput> UpdateLabels(int[] indices, TOutput[] labels)

Parameters

indices int[]

The indices to update.

labels TOutput[]

The new labels.

Returns

IDataset<T, TInput, TOutput>

A new dataset with updated labels.