Table of Contents

Interface IActiveLearningStrategy<T>

Namespace
AiDotNet.Interfaces
Assembly
AiDotNet.dll

Defines a strategy for active learning that selects the most informative samples for labeling from a pool of unlabeled data.

public interface IActiveLearningStrategy<T>

Type Parameters

T

The numeric type for calculations (e.g., double, float).

Remarks

For Beginners: Active learning helps when labeling data is expensive or time-consuming. Instead of randomly selecting samples to label, active learning intelligently picks the samples that would be most helpful for training the model. This can dramatically reduce the number of labels needed while achieving similar or better performance.

Common strategies include:

  • Uncertainty Sampling: Select samples where the model is most uncertain. The idea is that uncertain samples are near the decision boundary and most informative.
  • Query-by-Committee: Use multiple models and select samples where they disagree the most. Disagreement suggests the sample is in an unclear region.
  • Expected Model Change: Select samples that would cause the largest change to model parameters. These samples have high learning potential.
  • Diversity Sampling: Select samples that are representative of different regions of the input space, ensuring good coverage.

Typical Usage Flow:

// Initial training with small labeled set
model.Train(labeledData);

// Active learning loop
while (labelingBudget > 0)
{
    // Select most informative samples
    var samplesToLabel = strategy.SelectSamples(model, unlabeledPool, batchSize);

    // Get labels (from human annotator or oracle)
    var newLabels = GetLabels(samplesToLabel);

    // Update training data and retrain
    labeledData.AddRange(newLabels);
    model.Train(labeledData);

    labelingBudget -= batchSize;
}

Properties

Name

Gets the name of this active learning strategy.

string Name { get; }

Property Value

string

Remarks

Used for logging, debugging, and identifying which strategy is being used.

UseBatchDiversity

Gets or sets whether to use batch-mode selection that considers diversity among selected samples.

bool UseBatchDiversity { get; set; }

Property Value

bool

Remarks

For Beginners: When selecting multiple samples at once (batch mode), simply picking the top-scoring samples might lead to redundancy - they might all be similar. Enabling batch diversity ensures the selected samples are not only informative but also different from each other, providing better coverage of the uncertain regions.

Methods

ComputeInformativenessScores(IFullModel<T, Tensor<T>, Tensor<T>>, Tensor<T>)

Computes informativeness scores for all samples in the unlabeled pool.

Vector<T> ComputeInformativenessScores(IFullModel<T, Tensor<T>, Tensor<T>> model, Tensor<T> unlabeledPool)

Parameters

model IFullModel<T, Tensor<T>, Tensor<T>>

The current trained model used to evaluate samples.

unlabeledPool Tensor<T>

Pool of unlabeled samples to score.

Returns

Vector<T>

A vector of scores where higher values indicate more informative samples.

Remarks

For Beginners: This method assigns a score to each unlabeled sample indicating how "informative" or "valuable" it would be to label. Higher scores mean the model would benefit more from having that sample labeled.

Different strategies compute informativeness differently:

  • Uncertainty: How unsure the model is about the prediction
  • Diversity: How different the sample is from already labeled data
  • Expected change: How much the model would change if trained on this sample

GetSelectionStatistics()

Gets statistics about the most recent sample selection.

Dictionary<string, T> GetSelectionStatistics()

Returns

Dictionary<string, T>

Dictionary containing selection statistics (e.g., score distribution, diversity metrics).

Remarks

For Beginners: This method returns information about the last selection, which is useful for understanding how the strategy is performing and debugging. Statistics might include average score of selected samples, score variance, etc.

SelectSamples(IFullModel<T, Tensor<T>, Tensor<T>>, Tensor<T>, int)

Selects the most informative samples from the unlabeled pool for labeling.

int[] SelectSamples(IFullModel<T, Tensor<T>, Tensor<T>> model, Tensor<T> unlabeledPool, int batchSize)

Parameters

model IFullModel<T, Tensor<T>, Tensor<T>>

The current trained model used to evaluate samples.

unlabeledPool Tensor<T>

Pool of unlabeled samples to select from.

batchSize int

Number of samples to select for labeling.

Returns

int[]

Indices of the selected samples in the unlabeled pool.

Remarks

For Beginners: This is the main method of active learning. It looks at all the unlabeled samples and picks the ones that would be most valuable to have labeled. The returned indices tell you which samples from the unlabeled pool to label next.

The selection is based on the strategy's informativeness criterion (uncertainty, diversity, expected change, etc.).