Interface IActiveLearningStrategy<T>
- Namespace
- AiDotNet.Interfaces
- Assembly
- AiDotNet.dll
Defines a strategy for active learning that selects the most informative samples for labeling from a pool of unlabeled data.
public interface IActiveLearningStrategy<T>
Type Parameters
TThe numeric type for calculations (e.g., double, float).
Remarks
For Beginners: Active learning helps when labeling data is expensive or time-consuming. Instead of randomly selecting samples to label, active learning intelligently picks the samples that would be most helpful for training the model. This can dramatically reduce the number of labels needed while achieving similar or better performance.
Common strategies include:
- Uncertainty Sampling: Select samples where the model is most uncertain. The idea is that uncertain samples are near the decision boundary and most informative.
- Query-by-Committee: Use multiple models and select samples where they disagree the most. Disagreement suggests the sample is in an unclear region.
- Expected Model Change: Select samples that would cause the largest change to model parameters. These samples have high learning potential.
- Diversity Sampling: Select samples that are representative of different regions of the input space, ensuring good coverage.
Typical Usage Flow:
// Initial training with small labeled set
model.Train(labeledData);
// Active learning loop
while (labelingBudget > 0)
{
// Select most informative samples
var samplesToLabel = strategy.SelectSamples(model, unlabeledPool, batchSize);
// Get labels (from human annotator or oracle)
var newLabels = GetLabels(samplesToLabel);
// Update training data and retrain
labeledData.AddRange(newLabels);
model.Train(labeledData);
labelingBudget -= batchSize;
}
Properties
Name
Gets the name of this active learning strategy.
string Name { get; }
Property Value
Remarks
Used for logging, debugging, and identifying which strategy is being used.
UseBatchDiversity
Gets or sets whether to use batch-mode selection that considers diversity among selected samples.
bool UseBatchDiversity { get; set; }
Property Value
Remarks
For Beginners: When selecting multiple samples at once (batch mode), simply picking the top-scoring samples might lead to redundancy - they might all be similar. Enabling batch diversity ensures the selected samples are not only informative but also different from each other, providing better coverage of the uncertain regions.
Methods
ComputeInformativenessScores(IFullModel<T, Tensor<T>, Tensor<T>>, Tensor<T>)
Computes informativeness scores for all samples in the unlabeled pool.
Vector<T> ComputeInformativenessScores(IFullModel<T, Tensor<T>, Tensor<T>> model, Tensor<T> unlabeledPool)
Parameters
modelIFullModel<T, Tensor<T>, Tensor<T>>The current trained model used to evaluate samples.
unlabeledPoolTensor<T>Pool of unlabeled samples to score.
Returns
- Vector<T>
A vector of scores where higher values indicate more informative samples.
Remarks
For Beginners: This method assigns a score to each unlabeled sample indicating how "informative" or "valuable" it would be to label. Higher scores mean the model would benefit more from having that sample labeled.
Different strategies compute informativeness differently:
- Uncertainty: How unsure the model is about the prediction
- Diversity: How different the sample is from already labeled data
- Expected change: How much the model would change if trained on this sample
GetSelectionStatistics()
Gets statistics about the most recent sample selection.
Dictionary<string, T> GetSelectionStatistics()
Returns
- Dictionary<string, T>
Dictionary containing selection statistics (e.g., score distribution, diversity metrics).
Remarks
For Beginners: This method returns information about the last selection, which is useful for understanding how the strategy is performing and debugging. Statistics might include average score of selected samples, score variance, etc.
SelectSamples(IFullModel<T, Tensor<T>, Tensor<T>>, Tensor<T>, int)
Selects the most informative samples from the unlabeled pool for labeling.
int[] SelectSamples(IFullModel<T, Tensor<T>, Tensor<T>> model, Tensor<T> unlabeledPool, int batchSize)
Parameters
modelIFullModel<T, Tensor<T>, Tensor<T>>The current trained model used to evaluate samples.
unlabeledPoolTensor<T>Pool of unlabeled samples to select from.
batchSizeintNumber of samples to select for labeling.
Returns
- int[]
Indices of the selected samples in the unlabeled pool.
Remarks
For Beginners: This is the main method of active learning. It looks at all the unlabeled samples and picks the ones that would be most valuable to have labeled. The returned indices tell you which samples from the unlabeled pool to label next.
The selection is based on the strategy's informativeness criterion (uncertainty, diversity, expected change, etc.).