Class EntropyBasedAdaptiveStrategy<T>
- Namespace
- AiDotNet.KnowledgeDistillation.Strategies
- Assembly
- AiDotNet.dll
Adaptive distillation strategy that adjusts temperature based on prediction entropy.
public class EntropyBasedAdaptiveStrategy<T> : AdaptiveDistillationStrategyBase<T>, IDistillationStrategy<T>, IAdaptiveDistillationStrategy<T>
Type Parameters
TThe numeric type for calculations (e.g., double, float).
- Inheritance
-
EntropyBasedAdaptiveStrategy<T>
- Implements
- Inherited Members
Remarks
For Beginners: Entropy measures how uncertain or "spread out" a probability distribution is. High entropy means the student is uncertain (probabilities are similar across classes). Low entropy means the student is certain (one class has high probability).
Entropy Examples: - **Low Entropy** [0.95, 0.03, 0.02]: Student is certain → Class 0 dominates - **High Entropy** [0.35, 0.33, 0.32]: Student is uncertain → All classes similar
Intuition: - **High Entropy** (uncertain) → Student struggling → Lower temp (focus learning) - **Low Entropy** (certain) → Student confident → Higher temp (explore more)
Why Lower Temp for High Entropy? When student is uncertain, we want to provide sharper (lower temp) targets to focus learning on the most important features, rather than soft targets that might reinforce uncertainty.
Best For: - Detecting student uncertainty - Calibrating overconfident students - Datasets where uncertainty patterns are meaningful
Entropy Range: - Minimum: 0.0 (completely certain, one class = 1.0) - Maximum: 1.0 (normalized, completely uncertain, uniform distribution) - Normalized by log(num_classes) to get [0, 1] range
Temperature Mapping: High entropy → high difficulty → lower temperature (sharpen) Low entropy → low difficulty → higher temperature (soften)
Constructors
EntropyBasedAdaptiveStrategy(double, double, double, double, double)
Initializes a new instance of the EntropyBasedAdaptiveStrategy class.
public EntropyBasedAdaptiveStrategy(double baseTemperature = 3, double alpha = 0.3, double minTemperature = 1, double maxTemperature = 5, double adaptationRate = 0.1)
Parameters
baseTemperaturedoubleBase temperature for distillation (default: 3.0).
alphadoubleBalance between hard and soft loss (default: 0.3).
minTemperaturedoubleMinimum temperature (for high entropy/uncertain, default: 1.0).
maxTemperaturedoubleMaximum temperature (for low entropy/certain, default: 5.0).
adaptationRatedoubleEMA rate for performance tracking (default: 0.1).
Remarks
For Beginners: This strategy automatically adapts based on how uncertain the student's predictions are. No labels required!
Example:
var strategy = new EntropyBasedAdaptiveStrategy<double>(
minTemperature: 1.5, // For uncertain predictions (high entropy)
maxTemperature: 4.0, // For confident predictions (low entropy)
adaptationRate: 0.15 // Moderate adaptation speed
);
for (int i = 0; i < samples.Length; i++)
{
var teacherLogits = teacher.GetLogits(samples[i]);
var studentLogits = student.Predict(samples[i]);
// Automatically adapts based on entropy
var loss = strategy.ComputeLoss(studentLogits, teacherLogits);
strategy.UpdatePerformance(i, studentLogits);
}
Comparison with Confidence-Based: - **Confidence**: max(probabilities) - focuses on highest class - **Entropy**: considers full distribution - more holistic uncertainty measure
Methods
ComputeAdaptiveTemperature(Vector<T>, Vector<T>)
Computes adaptive temperature based on prediction entropy.
public override double ComputeAdaptiveTemperature(Vector<T> studentOutput, Vector<T> teacherOutput)
Parameters
studentOutputVector<T>Student's output logits.
teacherOutputVector<T>Teacher's output logits (not used in entropy-based).
Returns
- double
Adapted temperature based on student entropy.
Remarks
Algorithm: 1. Convert logits to probabilities (softmax) 2. Compute normalized entropy H = -Σ(p * log(p)) / log(n) 3. Map entropy to temperature (inverted): - High entropy → Low temp (sharpen to reduce uncertainty) - Low entropy → High temp (soften to explore more)
Examples with 3 classes: - [0.95, 0.03, 0.02]: Entropy ≈ 0.2 (low) → Higher temperature - [0.35, 0.33, 0.32]: Entropy ≈ 1.0 (high) → Lower temperature
Why Invert? High uncertainty needs focused (sharp) targets to learn clear boundaries. Low uncertainty can benefit from softer targets to learn class relationships.
ComputePerformance(Vector<T>, Vector<T>?)
Computes performance based on entropy (inverse relationship).
protected override double ComputePerformance(Vector<T> studentOutput, Vector<T>? trueLabel)
Parameters
studentOutputVector<T>trueLabelVector<T>
Returns
Remarks
Returns 1 - entropy. Low entropy (certain) = high performance score. High entropy (uncertain) = low performance score.