Table of Contents

Class EntropyBasedAdaptiveStrategy<T>

Namespace
AiDotNet.KnowledgeDistillation.Strategies
Assembly
AiDotNet.dll

Adaptive distillation strategy that adjusts temperature based on prediction entropy.

public class EntropyBasedAdaptiveStrategy<T> : AdaptiveDistillationStrategyBase<T>, IDistillationStrategy<T>, IAdaptiveDistillationStrategy<T>

Type Parameters

T

The numeric type for calculations (e.g., double, float).

Inheritance
EntropyBasedAdaptiveStrategy<T>
Implements
Inherited Members

Remarks

For Beginners: Entropy measures how uncertain or "spread out" a probability distribution is. High entropy means the student is uncertain (probabilities are similar across classes). Low entropy means the student is certain (one class has high probability).

Entropy Examples: - **Low Entropy** [0.95, 0.03, 0.02]: Student is certain → Class 0 dominates - **High Entropy** [0.35, 0.33, 0.32]: Student is uncertain → All classes similar

Intuition: - **High Entropy** (uncertain) → Student struggling → Lower temp (focus learning) - **Low Entropy** (certain) → Student confident → Higher temp (explore more)

Why Lower Temp for High Entropy? When student is uncertain, we want to provide sharper (lower temp) targets to focus learning on the most important features, rather than soft targets that might reinforce uncertainty.

Best For: - Detecting student uncertainty - Calibrating overconfident students - Datasets where uncertainty patterns are meaningful

Entropy Range: - Minimum: 0.0 (completely certain, one class = 1.0) - Maximum: 1.0 (normalized, completely uncertain, uniform distribution) - Normalized by log(num_classes) to get [0, 1] range

Temperature Mapping: High entropy → high difficulty → lower temperature (sharpen) Low entropy → low difficulty → higher temperature (soften)

Constructors

EntropyBasedAdaptiveStrategy(double, double, double, double, double)

Initializes a new instance of the EntropyBasedAdaptiveStrategy class.

public EntropyBasedAdaptiveStrategy(double baseTemperature = 3, double alpha = 0.3, double minTemperature = 1, double maxTemperature = 5, double adaptationRate = 0.1)

Parameters

baseTemperature double

Base temperature for distillation (default: 3.0).

alpha double

Balance between hard and soft loss (default: 0.3).

minTemperature double

Minimum temperature (for high entropy/uncertain, default: 1.0).

maxTemperature double

Maximum temperature (for low entropy/certain, default: 5.0).

adaptationRate double

EMA rate for performance tracking (default: 0.1).

Remarks

For Beginners: This strategy automatically adapts based on how uncertain the student's predictions are. No labels required!

Example:

var strategy = new EntropyBasedAdaptiveStrategy<double>(
    minTemperature: 1.5,  // For uncertain predictions (high entropy)
    maxTemperature: 4.0,  // For confident predictions (low entropy)
    adaptationRate: 0.15  // Moderate adaptation speed
);

for (int i = 0; i < samples.Length; i++) { var teacherLogits = teacher.GetLogits(samples[i]); var studentLogits = student.Predict(samples[i]);

// Automatically adapts based on entropy
var loss = strategy.ComputeLoss(studentLogits, teacherLogits);
strategy.UpdatePerformance(i, studentLogits);

}

Comparison with Confidence-Based: - **Confidence**: max(probabilities) - focuses on highest class - **Entropy**: considers full distribution - more holistic uncertainty measure

Methods

ComputeAdaptiveTemperature(Vector<T>, Vector<T>)

Computes adaptive temperature based on prediction entropy.

public override double ComputeAdaptiveTemperature(Vector<T> studentOutput, Vector<T> teacherOutput)

Parameters

studentOutput Vector<T>

Student's output logits.

teacherOutput Vector<T>

Teacher's output logits (not used in entropy-based).

Returns

double

Adapted temperature based on student entropy.

Remarks

Algorithm: 1. Convert logits to probabilities (softmax) 2. Compute normalized entropy H = -Σ(p * log(p)) / log(n) 3. Map entropy to temperature (inverted): - High entropy → Low temp (sharpen to reduce uncertainty) - Low entropy → High temp (soften to explore more)

Examples with 3 classes: - [0.95, 0.03, 0.02]: Entropy ≈ 0.2 (low) → Higher temperature - [0.35, 0.33, 0.32]: Entropy ≈ 1.0 (high) → Lower temperature

Why Invert? High uncertainty needs focused (sharp) targets to learn clear boundaries. Low uncertainty can benefit from softer targets to learn class relationships.

ComputePerformance(Vector<T>, Vector<T>?)

Computes performance based on entropy (inverse relationship).

protected override double ComputePerformance(Vector<T> studentOutput, Vector<T>? trueLabel)

Parameters

studentOutput Vector<T>
trueLabel Vector<T>

Returns

double

Remarks

Returns 1 - entropy. Low entropy (certain) = high performance score. High entropy (uncertain) = low performance score.