Class EntropyBasedAdaptiveStrategy<T>

Namespace: AiDotNet.KnowledgeDistillation.Strategies

Assembly: AiDotNet.dll

Adaptive distillation strategy that adjusts temperature based on prediction entropy.

public class EntropyBasedAdaptiveStrategy<T> : AdaptiveDistillationStrategyBase<T>, IDistillationStrategy<T>, IAdaptiveDistillationStrategy<T>

Type Parameters

T: The numeric type for calculations (e.g., double, float).

Inheritance: object

DistillationStrategyBase<T>

AdaptiveDistillationStrategyBase<T>

EntropyBasedAdaptiveStrategy<T>

Implements: IDistillationStrategy<T>

IAdaptiveDistillationStrategy<T>

Inherited Members: AdaptiveDistillationStrategyBase<T>.MinTemperature

AdaptiveDistillationStrategyBase<T>.MaxTemperature

AdaptiveDistillationStrategyBase<T>.AdaptationRate

AdaptiveDistillationStrategyBase<T>.UpdatePerformance(int, Vector<T>, Vector<T>)

AdaptiveDistillationStrategyBase<T>.GetPerformance(int)

AdaptiveDistillationStrategyBase<T>.ComputeLoss(Matrix<T>, Matrix<T>, Matrix<T>)

AdaptiveDistillationStrategyBase<T>.ComputeGradient(Matrix<T>, Matrix<T>, Matrix<T>)

AdaptiveDistillationStrategyBase<T>.GetMaxConfidence(Vector<T>)

AdaptiveDistillationStrategyBase<T>.ComputeEntropy(Vector<T>)

AdaptiveDistillationStrategyBase<T>.IsCorrect(Vector<T>, Vector<T>)

AdaptiveDistillationStrategyBase<T>.ArgMax(Vector<T>)

AdaptiveDistillationStrategyBase<T>.ClampTemperature(double)

DistillationStrategyBase<T>.NumOps

DistillationStrategyBase<T>.Temperature

DistillationStrategyBase<T>.Alpha

DistillationStrategyBase<T>.ValidateOutputDimensions(Matrix<T>, Matrix<T>)

DistillationStrategyBase<T>.ValidateLabelDimensions(Matrix<T>, Matrix<T>)

DistillationStrategyBase<T>.Epsilon

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

For Beginners: Entropy measures how uncertain or "spread out" a probability distribution is. High entropy means the student is uncertain (probabilities are similar across classes). Low entropy means the student is certain (one class has high probability).

Entropy Examples: - **Low Entropy** [0.95, 0.03, 0.02]: Student is certain → Class 0 dominates - **High Entropy** [0.35, 0.33, 0.32]: Student is uncertain → All classes similar

Intuition: - **High Entropy** (uncertain) → Student struggling → Lower temp (focus learning) - **Low Entropy** (certain) → Student confident → Higher temp (explore more)

Why Lower Temp for High Entropy? When student is uncertain, we want to provide sharper (lower temp) targets to focus learning on the most important features, rather than soft targets that might reinforce uncertainty.

Best For: - Detecting student uncertainty - Calibrating overconfident students - Datasets where uncertainty patterns are meaningful

Entropy Range: - Minimum: 0.0 (completely certain, one class = 1.0) - Maximum: 1.0 (normalized, completely uncertain, uniform distribution) - Normalized by log(num_classes) to get [0, 1] range

Temperature Mapping: High entropy → high difficulty → lower temperature (sharpen) Low entropy → low difficulty → higher temperature (soften)

Constructors

EntropyBasedAdaptiveStrategy(double, double, double, double, double)

Initializes a new instance of the EntropyBasedAdaptiveStrategy class.

public EntropyBasedAdaptiveStrategy(double baseTemperature = 3, double alpha = 0.3, double minTemperature = 1, double maxTemperature = 5, double adaptationRate = 0.1)

Parameters

baseTemperature double: Base temperature for distillation (default: 3.0).
alpha double: Balance between hard and soft loss (default: 0.3).
minTemperature double: Minimum temperature (for high entropy/uncertain, default: 1.0).
maxTemperature double: Maximum temperature (for low entropy/certain, default: 5.0).
adaptationRate double: EMA rate for performance tracking (default: 0.1).

Remarks

For Beginners: This strategy automatically adapts based on how uncertain the student's predictions are. No labels required!

Example:

var strategy = new EntropyBasedAdaptiveStrategy<double>(
    minTemperature: 1.5,  // For uncertain predictions (high entropy)
    maxTemperature: 4.0,  // For confident predictions (low entropy)
    adaptationRate: 0.15  // Moderate adaptation speed
);
for (int i = 0; i < samples.Length; i++)
{
var teacherLogits = teacher.GetLogits(samples[i]);
var studentLogits = student.Predict(samples[i]);
// Automatically adapts based on entropy
var loss = strategy.ComputeLoss(studentLogits, teacherLogits);
strategy.UpdatePerformance(i, studentLogits);

}

Comparison with Confidence-Based: - **Confidence**: max(probabilities) - focuses on highest class - **Entropy**: considers full distribution - more holistic uncertainty measure

Methods

ComputeAdaptiveTemperature(Vector<T>, Vector<T>)

Computes adaptive temperature based on prediction entropy.

public override double ComputeAdaptiveTemperature(Vector<T> studentOutput, Vector<T> teacherOutput)

Parameters

studentOutput Vector<T>: Student's output logits.
teacherOutput Vector<T>: Teacher's output logits (not used in entropy-based).

Returns

double: Adapted temperature based on student entropy.

Remarks

Algorithm: 1. Convert logits to probabilities (softmax) 2. Compute normalized entropy H = -Σ(p * log(p)) / log(n) 3. Map entropy to temperature (inverted): - High entropy → Low temp (sharpen to reduce uncertainty) - Low entropy → High temp (soften to explore more)

Examples with 3 classes: - [0.95, 0.03, 0.02]: Entropy ≈ 0.2 (low) → Higher temperature - [0.35, 0.33, 0.32]: Entropy ≈ 1.0 (high) → Lower temperature

Why Invert? High uncertainty needs focused (sharp) targets to learn clear boundaries. Low uncertainty can benefit from softer targets to learn class relationships.

ComputePerformance(Vector<T>, Vector<T>?)

Computes performance based on entropy (inverse relationship).

protected override double ComputePerformance(Vector<T> studentOutput, Vector<T>? trueLabel)

Parameters

studentOutput Vector<T>
trueLabel Vector<T>

Returns

double

Remarks

Returns 1 - entropy. Low entropy (certain) = high performance score. High entropy (uncertain) = low performance score.

Table of Contents

Class EntropyBasedAdaptiveStrategy<T>

Type Parameters

Remarks

Constructors

EntropyBasedAdaptiveStrategy(double, double, double, double, double)

Parameters

Remarks

Methods

ComputeAdaptiveTemperature(Vector<T>, Vector<T>)

Parameters

Returns

Remarks

ComputePerformance(Vector<T>, Vector<T>?)

Parameters

Returns

Remarks