Table of Contents

Class HardToEasyCurriculumStrategy<T>

Namespace
AiDotNet.KnowledgeDistillation.Strategies
Assembly
AiDotNet.dll

Curriculum distillation strategy that progresses from hard to easy samples.

public class HardToEasyCurriculumStrategy<T> : CurriculumDistillationStrategyBase<T>, IDistillationStrategy<T>, ICurriculumDistillationStrategy<T>

Type Parameters

T

The numeric type for calculations (e.g., double, float).

Inheritance
HardToEasyCurriculumStrategy<T>
Implements
Inherited Members

Remarks

For Beginners: Hard-to-easy curriculum is the opposite of traditional curriculum learning. It starts with challenging samples and gradually includes easier ones. This approach is useful for fine-tuning already-trained models or when the student has prior knowledge.

How It Works: 1. **Early Training** (progress 0.0-0.3): - Include only hard samples (difficulty ≥ 0.7) - Use low temperature (sharp targets, challenging) 2. **Mid Training** (progress 0.3-0.7): - Include hard and medium samples (difficulty ≥ 0.3) - Gradually increase temperature 3. **Late Training** (progress 0.7-1.0): - Include all samples (even easy ones) - Use high temperature (soft targets, exploratory)

Temperature Progression: Starts at MinTemperature (e.g., 2.0) and linearly increases to MaxTemperature (e.g., 5.0) as training progresses. This makes distillation progressively easier.

Sample Filtering: At progress P, only include samples with difficulty ≥ (1 - P). Example: At 50% progress, only samples with difficulty ≥ 0.5 are included.

Real-World Analogy: Training an advanced student: Start with challenging problems to identify gaps in knowledge, then fill in easier concepts they might have missed. Like a PhD student reviewing undergraduate material to strengthen foundations.

When to Use Hard-to-Easy: - **Fine-tuning**: Student already has base knowledge - **Transfer Learning**: Adapting pre-trained model to new domain - **Anti-forgetting**: Prevent model from forgetting hard concepts - **Expert Refinement**: Polish already-good student model - **Debugging**: Identify which hard samples student struggles with

Advantages: - Forces student to tackle challenges early - Identifies weaknesses quickly - Can improve performance on difficult edge cases - Prevents overfitting to easy samples

Disadvantages: - Can be unstable if student has no prior knowledge - May not converge well from random initialization - Harder to tune than easy-to-hard

References: - Krueger & Dayan (2009). Flexible shaping: How learning in small steps helps. - Kumar et al. (2010). Self-paced curriculum learning (discusses anti-curriculum).

Constructors

HardToEasyCurriculumStrategy(double, double, double, double, int, Dictionary<int, double>?)

Initializes a new instance of the HardToEasyCurriculumStrategy class.

public HardToEasyCurriculumStrategy(double baseTemperature = 3, double alpha = 0.3, double minTemperature = 2, double maxTemperature = 5, int totalSteps = 100, Dictionary<int, double>? sampleDifficulties = null)

Parameters

baseTemperature double

Base temperature for distillation (default: 3.0).

alpha double

Balance between hard and soft loss (default: 0.3).

minTemperature double

Starting temperature for hard samples (default: 2.0).

maxTemperature double

Ending temperature for easy samples (default: 5.0).

totalSteps int

Total training steps/epochs (default: 100).

sampleDifficulties Dictionary<int, double>

Optional pre-defined difficulty scores (0.0=easy, 1.0=hard).

Remarks

For Beginners: Use this strategy when fine-tuning an already-trained student model. Not recommended for training from scratch.

Example:

// Fine-tuning a pre-trained student model
var difficulties = ComputeSampleDifficulties(validationSet);

var strategy = new HardToEasyCurriculumStrategy<double>( minTemperature: 1.5, // Initial temperature (hard phase) - challenging! maxTemperature: 4.0, // Final temperature (easy phase) - gentle totalSteps: 50, // Shorter curriculum for fine-tuning sampleDifficulties: difficulties );

// Fine-tuning loop for (int epoch = 0; epoch < 50; epoch++) { strategy.UpdateProgress(epoch);

foreach (var (sample, index) in trainingSamples.WithIndex())
{
    // Filter: Only hard samples early, all samples later
    if (!strategy.ShouldIncludeSample(index))
        continue; // Too easy for current stage

    // Fine-tune on this sample...
}

}

Difficulty Scoring for Fine-Tuning: - Student's current loss on sample (higher = harder) - Disagreement with teacher (higher = harder) - Sample rarity in training set (rarer = harder) - Domain distance from pre-training data

Methods

ComputeCurriculumTemperature()

Computes curriculum temperature that increases over time (hard to easy).

public override double ComputeCurriculumTemperature()

Returns

double

Current temperature based on curriculum progress.

Remarks

Algorithm: Temperature increases linearly from MinTemperature to MaxTemperature. temp = MinTemp + progress * (MaxTemp - MinTemp)

Example with defaults (min=2.0, max=5.0): - Progress 0.0 (start): Temp = 2.0 (sharp, hard, challenging) - Progress 0.25: Temp = 2.75 (harder) - Progress 0.50: Temp = 3.5 (medium) - Progress 0.75: Temp = 4.25 (softer) - Progress 1.0 (end): Temp = 5.0 (very soft, easy, exploratory)

Intuition: - **Low temp early**: Sharp targets force student to learn hard patterns precisely - **High temp late**: Soft targets help student generalize to easier patterns

Use Case: When fine-tuning, start with sharp targets to fix errors on hard samples, then soften to improve overall generalization.

ShouldIncludeSample(int)

Determines if a sample should be included based on curriculum progress (inverted).

public override bool ShouldIncludeSample(int sampleIndex)

Parameters

sampleIndex int

Index of the sample.

Returns

bool

True if sample difficulty is within current curriculum stage.

Remarks

Algorithm: - Get sample difficulty (or assume 0.5 if not set) - Include if difficulty ≥ (1 - current progress) - Example: At 60% progress, include samples with difficulty ≥ 0.4

Progression Example (100 epochs): - Epoch 0 (0% progress): Only difficulty ≥ 1.0 (hardest samples only) - Epoch 25 (25%): difficulty ≥ 0.75 (hard samples) - Epoch 50 (50%): difficulty ≥ 0.50 (hard + medium) - Epoch 75 (75%): difficulty ≥ 0.25 (hard + medium + some easy) - Epoch 99 (99%): difficulty ≥ 0.01 (all samples)

Note: Samples without difficulty scores are always included (treated as appropriate for all stages).