Class HardToEasyCurriculumStrategy<T>
- Namespace
- AiDotNet.KnowledgeDistillation.Strategies
- Assembly
- AiDotNet.dll
Curriculum distillation strategy that progresses from hard to easy samples.
public class HardToEasyCurriculumStrategy<T> : CurriculumDistillationStrategyBase<T>, IDistillationStrategy<T>, ICurriculumDistillationStrategy<T>
Type Parameters
TThe numeric type for calculations (e.g., double, float).
- Inheritance
-
HardToEasyCurriculumStrategy<T>
- Implements
- Inherited Members
Remarks
For Beginners: Hard-to-easy curriculum is the opposite of traditional curriculum learning. It starts with challenging samples and gradually includes easier ones. This approach is useful for fine-tuning already-trained models or when the student has prior knowledge.
How It Works: 1. **Early Training** (progress 0.0-0.3): - Include only hard samples (difficulty ≥ 0.7) - Use low temperature (sharp targets, challenging) 2. **Mid Training** (progress 0.3-0.7): - Include hard and medium samples (difficulty ≥ 0.3) - Gradually increase temperature 3. **Late Training** (progress 0.7-1.0): - Include all samples (even easy ones) - Use high temperature (soft targets, exploratory)
Temperature Progression: Starts at MinTemperature (e.g., 2.0) and linearly increases to MaxTemperature (e.g., 5.0) as training progresses. This makes distillation progressively easier.
Sample Filtering: At progress P, only include samples with difficulty ≥ (1 - P). Example: At 50% progress, only samples with difficulty ≥ 0.5 are included.
Real-World Analogy: Training an advanced student: Start with challenging problems to identify gaps in knowledge, then fill in easier concepts they might have missed. Like a PhD student reviewing undergraduate material to strengthen foundations.
When to Use Hard-to-Easy: - **Fine-tuning**: Student already has base knowledge - **Transfer Learning**: Adapting pre-trained model to new domain - **Anti-forgetting**: Prevent model from forgetting hard concepts - **Expert Refinement**: Polish already-good student model - **Debugging**: Identify which hard samples student struggles with
Advantages: - Forces student to tackle challenges early - Identifies weaknesses quickly - Can improve performance on difficult edge cases - Prevents overfitting to easy samples
Disadvantages: - Can be unstable if student has no prior knowledge - May not converge well from random initialization - Harder to tune than easy-to-hard
References: - Krueger & Dayan (2009). Flexible shaping: How learning in small steps helps. - Kumar et al. (2010). Self-paced curriculum learning (discusses anti-curriculum).
Constructors
HardToEasyCurriculumStrategy(double, double, double, double, int, Dictionary<int, double>?)
Initializes a new instance of the HardToEasyCurriculumStrategy class.
public HardToEasyCurriculumStrategy(double baseTemperature = 3, double alpha = 0.3, double minTemperature = 2, double maxTemperature = 5, int totalSteps = 100, Dictionary<int, double>? sampleDifficulties = null)
Parameters
baseTemperaturedoubleBase temperature for distillation (default: 3.0).
alphadoubleBalance between hard and soft loss (default: 0.3).
minTemperaturedoubleStarting temperature for hard samples (default: 2.0).
maxTemperaturedoubleEnding temperature for easy samples (default: 5.0).
totalStepsintTotal training steps/epochs (default: 100).
sampleDifficultiesDictionary<int, double>Optional pre-defined difficulty scores (0.0=easy, 1.0=hard).
Remarks
For Beginners: Use this strategy when fine-tuning an already-trained student model. Not recommended for training from scratch.
Example:
// Fine-tuning a pre-trained student model
var difficulties = ComputeSampleDifficulties(validationSet);
var strategy = new HardToEasyCurriculumStrategy<double>(
minTemperature: 1.5, // Initial temperature (hard phase) - challenging!
maxTemperature: 4.0, // Final temperature (easy phase) - gentle
totalSteps: 50, // Shorter curriculum for fine-tuning
sampleDifficulties: difficulties
);
// Fine-tuning loop
for (int epoch = 0; epoch < 50; epoch++)
{
strategy.UpdateProgress(epoch);
foreach (var (sample, index) in trainingSamples.WithIndex())
{
// Filter: Only hard samples early, all samples later
if (!strategy.ShouldIncludeSample(index))
continue; // Too easy for current stage
// Fine-tune on this sample...
}
}
Difficulty Scoring for Fine-Tuning: - Student's current loss on sample (higher = harder) - Disagreement with teacher (higher = harder) - Sample rarity in training set (rarer = harder) - Domain distance from pre-training data
Methods
ComputeCurriculumTemperature()
Computes curriculum temperature that increases over time (hard to easy).
public override double ComputeCurriculumTemperature()
Returns
- double
Current temperature based on curriculum progress.
Remarks
Algorithm: Temperature increases linearly from MinTemperature to MaxTemperature. temp = MinTemp + progress * (MaxTemp - MinTemp)
Example with defaults (min=2.0, max=5.0): - Progress 0.0 (start): Temp = 2.0 (sharp, hard, challenging) - Progress 0.25: Temp = 2.75 (harder) - Progress 0.50: Temp = 3.5 (medium) - Progress 0.75: Temp = 4.25 (softer) - Progress 1.0 (end): Temp = 5.0 (very soft, easy, exploratory)
Intuition: - **Low temp early**: Sharp targets force student to learn hard patterns precisely - **High temp late**: Soft targets help student generalize to easier patterns
Use Case: When fine-tuning, start with sharp targets to fix errors on hard samples, then soften to improve overall generalization.
ShouldIncludeSample(int)
Determines if a sample should be included based on curriculum progress (inverted).
public override bool ShouldIncludeSample(int sampleIndex)
Parameters
sampleIndexintIndex of the sample.
Returns
- bool
True if sample difficulty is within current curriculum stage.
Remarks
Algorithm: - Get sample difficulty (or assume 0.5 if not set) - Include if difficulty ≥ (1 - current progress) - Example: At 60% progress, include samples with difficulty ≥ 0.4
Progression Example (100 epochs): - Epoch 0 (0% progress): Only difficulty ≥ 1.0 (hardest samples only) - Epoch 25 (25%): difficulty ≥ 0.75 (hard samples) - Epoch 50 (50%): difficulty ≥ 0.50 (hard + medium) - Epoch 75 (75%): difficulty ≥ 0.25 (hard + medium + some easy) - Epoch 99 (99%): difficulty ≥ 0.01 (all samples)
Note: Samples without difficulty scores are always included (treated as appropriate for all stages).