Class KnowledgeDistillationOptions<T, TInput, TOutput>
Configuration options for knowledge distillation training.
public class KnowledgeDistillationOptions<T, TInput, TOutput>
Type Parameters
TThe numeric type for calculations.
TInputThe input data type.
TOutputThe output data type.
- Inheritance
-
KnowledgeDistillationOptions<T, TInput, TOutput>
- Inherited Members
Remarks
For Beginners: This class configures how knowledge distillation works. Think of it as the "settings" for transferring knowledge from a large teacher model to a smaller student model.
Quick Start Example:
var options = new KnowledgeDistillationOptions<double, Vector<double>, Vector<double>>
{
TeacherModelType = TeacherModelType.NeuralNetwork,
StrategyType = DistillationStrategyType.ResponseBased,
Temperature = 3.0, // Soft predictions
Alpha = 0.3, // 30% hard labels, 70% teacher
Epochs = 20,
BatchSize = 32
};
Properties
Alpha
Gets or sets the alpha parameter balancing hard and soft loss.
public double Alpha { get; set; }
Property Value
Remarks
For Beginners: Alpha controls the balance: - 0.0: Only learn from teacher - 0.3-0.5: Balanced (recommended) - 1.0: Only learn from labels (no distillation)
Use lower alpha when labels are noisy or you want to rely more on the teacher.
AttentionLayers
Gets or sets attention layer names (if using attention-based distillation).
public Vector<string>? AttentionLayers { get; set; }
Property Value
- Vector<string>
Remarks
For Attention-Based Distillation: Specify attention layers to match. Example: ["attention1", "attention2"]
AttentionWeight
Gets or sets weight for attention loss (if using attention-based distillation).
public double AttentionWeight { get; set; }
Property Value
Remarks
Controls how much to weight attention matching. Typical values: 0.2-0.4
BatchSize
Gets or sets the batch size for training.
public int BatchSize { get; set; }
Property Value
Remarks
For Beginners: Batch size is how many samples to process at once: - Small (16-32): Less memory, noisier gradients - Medium (64-128): Balanced - Large (256+): More memory, smoother gradients
CheckpointDirectory
Gets or sets checkpoint directory path (if checkpoints are enabled).
public string? CheckpointDirectory { get; set; }
Property Value
Remarks
If null, uses "./checkpoints" by default.
CheckpointFrequency
Gets or sets checkpoint frequency (save every N epochs).
public int CheckpointFrequency { get; set; }
Property Value
Remarks
Set to 1 to save after every epoch. Higher values save less frequently.
EMADecay
Gets or sets the EMA decay rate (if using EMA).
public double EMADecay { get; set; }
Property Value
Remarks
Typical values: 0.99-0.999. Higher values give more weight to history.
EarlyStoppingMinDelta
Gets or sets minimum improvement delta for early stopping.
public double EarlyStoppingMinDelta { get; set; }
Property Value
Remarks
Loss must improve by at least this amount to count as improvement. Typical values: 0.001-0.01
EarlyStoppingPatience
Gets or sets patience for early stopping (epochs without improvement).
public int EarlyStoppingPatience { get; set; }
Property Value
Remarks
Typical values: 3-10. Higher patience allows more time for improvement.
EnsembleWeights
Gets or sets ensemble weights (if using multiple teachers).
public Vector<double>? EnsembleWeights { get; set; }
Property Value
- Vector<double>
Remarks
Optional weights for each teacher. Must sum to 1.0. If null, uniform weights are used.
Epochs
Gets or sets the number of training epochs.
public int Epochs { get; set; }
Property Value
Remarks
For Beginners: An epoch is one complete pass through the training data. Typical values: 10-50 epochs depending on dataset size and complexity.
FeatureLayerPairs
Gets or sets layer pairs for feature-based distillation. Format: "teacher_layer:student_layer"
public Vector<string>? FeatureLayerPairs { get; set; }
Property Value
- Vector<string>
Remarks
For Feature-Based Distillation: Specify which layers to match. Example: ["conv3:conv2", "conv4:conv3"]
FeatureWeight
Gets or sets weight for feature loss (if using feature-based distillation).
public double FeatureWeight { get; set; }
Property Value
Remarks
Controls how much to weight feature matching vs output matching. Typical values: 0.3-0.7
FreezeTeacher
Gets or sets whether to freeze teacher model during training.
public bool FreezeTeacher { get; set; }
Property Value
Remarks
For Beginners: Usually true - teacher should remain fixed. Set to false for online distillation where teacher updates with student.
LabelSmoothingFactor
Gets or sets the label smoothing factor (if enabled).
public double LabelSmoothingFactor { get; set; }
Property Value
Remarks
Typical values: 0.1-0.2. Higher values smooth labels more.
LearningRate
Gets or sets the learning rate for student training.
public double LearningRate { get; set; }
Property Value
Remarks
For Beginners: Learning rate controls how fast the student learns: - Too low: Slow training - Too high: Unstable training - Typical: 0.001-0.01
OnEpochComplete
Gets or sets callback function invoked after each epoch.
public Action<int, T>? OnEpochComplete { get; set; }
Property Value
Remarks
For Advanced Users: Use this to log progress, save checkpoints, or implement custom logic during training.
OutputDimension
Gets or sets output dimension for models (if not inferrable from teacher).
public int? OutputDimension { get; set; }
Property Value
- int?
Remarks
Usually inferred automatically. Set manually if needed.
RandomSeed
Gets or sets the random seed for reproducibility.
public int? RandomSeed { get; set; }
Property Value
- int?
Remarks
For Beginners: Set a seed to get reproducible results. Useful for debugging and comparing experiments.
SaveCheckpoints
Gets or sets whether to save checkpoints during training.
public bool SaveCheckpoints { get; set; }
Property Value
Remarks
For Production: Saves best model automatically. Essential for long-running training and recovery from failures.
SaveOnlyBestCheckpoint
Gets or sets whether to only save the best model checkpoint.
public bool SaveOnlyBestCheckpoint { get; set; }
Property Value
Remarks
If true, only keeps the checkpoint with best validation loss. If false, keeps all checkpoints.
SelfDistillationGenerations
Gets or sets the number of self-distillation generations (if using self-distillation).
public int SelfDistillationGenerations { get; set; }
Property Value
Remarks
For Self-Distillation: How many times the model re-teaches itself. Typical values: 1-3 generations.
Strategy
Gets or sets the distillation strategy instance (if using custom strategy).
public IDistillationStrategy<T>? Strategy { get; set; }
Property Value
Remarks
For Advanced Users: Provide a custom distillation strategy. If null, one will be created based on StrategyType.
StrategyType
Gets or sets the distillation strategy type.
public DistillationStrategyType StrategyType { get; set; }
Property Value
Remarks
For Beginners: The strategy determines what knowledge to transfer: - ResponseBased: Match final outputs (most common) - FeatureBased: Match intermediate layers - AttentionBased: Match attention patterns (for transformers)
Teacher
Gets or sets the teacher model instance (if using pre-instantiated teacher).
public ITeacherModel<TInput, TOutput>? Teacher { get; set; }
Property Value
- ITeacherModel<TInput, TOutput>
Remarks
For Advanced Users: Provide a custom teacher model instance. If null, one will be created based on TeacherModelType.
TeacherForward
Gets or sets the teacher model forward function (alternative approach).
public Func<TInput, TOutput>? TeacherForward { get; set; }
Property Value
- Func<TInput, TOutput>
Remarks
For Advanced Users: If you have a trained model with a forward function, provide it and it will be automatically wrapped as a teacher.
Example:
TeacherForward = input => myTrainedModel.Predict(input)
TeacherModel
Gets or sets the teacher IFullModel (recommended approach).
public IFullModel<T, TInput, TOutput>? TeacherModel { get; set; }
Property Value
- IFullModel<T, TInput, TOutput>
Remarks
For Beginners (Recommended): Pass your trained IFullModel directly. This is the standard way to provide a teacher model in the AiDotNet architecture.
Example:
// After training
var trainedModel = await builder.ConfigureModel(model).BuildAsync();
// Use as teacher
TeacherModel = trainedModel.Model
TeacherModelType
Gets or sets the type of teacher model to use.
public TeacherModelType TeacherModelType { get; set; }
Property Value
Remarks
For Beginners: The teacher is the "expert" model. Choose: - NeuralNetwork: Standard pre-trained model - Ensemble: Multiple teachers for better knowledge - Self: Model teaches itself (no separate teacher needed)
Teachers
Gets or sets multiple teacher models (for ensemble distillation).
public Vector<ITeacherModel<TInput, TOutput>>? Teachers { get; set; }
Property Value
- Vector<ITeacherModel<TInput, TOutput>>
Remarks
For Ensemble Distillation: Provide multiple teacher models. They will be automatically combined into an ensemble.
Temperature
Gets or sets the temperature for softmax scaling.
public double Temperature { get; set; }
Property Value
Remarks
For Beginners: Temperature controls how "soft" predictions are: - Low (1-2): Sharp predictions - Medium (3-5): Balanced (recommended) - High (6-10): Very soft predictions
Higher temperature reveals more about class relationships but may be harder to optimize.
UseEMA
Gets or sets whether to use exponential moving average for teacher predictions (self-distillation).
public bool UseEMA { get; set; }
Property Value
Remarks
For Self-Distillation: EMA smooths teacher predictions over time, improving stability.
UseEarlyStopping
Gets or sets whether to enable early stopping based on validation loss.
public bool UseEarlyStopping { get; set; }
Property Value
Remarks
For Production: Stops training when validation loss stops improving. Prevents overfitting and saves compute time.
UseLabelSmoothing
Gets or sets whether to use label smoothing.
public bool UseLabelSmoothing { get; set; }
Property Value
Remarks
For Beginners: Label smoothing softens hard labels slightly, which can improve generalization. Usually not needed with distillation.
ValidateAfterEpoch
Gets or sets whether to validate model after each epoch.
public bool ValidateAfterEpoch { get; set; }
Property Value
Remarks
For Beginners: If true, evaluates student on validation set after each epoch to monitor progress.
ValidationInputs
Gets or sets validation data inputs (if validation is enabled).
public TInput[]? ValidationInputs { get; set; }
Property Value
- TInput[]
ValidationLabels
Gets or sets validation data labels (if validation is enabled).
public TOutput[]? ValidationLabels { get; set; }
Property Value
- TOutput[]
Methods
Validate()
Validates the options and throws if any are invalid.
public void Validate()