Table of Contents

Class KnowledgeDistillationOptions<T, TInput, TOutput>

Namespace
AiDotNet.Models.Options
Assembly
AiDotNet.dll

Configuration options for knowledge distillation training.

public class KnowledgeDistillationOptions<T, TInput, TOutput>

Type Parameters

T

The numeric type for calculations.

TInput

The input data type.

TOutput

The output data type.

Inheritance
KnowledgeDistillationOptions<T, TInput, TOutput>
Inherited Members

Remarks

For Beginners: This class configures how knowledge distillation works. Think of it as the "settings" for transferring knowledge from a large teacher model to a smaller student model.

Quick Start Example:

var options = new KnowledgeDistillationOptions<double, Vector<double>, Vector<double>>
{
    TeacherModelType = TeacherModelType.NeuralNetwork,
    StrategyType = DistillationStrategyType.ResponseBased,
    Temperature = 3.0,  // Soft predictions
    Alpha = 0.3,        // 30% hard labels, 70% teacher
    Epochs = 20,
    BatchSize = 32
};

Properties

Alpha

Gets or sets the alpha parameter balancing hard and soft loss.

public double Alpha { get; set; }

Property Value

double

Remarks

For Beginners: Alpha controls the balance: - 0.0: Only learn from teacher - 0.3-0.5: Balanced (recommended) - 1.0: Only learn from labels (no distillation)

Use lower alpha when labels are noisy or you want to rely more on the teacher.

AttentionLayers

Gets or sets attention layer names (if using attention-based distillation).

public Vector<string>? AttentionLayers { get; set; }

Property Value

Vector<string>

Remarks

For Attention-Based Distillation: Specify attention layers to match. Example: ["attention1", "attention2"]

AttentionWeight

Gets or sets weight for attention loss (if using attention-based distillation).

public double AttentionWeight { get; set; }

Property Value

double

Remarks

Controls how much to weight attention matching. Typical values: 0.2-0.4

BatchSize

Gets or sets the batch size for training.

public int BatchSize { get; set; }

Property Value

int

Remarks

For Beginners: Batch size is how many samples to process at once: - Small (16-32): Less memory, noisier gradients - Medium (64-128): Balanced - Large (256+): More memory, smoother gradients

CheckpointDirectory

Gets or sets checkpoint directory path (if checkpoints are enabled).

public string? CheckpointDirectory { get; set; }

Property Value

string

Remarks

If null, uses "./checkpoints" by default.

CheckpointFrequency

Gets or sets checkpoint frequency (save every N epochs).

public int CheckpointFrequency { get; set; }

Property Value

int

Remarks

Set to 1 to save after every epoch. Higher values save less frequently.

EMADecay

Gets or sets the EMA decay rate (if using EMA).

public double EMADecay { get; set; }

Property Value

double

Remarks

Typical values: 0.99-0.999. Higher values give more weight to history.

EarlyStoppingMinDelta

Gets or sets minimum improvement delta for early stopping.

public double EarlyStoppingMinDelta { get; set; }

Property Value

double

Remarks

Loss must improve by at least this amount to count as improvement. Typical values: 0.001-0.01

EarlyStoppingPatience

Gets or sets patience for early stopping (epochs without improvement).

public int EarlyStoppingPatience { get; set; }

Property Value

int

Remarks

Typical values: 3-10. Higher patience allows more time for improvement.

EnsembleWeights

Gets or sets ensemble weights (if using multiple teachers).

public Vector<double>? EnsembleWeights { get; set; }

Property Value

Vector<double>

Remarks

Optional weights for each teacher. Must sum to 1.0. If null, uniform weights are used.

Epochs

Gets or sets the number of training epochs.

public int Epochs { get; set; }

Property Value

int

Remarks

For Beginners: An epoch is one complete pass through the training data. Typical values: 10-50 epochs depending on dataset size and complexity.

FeatureLayerPairs

Gets or sets layer pairs for feature-based distillation. Format: "teacher_layer:student_layer"

public Vector<string>? FeatureLayerPairs { get; set; }

Property Value

Vector<string>

Remarks

For Feature-Based Distillation: Specify which layers to match. Example: ["conv3:conv2", "conv4:conv3"]

FeatureWeight

Gets or sets weight for feature loss (if using feature-based distillation).

public double FeatureWeight { get; set; }

Property Value

double

Remarks

Controls how much to weight feature matching vs output matching. Typical values: 0.3-0.7

FreezeTeacher

Gets or sets whether to freeze teacher model during training.

public bool FreezeTeacher { get; set; }

Property Value

bool

Remarks

For Beginners: Usually true - teacher should remain fixed. Set to false for online distillation where teacher updates with student.

LabelSmoothingFactor

Gets or sets the label smoothing factor (if enabled).

public double LabelSmoothingFactor { get; set; }

Property Value

double

Remarks

Typical values: 0.1-0.2. Higher values smooth labels more.

LearningRate

Gets or sets the learning rate for student training.

public double LearningRate { get; set; }

Property Value

double

Remarks

For Beginners: Learning rate controls how fast the student learns: - Too low: Slow training - Too high: Unstable training - Typical: 0.001-0.01

OnEpochComplete

Gets or sets callback function invoked after each epoch.

public Action<int, T>? OnEpochComplete { get; set; }

Property Value

Action<int, T>

Remarks

For Advanced Users: Use this to log progress, save checkpoints, or implement custom logic during training.

OutputDimension

Gets or sets output dimension for models (if not inferrable from teacher).

public int? OutputDimension { get; set; }

Property Value

int?

Remarks

Usually inferred automatically. Set manually if needed.

RandomSeed

Gets or sets the random seed for reproducibility.

public int? RandomSeed { get; set; }

Property Value

int?

Remarks

For Beginners: Set a seed to get reproducible results. Useful for debugging and comparing experiments.

SaveCheckpoints

Gets or sets whether to save checkpoints during training.

public bool SaveCheckpoints { get; set; }

Property Value

bool

Remarks

For Production: Saves best model automatically. Essential for long-running training and recovery from failures.

SaveOnlyBestCheckpoint

Gets or sets whether to only save the best model checkpoint.

public bool SaveOnlyBestCheckpoint { get; set; }

Property Value

bool

Remarks

If true, only keeps the checkpoint with best validation loss. If false, keeps all checkpoints.

SelfDistillationGenerations

Gets or sets the number of self-distillation generations (if using self-distillation).

public int SelfDistillationGenerations { get; set; }

Property Value

int

Remarks

For Self-Distillation: How many times the model re-teaches itself. Typical values: 1-3 generations.

Strategy

Gets or sets the distillation strategy instance (if using custom strategy).

public IDistillationStrategy<T>? Strategy { get; set; }

Property Value

IDistillationStrategy<T>

Remarks

For Advanced Users: Provide a custom distillation strategy. If null, one will be created based on StrategyType.

StrategyType

Gets or sets the distillation strategy type.

public DistillationStrategyType StrategyType { get; set; }

Property Value

DistillationStrategyType

Remarks

For Beginners: The strategy determines what knowledge to transfer: - ResponseBased: Match final outputs (most common) - FeatureBased: Match intermediate layers - AttentionBased: Match attention patterns (for transformers)

Teacher

Gets or sets the teacher model instance (if using pre-instantiated teacher).

public ITeacherModel<TInput, TOutput>? Teacher { get; set; }

Property Value

ITeacherModel<TInput, TOutput>

Remarks

For Advanced Users: Provide a custom teacher model instance. If null, one will be created based on TeacherModelType.

TeacherForward

Gets or sets the teacher model forward function (alternative approach).

public Func<TInput, TOutput>? TeacherForward { get; set; }

Property Value

Func<TInput, TOutput>

Remarks

For Advanced Users: If you have a trained model with a forward function, provide it and it will be automatically wrapped as a teacher.

Example:

TeacherForward = input => myTrainedModel.Predict(input)

TeacherModel

Gets or sets the teacher IFullModel (recommended approach).

public IFullModel<T, TInput, TOutput>? TeacherModel { get; set; }

Property Value

IFullModel<T, TInput, TOutput>

Remarks

For Beginners (Recommended): Pass your trained IFullModel directly. This is the standard way to provide a teacher model in the AiDotNet architecture.

Example:

// After training
var trainedModel = await builder.ConfigureModel(model).BuildAsync();

// Use as teacher TeacherModel = trainedModel.Model

TeacherModelType

Gets or sets the type of teacher model to use.

public TeacherModelType TeacherModelType { get; set; }

Property Value

TeacherModelType

Remarks

For Beginners: The teacher is the "expert" model. Choose: - NeuralNetwork: Standard pre-trained model - Ensemble: Multiple teachers for better knowledge - Self: Model teaches itself (no separate teacher needed)

Teachers

Gets or sets multiple teacher models (for ensemble distillation).

public Vector<ITeacherModel<TInput, TOutput>>? Teachers { get; set; }

Property Value

Vector<ITeacherModel<TInput, TOutput>>

Remarks

For Ensemble Distillation: Provide multiple teacher models. They will be automatically combined into an ensemble.

Temperature

Gets or sets the temperature for softmax scaling.

public double Temperature { get; set; }

Property Value

double

Remarks

For Beginners: Temperature controls how "soft" predictions are: - Low (1-2): Sharp predictions - Medium (3-5): Balanced (recommended) - High (6-10): Very soft predictions

Higher temperature reveals more about class relationships but may be harder to optimize.

UseEMA

Gets or sets whether to use exponential moving average for teacher predictions (self-distillation).

public bool UseEMA { get; set; }

Property Value

bool

Remarks

For Self-Distillation: EMA smooths teacher predictions over time, improving stability.

UseEarlyStopping

Gets or sets whether to enable early stopping based on validation loss.

public bool UseEarlyStopping { get; set; }

Property Value

bool

Remarks

For Production: Stops training when validation loss stops improving. Prevents overfitting and saves compute time.

UseLabelSmoothing

Gets or sets whether to use label smoothing.

public bool UseLabelSmoothing { get; set; }

Property Value

bool

Remarks

For Beginners: Label smoothing softens hard labels slightly, which can improve generalization. Usually not needed with distillation.

ValidateAfterEpoch

Gets or sets whether to validate model after each epoch.

public bool ValidateAfterEpoch { get; set; }

Property Value

bool

Remarks

For Beginners: If true, evaluates student on validation set after each epoch to monitor progress.

ValidationInputs

Gets or sets validation data inputs (if validation is enabled).

public TInput[]? ValidationInputs { get; set; }

Property Value

TInput[]

ValidationLabels

Gets or sets validation data labels (if validation is enabled).

public TOutput[]? ValidationLabels { get; set; }

Property Value

TOutput[]

Methods

Validate()

Validates the options and throws if any are invalid.

public void Validate()