Table of Contents

Enum TeacherModelType

Namespace
AiDotNet.Enums
Assembly
AiDotNet.dll

Specifies the type of teacher model to use for knowledge distillation.

public enum TeacherModelType

Fields

Adaptive = 5

Adaptive teacher that adjusts teaching based on student performance. Modulates difficulty or focus areas based on how well the student is learning.

Best for: Curriculum learning, progressive training.

Requirements: Teacher model + adaptation logic.

Pros: Optimizes teaching strategy, faster convergence.

Cons: More complex, requires performance monitoring.

Curriculum = 7

Curriculum teacher that provides progressive difficulty. Starts with easy samples and gradually increases difficulty.

Best for: Complex tasks, improving convergence.

Requirements: Teacher model + curriculum strategy.

Pros: Better convergence, handles complex tasks.

Cons: Requires curriculum design, longer training.

Distributed = 10

Distributed teacher split across multiple devices/nodes. Large teacher model is distributed for efficient inference.

Best for: Very large teachers, distributed training.

Requirements: Multi-device setup, large teacher.

Pros: Handles very large models, parallel processing.

Cons: Complex setup, communication overhead.

Ensemble = 1

Ensemble of multiple teacher models. Combines predictions from multiple teachers (averaging, voting, or weighted combination).

Best for: High-accuracy requirements, combining diverse models.

Requirements: Multiple pre-trained teacher models.

Pros: More robust, captures diverse knowledge.

Cons: Slower (multiple forward passes), requires more memory.

MultiModal = 4

Multi-modal teacher (e.g., CLIP, vision-language models). Handles multiple input modalities (text, images, audio, etc.).

Best for: Cross-modal learning, vision-language tasks.

Requirements: Multi-modal pre-trained model.

Pros: Handles multiple modalities, rich representations.

Cons: Complex, requires multi-modal data.

NeuralNetwork = 0

Standard neural network teacher. Uses a single, pre-trained neural network as the teacher model.

Best for: Standard distillation scenarios, single teacher.

Requirements: Pre-trained teacher model.

Pros: Simple, straightforward, fast.

Cons: Limited to single model's knowledge.

Online = 6

Online teacher that updates during student training. Teacher weights are updated simultaneously with student (co-training).

Best for: Continuous learning, evolving data distributions.

Requirements: Updateable teacher model.

Pros: Adapts to new data, maintains relevance.

Cons: Risk of teacher degradation, complex optimization.

Pretrained = 2

Pretrained model loaded from checkpoint or ONNX. Loads a teacher from a saved checkpoint, ONNX model, or other serialized format.

Best for: Using external models, cross-framework distillation.

Requirements: Model checkpoint or ONNX file.

Pros: Reuse existing models, framework-agnostic.

Cons: May require format conversions.

Quantized = 9

Quantized teacher with reduced precision (INT8, INT4). Uses quantized version of teacher for faster inference during distillation.

Best for: Fast distillation, resource-constrained environments.

Requirements: Quantized teacher model.

Pros: Faster, less memory, still effective.

Cons: Slight accuracy loss, quantization overhead.

Self = 8

Self-teacher where model teaches itself (Born-Again Networks). The model acts as its own teacher to improve calibration and generalization.

Best for: Improving calibration, no separate teacher available.

Requirements: Initial trained model.

Pros: No separate teacher needed, improves calibration.

Cons: Requires multiple training generations.

Transformer = 3

Transformer-based teacher (BERT, GPT, ViT, etc.). Specialized for transformer architectures with attention mechanisms.

Best for: Language models, vision transformers, attention-based models.

Requirements: Transformer teacher model.

Pros: Supports attention distillation, handles sequences.

Cons: Specific to transformer architecture.

Remarks

For Beginners: The teacher model is the "expert" that guides the student model's learning. Different teacher types are suited for different scenarios and distillation goals.

Choosing a Teacher: - Use **NeuralNetwork** for standard NN-to-NN distillation - Use **Ensemble** to combine knowledge from multiple models - Use **Pretrained** to load from checkpoints or ONNX - Use **Adaptive** for curriculum learning (progressive difficulty) - Use **Online** when teacher should update during training