Class SEALOptions<T, TInput, TOutput>
- Namespace
- AiDotNet.MetaLearning.Options
- Assembly
- AiDotNet.dll
Configuration options for the SEAL (Sample-Efficient Adaptive Learning) meta-learning algorithm.
public class SEALOptions<T, TInput, TOutput> : IMetaLearnerOptions<T>
Type Parameters
TThe numeric data type used for calculations (e.g., float, double).
TInputThe input data type (e.g., Matrix<T>, Tensor<T>).
TOutputThe output data type (e.g., Vector<T>, Tensor<T>).
- Inheritance
-
SEALOptions<T, TInput, TOutput>
- Implements
- Inherited Members
Remarks
SEAL is a gradient-based meta-learning algorithm that combines ideas from MAML with sample-efficiency improvements. It learns initial parameters that can be quickly adapted to new tasks with just a few examples, incorporating temperature scaling, entropy regularization, and optional adaptive learning rates.
For Beginners: SEAL learns the best starting point for a model so that it can quickly adapt to new tasks with minimal data. Think of it like learning how to learn - after seeing many tasks, the model knows how to pick up new skills quickly.
Imagine learning to play musical instruments:
- Learning your first instrument (piano) is hard
- Learning your second instrument (guitar) is easier
- By your 5th instrument, you've learned principles that help you pick up any new instrument much faster
SEAL does the same with machine learning models!
Key features of SEAL: - Temperature scaling: Controls confidence in predictions during meta-training - Entropy regularization: Encourages diverse predictions to prevent overconfident models - Adaptive learning rates: Per-parameter learning rate adaptation based on gradient norms - Weight decay: Prevents overfitting to meta-training tasks
Constructors
SEALOptions(IFullModel<T, TInput, TOutput>)
Initializes a new instance of the SEALOptions class with the required meta-model.
public SEALOptions(IFullModel<T, TInput, TOutput> metaModel)
Parameters
metaModelIFullModel<T, TInput, TOutput>The meta-model to be trained (required).
Examples
// Create SEAL options with minimal configuration (uses all defaults)
var options = new SEALOptions<double, Matrix<double>, Vector<double>>(myNeuralNetwork);
var seal = new SEALAlgorithm<double, Matrix<double>, Vector<double>>(options);
// Create SEAL with entropy regularization for better generalization
var options = new SEALOptions<double, Matrix<double>, Vector<double>>(myNeuralNetwork)
{
EntropyCoefficient = 0.01,
Temperature = 1.5,
UseAdaptiveInnerLR = true
};
// Create SEAL with weight decay and gradient clipping
var options = new SEALOptions<double, Matrix<double>, Vector<double>>(myNeuralNetwork)
{
WeightDecay = 0.001,
GradientClipThreshold = 5.0,
AdaptationSteps = 10
};
Exceptions
- ArgumentNullException
Thrown when metaModel is null.
Properties
AdaptationSteps
Gets or sets the number of gradient steps to take during inner loop adaptation.
public int AdaptationSteps { get; set; }
Property Value
- int
Default: 5 (standard for few-shot learning scenarios).
Remarks
More adaptation steps allow better task-specific fitting but increase computational cost and memory usage (especially for second-order methods).
For Beginners: This is how many practice rounds you get on each task before being tested. More practice usually helps, but takes more time.
AdaptiveLearningRateDecay
Gets or sets the decay rate for running mean in adaptive learning rates.
public double AdaptiveLearningRateDecay { get; set; }
Property Value
- double
Default: 0.99 (slow decay for stable estimates).
Remarks
Only used when AdaptiveLearningRateMode is RunningMean. Higher values give more weight to historical gradients (more stable but slower to adapt).
AdaptiveLearningRateEpsilon
Gets or sets the epsilon value for numerical stability in adaptive learning rates.
public double AdaptiveLearningRateEpsilon { get; set; }
Property Value
- double
Default: 1e-8 (small value to prevent division by zero).
Remarks
Used in the denominator when computing adaptive learning rates to prevent division by zero: lr = base_lr / (sqrt(squared_grad) + epsilon)
AdaptiveLearningRateMode
Gets or sets the mode for adaptive learning rate computation.
public SEALAdaptiveLearningRateMode AdaptiveLearningRateMode { get; set; }
Property Value
- SEALAdaptiveLearningRateMode
Default: GradientNorm (scale by inverse gradient norm).
Remarks
Different modes for computing adaptive learning rates: - GradientNorm: lr = base_lr / (sqrt(grad^2) + epsilon) [AdaGrad-like] - RunningMean: Maintains exponential moving average of squared gradients [RMSprop-like] - PerLayer: Applies same adaptive rate to all parameters in each layer
CheckpointFrequency
Gets or sets how often to save checkpoints.
public int CheckpointFrequency { get; set; }
Property Value
- int
Default: 500.
DataLoader
Gets or sets the episodic data loader for sampling tasks.
public IEpisodicDataLoader<T, TInput, TOutput>? DataLoader { get; set; }
Property Value
- IEpisodicDataLoader<T, TInput, TOutput>
Default: null (tasks must be provided manually to MetaTrain).
EnableCheckpointing
Gets or sets whether to save checkpoints during training.
public bool EnableCheckpointing { get; set; }
Property Value
- bool
Default: false.
EntropyCoefficient
Gets or sets the entropy regularization coefficient.
public double EntropyCoefficient { get; set; }
Property Value
- double
Default: 0.0 (no entropy regularization).
Remarks
Entropy regularization adds a bonus for diverse predictions: Loss = Original_Loss - EntropyCoefficient * Entropy(predictions)
Higher values encourage the model to be less confident and more exploratory, which can help prevent overfitting on meta-training tasks.
For Beginners: Entropy measures how "spread out" the model's predictions are. By adding entropy to the objective, we encourage the model to not be too confident, which helps it generalize better to new tasks.
EntropyOnlyDuringMetaTrain
Gets or sets whether to apply entropy regularization only during meta-training.
public bool EntropyOnlyDuringMetaTrain { get; set; }
Property Value
- bool
Default: true (entropy regularization disabled during adaptation).
Remarks
When true, entropy regularization is only applied during meta-training (outer loop) and not during task adaptation (inner loop). This is often desirable as we want focused adaptation during the inner loop.
EvaluationFrequency
Gets or sets how often to evaluate during meta-training.
public int EvaluationFrequency { get; set; }
Property Value
- int
Default: 100 (evaluate every 100 iterations).
EvaluationTasks
Gets or sets the number of tasks to use for evaluation.
public int EvaluationTasks { get; set; }
Property Value
- int
Default: 100.
GradientClipThreshold
Gets or sets the maximum gradient norm for gradient clipping.
public double? GradientClipThreshold { get; set; }
Property Value
- double?
Default: 10.0 (prevents exploding gradients).
Remarks
Gradient clipping limits the magnitude of gradients during training, preventing numerical instability from exploding gradients.
InnerLearningRate
Gets or sets the learning rate for the inner loop (task adaptation).
public double InnerLearningRate { get; set; }
Property Value
- double
Default: 0.01 (standard for meta-learning inner loops).
Remarks
This controls how quickly the model adapts to each task during the inner loop. A smaller value leads to more stable but slower adaptation; larger values can cause unstable training.
For Beginners: This is like how big of steps you take when learning a specific task. Smaller steps are safer but slower; bigger steps are faster but might overshoot the optimal solution.
InnerOptimizer
Gets or sets the optimizer for inner-loop adaptation.
public IGradientBasedOptimizer<T, TInput, TOutput>? InnerOptimizer { get; set; }
Property Value
- IGradientBasedOptimizer<T, TInput, TOutput>
Default: null (uses built-in SGD optimizer with InnerLearningRate).
Remarks
The inner optimizer performs task-specific adaptation on the support set. SGD is typically used for simplicity and computational efficiency.
LossFunction
Gets or sets the loss function for training.
public ILossFunction<T>? LossFunction { get; set; }
Property Value
- ILossFunction<T>
Default: null (uses model's default loss function if available).
MetaBatchSize
Gets or sets the number of tasks to sample per meta-training iteration.
public int MetaBatchSize { get; set; }
Property Value
- int
Default: 4 (typical meta-batch size).
Remarks
Larger batch sizes provide more stable gradient estimates but require more memory and computation per iteration.
MetaModel
Gets or sets the meta-model to be trained. This is the only required property.
public IFullModel<T, TInput, TOutput> MetaModel { get; set; }
Property Value
- IFullModel<T, TInput, TOutput>
Remarks
The model must implement IFullModel to support parameter getting/setting and gradient computation required for SEAL's meta-learning process.
MetaOptimizer
Gets or sets the optimizer for meta-parameter updates (outer loop).
public IGradientBasedOptimizer<T, TInput, TOutput>? MetaOptimizer { get; set; }
Property Value
- IGradientBasedOptimizer<T, TInput, TOutput>
Default: null (uses built-in Adam optimizer with OuterLearningRate).
Remarks
The meta-optimizer updates the initial model parameters based on performance across all tasks in the meta-batch. Adam is recommended for stable convergence.
MinTemperature
Gets or sets the minimum temperature for temperature annealing.
public double MinTemperature { get; set; }
Property Value
- double
Default: 1.0 (no annealing, constant temperature).
Remarks
When MinTemperature is less than Temperature, the temperature will linearly decrease from Temperature to MinTemperature over the course of training. This allows the model to be more exploratory early and more confident later.
NumMetaIterations
Gets or sets the total number of meta-training iterations to perform.
public int NumMetaIterations { get; set; }
Property Value
- int
Default: 1000 (typical meta-training length).
OuterLearningRate
Gets or sets the learning rate for the outer loop (meta-optimization).
public double OuterLearningRate { get; set; }
Property Value
- double
Default: 0.001 (typically 10x smaller than inner rate).
Remarks
This controls how the meta-parameters (initial model weights) are updated based on performance across all tasks. Typically smaller than inner learning rate for stable meta-learning.
RandomSeed
Gets or sets the random seed for reproducibility.
public int? RandomSeed { get; set; }
Property Value
- int?
Default: null (non-deterministic).
Temperature
Gets or sets the temperature scaling factor for the loss function.
public double Temperature { get; set; }
Property Value
- double
Default: 1.0 (no temperature scaling).
Remarks
Temperature scaling divides the loss by the temperature value: - Temperature > 1.0: Softens predictions, reduces confidence - Temperature < 1.0: Sharpens predictions, increases confidence - Temperature = 1.0: No effect (standard loss computation)
For Beginners: Temperature is like adjusting how "certain" the model should be about its predictions: - High temperature: Model becomes more humble, spreads probability across options - Low temperature: Model becomes more confident, concentrates probability on top choice
UseAdaptiveInnerLR
Gets or sets whether to use adaptive inner learning rates.
public bool UseAdaptiveInnerLR { get; set; }
Property Value
- bool
Default: false (use fixed inner learning rate).
Remarks
When enabled, SEAL computes per-parameter learning rates based on gradient norms during inner loop adaptation. Parameters with larger gradients get smaller learning rates (similar to AdaGrad's approach).
For Beginners: Instead of using the same step size for all parameters, adaptive learning rates adjust the step size for each parameter based on how much it has been changing. Parameters that change a lot get smaller steps to prevent overshooting.
UseFirstOrder
Gets or sets whether to use first-order approximation (FOMAML-style).
public bool UseFirstOrder { get; set; }
Property Value
- bool
Default: true (recommended for computational efficiency).
Remarks
First-order approximation ignores second-order gradient terms, reducing computational complexity from O(n^3) to O(n) with minimal performance loss. Most production implementations use first-order methods.
For Beginners: When true, SEAL uses a faster but slightly less accurate way to compute gradients. In practice, this works almost as well as the exact method but is much faster.
WeightDecay
Gets or sets the weight decay (L2 regularization) coefficient.
public double WeightDecay { get; set; }
Property Value
- double
Default: 0.0 (no weight decay).
Remarks
Weight decay adds a penalty proportional to the squared magnitude of weights: Gradient = Original_Gradient + WeightDecay * Parameters
This helps prevent overfitting by keeping weights small.
For Beginners: Weight decay is like a "simplicity penalty" that discourages the model from having very large weights. Large weights often indicate overfitting, so keeping them small helps the model generalize.
Methods
Clone()
Creates a deep copy of the SEAL options.
public IMetaLearnerOptions<T> Clone()
Returns
- IMetaLearnerOptions<T>
A new SEALOptions instance with the same configuration values.
IsValid()
Validates that all SEAL configuration options are properly set.
public bool IsValid()
Returns
- bool
True if the configuration is valid for SEAL training; otherwise, false.
Remarks
Checks all required hyperparameters including SEAL-specific ones: - Standard meta-learning parameters (learning rates, steps, etc.) - Temperature must be positive - Entropy coefficient must be non-negative - Weight decay must be non-negative - Adaptive learning rate parameters must be valid