Table of Contents

Class MetaSGDAlgorithm<T, TInput, TOutput>

Namespace
AiDotNet.MetaLearning.Algorithms
Assembly
AiDotNet.dll

Implementation of Meta-SGD (Meta Stochastic Gradient Descent) algorithm.

public class MetaSGDAlgorithm<T, TInput, TOutput> : MetaLearnerBase<T, TInput, TOutput>, IMetaLearner<T, TInput, TOutput>

Type Parameters

T

The numeric type used for calculations (e.g., float, double).

TInput

The input data type (e.g., Matrix<T>, Tensor<T>).

TOutput

The output data type (e.g., Vector<T>, Tensor<T>).

Inheritance
MetaLearnerBase<T, TInput, TOutput>
MetaSGDAlgorithm<T, TInput, TOutput>
Implements
IMetaLearner<T, TInput, TOutput>
Inherited Members

Remarks

Meta-SGD learns per-parameter learning rates for meta-learning. Instead of learning just initialization parameters like MAML, it learns the learning rate, momentum, and direction for each parameter individually, which can be seen as learning a custom optimizer for each parameter.

For Beginners: Meta-SGD learns how to update each parameter individually:

In regular training, you use one learning rate for all weights. But different parts of a neural network benefit from different learning rates. Meta-SGD figures this out automatically by learning: - α_i: The optimal learning rate for parameter i - β_i: The optimal momentum for parameter i (optional) - d_i: The optimal update direction/sign for parameter i (optional)

Algorithm - Meta-SGD:

# Learn per-parameter optimizers
for each parameter θ_i:
    learning_rate_i = learnable_parameter
    momentum_i = learnable_parameter (optional)
    direction_i = learnable_parameter (optional)

# Meta-training episode
for each task in task_batch:
    # Inner loop: adapt to task
    adapted_params = initial_params.copy()
    for step = 1 to K_inner:
        gradients = compute_gradients(adapted_params, support_set)
        for i in range(num_params):
            # Per-parameter update rule
            adapted_params[i] = update_rule_i(
                adapted_params[i],
                gradients[i],
                learning_rate_i,
                momentum_i,
                direction_i
            )

    # Evaluate on query set
    query_loss = evaluate(adapted_params, query_set)

    # Meta-update: optimize per-parameter coefficients
    meta_gradients = compute_meta_gradients(query_loss)
    update_per_parameter_optimizers(meta_gradients)

Key Insights: 1. Per-Parameter Optimization: Each parameter gets its own learned optimizer configuration, allowing heterogeneous learning rates across layers. 2. First-Order Method: No Hessian computation needed, much faster than second-order MAML while maintaining strong performance. 3. Interpretable: Learned per-parameter learning rates reveal which parameters are most important for quick adaptation. 4. Flexible Update Rules: Can combine with various base optimizers (SGD, Adam, RMSprop) for different adaptation characteristics.

Reference: Li, Z., Zhou, F., Chen, F., & Li, H. (2017). Meta-SGD: Learning to Learn Quickly for Few-Shot Learning.

Constructors

MetaSGDAlgorithm(MetaSGDOptions<T, TInput, TOutput>)

Initializes a new instance of the MetaSGDAlgorithm class.

public MetaSGDAlgorithm(MetaSGDOptions<T, TInput, TOutput> options)

Parameters

options MetaSGDOptions<T, TInput, TOutput>

Meta-SGD configuration options containing the model and all hyperparameters.

Examples

// Create Meta-SGD with minimal configuration
var options = new MetaSGDOptions<double, Tensor, Tensor>(myNeuralNetwork);
var metaSGD = new MetaSGDAlgorithm<double, Tensor, Tensor>(options);

// Create Meta-SGD with full per-parameter optimization
var options = new MetaSGDOptions<double, Tensor, Tensor>(myNeuralNetwork)
{
    UpdateRuleType = MetaSGDUpdateRuleType.Adam,
    LearnLearningRate = true,
    LearnMomentum = true,
    LearnDirection = true,
    LearnAdamBetas = true
};
var metaSGD = new MetaSGDAlgorithm<double, Tensor, Tensor>(options);

Remarks

For Beginners: This creates a Meta-SGD model that learns per-parameter optimizers:

What Meta-SGD needs: - MetaModel: Neural network to be meta-trained (required) - UpdateRuleType: Type of update rule to learn (SGD, Adam, etc.) - LearnLearningRate: Whether to learn per-parameter learning rates (default: true) - LearnMomentum: Whether to learn per-parameter momentum (default: false) - LearnDirection: Whether to learn update direction sign (default: true)

What makes it different from MAML: - MAML: Same learning rate for all parameters - Meta-SGD: Different learning rate per parameter - Meta-SGD learns optimizers, MAML learns initialization - Meta-SGD is first-order (faster), MAML is second-order (more accurate)

Exceptions

ArgumentNullException

Thrown when options is null.

InvalidOperationException

Thrown when required components are not set in options.

ArgumentException

Thrown when configuration validation fails.

Properties

AlgorithmType

Gets the algorithm type identifier for this meta-learner.

public override MetaLearningAlgorithmType AlgorithmType { get; }

Property Value

MetaLearningAlgorithmType

Returns MetaSGD.

Remarks

This property identifies the algorithm as Meta-SGD, a first-order meta-learning algorithm that learns per-parameter learning rates, momentum terms, and update directions for fast task adaptation.

For Beginners: This tells the framework which meta-learning algorithm is being used. Meta-SGD is characterized by its per-parameter optimization approach, which is simpler and faster than MAML while achieving competitive performance on few-shot learning tasks.

Methods

Adapt(IMetaLearningTask<T, TInput, TOutput>)

Adapts the meta-learned model to a new task using the learned per-parameter optimizers.

public override IModel<TInput, TOutput, ModelMetadata<T>> Adapt(IMetaLearningTask<T, TInput, TOutput> task)

Parameters

task IMetaLearningTask<T, TInput, TOutput>

The new task containing support set examples for adaptation.

Returns

IModel<TInput, TOutput, ModelMetadata<T>>

A new model instance that has been adapted to the given task using learned optimizers.

Remarks

Meta-SGD adaptation uses the learned per-parameter learning rates, momentum, and directions to perform highly optimized gradient descent on the support set.

For Beginners: When adapting to a new task, Meta-SGD uses the learned per-parameter optimizers to update the model. Each weight gets updated at its own optimal rate, making adaptation much faster than using a single learning rate for all weights.

Adaptation Process:

for each adaptation step:
    gradients = compute_gradients(model, support_set)
    for each parameter i:
        update_i = α_i × d_i × gradients[i] + β_i × velocity[i]
        params[i] = params[i] - update_i

Where α_i, d_i, β_i are the learned per-parameter coefficients.

Advantages over MAML adaptation: - Uses optimized per-parameter learning rates (not one rate for all) - Can include learned momentum for faster convergence - Direction coefficients can flip/scale gradients as needed

Exceptions

ArgumentNullException

Thrown when task is null.

MetaTrain(TaskBatch<T, TInput, TOutput>)

Performs one meta-training step using Meta-SGD's per-parameter optimization approach.

public override T MetaTrain(TaskBatch<T, TInput, TOutput> taskBatch)

Parameters

taskBatch TaskBatch<T, TInput, TOutput>

A batch of tasks to meta-train on, each containing support and query sets.

Returns

T

The average loss across all tasks in the batch (evaluated on query sets).

Remarks

Meta-SGD meta-training optimizes per-parameter learning coefficients:

For each task: 1. Clone the meta-model with current meta-parameters 2. Perform K gradient descent steps using learned per-parameter optimizers 3. Evaluate adapted model on query set

Meta-Update: 1. Compute gradients of query loss w.r.t. per-parameter coefficients 2. Update learning rates: α_i = α_i - η × ∂L_query/∂α_i 3. Update momentum (if enabled): β_i = β_i - η × ∂L_query/∂β_i 4. Update direction (if enabled): d_i = d_i - η × ∂L_query/∂d_i

For Beginners: Meta-SGD learns how fast each weight should change. After seeing many tasks, it discovers that some weights need big updates (high learning rate) while others need small updates (low learning rate). This makes adaptation to new tasks much faster and more effective.

Key Difference from MAML: While MAML computes how initialization affects final loss (requires second-order gradients), Meta-SGD directly learns the optimal update magnitude for each parameter (first-order only).

Exceptions

ArgumentException

Thrown when the task batch is null or empty.