Table of Contents

Class MomentumOptimizer<T, TInput, TOutput>

Namespace
AiDotNet.Optimizers
Assembly
AiDotNet.dll

Implements the Momentum optimization algorithm for gradient-based optimization.

public class MomentumOptimizer<T, TInput, TOutput> : GradientBasedOptimizerBase<T, TInput, TOutput>, IGradientBasedOptimizer<T, TInput, TOutput>, IOptimizer<T, TInput, TOutput>, IModelSerializer

Type Parameters

T

The numeric type used for calculations, typically float or double.

TInput
TOutput
Inheritance
OptimizerBase<T, TInput, TOutput>
GradientBasedOptimizerBase<T, TInput, TOutput>
MomentumOptimizer<T, TInput, TOutput>
Implements
IGradientBasedOptimizer<T, TInput, TOutput>
IOptimizer<T, TInput, TOutput>
Inherited Members
Extension Methods

Remarks

The Momentum optimizer is an extension of gradient descent that helps accelerate the optimization process in relevant directions and dampens oscillations. It does this by adding a fraction of the update vector of the past time step to the current update vector.

For Beginners: Imagine you're rolling a ball down a hill to find the lowest point. The Momentum optimizer is like giving that ball some "memory" of its previous movements. This helps it move faster in consistent directions and resist getting stuck in small bumps or divots along the way.

Constructors

MomentumOptimizer(IFullModel<T, TInput, TOutput>, MomentumOptimizerOptions<T, TInput, TOutput>?)

Initializes a new instance of the MomentumOptimizer class.

public MomentumOptimizer(IFullModel<T, TInput, TOutput> model, MomentumOptimizerOptions<T, TInput, TOutput>? options = null)

Parameters

model IFullModel<T, TInput, TOutput>

The model to optimize.

options MomentumOptimizerOptions<T, TInput, TOutput>

The options for configuring the Momentum optimizer.

Remarks

This constructor sets up the optimizer with the provided options and dependencies. If no options are provided, it uses default settings.

For Beginners: This is like setting up your ball-rolling experiment. You're deciding on the properties of the ball (like its size and bounciness) and the hill (like its steepness and texture).

Properties

SupportsGpuUpdate

Gets whether this optimizer supports GPU-accelerated parameter updates.

public override bool SupportsGpuUpdate { get; }

Property Value

bool

Methods

Deserialize(byte[])

Deserializes a byte array to restore the optimizer's state.

public override void Deserialize(byte[] data)

Parameters

data byte[]

The byte array containing the serialized optimizer state.

Remarks

This method takes a byte array (previously created by Serialize) and uses it to restore the optimizer's state, including its base class state and options.

For Beginners: This is like using a detailed blueprint to recreate your ball-rolling experiment exactly as it was at a certain point. It allows you to set up the experiment to match a previous state, with all the same rules and conditions.

Exceptions

InvalidOperationException

Thrown when the optimizer options cannot be deserialized.

DisposeGpuState()

Disposes GPU-allocated optimizer state.

public override void DisposeGpuState()

GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)

Generates a unique key for caching gradients based on the model and input data.

protected override string GenerateGradientCacheKey(IFullModel<T, TInput, TOutput> model, TInput X, TOutput y)

Parameters

model IFullModel<T, TInput, TOutput>

The symbolic model being optimized.

X TInput

The input data matrix.

y TOutput

The target output vector.

Returns

string

A string representing the unique gradient cache key.

Remarks

This method creates a unique identifier for caching gradients. It combines the base gradient cache key with specific parameters of the Momentum algorithm.

For Beginners: Imagine you're leaving markers along your ball-rolling path. This method creates a unique label for each marker, combining information about the hill (the model and data) with specifics about how you're rolling the ball (initial momentum and learning rate). This helps you quickly recognize and use information from similar situations you've encountered before.

GetOptions()

Gets the current optimization algorithm options.

public override OptimizationAlgorithmOptions<T, TInput, TOutput> GetOptions()

Returns

OptimizationAlgorithmOptions<T, TInput, TOutput>

The current MomentumOptimizerOptions object.

Remarks

This method returns the current options used by the Momentum optimizer.

For Beginners: This is like checking your current ball-rolling rules. It lets you see all the settings and strategies you're currently using in your experiment.

InitializeAdaptiveParameters()

Initializes adaptive parameters for the optimization process.

protected override void InitializeAdaptiveParameters()

Remarks

This method sets up the initial learning rate and momentum for the optimization process based on the options provided.

For Beginners: This is like setting the initial speed of your ball (learning rate) and how much it remembers its previous movements (momentum) before you start rolling it down the hill.

InitializeGpuState(int, IDirectGpuBackend)

Initializes Momentum optimizer state on the GPU.

public override void InitializeGpuState(int parameterCount, IDirectGpuBackend backend)

Parameters

parameterCount int
backend IDirectGpuBackend

Optimize(OptimizationInputData<T, TInput, TOutput>)

Performs the optimization process using the Momentum algorithm.

public override OptimizationResult<T, TInput, TOutput> Optimize(OptimizationInputData<T, TInput, TOutput> inputData)

Parameters

inputData OptimizationInputData<T, TInput, TOutput>

The input data for the optimization process.

Returns

OptimizationResult<T, TInput, TOutput>

The result of the optimization process.

Remarks

This method implements the main optimization loop. It iterates through the data, calculating gradients, updating the velocity (momentum), and adjusting the model parameters accordingly.

For Beginners: This is the actual process of rolling the ball down the hill. In each step, you're calculating which way the ball should roll (gradient), how fast it's moving (velocity), and where it ends up (new solution). You keep doing this until the ball finds the lowest point or you've rolled it enough times.

DataLoader Integration: This optimizer now uses the DataLoader batching infrastructure which supports: - Custom samplers (weighted, stratified, curriculum, importance, active learning) - Reproducible shuffling via RandomSeed - Option to drop incomplete final batches Set these options via GradientBasedOptimizerOptions.DataSampler, ShuffleData, DropLastBatch, and RandomSeed.

ReverseUpdate(Vector<T>, Vector<T>)

Reverses a momentum-based gradient update to recover original parameters.

public override Vector<T> ReverseUpdate(Vector<T> updatedParameters, Vector<T> appliedGradients)

Parameters

updatedParameters Vector<T>

Parameters after gradient application

appliedGradients Vector<T>

The gradients that were applied (not used directly for momentum reversal)

Returns

Vector<T>

Original parameters before the gradient update

Remarks

For momentum optimizer, the forward update is: 1. velocity_new = momentum * velocity_old + learning_rate * gradient 2. params_new = params_old - velocity_new

To reverse: params_old = params_new + velocity_new

This requires access to the current velocity state, which is maintained by the optimizer.

For Beginners: This is like rewinding your ball-rolling experiment. Given where the ball ended up (updated parameters) and how fast it was moving (velocity), we can figure out where it started from.

Exceptions

ArgumentNullException

If parameters or gradients are null

ArgumentException

If parameter and gradient sizes do not match

Serialize()

Serializes the optimizer's state into a byte array.

public override byte[] Serialize()

Returns

byte[]

A byte array representing the serialized state of the optimizer.

Remarks

This method converts the current state of the optimizer, including its base class state and options, into a byte array. This is useful for saving the optimizer's state or transferring it between systems.

For Beginners: Think of this as taking a snapshot of your entire ball-rolling experiment. It captures all the details of your current setup, including the ball's position, speed, and all your rules. This snapshot can be used to recreate the exact same experiment later or share it with others.

UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput>, OptimizationStepData<T, TInput, TOutput>)

Updates the adaptive parameters of the optimizer based on the current and previous optimization steps.

protected override void UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput> currentStepData, OptimizationStepData<T, TInput, TOutput> previousStepData)

Parameters

currentStepData OptimizationStepData<T, TInput, TOutput>

Data from the current optimization step.

previousStepData OptimizationStepData<T, TInput, TOutput>

Data from the previous optimization step.

Remarks

This method adjusts the learning rate and momentum based on the performance of the current step compared to the previous step. If improvement is seen, the learning rate and momentum may be increased, otherwise they may be decreased.

For Beginners: This is like adjusting how you roll the ball based on how well you're doing. If you're getting closer to the bottom of the hill, you might roll the ball a bit faster or give it more momentum. If you're not improving, you might slow down or reduce the momentum to be more careful.

UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)

Updates the optimizer's options with new settings.

protected override void UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput> options)

Parameters

options OptimizationAlgorithmOptions<T, TInput, TOutput>

The new options to be applied to the optimizer.

Remarks

This method allows updating the optimizer's settings during runtime. It ensures that only compatible option types are used with this optimizer.

For Beginners: This is like changing the rules of how you're rolling the ball mid-experiment. It makes sure you're only using rules that work for this specific type of ball-rolling (Momentum optimization).

Exceptions

ArgumentException

Thrown when the provided options are not of the correct type.

UpdateParameters(Vector<T>, Vector<T>)

Updates a vector of parameters using the Momentum optimization algorithm.

public override Vector<T> UpdateParameters(Vector<T> parameters, Vector<T> gradient)

Parameters

parameters Vector<T>

The current parameter vector to be updated.

gradient Vector<T>

The gradient vector corresponding to the parameters.

Returns

Vector<T>

The updated parameter vector.

Remarks

This method implements the Momentum update rule by maintaining a velocity vector that accumulates a weighted average of past gradients. The velocity combines the previous velocity (scaled by momentum) with the current gradient (scaled by learning rate).

For Beginners: This method applies Momentum to adjust parameters. Like a ball rolling down a hill, it remembers its previous direction and speed (velocity) and combines it with the current slope (gradient) to determine where to go next. This helps the optimizer move faster in consistent directions and resist getting stuck in small bumps.

UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)

Updates parameters on the GPU using the SGD with momentum kernel.

public override void UpdateParametersGpu(IGpuBuffer parameters, IGpuBuffer gradients, int parameterCount, IDirectGpuBackend backend)

Parameters

parameters IGpuBuffer
gradients IGpuBuffer
parameterCount int
backend IDirectGpuBackend

UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)

Updates the current solution based on the calculated velocity.

protected override IFullModel<T, TInput, TOutput> UpdateSolution(IFullModel<T, TInput, TOutput> currentSolution, Vector<T> velocity)

Parameters

currentSolution IFullModel<T, TInput, TOutput>

The current model solution.

velocity Vector<T>

The current velocity vector.

Returns

IFullModel<T, TInput, TOutput>

An updated symbolic model with improved coefficients.

Remarks

This method applies the velocity to the current solution, adjusting each coefficient accordingly.

For Beginners: This is like determining the ball's new position after it has rolled. You're using the ball's speed and direction (velocity) to figure out where it ends up.