Class MomentumOptimizer<T, TInput, TOutput>

Namespace: AiDotNet.Optimizers

Assembly: AiDotNet.dll

Implements the Momentum optimization algorithm for gradient-based optimization.

public class MomentumOptimizer<T, TInput, TOutput> : GradientBasedOptimizerBase<T, TInput, TOutput>, IGradientBasedOptimizer<T, TInput, TOutput>, IOptimizer<T, TInput, TOutput>, IModelSerializer

Type Parameters

T: The numeric type used for calculations, typically float or double.
TInput
TOutput

Inheritance: object

OptimizerBase<T, TInput, TOutput>

GradientBasedOptimizerBase<T, TInput, TOutput>

MomentumOptimizer<T, TInput, TOutput>

Implements: IGradientBasedOptimizer<T, TInput, TOutput>

IOptimizer<T, TInput, TOutput>

IModelSerializer

Inherited Members: GradientBasedOptimizerBase<T, TInput, TOutput>.GradientOptions

GradientBasedOptimizerBase<T, TInput, TOutput>._previousGradient

GradientBasedOptimizerBase<T, TInput, TOutput>._lastComputedGradients

GradientBasedOptimizerBase<T, TInput, TOutput>.GradientCache

GradientBasedOptimizerBase<T, TInput, TOutput>.LossFunction

GradientBasedOptimizerBase<T, TInput, TOutput>.Regularization

GradientBasedOptimizerBase<T, TInput, TOutput>._mixedPrecisionContext

GradientBasedOptimizerBase<T, TInput, TOutput>._learningRateScheduler

GradientBasedOptimizerBase<T, TInput, TOutput>._schedulerStepMode

GradientBasedOptimizerBase<T, TInput, TOutput>._currentStep

GradientBasedOptimizerBase<T, TInput, TOutput>._currentEpoch

GradientBasedOptimizerBase<T, TInput, TOutput>.IsMixedPrecisionEnabled

GradientBasedOptimizerBase<T, TInput, TOutput>.LearningRateScheduler

GradientBasedOptimizerBase<T, TInput, TOutput>.SchedulerStepMode

GradientBasedOptimizerBase<T, TInput, TOutput>.CreateBatcher(OptimizationInputData<T, TInput, TOutput>, int)

GradientBasedOptimizerBase<T, TInput, TOutput>.CreateBatcher(OptimizationInputData<T, TInput, TOutput>, int, IDataSampler)

GradientBasedOptimizerBase<T, TInput, TOutput>.NotifyEpochStart(int)

GradientBasedOptimizerBase<T, TInput, TOutput>.LastComputedGradients

GradientBasedOptimizerBase<T, TInput, TOutput>.ApplyGradients(Vector<T>, IFullModel<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.ApplyGradients(Vector<T>, Vector<T>, IFullModel<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.ReverseUpdate(Vector<T>, Vector<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.CreateRegularization(GradientDescentOptimizerOptions<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.CalculateGradient(IFullModel<T, TInput, TOutput>, TInput, TOutput)

GradientBasedOptimizerBase<T, TInput, TOutput>.ApplyGradientClipping(Vector<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.AreGradientsExploding(double)

GradientBasedOptimizerBase<T, TInput, TOutput>.AreGradientsVanishing(double)

GradientBasedOptimizerBase<T, TInput, TOutput>.GetGradientNorm()

GradientBasedOptimizerBase<T, TInput, TOutput>.ComputeHessianEfficiently(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.ComputeHessianFiniteDifferences(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.LineSearch(IFullModel<T, TInput, TOutput>, Vector<T>, Vector<T>, OptimizationInputData<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.CalculateGradient(IFullModel<T, TInput, TOutput>, TInput, TOutput, int[])

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)

GradientBasedOptimizerBase<T, TInput, TOutput>.Reset()

GradientBasedOptimizerBase<T, TInput, TOutput>.StepScheduler()

GradientBasedOptimizerBase<T, TInput, TOutput>.OnEpochEnd()

GradientBasedOptimizerBase<T, TInput, TOutput>.OnBatchEnd()

GradientBasedOptimizerBase<T, TInput, TOutput>.IsInWarmupPhase()

GradientBasedOptimizerBase<T, TInput, TOutput>.GetCurrentLearningRate()

GradientBasedOptimizerBase<T, TInput, TOutput>.CurrentStep

GradientBasedOptimizerBase<T, TInput, TOutput>.CurrentEpoch

GradientBasedOptimizerBase<T, TInput, TOutput>.ApplyMomentum(Vector<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateParameters(List<ILayer<T>>)

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateParameters(Matrix<T>, Matrix<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateParameters(Tensor<T>, Tensor<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateParameters(Vector<T>, Vector<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.SupportsGpuUpdate

GradientBasedOptimizerBase<T, TInput, TOutput>._gpuState

GradientBasedOptimizerBase<T, TInput, TOutput>._gpuStateInitialized

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)

GradientBasedOptimizerBase<T, TInput, TOutput>.InitializeGpuState(int, IDirectGpuBackend)

GradientBasedOptimizerBase<T, TInput, TOutput>.DisposeGpuState()

OptimizerBase<T, TInput, TOutput>.Engine

OptimizerBase<T, TInput, TOutput>.NumOps

OptimizerBase<T, TInput, TOutput>.Random

OptimizerBase<T, TInput, TOutput>.Options

OptimizerBase<T, TInput, TOutput>.PredictionOptions

OptimizerBase<T, TInput, TOutput>.ModelStatsOptions

OptimizerBase<T, TInput, TOutput>.ModelEvaluator

OptimizerBase<T, TInput, TOutput>.FitDetector

OptimizerBase<T, TInput, TOutput>.FitnessCalculator

OptimizerBase<T, TInput, TOutput>.FitnessList

OptimizerBase<T, TInput, TOutput>.IterationHistoryList

OptimizerBase<T, TInput, TOutput>.ModelCache

OptimizerBase<T, TInput, TOutput>.CurrentLearningRate

OptimizerBase<T, TInput, TOutput>.CurrentMomentum

OptimizerBase<T, TInput, TOutput>.IterationsWithoutImprovement

OptimizerBase<T, TInput, TOutput>.IterationsWithImprovement

OptimizerBase<T, TInput, TOutput>.Model

OptimizerBase<T, TInput, TOutput>.Optimize(OptimizationInputData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.GetCachedStepData(string)

OptimizerBase<T, TInput, TOutput>.CacheStepData(string, OptimizationStepData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.AdjustModelParameters(IFullModel<T, TInput, TOutput>, double, double)

OptimizerBase<T, TInput, TOutput>.RandomlySelectFeatures(int, int?, int?)

OptimizerBase<T, TInput, TOutput>.ApplyFeatureSelection(IFullModel<T, TInput, TOutput>, List<int>)

OptimizerBase<T, TInput, TOutput>.AdjustParameters(Vector<T>, double, double)

OptimizerBase<T, TInput, TOutput>.EvaluateSolution(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.PrepareAndEvaluateSolution(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.CalculateLoss(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.CreateOptimizationResult(OptimizationStepData<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.ApplyFeatureSelection(IFullModel<T, TInput, TOutput>, int)

OptimizerBase<T, TInput, TOutput>.CreateSolution(TInput)

OptimizerBase<T, TInput, TOutput>.GenerateCacheKey(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.UpdateBestSolution(OptimizationStepData<T, TInput, TOutput>, ref OptimizationStepData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.InitializeAdaptiveParameters()

OptimizerBase<T, TInput, TOutput>.Reset()

OptimizerBase<T, TInput, TOutput>.ResetAdaptiveParameters()

OptimizerBase<T, TInput, TOutput>.UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput>, OptimizationStepData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.UpdateIterationHistoryAndCheckEarlyStopping(int, OptimizationStepData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.ShouldEarlyStop()

OptimizerBase<T, TInput, TOutput>.Serialize()

OptimizerBase<T, TInput, TOutput>.Deserialize(byte[])

OptimizerBase<T, TInput, TOutput>.SerializeAdditionalData(BinaryWriter)

OptimizerBase<T, TInput, TOutput>.DeserializeAdditionalData(BinaryReader)

OptimizerBase<T, TInput, TOutput>.UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.Step()

OptimizerBase<T, TInput, TOutput>.CalculateUpdate(Dictionary<string, Vector<T>>)

OptimizerBase<T, TInput, TOutput>.GetOptions()

OptimizerBase<T, TInput, TOutput>.CalculateUpdate(Vector<T>, Vector<T>)

OptimizerBase<T, TInput, TOutput>.InitializeRandomSolution(Vector<T>, Vector<T>)

OptimizerBase<T, TInput, TOutput>.InitializeRandomSolution(TInput)

OptimizerBase<T, TInput, TOutput>.SaveModel(string)

OptimizerBase<T, TInput, TOutput>.LoadModel(string)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Extension Methods: DistributedExtensions.AsDistributed<T, TInput, TOutput>(IOptimizer<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IOptimizer<T, TInput, TOutput>, IShardingConfiguration<T>)

Remarks

The Momentum optimizer is an extension of gradient descent that helps accelerate the optimization process in relevant directions and dampens oscillations. It does this by adding a fraction of the update vector of the past time step to the current update vector.

For Beginners: Imagine you're rolling a ball down a hill to find the lowest point. The Momentum optimizer is like giving that ball some "memory" of its previous movements. This helps it move faster in consistent directions and resist getting stuck in small bumps or divots along the way.

Constructors

MomentumOptimizer(IFullModel<T, TInput, TOutput>, MomentumOptimizerOptions<T, TInput, TOutput>?)

Initializes a new instance of the MomentumOptimizer class.

public MomentumOptimizer(IFullModel<T, TInput, TOutput> model, MomentumOptimizerOptions<T, TInput, TOutput>? options = null)

Parameters

model IFullModel<T, TInput, TOutput>: The model to optimize.
options MomentumOptimizerOptions<T, TInput, TOutput>: The options for configuring the Momentum optimizer.

Remarks

This constructor sets up the optimizer with the provided options and dependencies. If no options are provided, it uses default settings.

For Beginners: This is like setting up your ball-rolling experiment. You're deciding on the properties of the ball (like its size and bounciness) and the hill (like its steepness and texture).

Properties

SupportsGpuUpdate

Gets whether this optimizer supports GPU-accelerated parameter updates.

public override bool SupportsGpuUpdate { get; }

Property Value

bool

Methods

Deserialize(byte[])

Deserializes a byte array to restore the optimizer's state.

public override void Deserialize(byte[] data)

Parameters

data byte[]: The byte array containing the serialized optimizer state.

Remarks

This method takes a byte array (previously created by Serialize) and uses it to restore the optimizer's state, including its base class state and options.

For Beginners: This is like using a detailed blueprint to recreate your ball-rolling experiment exactly as it was at a certain point. It allows you to set up the experiment to match a previous state, with all the same rules and conditions.

Exceptions

InvalidOperationException: Thrown when the optimizer options cannot be deserialized.

DisposeGpuState()

Disposes GPU-allocated optimizer state.

public override void DisposeGpuState()

GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)

Generates a unique key for caching gradients based on the model and input data.

protected override string GenerateGradientCacheKey(IFullModel<T, TInput, TOutput> model, TInput X, TOutput y)

Parameters

model IFullModel<T, TInput, TOutput>: The symbolic model being optimized.
X TInput: The input data matrix.
y TOutput: The target output vector.

Returns

string: A string representing the unique gradient cache key.

Remarks

This method creates a unique identifier for caching gradients. It combines the base gradient cache key with specific parameters of the Momentum algorithm.

For Beginners: Imagine you're leaving markers along your ball-rolling path. This method creates a unique label for each marker, combining information about the hill (the model and data) with specifics about how you're rolling the ball (initial momentum and learning rate). This helps you quickly recognize and use information from similar situations you've encountered before.

GetOptions()

Gets the current optimization algorithm options.

public override OptimizationAlgorithmOptions<T, TInput, TOutput> GetOptions()

Returns

OptimizationAlgorithmOptions<T, TInput, TOutput>: The current MomentumOptimizerOptions object.

Remarks

This method returns the current options used by the Momentum optimizer.

For Beginners: This is like checking your current ball-rolling rules. It lets you see all the settings and strategies you're currently using in your experiment.

InitializeAdaptiveParameters()

Initializes adaptive parameters for the optimization process.

protected override void InitializeAdaptiveParameters()

Remarks

This method sets up the initial learning rate and momentum for the optimization process based on the options provided.

For Beginners: This is like setting the initial speed of your ball (learning rate) and how much it remembers its previous movements (momentum) before you start rolling it down the hill.

InitializeGpuState(int, IDirectGpuBackend)

Initializes Momentum optimizer state on the GPU.

public override void InitializeGpuState(int parameterCount, IDirectGpuBackend backend)

Parameters

parameterCount int
backend IDirectGpuBackend

Optimize(OptimizationInputData<T, TInput, TOutput>)

Performs the optimization process using the Momentum algorithm.

public override OptimizationResult<T, TInput, TOutput> Optimize(OptimizationInputData<T, TInput, TOutput> inputData)

Parameters

inputData OptimizationInputData<T, TInput, TOutput>: The input data for the optimization process.

Returns

OptimizationResult<T, TInput, TOutput>: The result of the optimization process.

Remarks

This method implements the main optimization loop. It iterates through the data, calculating gradients, updating the velocity (momentum), and adjusting the model parameters accordingly.

For Beginners: This is the actual process of rolling the ball down the hill. In each step, you're calculating which way the ball should roll (gradient), how fast it's moving (velocity), and where it ends up (new solution). You keep doing this until the ball finds the lowest point or you've rolled it enough times.

DataLoader Integration: This optimizer now uses the DataLoader batching infrastructure which supports: - Custom samplers (weighted, stratified, curriculum, importance, active learning) - Reproducible shuffling via RandomSeed - Option to drop incomplete final batches Set these options via GradientBasedOptimizerOptions.DataSampler, ShuffleData, DropLastBatch, and RandomSeed.

ReverseUpdate(Vector<T>, Vector<T>)

Reverses a momentum-based gradient update to recover original parameters.

public override Vector<T> ReverseUpdate(Vector<T> updatedParameters, Vector<T> appliedGradients)

Parameters

updatedParameters Vector<T>: Parameters after gradient application
appliedGradients Vector<T>: The gradients that were applied (not used directly for momentum reversal)

Returns

Vector<T>: Original parameters before the gradient update

Remarks

For momentum optimizer, the forward update is: 1. velocity_new = momentum * velocity_old + learning_rate * gradient 2. params_new = params_old - velocity_new

To reverse: params_old = params_new + velocity_new

This requires access to the current velocity state, which is maintained by the optimizer.

For Beginners: This is like rewinding your ball-rolling experiment. Given where the ball ended up (updated parameters) and how fast it was moving (velocity), we can figure out where it started from.

Exceptions

ArgumentNullException: If parameters or gradients are null
ArgumentException: If parameter and gradient sizes do not match

Serialize()

Serializes the optimizer's state into a byte array.

public override byte[] Serialize()

Returns

byte[]: A byte array representing the serialized state of the optimizer.

Remarks

This method converts the current state of the optimizer, including its base class state and options, into a byte array. This is useful for saving the optimizer's state or transferring it between systems.

For Beginners: Think of this as taking a snapshot of your entire ball-rolling experiment. It captures all the details of your current setup, including the ball's position, speed, and all your rules. This snapshot can be used to recreate the exact same experiment later or share it with others.

UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput>, OptimizationStepData<T, TInput, TOutput>)

Updates the adaptive parameters of the optimizer based on the current and previous optimization steps.

protected override void UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput> currentStepData, OptimizationStepData<T, TInput, TOutput> previousStepData)

Parameters

currentStepData OptimizationStepData<T, TInput, TOutput>: Data from the current optimization step.
previousStepData OptimizationStepData<T, TInput, TOutput>: Data from the previous optimization step.

Remarks

This method adjusts the learning rate and momentum based on the performance of the current step compared to the previous step. If improvement is seen, the learning rate and momentum may be increased, otherwise they may be decreased.

For Beginners: This is like adjusting how you roll the ball based on how well you're doing. If you're getting closer to the bottom of the hill, you might roll the ball a bit faster or give it more momentum. If you're not improving, you might slow down or reduce the momentum to be more careful.

UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)

Updates the optimizer's options with new settings.

protected override void UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput> options)

Parameters

options OptimizationAlgorithmOptions<T, TInput, TOutput>: The new options to be applied to the optimizer.

Remarks

This method allows updating the optimizer's settings during runtime. It ensures that only compatible option types are used with this optimizer.

For Beginners: This is like changing the rules of how you're rolling the ball mid-experiment. It makes sure you're only using rules that work for this specific type of ball-rolling (Momentum optimization).

Exceptions

ArgumentException: Thrown when the provided options are not of the correct type.

UpdateParameters(Vector<T>, Vector<T>)

Updates a vector of parameters using the Momentum optimization algorithm.

public override Vector<T> UpdateParameters(Vector<T> parameters, Vector<T> gradient)

Parameters

parameters Vector<T>: The current parameter vector to be updated.
gradient Vector<T>: The gradient vector corresponding to the parameters.

Returns

Vector<T>: The updated parameter vector.

Remarks

This method implements the Momentum update rule by maintaining a velocity vector that accumulates a weighted average of past gradients. The velocity combines the previous velocity (scaled by momentum) with the current gradient (scaled by learning rate).

For Beginners: This method applies Momentum to adjust parameters. Like a ball rolling down a hill, it remembers its previous direction and speed (velocity) and combines it with the current slope (gradient) to determine where to go next. This helps the optimizer move faster in consistent directions and resist getting stuck in small bumps.

UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)

Updates parameters on the GPU using the SGD with momentum kernel.

public override void UpdateParametersGpu(IGpuBuffer parameters, IGpuBuffer gradients, int parameterCount, IDirectGpuBackend backend)

Parameters

parameters IGpuBuffer
gradients IGpuBuffer
parameterCount int
backend IDirectGpuBackend

UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)

Updates the current solution based on the calculated velocity.

protected override IFullModel<T, TInput, TOutput> UpdateSolution(IFullModel<T, TInput, TOutput> currentSolution, Vector<T> velocity)

Parameters

currentSolution IFullModel<T, TInput, TOutput>: The current model solution.
velocity Vector<T>: The current velocity vector.

Returns

IFullModel<T, TInput, TOutput>: An updated symbolic model with improved coefficients.

Remarks

This method applies the velocity to the current solution, adjusting each coefficient accordingly.

For Beginners: This is like determining the ball's new position after it has rolled. You're using the ball's speed and direction (velocity) to figure out where it ends up.

Table of Contents

Class MomentumOptimizer<T, TInput, TOutput>

Type Parameters

Remarks

Constructors

MomentumOptimizer(IFullModel<T, TInput, TOutput>, MomentumOptimizerOptions<T, TInput, TOutput>?)

Parameters

Remarks

Properties

SupportsGpuUpdate

Property Value

Methods

Deserialize(byte[])

Parameters

Remarks

Exceptions

DisposeGpuState()

GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)

Parameters

Returns

Remarks

GetOptions()

Returns

Remarks

InitializeAdaptiveParameters()

Remarks

InitializeGpuState(int, IDirectGpuBackend)

Parameters

Optimize(OptimizationInputData<T, TInput, TOutput>)

Parameters

Returns

Remarks

ReverseUpdate(Vector<T>, Vector<T>)

Parameters

Returns

Remarks

Exceptions

Serialize()

Returns

Remarks

UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput>, OptimizationStepData<T, TInput, TOutput>)

Parameters

Remarks

UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)

Parameters

Remarks

Exceptions

UpdateParameters(Vector<T>, Vector<T>)

Parameters

Returns

Remarks

UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)

Parameters

UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)

Parameters

Returns

Remarks