Class RootMeanSquarePropagationOptimizer<T, TInput, TOutput>

Namespace: AiDotNet.Optimizers

Assembly: AiDotNet.dll

Implements the Root Mean Square Propagation (RMSProp) optimization algorithm, an adaptive learning rate method.

public class RootMeanSquarePropagationOptimizer<T, TInput, TOutput> : GradientBasedOptimizerBase<T, TInput, TOutput>, IGradientBasedOptimizer<T, TInput, TOutput>, IOptimizer<T, TInput, TOutput>, IModelSerializer

Type Parameters

T: The numeric type used for calculations, typically float or double.
TInput
TOutput

Inheritance: object

OptimizerBase<T, TInput, TOutput>

GradientBasedOptimizerBase<T, TInput, TOutput>

RootMeanSquarePropagationOptimizer<T, TInput, TOutput>

Implements: IGradientBasedOptimizer<T, TInput, TOutput>

IOptimizer<T, TInput, TOutput>

IModelSerializer

Inherited Members: GradientBasedOptimizerBase<T, TInput, TOutput>.GradientOptions

GradientBasedOptimizerBase<T, TInput, TOutput>._previousGradient

GradientBasedOptimizerBase<T, TInput, TOutput>._lastComputedGradients

GradientBasedOptimizerBase<T, TInput, TOutput>.GradientCache

GradientBasedOptimizerBase<T, TInput, TOutput>.LossFunction

GradientBasedOptimizerBase<T, TInput, TOutput>.Regularization

GradientBasedOptimizerBase<T, TInput, TOutput>._mixedPrecisionContext

GradientBasedOptimizerBase<T, TInput, TOutput>._learningRateScheduler

GradientBasedOptimizerBase<T, TInput, TOutput>._schedulerStepMode

GradientBasedOptimizerBase<T, TInput, TOutput>._currentStep

GradientBasedOptimizerBase<T, TInput, TOutput>._currentEpoch

GradientBasedOptimizerBase<T, TInput, TOutput>.IsMixedPrecisionEnabled

GradientBasedOptimizerBase<T, TInput, TOutput>.LearningRateScheduler

GradientBasedOptimizerBase<T, TInput, TOutput>.SchedulerStepMode

GradientBasedOptimizerBase<T, TInput, TOutput>.CreateBatcher(OptimizationInputData<T, TInput, TOutput>, int)

GradientBasedOptimizerBase<T, TInput, TOutput>.CreateBatcher(OptimizationInputData<T, TInput, TOutput>, int, IDataSampler)

GradientBasedOptimizerBase<T, TInput, TOutput>.NotifyEpochStart(int)

GradientBasedOptimizerBase<T, TInput, TOutput>.LastComputedGradients

GradientBasedOptimizerBase<T, TInput, TOutput>.ApplyGradients(Vector<T>, IFullModel<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.ApplyGradients(Vector<T>, Vector<T>, IFullModel<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.ReverseUpdate(Vector<T>, Vector<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.CreateRegularization(GradientDescentOptimizerOptions<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.CalculateGradient(IFullModel<T, TInput, TOutput>, TInput, TOutput)

GradientBasedOptimizerBase<T, TInput, TOutput>.ApplyGradientClipping(Vector<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.AreGradientsExploding(double)

GradientBasedOptimizerBase<T, TInput, TOutput>.AreGradientsVanishing(double)

GradientBasedOptimizerBase<T, TInput, TOutput>.GetGradientNorm()

GradientBasedOptimizerBase<T, TInput, TOutput>.ComputeHessianEfficiently(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.ComputeHessianFiniteDifferences(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.LineSearch(IFullModel<T, TInput, TOutput>, Vector<T>, Vector<T>, OptimizationInputData<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.CalculateGradient(IFullModel<T, TInput, TOutput>, TInput, TOutput, int[])

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)

GradientBasedOptimizerBase<T, TInput, TOutput>.Reset()

GradientBasedOptimizerBase<T, TInput, TOutput>.StepScheduler()

GradientBasedOptimizerBase<T, TInput, TOutput>.OnEpochEnd()

GradientBasedOptimizerBase<T, TInput, TOutput>.OnBatchEnd()

GradientBasedOptimizerBase<T, TInput, TOutput>.IsInWarmupPhase()

GradientBasedOptimizerBase<T, TInput, TOutput>.GetCurrentLearningRate()

GradientBasedOptimizerBase<T, TInput, TOutput>.CurrentStep

GradientBasedOptimizerBase<T, TInput, TOutput>.CurrentEpoch

GradientBasedOptimizerBase<T, TInput, TOutput>.ApplyMomentum(Vector<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateParameters(List<ILayer<T>>)

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateParameters(Matrix<T>, Matrix<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateParameters(Tensor<T>, Tensor<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateParameters(Vector<T>, Vector<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.SupportsGpuUpdate

GradientBasedOptimizerBase<T, TInput, TOutput>._gpuState

GradientBasedOptimizerBase<T, TInput, TOutput>._gpuStateInitialized

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)

GradientBasedOptimizerBase<T, TInput, TOutput>.InitializeGpuState(int, IDirectGpuBackend)

GradientBasedOptimizerBase<T, TInput, TOutput>.DisposeGpuState()

OptimizerBase<T, TInput, TOutput>.Engine

OptimizerBase<T, TInput, TOutput>.NumOps

OptimizerBase<T, TInput, TOutput>.Random

OptimizerBase<T, TInput, TOutput>.Options

OptimizerBase<T, TInput, TOutput>.PredictionOptions

OptimizerBase<T, TInput, TOutput>.ModelStatsOptions

OptimizerBase<T, TInput, TOutput>.ModelEvaluator

OptimizerBase<T, TInput, TOutput>.FitDetector

OptimizerBase<T, TInput, TOutput>.FitnessCalculator

OptimizerBase<T, TInput, TOutput>.FitnessList

OptimizerBase<T, TInput, TOutput>.IterationHistoryList

OptimizerBase<T, TInput, TOutput>.ModelCache

OptimizerBase<T, TInput, TOutput>.CurrentLearningRate

OptimizerBase<T, TInput, TOutput>.CurrentMomentum

OptimizerBase<T, TInput, TOutput>.IterationsWithoutImprovement

OptimizerBase<T, TInput, TOutput>.IterationsWithImprovement

OptimizerBase<T, TInput, TOutput>.Model

OptimizerBase<T, TInput, TOutput>.Optimize(OptimizationInputData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.GetCachedStepData(string)

OptimizerBase<T, TInput, TOutput>.CacheStepData(string, OptimizationStepData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.AdjustModelParameters(IFullModel<T, TInput, TOutput>, double, double)

OptimizerBase<T, TInput, TOutput>.RandomlySelectFeatures(int, int?, int?)

OptimizerBase<T, TInput, TOutput>.ApplyFeatureSelection(IFullModel<T, TInput, TOutput>, List<int>)

OptimizerBase<T, TInput, TOutput>.AdjustParameters(Vector<T>, double, double)

OptimizerBase<T, TInput, TOutput>.EvaluateSolution(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.PrepareAndEvaluateSolution(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.CalculateLoss(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.CreateOptimizationResult(OptimizationStepData<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.ApplyFeatureSelection(IFullModel<T, TInput, TOutput>, int)

OptimizerBase<T, TInput, TOutput>.CreateSolution(TInput)

OptimizerBase<T, TInput, TOutput>.GenerateCacheKey(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.UpdateBestSolution(OptimizationStepData<T, TInput, TOutput>, ref OptimizationStepData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.InitializeAdaptiveParameters()

OptimizerBase<T, TInput, TOutput>.Reset()

OptimizerBase<T, TInput, TOutput>.ResetAdaptiveParameters()

OptimizerBase<T, TInput, TOutput>.UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput>, OptimizationStepData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.UpdateIterationHistoryAndCheckEarlyStopping(int, OptimizationStepData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.ShouldEarlyStop()

OptimizerBase<T, TInput, TOutput>.Serialize()

OptimizerBase<T, TInput, TOutput>.Deserialize(byte[])

OptimizerBase<T, TInput, TOutput>.SerializeAdditionalData(BinaryWriter)

OptimizerBase<T, TInput, TOutput>.DeserializeAdditionalData(BinaryReader)

OptimizerBase<T, TInput, TOutput>.UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.Step()

OptimizerBase<T, TInput, TOutput>.CalculateUpdate(Dictionary<string, Vector<T>>)

OptimizerBase<T, TInput, TOutput>.GetOptions()

OptimizerBase<T, TInput, TOutput>.CalculateUpdate(Vector<T>, Vector<T>)

OptimizerBase<T, TInput, TOutput>.InitializeRandomSolution(Vector<T>, Vector<T>)

OptimizerBase<T, TInput, TOutput>.InitializeRandomSolution(TInput)

OptimizerBase<T, TInput, TOutput>.SaveModel(string)

OptimizerBase<T, TInput, TOutput>.LoadModel(string)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Extension Methods: DistributedExtensions.AsDistributed<T, TInput, TOutput>(IOptimizer<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IOptimizer<T, TInput, TOutput>, IShardingConfiguration<T>)

Remarks

RMSProp is an adaptive learning rate optimization algorithm designed to handle non-stationary objectives and accelerate convergence. It maintains a moving average of the squared gradients for each parameter and divides the learning rate by the square root of this average. This approach allows the algorithm to use a larger learning rate for parameters with small gradients and a smaller learning rate for parameters with large gradients, leading to more efficient optimization.

For Beginners: RMSProp is like a hiker who adjusts their step size differently for each direction.

Imagine a hiker exploring mountains with different terrains:

On steep slopes (large gradients), the hiker takes small, careful steps
On gentle slopes (small gradients), the hiker takes larger, confident steps
The hiker remembers how steep each direction has been recently (using a moving average)
This memory helps the hiker adjust their steps even as the terrain changes

This adaptive approach helps the algorithm find good solutions more quickly by:

Preventing wild overshooting on steep slopes
Making faster progress on gentle terrain
Adjusting automatically to different parts of the solution space

Constructors

RootMeanSquarePropagationOptimizer(IFullModel<T, TInput, TOutput>, RootMeanSquarePropagationOptimizerOptions<T, TInput, TOutput>?, IEngine?)

Initializes a new instance of the RootMeanSquarePropagationOptimizer<T> class with the specified options and components.

public RootMeanSquarePropagationOptimizer(IFullModel<T, TInput, TOutput> model, RootMeanSquarePropagationOptimizerOptions<T, TInput, TOutput>? options = null, IEngine? engine = null)

Parameters

model IFullModel<T, TInput, TOutput>
options RootMeanSquarePropagationOptimizerOptions<T, TInput, TOutput>: The RMSProp optimization options, or null to use default options.
engine IEngine

Remarks

This constructor creates a new RMSProp optimizer with the specified options and components. If any parameter is null, a default implementation is used. The constructor initializes the iteration counter, squared gradient vector, and options.

For Beginners: This is the starting point for creating a new optimizer.

Think of it like preparing for a hiking expedition:

You can provide custom settings (options) or use the default ones
You can provide specialized tools (evaluators, calculators) or use the basic ones
It initializes everything the optimizer needs to start working
The squared gradient starts empty because there's no history yet
The step counter starts at zero because no steps have been taken

This constructor gets everything ready so you can start the optimization process.

Properties

SupportsGpuUpdate

Gets whether this optimizer supports GPU-accelerated parameter updates.

public override bool SupportsGpuUpdate { get; }

Property Value

bool

Methods

Deserialize(byte[])

Reconstructs the RMSProp optimizer from a serialized byte array.

public override void Deserialize(byte[] data)

Parameters

data byte[]: The byte array containing the serialized optimizer.

Remarks

This method overrides the base implementation to handle RMSProp-specific information during deserialization. It first deserializes the base class data, then reconstructs the iteration count, squared gradient vector, and options.

For Beginners: This method restores the optimizer from a previously saved state.

It's like restoring from a snapshot:

First, it loads all the general optimizer information
Then, it loads the RMSProp-specific state and settings
It reconstructs the optimizer to the exact state it was in when saved

This allows you to:

Continue working with an optimizer you previously saved
Use an optimizer that someone else created and shared
Revert to a backup if needed

Exceptions

InvalidOperationException: Thrown when the data cannot be deserialized.

DisposeGpuState()

Disposes GPU-allocated optimizer state.

public override void DisposeGpuState()

GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)

Generates a unique key for caching gradients based on the model, input data, and optimizer state.

protected override string GenerateGradientCacheKey(IFullModel<T, TInput, TOutput> model, TInput X, TOutput y)

Parameters

model IFullModel<T, TInput, TOutput>: The model for which the gradient is calculated.
X TInput: The input features matrix.
y TOutput: The target values vector.

Returns

string: A string key that uniquely identifies this gradient calculation.

Remarks

This method overrides the base implementation to include RMSProp-specific information in the cache key. It extends the base key with information about the current learning rate, decay rate, epsilon value, and iteration count. This ensures that gradients are properly cached and retrieved even as the optimizer's state changes.

For Beginners: This method creates a unique identification tag for each gradient calculation.

Think of it like a file naming system:

It includes information about the model and data being used
It adds details specific to the RMSProp optimizer's current state
This unique tag helps the optimizer avoid redundant calculations
If the same gradient is needed again, it can be retrieved from cache instead of recalculated

This caching mechanism improves efficiency by avoiding duplicate work.

GetOptions()

Gets the current options for this optimizer.

public override OptimizationAlgorithmOptions<T, TInput, TOutput> GetOptions()

Returns

OptimizationAlgorithmOptions<T, TInput, TOutput>: The current RMSProp optimization options.

Remarks

This method overrides the base implementation to return the RMSProp-specific options.

For Beginners: This method returns the current settings of the optimizer.

It's like checking what settings are currently active:

You can see the current decay rate
You can see the current epsilon value
You can see all the other parameters that control the optimizer

This is useful for understanding how the optimizer is currently configured or for making a copy of the settings to modify and apply later.

InitializeGpuState(int, IDirectGpuBackend)

Initializes RMSprop optimizer state on the GPU.

public override void InitializeGpuState(int parameterCount, IDirectGpuBackend backend)

Parameters

parameterCount int
backend IDirectGpuBackend

Optimize(OptimizationInputData<T, TInput, TOutput>)

Performs the RMSProp optimization to find the best solution for the given input data.

public override OptimizationResult<T, TInput, TOutput> Optimize(OptimizationInputData<T, TInput, TOutput> inputData)

Parameters

inputData OptimizationInputData<T, TInput, TOutput>: The input data to optimize against.

Returns

OptimizationResult<T, TInput, TOutput>: An optimization result containing the best solution found and associated metrics.

Remarks

This method implements the main RMSProp algorithm. It starts from a random solution and iteratively improves it by calculating the gradient, applying momentum, updating the solution based on the adaptive learning rates, and evaluating the new solution. The process continues until either the maximum number of iterations is reached, early stopping criteria are met, or the improvement falls below the specified tolerance.

For Beginners: This is the main search process where the algorithm looks for the best solution.

The process works like this:

Start at a random position on the "landscape"
Initialize the squared gradient history and step counter
For each iteration:
- Figure out which direction is most uphill (calculate gradient)
- Apply momentum to smooth the movement
- Take a step using adaptive step sizes for each direction
- Check if the new position is better than the best found so far
- Update the adaptive parameters based on progress
Stop when enough iterations are done, when no more improvement is happening, or when the improvement is very small

This approach efficiently finds good solutions by adapting its behavior based on the shape of the optimization landscape.

DataLoader Integration: This method uses the DataLoader API for efficient batch processing. It creates a batcher using CreateBatcher(OptimizationInputData<T, TInput, TOutput>, int) and notifies the sampler of epoch starts using NotifyEpochStart(int).

Reset()

Resets the optimizer to its initial state.

public override void Reset()

Remarks

This method overrides the base implementation to reset RMSProp-specific state variables in addition to the base state. It resets the iteration counter and clears the squared gradient history, preparing the optimizer for a fresh start.

For Beginners: This method prepares the optimizer to start fresh.

It's like a hiker:

Returning to the starting point
Resetting their step counter to zero
Clearing their memory of previous terrain steepness

This allows the optimizer to begin a new optimization process without being influenced by previous runs.

ReverseUpdate(Vector<T>, Vector<T>)

Reverses an RMSprop gradient update to recover original parameters.

public override Vector<T> ReverseUpdate(Vector<T> updatedParameters, Vector<T> appliedGradients)

Parameters

updatedParameters Vector<T>: Parameters after gradient application
appliedGradients Vector<T>: The gradients that were applied

Returns

Vector<T>: Original parameters before the gradient update

Remarks

For RMSprop, the forward update is: 1. _squaredGradient[i] = decay * _squaredGradient[i] + (1 - decay) * gradient[i]^2 2. update = learning_rate * gradient[i] / (sqrt(_squaredGradient[i]) + epsilon) 3. params_new = params_old - update

To reverse: params_old = params_new + update

This requires access to the current squared gradient state and the applied gradients to recalculate the adaptive update that was applied.

For Beginners: This is like retracing the hiker's steps. Given where the hiker ended up (updated parameters) and the terrain steepness history (squared gradients), we can calculate the exact step size that was used and determine where the hiker started from.

Exceptions

ArgumentNullException: If parameters or gradients are null
ArgumentException: If parameter and gradient sizes do not match

Serialize()

Serializes the RMSProp optimizer to a byte array for storage or transmission.

public override byte[] Serialize()

Returns

byte[]: A byte array containing the serialized optimizer.

Remarks

This method overrides the base implementation to include RMSProp-specific information in the serialization. It first serializes the base class data, then adds the iteration count, squared gradient vector, and options.

For Beginners: This method saves the current state of the optimizer so it can be restored later.

It's like taking a snapshot of the optimizer:

First, it saves all the general optimizer information
Then, it saves the RMSProp-specific state and settings
It packages everything into a format that can be saved to a file or sent over a network

This allows you to:

Save a trained optimizer to use later
Share an optimizer with others
Create a backup before making changes

UpdateParameters(Vector<T>, Vector<T>)

Updates a vector of parameters using the RMSProp algorithm.

public override Vector<T> UpdateParameters(Vector<T> parameters, Vector<T> gradient)

Parameters

parameters Vector<T>: The parameters to update.
gradient Vector<T>: The gradient vector for the parameters.

Returns

Vector<T>: The updated parameters.

Remarks

This method implements the core RMSProp update rule. For each parameter, it: 1. Updates the running average of squared gradients 2. Calculates an adaptive learning rate by dividing the base learning rate by the square root of the running average (plus epsilon for numerical stability) 3. Updates the parameter by subtracting the product of the adaptive learning rate and the gradient

For Beginners: This method adjusts each parameter based on its gradient history.

For each parameter:

It updates the memory of how steep this direction has been (squared gradient)
It calculates a custom step size based on the steepness history
Parameters with consistently large gradients get smaller steps
Parameters with consistently small gradients get larger steps
It then updates the parameter value using this custom step size

This adaptive approach helps the algorithm converge faster by giving each parameter exactly the step size it needs.

UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)

Updates parameters on the GPU using the RMSprop kernel.

public override void UpdateParametersGpu(IGpuBuffer parameters, IGpuBuffer gradients, int parameterCount, IDirectGpuBackend backend)

Parameters

parameters IGpuBuffer
gradients IGpuBuffer
parameterCount int
backend IDirectGpuBackend

UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)

protected override IFullModel<T, TInput, TOutput> UpdateSolution(IFullModel<T, TInput, TOutput> currentSolution, Vector<T> gradient)

Parameters

currentSolution IFullModel<T, TInput, TOutput>
gradient Vector<T>

Returns

IFullModel<T, TInput, TOutput>

Table of Contents

Class RootMeanSquarePropagationOptimizer<T, TInput, TOutput>

Type Parameters

Remarks

Constructors

RootMeanSquarePropagationOptimizer(IFullModel<T, TInput, TOutput>, RootMeanSquarePropagationOptimizerOptions<T, TInput, TOutput>?, IEngine?)

Parameters

Remarks

Properties

SupportsGpuUpdate

Property Value

Methods

Deserialize(byte[])

Parameters

Remarks

Exceptions

DisposeGpuState()

GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)

Parameters

Returns

Remarks

GetOptions()

Returns

Remarks

InitializeGpuState(int, IDirectGpuBackend)

Parameters

Optimize(OptimizationInputData<T, TInput, TOutput>)

Parameters

Returns

Remarks

Reset()

Remarks

ReverseUpdate(Vector<T>, Vector<T>)

Parameters

Returns

Remarks

Exceptions

Serialize()

Returns

Remarks

UpdateParameters(Vector<T>, Vector<T>)

Parameters

Returns

Remarks

UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)

Parameters

UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)

Parameters

Returns