Table of Contents

Class RootMeanSquarePropagationOptimizer<T, TInput, TOutput>

Namespace
AiDotNet.Optimizers
Assembly
AiDotNet.dll

Implements the Root Mean Square Propagation (RMSProp) optimization algorithm, an adaptive learning rate method.

public class RootMeanSquarePropagationOptimizer<T, TInput, TOutput> : GradientBasedOptimizerBase<T, TInput, TOutput>, IGradientBasedOptimizer<T, TInput, TOutput>, IOptimizer<T, TInput, TOutput>, IModelSerializer

Type Parameters

T

The numeric type used for calculations, typically float or double.

TInput
TOutput
Inheritance
OptimizerBase<T, TInput, TOutput>
GradientBasedOptimizerBase<T, TInput, TOutput>
RootMeanSquarePropagationOptimizer<T, TInput, TOutput>
Implements
IGradientBasedOptimizer<T, TInput, TOutput>
IOptimizer<T, TInput, TOutput>
Inherited Members
Extension Methods

Remarks

RMSProp is an adaptive learning rate optimization algorithm designed to handle non-stationary objectives and accelerate convergence. It maintains a moving average of the squared gradients for each parameter and divides the learning rate by the square root of this average. This approach allows the algorithm to use a larger learning rate for parameters with small gradients and a smaller learning rate for parameters with large gradients, leading to more efficient optimization.

For Beginners: RMSProp is like a hiker who adjusts their step size differently for each direction.

Imagine a hiker exploring mountains with different terrains:

  • On steep slopes (large gradients), the hiker takes small, careful steps
  • On gentle slopes (small gradients), the hiker takes larger, confident steps
  • The hiker remembers how steep each direction has been recently (using a moving average)
  • This memory helps the hiker adjust their steps even as the terrain changes

This adaptive approach helps the algorithm find good solutions more quickly by:

  • Preventing wild overshooting on steep slopes
  • Making faster progress on gentle terrain
  • Adjusting automatically to different parts of the solution space

Constructors

RootMeanSquarePropagationOptimizer(IFullModel<T, TInput, TOutput>, RootMeanSquarePropagationOptimizerOptions<T, TInput, TOutput>?, IEngine?)

Initializes a new instance of the RootMeanSquarePropagationOptimizer<T> class with the specified options and components.

public RootMeanSquarePropagationOptimizer(IFullModel<T, TInput, TOutput> model, RootMeanSquarePropagationOptimizerOptions<T, TInput, TOutput>? options = null, IEngine? engine = null)

Parameters

model IFullModel<T, TInput, TOutput>
options RootMeanSquarePropagationOptimizerOptions<T, TInput, TOutput>

The RMSProp optimization options, or null to use default options.

engine IEngine

Remarks

This constructor creates a new RMSProp optimizer with the specified options and components. If any parameter is null, a default implementation is used. The constructor initializes the iteration counter, squared gradient vector, and options.

For Beginners: This is the starting point for creating a new optimizer.

Think of it like preparing for a hiking expedition:

  • You can provide custom settings (options) or use the default ones
  • You can provide specialized tools (evaluators, calculators) or use the basic ones
  • It initializes everything the optimizer needs to start working
  • The squared gradient starts empty because there's no history yet
  • The step counter starts at zero because no steps have been taken

This constructor gets everything ready so you can start the optimization process.

Properties

SupportsGpuUpdate

Gets whether this optimizer supports GPU-accelerated parameter updates.

public override bool SupportsGpuUpdate { get; }

Property Value

bool

Methods

Deserialize(byte[])

Reconstructs the RMSProp optimizer from a serialized byte array.

public override void Deserialize(byte[] data)

Parameters

data byte[]

The byte array containing the serialized optimizer.

Remarks

This method overrides the base implementation to handle RMSProp-specific information during deserialization. It first deserializes the base class data, then reconstructs the iteration count, squared gradient vector, and options.

For Beginners: This method restores the optimizer from a previously saved state.

It's like restoring from a snapshot:

  • First, it loads all the general optimizer information
  • Then, it loads the RMSProp-specific state and settings
  • It reconstructs the optimizer to the exact state it was in when saved

This allows you to:

  • Continue working with an optimizer you previously saved
  • Use an optimizer that someone else created and shared
  • Revert to a backup if needed

Exceptions

InvalidOperationException

Thrown when the data cannot be deserialized.

DisposeGpuState()

Disposes GPU-allocated optimizer state.

public override void DisposeGpuState()

GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)

Generates a unique key for caching gradients based on the model, input data, and optimizer state.

protected override string GenerateGradientCacheKey(IFullModel<T, TInput, TOutput> model, TInput X, TOutput y)

Parameters

model IFullModel<T, TInput, TOutput>

The model for which the gradient is calculated.

X TInput

The input features matrix.

y TOutput

The target values vector.

Returns

string

A string key that uniquely identifies this gradient calculation.

Remarks

This method overrides the base implementation to include RMSProp-specific information in the cache key. It extends the base key with information about the current learning rate, decay rate, epsilon value, and iteration count. This ensures that gradients are properly cached and retrieved even as the optimizer's state changes.

For Beginners: This method creates a unique identification tag for each gradient calculation.

Think of it like a file naming system:

  • It includes information about the model and data being used
  • It adds details specific to the RMSProp optimizer's current state
  • This unique tag helps the optimizer avoid redundant calculations
  • If the same gradient is needed again, it can be retrieved from cache instead of recalculated

This caching mechanism improves efficiency by avoiding duplicate work.

GetOptions()

Gets the current options for this optimizer.

public override OptimizationAlgorithmOptions<T, TInput, TOutput> GetOptions()

Returns

OptimizationAlgorithmOptions<T, TInput, TOutput>

The current RMSProp optimization options.

Remarks

This method overrides the base implementation to return the RMSProp-specific options.

For Beginners: This method returns the current settings of the optimizer.

It's like checking what settings are currently active:

  • You can see the current decay rate
  • You can see the current epsilon value
  • You can see all the other parameters that control the optimizer

This is useful for understanding how the optimizer is currently configured or for making a copy of the settings to modify and apply later.

InitializeGpuState(int, IDirectGpuBackend)

Initializes RMSprop optimizer state on the GPU.

public override void InitializeGpuState(int parameterCount, IDirectGpuBackend backend)

Parameters

parameterCount int
backend IDirectGpuBackend

Optimize(OptimizationInputData<T, TInput, TOutput>)

Performs the RMSProp optimization to find the best solution for the given input data.

public override OptimizationResult<T, TInput, TOutput> Optimize(OptimizationInputData<T, TInput, TOutput> inputData)

Parameters

inputData OptimizationInputData<T, TInput, TOutput>

The input data to optimize against.

Returns

OptimizationResult<T, TInput, TOutput>

An optimization result containing the best solution found and associated metrics.

Remarks

This method implements the main RMSProp algorithm. It starts from a random solution and iteratively improves it by calculating the gradient, applying momentum, updating the solution based on the adaptive learning rates, and evaluating the new solution. The process continues until either the maximum number of iterations is reached, early stopping criteria are met, or the improvement falls below the specified tolerance.

For Beginners: This is the main search process where the algorithm looks for the best solution.

The process works like this:

  1. Start at a random position on the "landscape"
  2. Initialize the squared gradient history and step counter
  3. For each iteration:
    • Figure out which direction is most uphill (calculate gradient)
    • Apply momentum to smooth the movement
    • Take a step using adaptive step sizes for each direction
    • Check if the new position is better than the best found so far
    • Update the adaptive parameters based on progress
  4. Stop when enough iterations are done, when no more improvement is happening, or when the improvement is very small

This approach efficiently finds good solutions by adapting its behavior based on the shape of the optimization landscape.

DataLoader Integration: This method uses the DataLoader API for efficient batch processing. It creates a batcher using CreateBatcher(OptimizationInputData<T, TInput, TOutput>, int) and notifies the sampler of epoch starts using NotifyEpochStart(int).

Reset()

Resets the optimizer to its initial state.

public override void Reset()

Remarks

This method overrides the base implementation to reset RMSProp-specific state variables in addition to the base state. It resets the iteration counter and clears the squared gradient history, preparing the optimizer for a fresh start.

For Beginners: This method prepares the optimizer to start fresh.

It's like a hiker:

  • Returning to the starting point
  • Resetting their step counter to zero
  • Clearing their memory of previous terrain steepness

This allows the optimizer to begin a new optimization process without being influenced by previous runs.

ReverseUpdate(Vector<T>, Vector<T>)

Reverses an RMSprop gradient update to recover original parameters.

public override Vector<T> ReverseUpdate(Vector<T> updatedParameters, Vector<T> appliedGradients)

Parameters

updatedParameters Vector<T>

Parameters after gradient application

appliedGradients Vector<T>

The gradients that were applied

Returns

Vector<T>

Original parameters before the gradient update

Remarks

For RMSprop, the forward update is: 1. _squaredGradient[i] = decay * _squaredGradient[i] + (1 - decay) * gradient[i]^2 2. update = learning_rate * gradient[i] / (sqrt(_squaredGradient[i]) + epsilon) 3. params_new = params_old - update

To reverse: params_old = params_new + update

This requires access to the current squared gradient state and the applied gradients to recalculate the adaptive update that was applied.

For Beginners: This is like retracing the hiker's steps. Given where the hiker ended up (updated parameters) and the terrain steepness history (squared gradients), we can calculate the exact step size that was used and determine where the hiker started from.

Exceptions

ArgumentNullException

If parameters or gradients are null

ArgumentException

If parameter and gradient sizes do not match

Serialize()

Serializes the RMSProp optimizer to a byte array for storage or transmission.

public override byte[] Serialize()

Returns

byte[]

A byte array containing the serialized optimizer.

Remarks

This method overrides the base implementation to include RMSProp-specific information in the serialization. It first serializes the base class data, then adds the iteration count, squared gradient vector, and options.

For Beginners: This method saves the current state of the optimizer so it can be restored later.

It's like taking a snapshot of the optimizer:

  • First, it saves all the general optimizer information
  • Then, it saves the RMSProp-specific state and settings
  • It packages everything into a format that can be saved to a file or sent over a network

This allows you to:

  • Save a trained optimizer to use later
  • Share an optimizer with others
  • Create a backup before making changes

UpdateParameters(Vector<T>, Vector<T>)

Updates a vector of parameters using the RMSProp algorithm.

public override Vector<T> UpdateParameters(Vector<T> parameters, Vector<T> gradient)

Parameters

parameters Vector<T>

The parameters to update.

gradient Vector<T>

The gradient vector for the parameters.

Returns

Vector<T>

The updated parameters.

Remarks

This method implements the core RMSProp update rule. For each parameter, it: 1. Updates the running average of squared gradients 2. Calculates an adaptive learning rate by dividing the base learning rate by the square root of the running average (plus epsilon for numerical stability) 3. Updates the parameter by subtracting the product of the adaptive learning rate and the gradient

For Beginners: This method adjusts each parameter based on its gradient history.

For each parameter:

  • It updates the memory of how steep this direction has been (squared gradient)
  • It calculates a custom step size based on the steepness history
  • Parameters with consistently large gradients get smaller steps
  • Parameters with consistently small gradients get larger steps
  • It then updates the parameter value using this custom step size

This adaptive approach helps the algorithm converge faster by giving each parameter exactly the step size it needs.

UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)

Updates parameters on the GPU using the RMSprop kernel.

public override void UpdateParametersGpu(IGpuBuffer parameters, IGpuBuffer gradients, int parameterCount, IDirectGpuBackend backend)

Parameters

parameters IGpuBuffer
gradients IGpuBuffer
parameterCount int
backend IDirectGpuBackend

UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)

protected override IFullModel<T, TInput, TOutput> UpdateSolution(IFullModel<T, TInput, TOutput> currentSolution, Vector<T> gradient)

Parameters

currentSolution IFullModel<T, TInput, TOutput>
gradient Vector<T>

Returns

IFullModel<T, TInput, TOutput>