Class RootMeanSquarePropagationOptimizer<T, TInput, TOutput>
- Namespace
- AiDotNet.Optimizers
- Assembly
- AiDotNet.dll
Implements the Root Mean Square Propagation (RMSProp) optimization algorithm, an adaptive learning rate method.
public class RootMeanSquarePropagationOptimizer<T, TInput, TOutput> : GradientBasedOptimizerBase<T, TInput, TOutput>, IGradientBasedOptimizer<T, TInput, TOutput>, IOptimizer<T, TInput, TOutput>, IModelSerializer
Type Parameters
TThe numeric type used for calculations, typically float or double.
TInputTOutput
- Inheritance
-
OptimizerBase<T, TInput, TOutput>GradientBasedOptimizerBase<T, TInput, TOutput>RootMeanSquarePropagationOptimizer<T, TInput, TOutput>
- Implements
-
IGradientBasedOptimizer<T, TInput, TOutput>IOptimizer<T, TInput, TOutput>
- Inherited Members
- Extension Methods
Remarks
RMSProp is an adaptive learning rate optimization algorithm designed to handle non-stationary objectives and accelerate convergence. It maintains a moving average of the squared gradients for each parameter and divides the learning rate by the square root of this average. This approach allows the algorithm to use a larger learning rate for parameters with small gradients and a smaller learning rate for parameters with large gradients, leading to more efficient optimization.
For Beginners: RMSProp is like a hiker who adjusts their step size differently for each direction.
Imagine a hiker exploring mountains with different terrains:
- On steep slopes (large gradients), the hiker takes small, careful steps
- On gentle slopes (small gradients), the hiker takes larger, confident steps
- The hiker remembers how steep each direction has been recently (using a moving average)
- This memory helps the hiker adjust their steps even as the terrain changes
This adaptive approach helps the algorithm find good solutions more quickly by:
- Preventing wild overshooting on steep slopes
- Making faster progress on gentle terrain
- Adjusting automatically to different parts of the solution space
Constructors
RootMeanSquarePropagationOptimizer(IFullModel<T, TInput, TOutput>, RootMeanSquarePropagationOptimizerOptions<T, TInput, TOutput>?, IEngine?)
Initializes a new instance of the RootMeanSquarePropagationOptimizer<T> class with the specified options and components.
public RootMeanSquarePropagationOptimizer(IFullModel<T, TInput, TOutput> model, RootMeanSquarePropagationOptimizerOptions<T, TInput, TOutput>? options = null, IEngine? engine = null)
Parameters
modelIFullModel<T, TInput, TOutput>optionsRootMeanSquarePropagationOptimizerOptions<T, TInput, TOutput>The RMSProp optimization options, or null to use default options.
engineIEngine
Remarks
This constructor creates a new RMSProp optimizer with the specified options and components. If any parameter is null, a default implementation is used. The constructor initializes the iteration counter, squared gradient vector, and options.
For Beginners: This is the starting point for creating a new optimizer.
Think of it like preparing for a hiking expedition:
- You can provide custom settings (options) or use the default ones
- You can provide specialized tools (evaluators, calculators) or use the basic ones
- It initializes everything the optimizer needs to start working
- The squared gradient starts empty because there's no history yet
- The step counter starts at zero because no steps have been taken
This constructor gets everything ready so you can start the optimization process.
Properties
SupportsGpuUpdate
Gets whether this optimizer supports GPU-accelerated parameter updates.
public override bool SupportsGpuUpdate { get; }
Property Value
Methods
Deserialize(byte[])
Reconstructs the RMSProp optimizer from a serialized byte array.
public override void Deserialize(byte[] data)
Parameters
databyte[]The byte array containing the serialized optimizer.
Remarks
This method overrides the base implementation to handle RMSProp-specific information during deserialization. It first deserializes the base class data, then reconstructs the iteration count, squared gradient vector, and options.
For Beginners: This method restores the optimizer from a previously saved state.
It's like restoring from a snapshot:
- First, it loads all the general optimizer information
- Then, it loads the RMSProp-specific state and settings
- It reconstructs the optimizer to the exact state it was in when saved
This allows you to:
- Continue working with an optimizer you previously saved
- Use an optimizer that someone else created and shared
- Revert to a backup if needed
Exceptions
- InvalidOperationException
Thrown when the data cannot be deserialized.
DisposeGpuState()
Disposes GPU-allocated optimizer state.
public override void DisposeGpuState()
GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)
Generates a unique key for caching gradients based on the model, input data, and optimizer state.
protected override string GenerateGradientCacheKey(IFullModel<T, TInput, TOutput> model, TInput X, TOutput y)
Parameters
modelIFullModel<T, TInput, TOutput>The model for which the gradient is calculated.
XTInputThe input features matrix.
yTOutputThe target values vector.
Returns
- string
A string key that uniquely identifies this gradient calculation.
Remarks
This method overrides the base implementation to include RMSProp-specific information in the cache key. It extends the base key with information about the current learning rate, decay rate, epsilon value, and iteration count. This ensures that gradients are properly cached and retrieved even as the optimizer's state changes.
For Beginners: This method creates a unique identification tag for each gradient calculation.
Think of it like a file naming system:
- It includes information about the model and data being used
- It adds details specific to the RMSProp optimizer's current state
- This unique tag helps the optimizer avoid redundant calculations
- If the same gradient is needed again, it can be retrieved from cache instead of recalculated
This caching mechanism improves efficiency by avoiding duplicate work.
GetOptions()
Gets the current options for this optimizer.
public override OptimizationAlgorithmOptions<T, TInput, TOutput> GetOptions()
Returns
- OptimizationAlgorithmOptions<T, TInput, TOutput>
The current RMSProp optimization options.
Remarks
This method overrides the base implementation to return the RMSProp-specific options.
For Beginners: This method returns the current settings of the optimizer.
It's like checking what settings are currently active:
- You can see the current decay rate
- You can see the current epsilon value
- You can see all the other parameters that control the optimizer
This is useful for understanding how the optimizer is currently configured or for making a copy of the settings to modify and apply later.
InitializeGpuState(int, IDirectGpuBackend)
Initializes RMSprop optimizer state on the GPU.
public override void InitializeGpuState(int parameterCount, IDirectGpuBackend backend)
Parameters
parameterCountintbackendIDirectGpuBackend
Optimize(OptimizationInputData<T, TInput, TOutput>)
Performs the RMSProp optimization to find the best solution for the given input data.
public override OptimizationResult<T, TInput, TOutput> Optimize(OptimizationInputData<T, TInput, TOutput> inputData)
Parameters
inputDataOptimizationInputData<T, TInput, TOutput>The input data to optimize against.
Returns
- OptimizationResult<T, TInput, TOutput>
An optimization result containing the best solution found and associated metrics.
Remarks
This method implements the main RMSProp algorithm. It starts from a random solution and iteratively improves it by calculating the gradient, applying momentum, updating the solution based on the adaptive learning rates, and evaluating the new solution. The process continues until either the maximum number of iterations is reached, early stopping criteria are met, or the improvement falls below the specified tolerance.
For Beginners: This is the main search process where the algorithm looks for the best solution.
The process works like this:
- Start at a random position on the "landscape"
- Initialize the squared gradient history and step counter
- For each iteration:
- Figure out which direction is most uphill (calculate gradient)
- Apply momentum to smooth the movement
- Take a step using adaptive step sizes for each direction
- Check if the new position is better than the best found so far
- Update the adaptive parameters based on progress
- Stop when enough iterations are done, when no more improvement is happening, or when the improvement is very small
This approach efficiently finds good solutions by adapting its behavior based on the shape of the optimization landscape.
DataLoader Integration: This method uses the DataLoader API for efficient batch processing. It creates a batcher using CreateBatcher(OptimizationInputData<T, TInput, TOutput>, int) and notifies the sampler of epoch starts using NotifyEpochStart(int).
Reset()
Resets the optimizer to its initial state.
public override void Reset()
Remarks
This method overrides the base implementation to reset RMSProp-specific state variables in addition to the base state. It resets the iteration counter and clears the squared gradient history, preparing the optimizer for a fresh start.
For Beginners: This method prepares the optimizer to start fresh.
It's like a hiker:
- Returning to the starting point
- Resetting their step counter to zero
- Clearing their memory of previous terrain steepness
This allows the optimizer to begin a new optimization process without being influenced by previous runs.
ReverseUpdate(Vector<T>, Vector<T>)
Reverses an RMSprop gradient update to recover original parameters.
public override Vector<T> ReverseUpdate(Vector<T> updatedParameters, Vector<T> appliedGradients)
Parameters
updatedParametersVector<T>Parameters after gradient application
appliedGradientsVector<T>The gradients that were applied
Returns
- Vector<T>
Original parameters before the gradient update
Remarks
For RMSprop, the forward update is: 1. _squaredGradient[i] = decay * _squaredGradient[i] + (1 - decay) * gradient[i]^2 2. update = learning_rate * gradient[i] / (sqrt(_squaredGradient[i]) + epsilon) 3. params_new = params_old - update
To reverse: params_old = params_new + update
This requires access to the current squared gradient state and the applied gradients to recalculate the adaptive update that was applied.
For Beginners: This is like retracing the hiker's steps. Given where the hiker ended up (updated parameters) and the terrain steepness history (squared gradients), we can calculate the exact step size that was used and determine where the hiker started from.
Exceptions
- ArgumentNullException
If parameters or gradients are null
- ArgumentException
If parameter and gradient sizes do not match
Serialize()
Serializes the RMSProp optimizer to a byte array for storage or transmission.
public override byte[] Serialize()
Returns
- byte[]
A byte array containing the serialized optimizer.
Remarks
This method overrides the base implementation to include RMSProp-specific information in the serialization. It first serializes the base class data, then adds the iteration count, squared gradient vector, and options.
For Beginners: This method saves the current state of the optimizer so it can be restored later.
It's like taking a snapshot of the optimizer:
- First, it saves all the general optimizer information
- Then, it saves the RMSProp-specific state and settings
- It packages everything into a format that can be saved to a file or sent over a network
This allows you to:
- Save a trained optimizer to use later
- Share an optimizer with others
- Create a backup before making changes
UpdateParameters(Vector<T>, Vector<T>)
Updates a vector of parameters using the RMSProp algorithm.
public override Vector<T> UpdateParameters(Vector<T> parameters, Vector<T> gradient)
Parameters
parametersVector<T>The parameters to update.
gradientVector<T>The gradient vector for the parameters.
Returns
- Vector<T>
The updated parameters.
Remarks
This method implements the core RMSProp update rule. For each parameter, it: 1. Updates the running average of squared gradients 2. Calculates an adaptive learning rate by dividing the base learning rate by the square root of the running average (plus epsilon for numerical stability) 3. Updates the parameter by subtracting the product of the adaptive learning rate and the gradient
For Beginners: This method adjusts each parameter based on its gradient history.
For each parameter:
- It updates the memory of how steep this direction has been (squared gradient)
- It calculates a custom step size based on the steepness history
- Parameters with consistently large gradients get smaller steps
- Parameters with consistently small gradients get larger steps
- It then updates the parameter value using this custom step size
This adaptive approach helps the algorithm converge faster by giving each parameter exactly the step size it needs.
UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)
Updates parameters on the GPU using the RMSprop kernel.
public override void UpdateParametersGpu(IGpuBuffer parameters, IGpuBuffer gradients, int parameterCount, IDirectGpuBackend backend)
Parameters
parametersIGpuBuffergradientsIGpuBufferparameterCountintbackendIDirectGpuBackend
UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)
protected override IFullModel<T, TInput, TOutput> UpdateSolution(IFullModel<T, TInput, TOutput> currentSolution, Vector<T> gradient)
Parameters
currentSolutionIFullModel<T, TInput, TOutput>gradientVector<T>
Returns
- IFullModel<T, TInput, TOutput>