Class AMSGradOptimizer<T, TInput, TOutput>
- Namespace
- AiDotNet.Optimizers
- Assembly
- AiDotNet.dll
Implements the AMSGrad optimization algorithm, an improved version of Adam optimizer.
public class AMSGradOptimizer<T, TInput, TOutput> : GradientBasedOptimizerBase<T, TInput, TOutput>, IGradientBasedOptimizer<T, TInput, TOutput>, IOptimizer<T, TInput, TOutput>, IModelSerializer
Type Parameters
TThe numeric type used for calculations (e.g., float, double).
TInputTOutput
- Inheritance
-
OptimizerBase<T, TInput, TOutput>GradientBasedOptimizerBase<T, TInput, TOutput>AMSGradOptimizer<T, TInput, TOutput>
- Implements
-
IGradientBasedOptimizer<T, TInput, TOutput>IOptimizer<T, TInput, TOutput>
- Inherited Members
- Extension Methods
Remarks
AMSGrad is an adaptive learning rate optimization algorithm that addresses some of the convergence issues in Adam. It maintains the maximum of past squared gradients to ensure non-decreasing step sizes.
For Beginners: AMSGrad is like a smart assistant that helps adjust the learning process. It remembers past information to make better decisions about how quickly to learn in different parts of the problem.
Constructors
AMSGradOptimizer(IFullModel<T, TInput, TOutput>, AMSGradOptimizerOptions<T, TInput, TOutput>?, IEngine?)
Initializes a new instance of the AMSGradOptimizer class.
public AMSGradOptimizer(IFullModel<T, TInput, TOutput> model, AMSGradOptimizerOptions<T, TInput, TOutput>? options = null, IEngine? engine = null)
Parameters
modelIFullModel<T, TInput, TOutput>optionsAMSGradOptimizerOptions<T, TInput, TOutput>The options for configuring the AMSGrad optimizer.
engineIEngine
Remarks
For Beginners: This sets up the AMSGrad optimizer with its initial configuration. You can customize various aspects of how it learns, or use default settings.
Methods
Deserialize(byte[])
Restores the optimizer's state from a byte array previously created by the Serialize method.
public override void Deserialize(byte[] data)
Parameters
databyte[]The byte array containing the serialized optimizer state.
Remarks
For Beginners: This method rebuilds the AMSGrad optimizer's state from a saved snapshot. It's like restoring a machine to a previous configuration using a backup. This allows you to continue optimization from where you left off or use a shared optimizer state.
DisposeGpuState()
Disposes GPU-allocated optimizer state.
public override void DisposeGpuState()
GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)
Generates a unique key for caching gradients based on the current state of the optimizer and input data.
protected override string GenerateGradientCacheKey(IFullModel<T, TInput, TOutput> model, TInput X, TOutput y)
Parameters
modelIFullModel<T, TInput, TOutput>The symbolic model being optimized.
XTInputThe input matrix.
yTOutputThe target vector.
Returns
- string
A string that uniquely identifies the current optimization state for gradient caching.
Remarks
For Beginners: This method creates a unique label for the current state of the AMSGrad optimization. It's used to efficiently store and retrieve calculated gradients, which helps speed up the optimization process. The key includes specific AMSGrad parameters to ensure it's unique to this optimizer's current state.
GetOptions()
Retrieves the current options of the optimizer.
public override OptimizationAlgorithmOptions<T, TInput, TOutput> GetOptions()
Returns
- OptimizationAlgorithmOptions<T, TInput, TOutput>
The current optimization algorithm options.
Remarks
For Beginners: This method lets you check what settings the AMSGrad optimizer is currently using. It's like looking at the current settings on a machine.
InitializeAdaptiveParameters()
Initializes the adaptive parameters used by the AMSGrad optimizer.
protected override void InitializeAdaptiveParameters()
Remarks
For Beginners: This resets the learning rate and time step to their starting values, preparing the optimizer for a new optimization run.
InitializeGpuState(int, IDirectGpuBackend)
Initializes optimizer state on the GPU for a given parameter count.
public override void InitializeGpuState(int parameterCount, IDirectGpuBackend backend)
Parameters
parameterCountintNumber of parameters to initialize state for.
backendIDirectGpuBackendThe GPU backend to use for memory allocation.
Remarks
For Beginners: The base implementation does nothing. Derived classes that maintain optimizer state (like momentum or adaptive learning rates) override this.
Optimize(OptimizationInputData<T, TInput, TOutput>)
Performs the optimization process using the AMSGrad algorithm.
public override OptimizationResult<T, TInput, TOutput> Optimize(OptimizationInputData<T, TInput, TOutput> inputData)
Parameters
inputDataOptimizationInputData<T, TInput, TOutput>The input data for optimization, including training data and targets.
Returns
- OptimizationResult<T, TInput, TOutput>
The result of the optimization process, including the best solution found.
Remarks
For Beginners: This is the main optimization process. It repeatedly updates the solution using the AMSGrad steps until it reaches the best possible solution or hits a stopping condition.
DataLoader Integration: This method uses the DataLoader API for efficient batch processing. It creates a batcher using CreateBatcher(OptimizationInputData<T, TInput, TOutput>, int) and notifies the sampler of epoch starts using NotifyEpochStart(int).
ReverseUpdate(Vector<T>, Vector<T>)
Reverses an AMSGrad gradient update to recover original parameters.
public override Vector<T> ReverseUpdate(Vector<T> updatedParameters, Vector<T> appliedGradients)
Parameters
updatedParametersVector<T>Parameters after AMSGrad update
appliedGradientsVector<T>The gradients that were applied
Returns
- Vector<T>
Original parameters before the update
Remarks
AMSGrad's reverse update requires the optimizer's internal state (_m, _v, _vHat, _t) from the forward pass. This method must be called immediately after UpdateParameters while the state is fresh. It uses the maximum of past second moments to recalculate the update.
For Beginners: This calculates where parameters were before an AMSGrad update. AMSGrad remembers the largest variance seen for each parameter, which prevents taking too-large steps. To reverse the update, we need this maximum variance history (_vHat) along with momentum (_m).
Serialize()
Converts the current state of the optimizer into a byte array for storage or transmission.
public override byte[] Serialize()
Returns
- byte[]
A byte array representing the serialized state of the optimizer.
Remarks
For Beginners: This method saves all the important information about the AMSGrad optimizer's current state. It's like taking a snapshot of the optimizer that can be used to recreate its exact state later. This is useful for saving progress or sharing the optimizer's state with others.
UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput>, OptimizationStepData<T, TInput, TOutput>)
Updates the adaptive parameters of the optimizer based on the current and previous optimization steps.
protected override void UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput> currentStepData, OptimizationStepData<T, TInput, TOutput> previousStepData)
Parameters
currentStepDataOptimizationStepData<T, TInput, TOutput>Data from the current optimization step.
previousStepDataOptimizationStepData<T, TInput, TOutput>Data from the previous optimization step.
Remarks
For Beginners: This method adjusts the learning rate based on how well the optimization is progressing. If the solution is improving, it might increase the learning rate to learn faster. If not, it might decrease the rate to be more careful.
UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)
Updates the optimizer's options with new settings.
protected override void UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput> options)
Parameters
optionsOptimizationAlgorithmOptions<T, TInput, TOutput>The new options to be applied to the optimizer.
Remarks
For Beginners: This method allows you to change the settings of the AMSGrad optimizer while it's running. It's like adjusting the controls on a machine that's already operating. If you provide the wrong type of settings, it will stop and let you know there's an error.
Exceptions
- ArgumentException
Thrown when the provided options are not of the correct type.
UpdateParameters(Vector<T>, Vector<T>)
Updates a vector of parameters using the AMSGrad optimization algorithm.
public override Vector<T> UpdateParameters(Vector<T> parameters, Vector<T> gradient)
Parameters
parametersVector<T>The current parameter vector to be updated.
gradientVector<T>The gradient vector corresponding to the parameters.
Returns
- Vector<T>
The updated parameter vector.
Remarks
AMSGrad is a variant of Adam that uses the maximum of past second moments to ensure convergence. This prevents the learning rate from becoming too large and helps with non-convex optimization.
For Beginners: AMSGrad is like Adam but keeps track of the largest variance it has seen so far, preventing the optimizer from taking overly aggressive steps.
UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)
GPU-accelerated parameter update for AMSGrad optimizer.
public override void UpdateParametersGpu(IGpuBuffer parameters, IGpuBuffer gradients, int parameterCount, IDirectGpuBackend backend)
Parameters
parametersIGpuBuffergradientsIGpuBufferparameterCountintbackendIDirectGpuBackend
UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)
Updates the current solution using the AMSGrad update rule.
protected override IFullModel<T, TInput, TOutput> UpdateSolution(IFullModel<T, TInput, TOutput> currentSolution, Vector<T> gradient)
Parameters
currentSolutionIFullModel<T, TInput, TOutput>The current solution being optimized.
gradientVector<T>The gradient of the current solution.
Returns
- IFullModel<T, TInput, TOutput>
A new solution with updated coefficients.
Remarks
For Beginners: This method applies the AMSGrad formula to update each parameter of the solution. It uses the current and past gradients to determine how much to change each parameter.