Class AdamOptimizer<T, TInput, TOutput>
- Namespace
- AiDotNet.Optimizers
- Assembly
- AiDotNet.dll
Implements the Adam (Adaptive Moment Estimation) optimization algorithm for gradient-based optimization.
public class AdamOptimizer<T, TInput, TOutput> : GradientBasedOptimizerBase<T, TInput, TOutput>, IGradientBasedOptimizer<T, TInput, TOutput>, IOptimizer<T, TInput, TOutput>, IModelSerializer
Type Parameters
TThe numeric type used for calculations (e.g., float, double).
TInputTOutput
- Inheritance
-
OptimizerBase<T, TInput, TOutput>GradientBasedOptimizerBase<T, TInput, TOutput>AdamOptimizer<T, TInput, TOutput>
- Implements
-
IGradientBasedOptimizer<T, TInput, TOutput>IOptimizer<T, TInput, TOutput>
- Inherited Members
- Extension Methods
Remarks
Adam is an advanced optimization algorithm that combines ideas from RMSprop and Momentum optimization methods. It adapts the learning rates for each parameter individually and is well-suited for problems with noisy or sparse gradients.
For Beginners: Adam is like a smart personal trainer for your machine learning model. It helps your model learn efficiently by adjusting how it learns based on past experiences.
Constructors
AdamOptimizer(IFullModel<T, TInput, TOutput>?, AdamOptimizerOptions<T, TInput, TOutput>?)
Initializes a new instance of the AdamOptimizer class.
public AdamOptimizer(IFullModel<T, TInput, TOutput>? model, AdamOptimizerOptions<T, TInput, TOutput>? options = null)
Parameters
modelIFullModel<T, TInput, TOutput>The model to optimize.
optionsAdamOptimizerOptions<T, TInput, TOutput>The options for configuring the Adam optimizer.
Remarks
For Beginners: This sets up the Adam optimizer with its initial configuration. You can customize various aspects of how it learns, or use default settings that work well for many problems.
Properties
SupportsGpuUpdate
Gets whether this optimizer supports GPU-accelerated parameter updates.
public override bool SupportsGpuUpdate { get; }
Property Value
Methods
Deserialize(byte[])
Deserializes the optimizer's state from a byte array.
public override void Deserialize(byte[] data)
Parameters
databyte[]The byte array containing the serialized optimizer state.
Remarks
For Beginners: This method rebuilds the optimizer's state from a saved snapshot. It's like restoring the optimizer's memory and settings from a backup, allowing you to continue from where you left off.
DisposeGpuState()
Disposes GPU-allocated optimizer state.
public override void DisposeGpuState()
GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)
Generates a unique key for caching gradients.
protected override string GenerateGradientCacheKey(IFullModel<T, TInput, TOutput> model, TInput X, TOutput y)
Parameters
modelIFullModel<T, TInput, TOutput>The symbolic model.
XTInputThe input matrix.
yTOutputThe target vector.
Returns
- string
A string key for gradient caching.
Remarks
For Beginners: This method creates a unique identifier for a specific optimization scenario. It's like creating a label for a particular training session, which helps in efficiently storing and retrieving calculated gradients.
GetOptions()
Gets the current optimizer options.
public override OptimizationAlgorithmOptions<T, TInput, TOutput> GetOptions()
Returns
- OptimizationAlgorithmOptions<T, TInput, TOutput>
The current AdamOptimizerOptions.
Remarks
For Beginners: This method lets you check what settings the optimizer is currently using. It's like asking your personal trainer about their current training plan for you.
InitializeAdaptiveParameters()
Initializes the adaptive parameters used by the Adam optimizer.
protected override void InitializeAdaptiveParameters()
Remarks
For Beginners: This sets up the initial learning rate and momentum factors. These values will be adjusted as the optimizer learns more about the problem.
InitializeGpuState(int, IDirectGpuBackend)
Initializes Adam optimizer state on the GPU.
public override void InitializeGpuState(int parameterCount, IDirectGpuBackend backend)
Parameters
parameterCountintNumber of parameters.
backendIDirectGpuBackendGPU backend for memory allocation.
Optimize(OptimizationInputData<T, TInput, TOutput>)
Performs the optimization process using the Adam algorithm.
public override OptimizationResult<T, TInput, TOutput> Optimize(OptimizationInputData<T, TInput, TOutput> inputData)
Parameters
inputDataOptimizationInputData<T, TInput, TOutput>The input data for optimization, including training data and targets.
Returns
- OptimizationResult<T, TInput, TOutput>
The result of the optimization process, including the best solution found.
Remarks
For Beginners: This is the main learning process. It repeatedly tries to improve the model's parameters, using the Adam algorithm to decide how to change them.
DataLoader Integration: This optimizer now uses the DataLoader batching infrastructure which supports: - Custom samplers (weighted, stratified, curriculum, importance, active learning) - Reproducible shuffling via RandomSeed - Option to drop incomplete final batches Set these options via GradientBasedOptimizerOptions.DataSampler, ShuffleData, DropLastBatch, and RandomSeed.
Reset()
Resets the optimizer's internal state.
public override void Reset()
Remarks
For Beginners: This is like resetting the optimizer's memory. It forgets all past adjustments and starts fresh, which can be useful when you want to reuse the optimizer for a new problem.
ReverseUpdate(Vector<T>, Vector<T>)
Reverses an Adam gradient update to recover original parameters.
public override Vector<T> ReverseUpdate(Vector<T> updatedParameters, Vector<T> appliedGradients)
Parameters
updatedParametersVector<T>appliedGradientsVector<T>
Returns
- Vector<T>
Remarks
This override provides accurate reversal for Adam's adaptive update rule: params_old = params_new + lr * m_hat / (sqrt(v_hat) + epsilon)
Uses the current moment estimates (_m, _v, _t) to reconstruct the exact update that was applied, accounting for bias correction and adaptive learning rates.
For Beginners: This accurately undoes an Adam update by accounting for all of Adam's special features (momentum, adaptive learning rate, bias correction).
Serialize()
Serializes the optimizer's state into a byte array.
public override byte[] Serialize()
Returns
- byte[]
A byte array representing the serialized state of the optimizer.
Remarks
For Beginners: This method saves the optimizer's current state into a compact form. It's like taking a snapshot of the optimizer's memory and settings, which can be used later to recreate its exact state.
UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput>, OptimizationStepData<T, TInput, TOutput>)
Updates the adaptive parameters of the optimizer based on the current and previous optimization steps.
protected override void UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput> currentStepData, OptimizationStepData<T, TInput, TOutput> previousStepData)
Parameters
currentStepDataOptimizationStepData<T, TInput, TOutput>Data from the current optimization step.
previousStepDataOptimizationStepData<T, TInput, TOutput>Data from the previous optimization step.
Remarks
For Beginners: This method adjusts how the optimizer learns based on its recent performance. It can change the learning rate and momentum factors to help the optimizer learn more effectively.
UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)
Updates the optimizer's options.
protected override void UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput> options)
Parameters
optionsOptimizationAlgorithmOptions<T, TInput, TOutput>The new options to be set.
Remarks
For Beginners: This method allows you to change the optimizer's settings mid-way. It's like adjusting the personal trainer's approach based on new instructions.
Exceptions
- ArgumentException
Thrown when the provided options are not of type AdamOptimizerOptions.
UpdateParameters(Matrix<T>, Matrix<T>)
Updates a matrix of parameters using the Adam optimization algorithm.
public override Matrix<T> UpdateParameters(Matrix<T> parameters, Matrix<T> gradient)
Parameters
parametersMatrix<T>The current parameter matrix to be updated.
gradientMatrix<T>The gradient matrix corresponding to the parameters.
Returns
- Matrix<T>
The updated parameter matrix.
Remarks
For Beginners: This method is similar to UpdateVector, but it works on a 2D grid of parameters instead of a 1D list. It's like adjusting a whole panel of knobs, where each knob is positioned in a grid.
UpdateParameters(Vector<T>, Vector<T>)
Updates a vector of parameters using the Adam optimization algorithm.
public override Vector<T> UpdateParameters(Vector<T> parameters, Vector<T> gradient)
Parameters
parametersVector<T>The current parameter vector to be updated.
gradientVector<T>The gradient vector corresponding to the parameters.
Returns
- Vector<T>
The updated parameter vector.
Remarks
For Beginners: This method applies the Adam algorithm to a vector of parameters. It's like adjusting multiple knobs on a machine all at once, where each knob represents a parameter. The method decides how much to turn each knob based on past adjustments and the current gradient.
UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)
Updates parameters on the GPU using the Adam kernel.
public override void UpdateParametersGpu(IGpuBuffer parameters, IGpuBuffer gradients, int parameterCount, IDirectGpuBackend backend)
Parameters
parametersIGpuBufferGPU buffer containing parameters to update (modified in-place).
gradientsIGpuBufferGPU buffer containing gradients.
parameterCountintNumber of parameters.
backendIDirectGpuBackendThe GPU backend to use for execution.
UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)
Updates the current solution using the Adam update rule.
protected override IFullModel<T, TInput, TOutput> UpdateSolution(IFullModel<T, TInput, TOutput> currentSolution, Vector<T> gradient)
Parameters
currentSolutionIFullModel<T, TInput, TOutput>The current solution being optimized.
gradientVector<T>The calculated gradient for the current solution.
Returns
- IFullModel<T, TInput, TOutput>
A new solution with updated parameters.
Remarks
For Beginners: This method applies the Adam algorithm to adjust the model's parameters. It uses the current gradient and past information to decide how to change each parameter.