Class NadamOptimizer<T, TInput, TOutput>
- Namespace
- AiDotNet.Optimizers
- Assembly
- AiDotNet.dll
Implements the Nesterov-accelerated Adaptive Moment Estimation (Nadam) optimization algorithm.
public class NadamOptimizer<T, TInput, TOutput> : GradientBasedOptimizerBase<T, TInput, TOutput>, IGradientBasedOptimizer<T, TInput, TOutput>, IOptimizer<T, TInput, TOutput>, IModelSerializer
Type Parameters
TThe numeric type used for calculations, typically float or double.
TInputTOutput
- Inheritance
-
OptimizerBase<T, TInput, TOutput>GradientBasedOptimizerBase<T, TInput, TOutput>NadamOptimizer<T, TInput, TOutput>
- Implements
-
IGradientBasedOptimizer<T, TInput, TOutput>IOptimizer<T, TInput, TOutput>
- Inherited Members
- Extension Methods
Remarks
Nadam combines the ideas of Adam (adaptive learning rates) and Nesterov accelerated gradient (NAG). It adapts the learning rates of each parameter and incorporates momentum using Nesterov's method.
For Beginners: Imagine you're rolling a smart ball down a hill. This ball can adjust its speed for different parts of the hill (adaptive learning rates), and it can look ahead to anticipate slopes (Nesterov's method). This combination helps it find the lowest point more efficiently.
Constructors
NadamOptimizer(IFullModel<T, TInput, TOutput>, NadamOptimizerOptions<T, TInput, TOutput>?)
Initializes a new instance of the NadamOptimizer class.
public NadamOptimizer(IFullModel<T, TInput, TOutput> model, NadamOptimizerOptions<T, TInput, TOutput>? options = null)
Parameters
modelIFullModel<T, TInput, TOutput>The model to optimize.
optionsNadamOptimizerOptions<T, TInput, TOutput>The Nadam-specific optimization options.
Remarks
This constructor sets up the Nadam optimizer with the provided options and dependencies. If no options are provided, it uses default settings.
For Beginners: This is like preparing your smart ball for the hill-rolling experiment. You're setting up its initial properties and deciding how it will adapt during its journey.
Methods
Deserialize(byte[])
Deserializes a byte array to restore the optimizer's state.
public override void Deserialize(byte[] data)
Parameters
databyte[]The byte array containing the serialized optimizer state.
Remarks
This method takes a byte array (previously created by Serialize) and uses it to restore the optimizer's state, including its base class state, options, and time step.
For Beginners: This is like using a detailed blueprint to recreate your smart ball rolling experiment exactly as it was at a certain point. It allows you to set up the experiment to match a previous state, with all the same rules and conditions.
Exceptions
- InvalidOperationException
Thrown when the optimizer options cannot be deserialized.
DisposeGpuState()
Disposes GPU-allocated optimizer state.
public override void DisposeGpuState()
GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)
Generates a unique key for caching gradients.
protected override string GenerateGradientCacheKey(IFullModel<T, TInput, TOutput> model, TInput X, TOutput y)
Parameters
modelIFullModel<T, TInput, TOutput>The current symbolic model.
XTInputThe input feature matrix.
yTOutputThe target vector.
Returns
- string
A string that uniquely identifies the current gradient calculation scenario.
Remarks
This method creates a unique identifier for caching gradients based on the current model, input data, and Nadam-specific parameters. This helps in efficiently reusing previously calculated gradients when possible.
For Beginners: This is like creating a special label for each unique situation your smart ball encounters. It helps the ball remember and quickly recall how it should move in similar situations, making the whole process more efficient.
GetOptions()
Gets the current optimization algorithm options.
public override OptimizationAlgorithmOptions<T, TInput, TOutput> GetOptions()
Returns
- OptimizationAlgorithmOptions<T, TInput, TOutput>
The current NadamOptimizerOptions object.
Remarks
This method returns the current options used by the Nadam optimizer.
For Beginners: This is like checking your current smart ball rolling rules. It lets you see all the settings and strategies you're currently using in your experiment.
InitializeAdaptiveParameters()
Initializes the adaptive parameters for the Nadam optimizer.
protected override void InitializeAdaptiveParameters()
Remarks
This method sets up the initial learning rate and resets the time step counter.
For Beginners: This is like setting the initial speed of your smart ball and resetting its internal clock before it starts rolling.
InitializeGpuState(int, IDirectGpuBackend)
Initializes Nadam optimizer state on the GPU.
public override void InitializeGpuState(int parameterCount, IDirectGpuBackend backend)
Parameters
parameterCountintNumber of parameters.
backendIDirectGpuBackendGPU backend for memory allocation.
Optimize(OptimizationInputData<T, TInput, TOutput>)
Performs the optimization process using the Nadam algorithm.
public override OptimizationResult<T, TInput, TOutput> Optimize(OptimizationInputData<T, TInput, TOutput> inputData)
Parameters
inputDataOptimizationInputData<T, TInput, TOutput>The input data for the optimization process.
Returns
- OptimizationResult<T, TInput, TOutput>
The result of the optimization process.
Remarks
This method implements the main optimization loop. It iterates through the data, calculating gradients, updating the momentum and adaptive learning rates, and adjusting the model parameters accordingly.
For Beginners: This is the actual process of rolling your smart ball down the hill. In each step, you're calculating which way the ball should roll (gradient), how fast it's moving (momentum), and how it should adapt its speed (adaptive learning rates). You keep doing this until the ball finds the lowest point or you've rolled it enough times.
DataLoader Integration: This method uses the DataLoader API for efficient batch processing. It creates a batcher using CreateBatcher(OptimizationInputData<T, TInput, TOutput>, int) and notifies the sampler of epoch starts using NotifyEpochStart(int).
ReverseUpdate(Vector<T>, Vector<T>)
Reverses a Nadam gradient update to recover original parameters.
public override Vector<T> ReverseUpdate(Vector<T> updatedParameters, Vector<T> appliedGradients)
Parameters
updatedParametersVector<T>Parameters after Nadam update
appliedGradientsVector<T>The gradients that were applied
Returns
- Vector<T>
Original parameters before the update
Remarks
Nadam's reverse update requires the optimizer's internal state (_m, _v, _t) from the forward pass. This method must be called immediately after UpdateParameters while the state is fresh. It recalculates the Nesterov-accelerated adaptive update that was applied.
For Beginners: This calculates where parameters were before a Nadam update. Nadam combines lookahead (Nesterov) with adaptive learning (Adam), so reversing requires both the momentum history (_m) and variance history (_v) to reconstruct the lookahead step.
Serialize()
Serializes the optimizer's state into a byte array.
public override byte[] Serialize()
Returns
- byte[]
A byte array representing the serialized state of the optimizer.
Remarks
This method converts the current state of the optimizer, including its base class state, options, and time step, into a byte array. This is useful for saving the optimizer's state or transferring it between systems.
For Beginners: Think of this as taking a snapshot of your entire smart ball rolling experiment. It captures all the details of your current setup, including the ball's position, speed, and all your rules. This snapshot can be used to recreate the exact same experiment later or share it with others.
UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput>, OptimizationStepData<T, TInput, TOutput>)
Updates the adaptive parameters of the optimizer based on the current and previous optimization steps.
protected override void UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput> currentStepData, OptimizationStepData<T, TInput, TOutput> previousStepData)
Parameters
currentStepDataOptimizationStepData<T, TInput, TOutput>Data from the current optimization step.
previousStepDataOptimizationStepData<T, TInput, TOutput>Data from the previous optimization step.
Remarks
This method adjusts the learning rate based on the performance of the current step compared to the previous step. If improvement is seen, the learning rate may be increased, otherwise it may be decreased.
For Beginners: This is like adjusting how fast your ball rolls based on whether it's getting closer to the bottom of the hill. If it's improving, you might let it roll a bit faster. If not, you might slow it down to be more careful.
UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)
Updates the optimizer's options with new settings.
protected override void UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput> options)
Parameters
optionsOptimizationAlgorithmOptions<T, TInput, TOutput>The new options to be applied to the optimizer.
Remarks
This method ensures that only compatible option types are used with this optimizer. It updates the internal options if the provided options are of the correct type.
For Beginners: This is like changing the rules of how your smart ball rolls mid-experiment. It makes sure you're only using rules that work for this specific type of smart ball (Nadam optimization).
Exceptions
- ArgumentException
Thrown when the provided options are not of the correct type.
UpdateParameters(Vector<T>, Vector<T>)
Updates a vector of parameters using the Nadam optimization algorithm.
public override Vector<T> UpdateParameters(Vector<T> parameters, Vector<T> gradient)
Parameters
parametersVector<T>The current parameter vector to be updated.
gradientVector<T>The gradient vector corresponding to the parameters.
Returns
- Vector<T>
The updated parameter vector.
Remarks
Nadam combines Adam's adaptive learning rates with Nesterov's accelerated gradient, providing the benefits of both techniques: adaptive per-parameter learning rates and lookahead momentum.
For Beginners: Nadam is like a smart ball that not only adapts its speed for different parts of the hill (Adam) but also looks ahead to anticipate slopes (Nesterov).
UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)
Updates parameters on GPU using Nadam optimization.
public override void UpdateParametersGpu(IGpuBuffer parameters, IGpuBuffer gradients, int parameterCount, IDirectGpuBackend backend)
Parameters
parametersIGpuBuffergradientsIGpuBufferparameterCountintbackendIDirectGpuBackend
UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)
Updates the current solution based on the calculated gradient using the Nadam algorithm.
protected override IFullModel<T, TInput, TOutput> UpdateSolution(IFullModel<T, TInput, TOutput> currentSolution, Vector<T> gradient)
Parameters
currentSolutionIFullModel<T, TInput, TOutput>The current model solution.
gradientVector<T>The calculated gradient.
Returns
- IFullModel<T, TInput, TOutput>
An updated symbolic model with improved coefficients.
Remarks
This method applies the Nadam update rule to adjust the model parameters. It uses both momentum and adaptive learning rates, incorporating Nesterov's accelerated gradient.
For Beginners: This is like adjusting the ball's position based on its current speed, the slope it's on, and its ability to look ahead. It's a complex calculation that helps the ball move more efficiently towards the lowest point.