Class NesterovAcceleratedGradientOptimizer<T, TInput, TOutput>
- Namespace
- AiDotNet.Optimizers
- Assembly
- AiDotNet.dll
Implements the Nesterov Accelerated Gradient optimization algorithm.
public class NesterovAcceleratedGradientOptimizer<T, TInput, TOutput> : GradientBasedOptimizerBase<T, TInput, TOutput>, IGradientBasedOptimizer<T, TInput, TOutput>, IOptimizer<T, TInput, TOutput>, IModelSerializer
Type Parameters
TThe numeric type used for calculations, typically float or double.
TInputTOutput
- Inheritance
-
OptimizerBase<T, TInput, TOutput>GradientBasedOptimizerBase<T, TInput, TOutput>NesterovAcceleratedGradientOptimizer<T, TInput, TOutput>
- Implements
-
IGradientBasedOptimizer<T, TInput, TOutput>IOptimizer<T, TInput, TOutput>
- Inherited Members
- Extension Methods
Remarks
The Nesterov Accelerated Gradient (NAG) is an optimization algorithm that improves upon standard gradient descent. It introduces a smart prediction of the next position of the parameters, which helps to dampen oscillations and improve convergence, especially in scenarios with high curvature or small but consistent gradients.
For Beginners: Imagine you're skiing down a hill. Regular gradient descent is like looking at your current position to decide where to go next. NAG is like looking ahead to where you'll be after your next move, and then deciding how to adjust your path. This "look-ahead" helps you navigate the slope more efficiently, especially around tricky turns.
Constructors
NesterovAcceleratedGradientOptimizer(IFullModel<T, TInput, TOutput>, NesterovAcceleratedGradientOptimizerOptions<T, TInput, TOutput>?, IEngine?)
Initializes a new instance of the NesterovAcceleratedGradientOptimizer class.
public NesterovAcceleratedGradientOptimizer(IFullModel<T, TInput, TOutput> model, NesterovAcceleratedGradientOptimizerOptions<T, TInput, TOutput>? options = null, IEngine? engine = null)
Parameters
modelIFullModel<T, TInput, TOutput>The model to optimize.
optionsNesterovAcceleratedGradientOptimizerOptions<T, TInput, TOutput>The NAG-specific optimization options.
engineIEngine
Remarks
This constructor sets up the NAG optimizer with the provided options and dependencies. If no options are provided, it uses default settings.
For Beginners: This is like preparing your skis and gear before you start your descent. You're setting up all the tools and rules you'll use during your optimization journey.
Properties
SupportsGpuUpdate
Gets whether this optimizer supports GPU-accelerated parameter updates.
public override bool SupportsGpuUpdate { get; }
Property Value
Methods
Deserialize(byte[])
Deserializes the Nesterov Accelerated Gradient optimizer from a byte array.
public override void Deserialize(byte[] data)
Parameters
databyte[]The byte array containing the serialized optimizer state.
Remarks
This method reconstructs the optimizer's state from a byte array, including its options and parameters. It's used to restore a previously saved or transmitted optimizer state.
For Beginners: This is like using a saved snapshot to set up the skiing process exactly as it was before, placing the skier back where they were on the slope and restoring the techniques they were using.
Exceptions
- InvalidOperationException
Thrown when the optimizer options cannot be deserialized.
DisposeGpuState()
Disposes GPU-allocated optimizer state.
public override void DisposeGpuState()
GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)
Generates a unique key for caching gradients in the Nesterov Accelerated Gradient optimizer.
protected override string GenerateGradientCacheKey(IFullModel<T, TInput, TOutput> model, TInput X, TOutput y)
Parameters
modelIFullModel<T, TInput, TOutput>The symbolic model for which the gradient is being calculated.
XTInputThe input data matrix.
yTOutputThe target vector.
Returns
- string
A string key uniquely identifying the gradient calculation scenario for caching purposes.
Remarks
This method creates a unique identifier for caching gradients, incorporating the base key and NAG-specific parameters.
For Beginners: This is like creating a special label for each unique skiing situation, considering not just the slope (model and data) but also the specific NAG skiing technique being used (initial momentum and learning rate).
GetOptions()
Gets the current options of the Nesterov Accelerated Gradient optimizer.
public override OptimizationAlgorithmOptions<T, TInput, TOutput> GetOptions()
Returns
- OptimizationAlgorithmOptions<T, TInput, TOutput>
The current optimization algorithm options.
Remarks
This method returns the current configuration options of the optimizer.
For Beginners: This is like asking to see the current set of rules the skier is following on their descent.
InitializeAdaptiveParameters()
Initializes the adaptive parameters for the NAG optimizer.
protected override void InitializeAdaptiveParameters()
Remarks
This method sets up the initial values for the learning rate and momentum.
For Beginners: This is like setting your initial speed and direction before you start skiing. You're deciding how fast to move and how much to consider your previous direction.
InitializeGpuState(int, IDirectGpuBackend)
Initializes NAG optimizer state on the GPU.
public override void InitializeGpuState(int parameterCount, IDirectGpuBackend backend)
Parameters
parameterCountintbackendIDirectGpuBackend
Optimize(OptimizationInputData<T, TInput, TOutput>)
Performs the optimization process using the Nesterov Accelerated Gradient algorithm.
public override OptimizationResult<T, TInput, TOutput> Optimize(OptimizationInputData<T, TInput, TOutput> inputData)
Parameters
inputDataOptimizationInputData<T, TInput, TOutput>The input data for the optimization process.
Returns
- OptimizationResult<T, TInput, TOutput>
The result of the optimization process.
Remarks
This method implements the main optimization loop. It uses the NAG algorithm to update the solution iteratively, aiming to find the optimal set of parameters that minimize the loss function.
For Beginners: This is your actual ski run. You start at the top of the hill (your initial solution) and then repeatedly: 1. Look ahead to where you might be after your next move. 2. Check the steepness (gradient) at that future position. 3. Adjust your speed and direction based on what you see. 4. Make your move. You keep doing this until you reach the bottom of the hill or decide you're close enough to the best spot.
DataLoader Integration: This method uses the DataLoader API for efficient batch processing. It creates a batcher using CreateBatcher(OptimizationInputData<T, TInput, TOutput>, int) and notifies the sampler of epoch starts using NotifyEpochStart(int).
ReverseUpdate(Vector<T>, Vector<T>)
Reverses a Nesterov Accelerated Gradient update to recover original parameters.
public override Vector<T> ReverseUpdate(Vector<T> updatedParameters, Vector<T> appliedGradients)
Parameters
updatedParametersVector<T>Parameters after NAG update
appliedGradientsVector<T>The gradients that were applied
Returns
- Vector<T>
Original parameters before the update
Remarks
NAG's reverse update requires the optimizer's internal velocity state from the forward pass. This method must be called immediately after UpdateParameters while the velocity is fresh. NAG evaluates gradients at a lookahead position, but the reversal only needs the final velocity.
For Beginners: This calculates where parameters were before a NAG update. NAG uses velocity (built from lookahead gradients) to update parameters. To reverse, we just need to know what velocity was used to take the step.
Serialize()
Serializes the Nesterov Accelerated Gradient optimizer to a byte array.
public override byte[] Serialize()
Returns
- byte[]
A byte array representing the serialized state of the optimizer.
Remarks
This method converts the current state of the optimizer, including its options and parameters, into a byte array. This allows the optimizer's state to be saved or transmitted.
For Beginners: This is like taking a snapshot of the entire skiing process, including where the skier is on the slope and what techniques they're using, so you can save it or send it to someone else.
UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput>, OptimizationStepData<T, TInput, TOutput>)
Updates the adaptive parameters of the NAG optimizer.
protected override void UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput> currentStepData, OptimizationStepData<T, TInput, TOutput> previousStepData)
Parameters
currentStepDataOptimizationStepData<T, TInput, TOutput>The current optimization step data.
previousStepDataOptimizationStepData<T, TInput, TOutput>The previous optimization step data.
Remarks
This method adjusts the learning rate and momentum based on the improvement in fitness. It's used to fine-tune the algorithm's behavior as the optimization progresses.
For Beginners: This is like adjusting your skiing technique as you go down the hill. If you're making good progress, you might decide to go a bit faster or trust your momentum more. If you're not improving, you might slow down or be more cautious about following your previous direction.
UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)
Updates the optimizer's options with new settings.
protected override void UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput> options)
Parameters
optionsOptimizationAlgorithmOptions<T, TInput, TOutput>The new options to be applied to the optimizer.
Remarks
This method ensures that only compatible option types are used with this optimizer. It updates the internal options if the provided options are of the correct type.
For Beginners: This is like changing the rules for how the skier should navigate the slope. It makes sure you're only using rules that work for this specific type of skiing technique (Nesterov Accelerated Gradient method).
Exceptions
- ArgumentException
Thrown when the provided options are not of the correct type.
UpdateParameters(Vector<T>, Vector<T>)
Updates a vector of parameters using the Nesterov Accelerated Gradient algorithm.
public override Vector<T> UpdateParameters(Vector<T> parameters, Vector<T> gradient)
Parameters
parametersVector<T>The current parameter vector to be updated.
gradientVector<T>The gradient vector corresponding to the parameters.
Returns
- Vector<T>
The updated parameter vector.
Remarks
NAG uses a lookahead mechanism where it evaluates the gradient at a predicted future position, then uses that gradient to update velocity. This lookahead gives NAG better convergence properties than standard momentum.
For Beginners: NAG is like looking ahead while skiing - you peek at the slope ahead before making your move, which helps you make smarter adjustments to your speed and direction.
UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)
Updates parameters on the GPU using the NAG kernel.
public override void UpdateParametersGpu(IGpuBuffer parameters, IGpuBuffer gradients, int parameterCount, IDirectGpuBackend backend)
Parameters
parametersIGpuBuffergradientsIGpuBufferparameterCountintbackendIDirectGpuBackend
UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)
Updates the current solution using the velocity vector.
protected override IFullModel<T, TInput, TOutput> UpdateSolution(IFullModel<T, TInput, TOutput> currentSolution, Vector<T> velocity)
Parameters
currentSolutionIFullModel<T, TInput, TOutput>The current solution.
velocityVector<T>The current velocity vector.
Returns
- IFullModel<T, TInput, TOutput>
The updated solution.
Remarks
This method computes the new solution by applying the velocity to the current solution.
For Beginners: This is like actually making your move down the slope. You take your current position and adjust it based on your speed and direction (velocity).