Class NesterovAcceleratedGradientOptimizer<T, TInput, TOutput>

Namespace: AiDotNet.Optimizers

Assembly: AiDotNet.dll

Implements the Nesterov Accelerated Gradient optimization algorithm.

public class NesterovAcceleratedGradientOptimizer<T, TInput, TOutput> : GradientBasedOptimizerBase<T, TInput, TOutput>, IGradientBasedOptimizer<T, TInput, TOutput>, IOptimizer<T, TInput, TOutput>, IModelSerializer

Type Parameters

T: The numeric type used for calculations, typically float or double.
TInput
TOutput

Inheritance: object

OptimizerBase<T, TInput, TOutput>

GradientBasedOptimizerBase<T, TInput, TOutput>

NesterovAcceleratedGradientOptimizer<T, TInput, TOutput>

Implements: IGradientBasedOptimizer<T, TInput, TOutput>

IOptimizer<T, TInput, TOutput>

IModelSerializer

Inherited Members: GradientBasedOptimizerBase<T, TInput, TOutput>.GradientOptions

GradientBasedOptimizerBase<T, TInput, TOutput>._previousGradient

GradientBasedOptimizerBase<T, TInput, TOutput>._lastComputedGradients

GradientBasedOptimizerBase<T, TInput, TOutput>.GradientCache

GradientBasedOptimizerBase<T, TInput, TOutput>.LossFunction

GradientBasedOptimizerBase<T, TInput, TOutput>.Regularization

GradientBasedOptimizerBase<T, TInput, TOutput>._mixedPrecisionContext

GradientBasedOptimizerBase<T, TInput, TOutput>._learningRateScheduler

GradientBasedOptimizerBase<T, TInput, TOutput>._schedulerStepMode

GradientBasedOptimizerBase<T, TInput, TOutput>._currentStep

GradientBasedOptimizerBase<T, TInput, TOutput>._currentEpoch

GradientBasedOptimizerBase<T, TInput, TOutput>.IsMixedPrecisionEnabled

GradientBasedOptimizerBase<T, TInput, TOutput>.LearningRateScheduler

GradientBasedOptimizerBase<T, TInput, TOutput>.SchedulerStepMode

GradientBasedOptimizerBase<T, TInput, TOutput>.CreateBatcher(OptimizationInputData<T, TInput, TOutput>, int)

GradientBasedOptimizerBase<T, TInput, TOutput>.CreateBatcher(OptimizationInputData<T, TInput, TOutput>, int, IDataSampler)

GradientBasedOptimizerBase<T, TInput, TOutput>.NotifyEpochStart(int)

GradientBasedOptimizerBase<T, TInput, TOutput>.LastComputedGradients

GradientBasedOptimizerBase<T, TInput, TOutput>.ApplyGradients(Vector<T>, IFullModel<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.ApplyGradients(Vector<T>, Vector<T>, IFullModel<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.ReverseUpdate(Vector<T>, Vector<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.CreateRegularization(GradientDescentOptimizerOptions<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.CalculateGradient(IFullModel<T, TInput, TOutput>, TInput, TOutput)

GradientBasedOptimizerBase<T, TInput, TOutput>.ApplyGradientClipping(Vector<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.AreGradientsExploding(double)

GradientBasedOptimizerBase<T, TInput, TOutput>.AreGradientsVanishing(double)

GradientBasedOptimizerBase<T, TInput, TOutput>.GetGradientNorm()

GradientBasedOptimizerBase<T, TInput, TOutput>.ComputeHessianEfficiently(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.ComputeHessianFiniteDifferences(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.LineSearch(IFullModel<T, TInput, TOutput>, Vector<T>, Vector<T>, OptimizationInputData<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.CalculateGradient(IFullModel<T, TInput, TOutput>, TInput, TOutput, int[])

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)

GradientBasedOptimizerBase<T, TInput, TOutput>.Reset()

GradientBasedOptimizerBase<T, TInput, TOutput>.StepScheduler()

GradientBasedOptimizerBase<T, TInput, TOutput>.OnEpochEnd()

GradientBasedOptimizerBase<T, TInput, TOutput>.OnBatchEnd()

GradientBasedOptimizerBase<T, TInput, TOutput>.IsInWarmupPhase()

GradientBasedOptimizerBase<T, TInput, TOutput>.GetCurrentLearningRate()

GradientBasedOptimizerBase<T, TInput, TOutput>.CurrentStep

GradientBasedOptimizerBase<T, TInput, TOutput>.CurrentEpoch

GradientBasedOptimizerBase<T, TInput, TOutput>.ApplyMomentum(Vector<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateParameters(List<ILayer<T>>)

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateParameters(Matrix<T>, Matrix<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateParameters(Tensor<T>, Tensor<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateParameters(Vector<T>, Vector<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.SupportsGpuUpdate

GradientBasedOptimizerBase<T, TInput, TOutput>._gpuState

GradientBasedOptimizerBase<T, TInput, TOutput>._gpuStateInitialized

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)

GradientBasedOptimizerBase<T, TInput, TOutput>.InitializeGpuState(int, IDirectGpuBackend)

GradientBasedOptimizerBase<T, TInput, TOutput>.DisposeGpuState()

OptimizerBase<T, TInput, TOutput>.Engine

OptimizerBase<T, TInput, TOutput>.NumOps

OptimizerBase<T, TInput, TOutput>.Random

OptimizerBase<T, TInput, TOutput>.Options

OptimizerBase<T, TInput, TOutput>.PredictionOptions

OptimizerBase<T, TInput, TOutput>.ModelStatsOptions

OptimizerBase<T, TInput, TOutput>.ModelEvaluator

OptimizerBase<T, TInput, TOutput>.FitDetector

OptimizerBase<T, TInput, TOutput>.FitnessCalculator

OptimizerBase<T, TInput, TOutput>.FitnessList

OptimizerBase<T, TInput, TOutput>.IterationHistoryList

OptimizerBase<T, TInput, TOutput>.ModelCache

OptimizerBase<T, TInput, TOutput>.CurrentLearningRate

OptimizerBase<T, TInput, TOutput>.CurrentMomentum

OptimizerBase<T, TInput, TOutput>.IterationsWithoutImprovement

OptimizerBase<T, TInput, TOutput>.IterationsWithImprovement

OptimizerBase<T, TInput, TOutput>.Model

OptimizerBase<T, TInput, TOutput>.Optimize(OptimizationInputData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.GetCachedStepData(string)

OptimizerBase<T, TInput, TOutput>.CacheStepData(string, OptimizationStepData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.AdjustModelParameters(IFullModel<T, TInput, TOutput>, double, double)

OptimizerBase<T, TInput, TOutput>.RandomlySelectFeatures(int, int?, int?)

OptimizerBase<T, TInput, TOutput>.ApplyFeatureSelection(IFullModel<T, TInput, TOutput>, List<int>)

OptimizerBase<T, TInput, TOutput>.AdjustParameters(Vector<T>, double, double)

OptimizerBase<T, TInput, TOutput>.EvaluateSolution(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.PrepareAndEvaluateSolution(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.CalculateLoss(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.CreateOptimizationResult(OptimizationStepData<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.ApplyFeatureSelection(IFullModel<T, TInput, TOutput>, int)

OptimizerBase<T, TInput, TOutput>.CreateSolution(TInput)

OptimizerBase<T, TInput, TOutput>.GenerateCacheKey(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.UpdateBestSolution(OptimizationStepData<T, TInput, TOutput>, ref OptimizationStepData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.InitializeAdaptiveParameters()

OptimizerBase<T, TInput, TOutput>.Reset()

OptimizerBase<T, TInput, TOutput>.ResetAdaptiveParameters()

OptimizerBase<T, TInput, TOutput>.UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput>, OptimizationStepData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.UpdateIterationHistoryAndCheckEarlyStopping(int, OptimizationStepData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.ShouldEarlyStop()

OptimizerBase<T, TInput, TOutput>.Serialize()

OptimizerBase<T, TInput, TOutput>.Deserialize(byte[])

OptimizerBase<T, TInput, TOutput>.SerializeAdditionalData(BinaryWriter)

OptimizerBase<T, TInput, TOutput>.DeserializeAdditionalData(BinaryReader)

OptimizerBase<T, TInput, TOutput>.UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.Step()

OptimizerBase<T, TInput, TOutput>.CalculateUpdate(Dictionary<string, Vector<T>>)

OptimizerBase<T, TInput, TOutput>.GetOptions()

OptimizerBase<T, TInput, TOutput>.CalculateUpdate(Vector<T>, Vector<T>)

OptimizerBase<T, TInput, TOutput>.InitializeRandomSolution(Vector<T>, Vector<T>)

OptimizerBase<T, TInput, TOutput>.InitializeRandomSolution(TInput)

OptimizerBase<T, TInput, TOutput>.SaveModel(string)

OptimizerBase<T, TInput, TOutput>.LoadModel(string)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Extension Methods: DistributedExtensions.AsDistributed<T, TInput, TOutput>(IOptimizer<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IOptimizer<T, TInput, TOutput>, IShardingConfiguration<T>)

Remarks

The Nesterov Accelerated Gradient (NAG) is an optimization algorithm that improves upon standard gradient descent. It introduces a smart prediction of the next position of the parameters, which helps to dampen oscillations and improve convergence, especially in scenarios with high curvature or small but consistent gradients.

For Beginners: Imagine you're skiing down a hill. Regular gradient descent is like looking at your current position to decide where to go next. NAG is like looking ahead to where you'll be after your next move, and then deciding how to adjust your path. This "look-ahead" helps you navigate the slope more efficiently, especially around tricky turns.

Constructors

NesterovAcceleratedGradientOptimizer(IFullModel<T, TInput, TOutput>, NesterovAcceleratedGradientOptimizerOptions<T, TInput, TOutput>?, IEngine?)

Initializes a new instance of the NesterovAcceleratedGradientOptimizer class.

public NesterovAcceleratedGradientOptimizer(IFullModel<T, TInput, TOutput> model, NesterovAcceleratedGradientOptimizerOptions<T, TInput, TOutput>? options = null, IEngine? engine = null)

Parameters

model IFullModel<T, TInput, TOutput>: The model to optimize.
options NesterovAcceleratedGradientOptimizerOptions<T, TInput, TOutput>: The NAG-specific optimization options.
engine IEngine

Remarks

This constructor sets up the NAG optimizer with the provided options and dependencies. If no options are provided, it uses default settings.

For Beginners: This is like preparing your skis and gear before you start your descent. You're setting up all the tools and rules you'll use during your optimization journey.

Properties

SupportsGpuUpdate

Gets whether this optimizer supports GPU-accelerated parameter updates.

public override bool SupportsGpuUpdate { get; }

Property Value

bool

Methods

Deserialize(byte[])

Deserializes the Nesterov Accelerated Gradient optimizer from a byte array.

public override void Deserialize(byte[] data)

Parameters

data byte[]: The byte array containing the serialized optimizer state.

Remarks

This method reconstructs the optimizer's state from a byte array, including its options and parameters. It's used to restore a previously saved or transmitted optimizer state.

For Beginners: This is like using a saved snapshot to set up the skiing process exactly as it was before, placing the skier back where they were on the slope and restoring the techniques they were using.

Exceptions

InvalidOperationException: Thrown when the optimizer options cannot be deserialized.

DisposeGpuState()

Disposes GPU-allocated optimizer state.

public override void DisposeGpuState()

GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)

Generates a unique key for caching gradients in the Nesterov Accelerated Gradient optimizer.

protected override string GenerateGradientCacheKey(IFullModel<T, TInput, TOutput> model, TInput X, TOutput y)

Parameters

model IFullModel<T, TInput, TOutput>: The symbolic model for which the gradient is being calculated.
X TInput: The input data matrix.
y TOutput: The target vector.

Returns

string: A string key uniquely identifying the gradient calculation scenario for caching purposes.

Remarks

This method creates a unique identifier for caching gradients, incorporating the base key and NAG-specific parameters.

For Beginners: This is like creating a special label for each unique skiing situation, considering not just the slope (model and data) but also the specific NAG skiing technique being used (initial momentum and learning rate).

GetOptions()

Gets the current options of the Nesterov Accelerated Gradient optimizer.

public override OptimizationAlgorithmOptions<T, TInput, TOutput> GetOptions()

Returns

OptimizationAlgorithmOptions<T, TInput, TOutput>: The current optimization algorithm options.

Remarks

This method returns the current configuration options of the optimizer.

For Beginners: This is like asking to see the current set of rules the skier is following on their descent.

InitializeAdaptiveParameters()

Initializes the adaptive parameters for the NAG optimizer.

protected override void InitializeAdaptiveParameters()

Remarks

This method sets up the initial values for the learning rate and momentum.

For Beginners: This is like setting your initial speed and direction before you start skiing. You're deciding how fast to move and how much to consider your previous direction.

InitializeGpuState(int, IDirectGpuBackend)

Initializes NAG optimizer state on the GPU.

public override void InitializeGpuState(int parameterCount, IDirectGpuBackend backend)

Parameters

parameterCount int
backend IDirectGpuBackend

Optimize(OptimizationInputData<T, TInput, TOutput>)

Performs the optimization process using the Nesterov Accelerated Gradient algorithm.

public override OptimizationResult<T, TInput, TOutput> Optimize(OptimizationInputData<T, TInput, TOutput> inputData)

Parameters

inputData OptimizationInputData<T, TInput, TOutput>: The input data for the optimization process.

Returns

OptimizationResult<T, TInput, TOutput>: The result of the optimization process.

Remarks

This method implements the main optimization loop. It uses the NAG algorithm to update the solution iteratively, aiming to find the optimal set of parameters that minimize the loss function.

For Beginners: This is your actual ski run. You start at the top of the hill (your initial solution) and then repeatedly: 1. Look ahead to where you might be after your next move. 2. Check the steepness (gradient) at that future position. 3. Adjust your speed and direction based on what you see. 4. Make your move. You keep doing this until you reach the bottom of the hill or decide you're close enough to the best spot.

DataLoader Integration: This method uses the DataLoader API for efficient batch processing. It creates a batcher using CreateBatcher(OptimizationInputData<T, TInput, TOutput>, int) and notifies the sampler of epoch starts using NotifyEpochStart(int).

ReverseUpdate(Vector<T>, Vector<T>)

Reverses a Nesterov Accelerated Gradient update to recover original parameters.

public override Vector<T> ReverseUpdate(Vector<T> updatedParameters, Vector<T> appliedGradients)

Parameters

updatedParameters Vector<T>: Parameters after NAG update
appliedGradients Vector<T>: The gradients that were applied

Returns

Vector<T>: Original parameters before the update

Remarks

NAG's reverse update requires the optimizer's internal velocity state from the forward pass. This method must be called immediately after UpdateParameters while the velocity is fresh. NAG evaluates gradients at a lookahead position, but the reversal only needs the final velocity.

For Beginners: This calculates where parameters were before a NAG update. NAG uses velocity (built from lookahead gradients) to update parameters. To reverse, we just need to know what velocity was used to take the step.

Serialize()

Serializes the Nesterov Accelerated Gradient optimizer to a byte array.

public override byte[] Serialize()

Returns

byte[]: A byte array representing the serialized state of the optimizer.

Remarks

This method converts the current state of the optimizer, including its options and parameters, into a byte array. This allows the optimizer's state to be saved or transmitted.

For Beginners: This is like taking a snapshot of the entire skiing process, including where the skier is on the slope and what techniques they're using, so you can save it or send it to someone else.

UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput>, OptimizationStepData<T, TInput, TOutput>)

Updates the adaptive parameters of the NAG optimizer.

protected override void UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput> currentStepData, OptimizationStepData<T, TInput, TOutput> previousStepData)

Parameters

currentStepData OptimizationStepData<T, TInput, TOutput>: The current optimization step data.
previousStepData OptimizationStepData<T, TInput, TOutput>: The previous optimization step data.

Remarks

This method adjusts the learning rate and momentum based on the improvement in fitness. It's used to fine-tune the algorithm's behavior as the optimization progresses.

For Beginners: This is like adjusting your skiing technique as you go down the hill. If you're making good progress, you might decide to go a bit faster or trust your momentum more. If you're not improving, you might slow down or be more cautious about following your previous direction.

UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)

Updates the optimizer's options with new settings.

protected override void UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput> options)

Parameters

options OptimizationAlgorithmOptions<T, TInput, TOutput>: The new options to be applied to the optimizer.

Remarks

This method ensures that only compatible option types are used with this optimizer. It updates the internal options if the provided options are of the correct type.

For Beginners: This is like changing the rules for how the skier should navigate the slope. It makes sure you're only using rules that work for this specific type of skiing technique (Nesterov Accelerated Gradient method).

Exceptions

ArgumentException: Thrown when the provided options are not of the correct type.

UpdateParameters(Vector<T>, Vector<T>)

Updates a vector of parameters using the Nesterov Accelerated Gradient algorithm.

public override Vector<T> UpdateParameters(Vector<T> parameters, Vector<T> gradient)

Parameters

parameters Vector<T>: The current parameter vector to be updated.
gradient Vector<T>: The gradient vector corresponding to the parameters.

Returns

Vector<T>: The updated parameter vector.

Remarks

NAG uses a lookahead mechanism where it evaluates the gradient at a predicted future position, then uses that gradient to update velocity. This lookahead gives NAG better convergence properties than standard momentum.

For Beginners: NAG is like looking ahead while skiing - you peek at the slope ahead before making your move, which helps you make smarter adjustments to your speed and direction.

UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)

Updates parameters on the GPU using the NAG kernel.

public override void UpdateParametersGpu(IGpuBuffer parameters, IGpuBuffer gradients, int parameterCount, IDirectGpuBackend backend)

Parameters

parameters IGpuBuffer
gradients IGpuBuffer
parameterCount int
backend IDirectGpuBackend

UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)

Updates the current solution using the velocity vector.

protected override IFullModel<T, TInput, TOutput> UpdateSolution(IFullModel<T, TInput, TOutput> currentSolution, Vector<T> velocity)

Parameters

currentSolution IFullModel<T, TInput, TOutput>: The current solution.
velocity Vector<T>: The current velocity vector.

Returns

IFullModel<T, TInput, TOutput>: The updated solution.

Remarks

This method computes the new solution by applying the velocity to the current solution.

For Beginners: This is like actually making your move down the slope. You take your current position and adjust it based on your speed and direction (velocity).

Table of Contents

Class NesterovAcceleratedGradientOptimizer<T, TInput, TOutput>

Type Parameters

Remarks

Constructors

NesterovAcceleratedGradientOptimizer(IFullModel<T, TInput, TOutput>, NesterovAcceleratedGradientOptimizerOptions<T, TInput, TOutput>?, IEngine?)

Parameters

Remarks

Properties

SupportsGpuUpdate

Property Value

Methods

Deserialize(byte[])

Parameters

Remarks

Exceptions

DisposeGpuState()

GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)

Parameters

Returns

Remarks

GetOptions()

Returns

Remarks

InitializeAdaptiveParameters()

Remarks

InitializeGpuState(int, IDirectGpuBackend)

Parameters

Optimize(OptimizationInputData<T, TInput, TOutput>)

Parameters

Returns

Remarks

ReverseUpdate(Vector<T>, Vector<T>)

Parameters

Returns

Remarks

Serialize()

Returns

Remarks

UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput>, OptimizationStepData<T, TInput, TOutput>)

Parameters

Remarks

UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)

Parameters

Remarks

Exceptions

UpdateParameters(Vector<T>, Vector<T>)

Parameters

Returns

Remarks

UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)

Parameters

UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)

Parameters

Returns

Remarks