Class NadamOptimizer<T, TInput, TOutput>

Namespace: AiDotNet.Optimizers

Assembly: AiDotNet.dll

Implements the Nesterov-accelerated Adaptive Moment Estimation (Nadam) optimization algorithm.

public class NadamOptimizer<T, TInput, TOutput> : GradientBasedOptimizerBase<T, TInput, TOutput>, IGradientBasedOptimizer<T, TInput, TOutput>, IOptimizer<T, TInput, TOutput>, IModelSerializer

Type Parameters

T: The numeric type used for calculations, typically float or double.
TInput
TOutput

Inheritance: object

OptimizerBase<T, TInput, TOutput>

GradientBasedOptimizerBase<T, TInput, TOutput>

NadamOptimizer<T, TInput, TOutput>

Implements: IGradientBasedOptimizer<T, TInput, TOutput>

IOptimizer<T, TInput, TOutput>

IModelSerializer

Inherited Members: GradientBasedOptimizerBase<T, TInput, TOutput>.GradientOptions

GradientBasedOptimizerBase<T, TInput, TOutput>._previousGradient

GradientBasedOptimizerBase<T, TInput, TOutput>._lastComputedGradients

GradientBasedOptimizerBase<T, TInput, TOutput>.GradientCache

GradientBasedOptimizerBase<T, TInput, TOutput>.LossFunction

GradientBasedOptimizerBase<T, TInput, TOutput>.Regularization

GradientBasedOptimizerBase<T, TInput, TOutput>._mixedPrecisionContext

GradientBasedOptimizerBase<T, TInput, TOutput>._learningRateScheduler

GradientBasedOptimizerBase<T, TInput, TOutput>._schedulerStepMode

GradientBasedOptimizerBase<T, TInput, TOutput>._currentStep

GradientBasedOptimizerBase<T, TInput, TOutput>._currentEpoch

GradientBasedOptimizerBase<T, TInput, TOutput>.IsMixedPrecisionEnabled

GradientBasedOptimizerBase<T, TInput, TOutput>.LearningRateScheduler

GradientBasedOptimizerBase<T, TInput, TOutput>.SchedulerStepMode

GradientBasedOptimizerBase<T, TInput, TOutput>.CreateBatcher(OptimizationInputData<T, TInput, TOutput>, int)

GradientBasedOptimizerBase<T, TInput, TOutput>.CreateBatcher(OptimizationInputData<T, TInput, TOutput>, int, IDataSampler)

GradientBasedOptimizerBase<T, TInput, TOutput>.NotifyEpochStart(int)

GradientBasedOptimizerBase<T, TInput, TOutput>.LastComputedGradients

GradientBasedOptimizerBase<T, TInput, TOutput>.ApplyGradients(Vector<T>, IFullModel<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.ApplyGradients(Vector<T>, Vector<T>, IFullModel<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.ReverseUpdate(Vector<T>, Vector<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.CreateRegularization(GradientDescentOptimizerOptions<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.CalculateGradient(IFullModel<T, TInput, TOutput>, TInput, TOutput)

GradientBasedOptimizerBase<T, TInput, TOutput>.ApplyGradientClipping(Vector<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.AreGradientsExploding(double)

GradientBasedOptimizerBase<T, TInput, TOutput>.AreGradientsVanishing(double)

GradientBasedOptimizerBase<T, TInput, TOutput>.GetGradientNorm()

GradientBasedOptimizerBase<T, TInput, TOutput>.ComputeHessianEfficiently(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.ComputeHessianFiniteDifferences(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.LineSearch(IFullModel<T, TInput, TOutput>, Vector<T>, Vector<T>, OptimizationInputData<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.CalculateGradient(IFullModel<T, TInput, TOutput>, TInput, TOutput, int[])

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)

GradientBasedOptimizerBase<T, TInput, TOutput>.Reset()

GradientBasedOptimizerBase<T, TInput, TOutput>.StepScheduler()

GradientBasedOptimizerBase<T, TInput, TOutput>.OnEpochEnd()

GradientBasedOptimizerBase<T, TInput, TOutput>.OnBatchEnd()

GradientBasedOptimizerBase<T, TInput, TOutput>.IsInWarmupPhase()

GradientBasedOptimizerBase<T, TInput, TOutput>.GetCurrentLearningRate()

GradientBasedOptimizerBase<T, TInput, TOutput>.CurrentStep

GradientBasedOptimizerBase<T, TInput, TOutput>.CurrentEpoch

GradientBasedOptimizerBase<T, TInput, TOutput>.ApplyMomentum(Vector<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateParameters(List<ILayer<T>>)

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateParameters(Matrix<T>, Matrix<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateParameters(Tensor<T>, Tensor<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateParameters(Vector<T>, Vector<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.SupportsGpuUpdate

GradientBasedOptimizerBase<T, TInput, TOutput>._gpuState

GradientBasedOptimizerBase<T, TInput, TOutput>._gpuStateInitialized

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)

GradientBasedOptimizerBase<T, TInput, TOutput>.InitializeGpuState(int, IDirectGpuBackend)

GradientBasedOptimizerBase<T, TInput, TOutput>.DisposeGpuState()

OptimizerBase<T, TInput, TOutput>.Engine

OptimizerBase<T, TInput, TOutput>.NumOps

OptimizerBase<T, TInput, TOutput>.Random

OptimizerBase<T, TInput, TOutput>.Options

OptimizerBase<T, TInput, TOutput>.PredictionOptions

OptimizerBase<T, TInput, TOutput>.ModelStatsOptions

OptimizerBase<T, TInput, TOutput>.ModelEvaluator

OptimizerBase<T, TInput, TOutput>.FitDetector

OptimizerBase<T, TInput, TOutput>.FitnessCalculator

OptimizerBase<T, TInput, TOutput>.FitnessList

OptimizerBase<T, TInput, TOutput>.IterationHistoryList

OptimizerBase<T, TInput, TOutput>.ModelCache

OptimizerBase<T, TInput, TOutput>.CurrentLearningRate

OptimizerBase<T, TInput, TOutput>.CurrentMomentum

OptimizerBase<T, TInput, TOutput>.IterationsWithoutImprovement

OptimizerBase<T, TInput, TOutput>.IterationsWithImprovement

OptimizerBase<T, TInput, TOutput>.Model

OptimizerBase<T, TInput, TOutput>.Optimize(OptimizationInputData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.GetCachedStepData(string)

OptimizerBase<T, TInput, TOutput>.CacheStepData(string, OptimizationStepData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.AdjustModelParameters(IFullModel<T, TInput, TOutput>, double, double)

OptimizerBase<T, TInput, TOutput>.RandomlySelectFeatures(int, int?, int?)

OptimizerBase<T, TInput, TOutput>.ApplyFeatureSelection(IFullModel<T, TInput, TOutput>, List<int>)

OptimizerBase<T, TInput, TOutput>.AdjustParameters(Vector<T>, double, double)

OptimizerBase<T, TInput, TOutput>.EvaluateSolution(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.PrepareAndEvaluateSolution(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.CalculateLoss(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.CreateOptimizationResult(OptimizationStepData<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.ApplyFeatureSelection(IFullModel<T, TInput, TOutput>, int)

OptimizerBase<T, TInput, TOutput>.CreateSolution(TInput)

OptimizerBase<T, TInput, TOutput>.GenerateCacheKey(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.UpdateBestSolution(OptimizationStepData<T, TInput, TOutput>, ref OptimizationStepData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.InitializeAdaptiveParameters()

OptimizerBase<T, TInput, TOutput>.Reset()

OptimizerBase<T, TInput, TOutput>.ResetAdaptiveParameters()

OptimizerBase<T, TInput, TOutput>.UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput>, OptimizationStepData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.UpdateIterationHistoryAndCheckEarlyStopping(int, OptimizationStepData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.ShouldEarlyStop()

OptimizerBase<T, TInput, TOutput>.Serialize()

OptimizerBase<T, TInput, TOutput>.Deserialize(byte[])

OptimizerBase<T, TInput, TOutput>.SerializeAdditionalData(BinaryWriter)

OptimizerBase<T, TInput, TOutput>.DeserializeAdditionalData(BinaryReader)

OptimizerBase<T, TInput, TOutput>.UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.Step()

OptimizerBase<T, TInput, TOutput>.CalculateUpdate(Dictionary<string, Vector<T>>)

OptimizerBase<T, TInput, TOutput>.GetOptions()

OptimizerBase<T, TInput, TOutput>.CalculateUpdate(Vector<T>, Vector<T>)

OptimizerBase<T, TInput, TOutput>.InitializeRandomSolution(Vector<T>, Vector<T>)

OptimizerBase<T, TInput, TOutput>.InitializeRandomSolution(TInput)

OptimizerBase<T, TInput, TOutput>.SaveModel(string)

OptimizerBase<T, TInput, TOutput>.LoadModel(string)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Extension Methods: DistributedExtensions.AsDistributed<T, TInput, TOutput>(IOptimizer<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IOptimizer<T, TInput, TOutput>, IShardingConfiguration<T>)

Remarks

Nadam combines the ideas of Adam (adaptive learning rates) and Nesterov accelerated gradient (NAG). It adapts the learning rates of each parameter and incorporates momentum using Nesterov's method.

For Beginners: Imagine you're rolling a smart ball down a hill. This ball can adjust its speed for different parts of the hill (adaptive learning rates), and it can look ahead to anticipate slopes (Nesterov's method). This combination helps it find the lowest point more efficiently.

Constructors

NadamOptimizer(IFullModel<T, TInput, TOutput>, NadamOptimizerOptions<T, TInput, TOutput>?)

Initializes a new instance of the NadamOptimizer class.

public NadamOptimizer(IFullModel<T, TInput, TOutput> model, NadamOptimizerOptions<T, TInput, TOutput>? options = null)

Parameters

model IFullModel<T, TInput, TOutput>: The model to optimize.
options NadamOptimizerOptions<T, TInput, TOutput>: The Nadam-specific optimization options.

Remarks

This constructor sets up the Nadam optimizer with the provided options and dependencies. If no options are provided, it uses default settings.

For Beginners: This is like preparing your smart ball for the hill-rolling experiment. You're setting up its initial properties and deciding how it will adapt during its journey.

Methods

Deserialize(byte[])

Deserializes a byte array to restore the optimizer's state.

public override void Deserialize(byte[] data)

Parameters

data byte[]: The byte array containing the serialized optimizer state.

Remarks

This method takes a byte array (previously created by Serialize) and uses it to restore the optimizer's state, including its base class state, options, and time step.

For Beginners: This is like using a detailed blueprint to recreate your smart ball rolling experiment exactly as it was at a certain point. It allows you to set up the experiment to match a previous state, with all the same rules and conditions.

Exceptions

InvalidOperationException: Thrown when the optimizer options cannot be deserialized.

DisposeGpuState()

Disposes GPU-allocated optimizer state.

public override void DisposeGpuState()

GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)

Generates a unique key for caching gradients.

protected override string GenerateGradientCacheKey(IFullModel<T, TInput, TOutput> model, TInput X, TOutput y)

Parameters

model IFullModel<T, TInput, TOutput>: The current symbolic model.
X TInput: The input feature matrix.
y TOutput: The target vector.

Returns

string: A string that uniquely identifies the current gradient calculation scenario.

Remarks

This method creates a unique identifier for caching gradients based on the current model, input data, and Nadam-specific parameters. This helps in efficiently reusing previously calculated gradients when possible.

For Beginners: This is like creating a special label for each unique situation your smart ball encounters. It helps the ball remember and quickly recall how it should move in similar situations, making the whole process more efficient.

GetOptions()

Gets the current optimization algorithm options.

public override OptimizationAlgorithmOptions<T, TInput, TOutput> GetOptions()

Returns

OptimizationAlgorithmOptions<T, TInput, TOutput>: The current NadamOptimizerOptions object.

Remarks

This method returns the current options used by the Nadam optimizer.

For Beginners: This is like checking your current smart ball rolling rules. It lets you see all the settings and strategies you're currently using in your experiment.

InitializeAdaptiveParameters()

Initializes the adaptive parameters for the Nadam optimizer.

protected override void InitializeAdaptiveParameters()

Remarks

This method sets up the initial learning rate and resets the time step counter.

For Beginners: This is like setting the initial speed of your smart ball and resetting its internal clock before it starts rolling.

InitializeGpuState(int, IDirectGpuBackend)

Initializes Nadam optimizer state on the GPU.

public override void InitializeGpuState(int parameterCount, IDirectGpuBackend backend)

Parameters

parameterCount int: Number of parameters.
backend IDirectGpuBackend: GPU backend for memory allocation.

Optimize(OptimizationInputData<T, TInput, TOutput>)

Performs the optimization process using the Nadam algorithm.

public override OptimizationResult<T, TInput, TOutput> Optimize(OptimizationInputData<T, TInput, TOutput> inputData)

Parameters

inputData OptimizationInputData<T, TInput, TOutput>: The input data for the optimization process.

Returns

OptimizationResult<T, TInput, TOutput>: The result of the optimization process.

Remarks

This method implements the main optimization loop. It iterates through the data, calculating gradients, updating the momentum and adaptive learning rates, and adjusting the model parameters accordingly.

For Beginners: This is the actual process of rolling your smart ball down the hill. In each step, you're calculating which way the ball should roll (gradient), how fast it's moving (momentum), and how it should adapt its speed (adaptive learning rates). You keep doing this until the ball finds the lowest point or you've rolled it enough times.

DataLoader Integration: This method uses the DataLoader API for efficient batch processing. It creates a batcher using CreateBatcher(OptimizationInputData<T, TInput, TOutput>, int) and notifies the sampler of epoch starts using NotifyEpochStart(int).

ReverseUpdate(Vector<T>, Vector<T>)

Reverses a Nadam gradient update to recover original parameters.

public override Vector<T> ReverseUpdate(Vector<T> updatedParameters, Vector<T> appliedGradients)

Parameters

updatedParameters Vector<T>: Parameters after Nadam update
appliedGradients Vector<T>: The gradients that were applied

Returns

Vector<T>: Original parameters before the update

Remarks

Nadam's reverse update requires the optimizer's internal state (_m, _v, _t) from the forward pass. This method must be called immediately after UpdateParameters while the state is fresh. It recalculates the Nesterov-accelerated adaptive update that was applied.

For Beginners: This calculates where parameters were before a Nadam update. Nadam combines lookahead (Nesterov) with adaptive learning (Adam), so reversing requires both the momentum history (_m) and variance history (_v) to reconstruct the lookahead step.

Serialize()

Serializes the optimizer's state into a byte array.

public override byte[] Serialize()

Returns

byte[]: A byte array representing the serialized state of the optimizer.

Remarks

This method converts the current state of the optimizer, including its base class state, options, and time step, into a byte array. This is useful for saving the optimizer's state or transferring it between systems.

For Beginners: Think of this as taking a snapshot of your entire smart ball rolling experiment. It captures all the details of your current setup, including the ball's position, speed, and all your rules. This snapshot can be used to recreate the exact same experiment later or share it with others.

UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput>, OptimizationStepData<T, TInput, TOutput>)

Updates the adaptive parameters of the optimizer based on the current and previous optimization steps.

protected override void UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput> currentStepData, OptimizationStepData<T, TInput, TOutput> previousStepData)

Parameters

currentStepData OptimizationStepData<T, TInput, TOutput>: Data from the current optimization step.
previousStepData OptimizationStepData<T, TInput, TOutput>: Data from the previous optimization step.

Remarks

This method adjusts the learning rate based on the performance of the current step compared to the previous step. If improvement is seen, the learning rate may be increased, otherwise it may be decreased.

For Beginners: This is like adjusting how fast your ball rolls based on whether it's getting closer to the bottom of the hill. If it's improving, you might let it roll a bit faster. If not, you might slow it down to be more careful.

UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)

Updates the optimizer's options with new settings.

protected override void UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput> options)

Parameters

options OptimizationAlgorithmOptions<T, TInput, TOutput>: The new options to be applied to the optimizer.

Remarks

This method ensures that only compatible option types are used with this optimizer. It updates the internal options if the provided options are of the correct type.

For Beginners: This is like changing the rules of how your smart ball rolls mid-experiment. It makes sure you're only using rules that work for this specific type of smart ball (Nadam optimization).

Exceptions

ArgumentException: Thrown when the provided options are not of the correct type.

UpdateParameters(Vector<T>, Vector<T>)

Updates a vector of parameters using the Nadam optimization algorithm.

public override Vector<T> UpdateParameters(Vector<T> parameters, Vector<T> gradient)

Parameters

parameters Vector<T>: The current parameter vector to be updated.
gradient Vector<T>: The gradient vector corresponding to the parameters.

Returns

Vector<T>: The updated parameter vector.

Remarks

Nadam combines Adam's adaptive learning rates with Nesterov's accelerated gradient, providing the benefits of both techniques: adaptive per-parameter learning rates and lookahead momentum.

For Beginners: Nadam is like a smart ball that not only adapts its speed for different parts of the hill (Adam) but also looks ahead to anticipate slopes (Nesterov).

UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)

Updates parameters on GPU using Nadam optimization.

public override void UpdateParametersGpu(IGpuBuffer parameters, IGpuBuffer gradients, int parameterCount, IDirectGpuBackend backend)

Parameters

parameters IGpuBuffer
gradients IGpuBuffer
parameterCount int
backend IDirectGpuBackend

UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)

Updates the current solution based on the calculated gradient using the Nadam algorithm.

protected override IFullModel<T, TInput, TOutput> UpdateSolution(IFullModel<T, TInput, TOutput> currentSolution, Vector<T> gradient)

Parameters

currentSolution IFullModel<T, TInput, TOutput>: The current model solution.
gradient Vector<T>: The calculated gradient.

Returns

IFullModel<T, TInput, TOutput>: An updated symbolic model with improved coefficients.

Remarks

This method applies the Nadam update rule to adjust the model parameters. It uses both momentum and adaptive learning rates, incorporating Nesterov's accelerated gradient.

For Beginners: This is like adjusting the ball's position based on its current speed, the slope it's on, and its ability to look ahead. It's a complex calculation that helps the ball move more efficiently towards the lowest point.

Table of Contents

Class NadamOptimizer<T, TInput, TOutput>

Type Parameters

Remarks

Constructors

NadamOptimizer(IFullModel<T, TInput, TOutput>, NadamOptimizerOptions<T, TInput, TOutput>?)

Parameters

Remarks

Methods

Deserialize(byte[])

Parameters

Remarks

Exceptions

DisposeGpuState()

GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)

Parameters

Returns

Remarks

GetOptions()

Returns

Remarks

InitializeAdaptiveParameters()

Remarks

InitializeGpuState(int, IDirectGpuBackend)

Parameters

Optimize(OptimizationInputData<T, TInput, TOutput>)

Parameters

Returns

Remarks

ReverseUpdate(Vector<T>, Vector<T>)

Parameters

Returns

Remarks

Serialize()

Returns

Remarks

UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput>, OptimizationStepData<T, TInput, TOutput>)

Parameters

Remarks

UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)

Parameters

Remarks

Exceptions

UpdateParameters(Vector<T>, Vector<T>)

Parameters

Returns

Remarks

UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)

Parameters

UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)

Parameters

Returns

Remarks