Class GradientDescentOptimizer<T, TInput, TOutput>

Namespace: AiDotNet.Optimizers

Assembly: AiDotNet.dll

Represents a Gradient Descent optimizer for machine learning models.

public class GradientDescentOptimizer<T, TInput, TOutput> : GradientBasedOptimizerBase<T, TInput, TOutput>, IGradientBasedOptimizer<T, TInput, TOutput>, IOptimizer<T, TInput, TOutput>, IModelSerializer

Type Parameters

T: The numeric type used for calculations, typically float or double.
TInput
TOutput

Inheritance: object

OptimizerBase<T, TInput, TOutput>

GradientBasedOptimizerBase<T, TInput, TOutput>

GradientDescentOptimizer<T, TInput, TOutput>

Implements: IGradientBasedOptimizer<T, TInput, TOutput>

IOptimizer<T, TInput, TOutput>

IModelSerializer

Inherited Members: GradientBasedOptimizerBase<T, TInput, TOutput>.GradientOptions

GradientBasedOptimizerBase<T, TInput, TOutput>._previousGradient

GradientBasedOptimizerBase<T, TInput, TOutput>._lastComputedGradients

GradientBasedOptimizerBase<T, TInput, TOutput>.GradientCache

GradientBasedOptimizerBase<T, TInput, TOutput>.LossFunction

GradientBasedOptimizerBase<T, TInput, TOutput>.Regularization

GradientBasedOptimizerBase<T, TInput, TOutput>._mixedPrecisionContext

GradientBasedOptimizerBase<T, TInput, TOutput>._learningRateScheduler

GradientBasedOptimizerBase<T, TInput, TOutput>._schedulerStepMode

GradientBasedOptimizerBase<T, TInput, TOutput>._currentStep

GradientBasedOptimizerBase<T, TInput, TOutput>._currentEpoch

GradientBasedOptimizerBase<T, TInput, TOutput>.IsMixedPrecisionEnabled

GradientBasedOptimizerBase<T, TInput, TOutput>.LearningRateScheduler

GradientBasedOptimizerBase<T, TInput, TOutput>.SchedulerStepMode

GradientBasedOptimizerBase<T, TInput, TOutput>.CreateBatcher(OptimizationInputData<T, TInput, TOutput>, int)

GradientBasedOptimizerBase<T, TInput, TOutput>.CreateBatcher(OptimizationInputData<T, TInput, TOutput>, int, IDataSampler)

GradientBasedOptimizerBase<T, TInput, TOutput>.NotifyEpochStart(int)

GradientBasedOptimizerBase<T, TInput, TOutput>.LastComputedGradients

GradientBasedOptimizerBase<T, TInput, TOutput>.ApplyGradients(Vector<T>, IFullModel<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.ApplyGradients(Vector<T>, Vector<T>, IFullModel<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.ReverseUpdate(Vector<T>, Vector<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.CreateRegularization(GradientDescentOptimizerOptions<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.CalculateGradient(IFullModel<T, TInput, TOutput>, TInput, TOutput)

GradientBasedOptimizerBase<T, TInput, TOutput>.ApplyGradientClipping(Vector<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.AreGradientsExploding(double)

GradientBasedOptimizerBase<T, TInput, TOutput>.AreGradientsVanishing(double)

GradientBasedOptimizerBase<T, TInput, TOutput>.GetGradientNorm()

GradientBasedOptimizerBase<T, TInput, TOutput>.ComputeHessianEfficiently(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.ComputeHessianFiniteDifferences(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.LineSearch(IFullModel<T, TInput, TOutput>, Vector<T>, Vector<T>, OptimizationInputData<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.CalculateGradient(IFullModel<T, TInput, TOutput>, TInput, TOutput, int[])

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)

GradientBasedOptimizerBase<T, TInput, TOutput>.Reset()

GradientBasedOptimizerBase<T, TInput, TOutput>.StepScheduler()

GradientBasedOptimizerBase<T, TInput, TOutput>.OnEpochEnd()

GradientBasedOptimizerBase<T, TInput, TOutput>.OnBatchEnd()

GradientBasedOptimizerBase<T, TInput, TOutput>.IsInWarmupPhase()

GradientBasedOptimizerBase<T, TInput, TOutput>.GetCurrentLearningRate()

GradientBasedOptimizerBase<T, TInput, TOutput>.CurrentStep

GradientBasedOptimizerBase<T, TInput, TOutput>.CurrentEpoch

GradientBasedOptimizerBase<T, TInput, TOutput>.ApplyMomentum(Vector<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateParameters(List<ILayer<T>>)

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateParameters(Matrix<T>, Matrix<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateParameters(Tensor<T>, Tensor<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateParameters(Vector<T>, Vector<T>)

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)

GradientBasedOptimizerBase<T, TInput, TOutput>.SupportsGpuUpdate

GradientBasedOptimizerBase<T, TInput, TOutput>._gpuState

GradientBasedOptimizerBase<T, TInput, TOutput>._gpuStateInitialized

GradientBasedOptimizerBase<T, TInput, TOutput>.UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)

GradientBasedOptimizerBase<T, TInput, TOutput>.InitializeGpuState(int, IDirectGpuBackend)

GradientBasedOptimizerBase<T, TInput, TOutput>.DisposeGpuState()

OptimizerBase<T, TInput, TOutput>.Engine

OptimizerBase<T, TInput, TOutput>.NumOps

OptimizerBase<T, TInput, TOutput>.Random

OptimizerBase<T, TInput, TOutput>.Options

OptimizerBase<T, TInput, TOutput>.PredictionOptions

OptimizerBase<T, TInput, TOutput>.ModelStatsOptions

OptimizerBase<T, TInput, TOutput>.ModelEvaluator

OptimizerBase<T, TInput, TOutput>.FitDetector

OptimizerBase<T, TInput, TOutput>.FitnessCalculator

OptimizerBase<T, TInput, TOutput>.FitnessList

OptimizerBase<T, TInput, TOutput>.IterationHistoryList

OptimizerBase<T, TInput, TOutput>.ModelCache

OptimizerBase<T, TInput, TOutput>.CurrentLearningRate

OptimizerBase<T, TInput, TOutput>.CurrentMomentum

OptimizerBase<T, TInput, TOutput>.IterationsWithoutImprovement

OptimizerBase<T, TInput, TOutput>.IterationsWithImprovement

OptimizerBase<T, TInput, TOutput>.Model

OptimizerBase<T, TInput, TOutput>.Optimize(OptimizationInputData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.GetCachedStepData(string)

OptimizerBase<T, TInput, TOutput>.CacheStepData(string, OptimizationStepData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.AdjustModelParameters(IFullModel<T, TInput, TOutput>, double, double)

OptimizerBase<T, TInput, TOutput>.RandomlySelectFeatures(int, int?, int?)

OptimizerBase<T, TInput, TOutput>.ApplyFeatureSelection(IFullModel<T, TInput, TOutput>, List<int>)

OptimizerBase<T, TInput, TOutput>.AdjustParameters(Vector<T>, double, double)

OptimizerBase<T, TInput, TOutput>.EvaluateSolution(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.PrepareAndEvaluateSolution(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.CalculateLoss(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.CreateOptimizationResult(OptimizationStepData<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.ApplyFeatureSelection(IFullModel<T, TInput, TOutput>, int)

OptimizerBase<T, TInput, TOutput>.CreateSolution(TInput)

OptimizerBase<T, TInput, TOutput>.GenerateCacheKey(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.UpdateBestSolution(OptimizationStepData<T, TInput, TOutput>, ref OptimizationStepData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.InitializeAdaptiveParameters()

OptimizerBase<T, TInput, TOutput>.Reset()

OptimizerBase<T, TInput, TOutput>.ResetAdaptiveParameters()

OptimizerBase<T, TInput, TOutput>.UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput>, OptimizationStepData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.UpdateIterationHistoryAndCheckEarlyStopping(int, OptimizationStepData<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.ShouldEarlyStop()

OptimizerBase<T, TInput, TOutput>.Serialize()

OptimizerBase<T, TInput, TOutput>.Deserialize(byte[])

OptimizerBase<T, TInput, TOutput>.SerializeAdditionalData(BinaryWriter)

OptimizerBase<T, TInput, TOutput>.DeserializeAdditionalData(BinaryReader)

OptimizerBase<T, TInput, TOutput>.UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)

OptimizerBase<T, TInput, TOutput>.Step()

OptimizerBase<T, TInput, TOutput>.CalculateUpdate(Dictionary<string, Vector<T>>)

OptimizerBase<T, TInput, TOutput>.GetOptions()

OptimizerBase<T, TInput, TOutput>.CalculateUpdate(Vector<T>, Vector<T>)

OptimizerBase<T, TInput, TOutput>.InitializeRandomSolution(Vector<T>, Vector<T>)

OptimizerBase<T, TInput, TOutput>.InitializeRandomSolution(TInput)

OptimizerBase<T, TInput, TOutput>.SaveModel(string)

OptimizerBase<T, TInput, TOutput>.LoadModel(string)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Extension Methods: DistributedExtensions.AsDistributed<T, TInput, TOutput>(IOptimizer<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IOptimizer<T, TInput, TOutput>, IShardingConfiguration<T>)

Remarks

Gradient Descent is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. It takes steps proportional to the negative of the gradient of the function at the current point.

For Beginners: Imagine you're trying to find the lowest point in a valley:

You start at a random point (initial model parameters)
You look around to see which way is steepest downhill (calculate the gradient)
You take a step in that direction (update the parameters)
You repeat this process until you reach the bottom of the valley (optimize the model)

This optimizer helps the model learn by gradually adjusting its parameters to minimize errors.

Constructors

GradientDescentOptimizer(IFullModel<T, TInput, TOutput>, GradientDescentOptimizerOptions<T, TInput, TOutput>?, IEngine?)

Initializes a new instance of the GradientDescentOptimizer class.

public GradientDescentOptimizer(IFullModel<T, TInput, TOutput> model, GradientDescentOptimizerOptions<T, TInput, TOutput>? options = null, IEngine? engine = null)

Parameters

model IFullModel<T, TInput, TOutput>: The model to optimize.
options GradientDescentOptimizerOptions<T, TInput, TOutput>: Options for the Gradient Descent optimizer.
engine IEngine

Remarks

For Beginners: This sets up the Gradient Descent optimizer with its initial settings. It's like preparing for your hike by choosing your starting point, deciding how big your steps will be, and how you'll adjust your path to avoid getting stuck in small dips.

Methods

Deserialize(byte[])

Restores the state of the Gradient Descent optimizer from a byte array.

public override void Deserialize(byte[] data)

Parameters

data byte[]: The byte array containing the serialized optimizer state.

Remarks

This method deserializes both the base class data and the Gradient Descent-specific options from a byte array, typically created by the Serialize method. It reconstructs the optimizer's state, including all settings and progress information.

For Beginners: This is like unpacking your hiking gear and reading your saved plan:

It takes the saved information (byte array) and uses it to set up the optimizer
This allows you to continue optimizing from where you left off, or use someone else's setup
It's the reverse process of Serialize, turning the saved data back into a working optimizer

Imagine you're starting a hike using a very detailed guide someone else wrote. This method helps you set everything up exactly as described in that guide.

Exceptions

InvalidOperationException: Thrown when deserialization of optimizer options fails.

GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)

Generates a unique key for caching gradients specific to the Gradient Descent optimizer.

protected override string GenerateGradientCacheKey(IFullModel<T, TInput, TOutput> model, TInput X, TOutput y)

Parameters

model IFullModel<T, TInput, TOutput>: The current model being optimized.
X TInput: The input features used for gradient calculation.
y TOutput: The target values used for gradient calculation.

Returns

string: A string that uniquely identifies the current gradient calculation scenario.

Remarks

This method extends the base class's gradient cache key generation by adding Gradient Descent-specific parameters. The resulting key is unique to the current state of the optimizer and the input data, allowing for efficient caching and retrieval of previously calculated gradients.

For Beginners: Think of this method as creating a unique label for each gradient calculation:

It starts with a basic label (from the base class) that describes the model and data
Then it adds specific details about the Gradient Descent optimizer, like how big steps it's taking (learning rate) and how many times it plans to adjust the model (max iterations)
This unique label helps the optimizer remember and quickly find previous calculations, making the whole process faster and more efficient

It's like keeping a well-organized hiking journal where you can quickly look up information about specific points in your journey.

GetOptions()

Retrieves the current options for the Gradient Descent optimizer.

public override OptimizationAlgorithmOptions<T, TInput, TOutput> GetOptions()

Returns

OptimizationAlgorithmOptions<T, TInput, TOutput>: The current Gradient Descent optimizer options.

Remarks

This method returns the current configuration options for the Gradient Descent optimizer. These options control various aspects of the optimization process, such as learning rate, maximum iterations, and regularization settings.

For Beginners: Think of this method as checking your current hiking plan:

It tells you things like how big your steps are (learning rate)
How long you plan to hike (maximum iterations)
What rules you're following to avoid getting lost (regularization settings)

This information is useful if you want to understand or adjust how the optimizer is currently set up.

Optimize(OptimizationInputData<T, TInput, TOutput>)

Performs the main optimization process using the Gradient Descent algorithm.

public override OptimizationResult<T, TInput, TOutput> Optimize(OptimizationInputData<T, TInput, TOutput> inputData)

Parameters

inputData OptimizationInputData<T, TInput, TOutput>: The input data for the optimization process.

Returns

OptimizationResult<T, TInput, TOutput>: The result of the optimization process.

Remarks

For Beginners: This is the heart of the Gradient Descent algorithm. It: 1. Starts with a random solution 2. Calculates how to improve the solution (the gradient) 3. Updates the solution by taking a step in the direction of improvement 4. Repeats this process many times

It's like repeatedly adjusting your path as you hike, always trying to move towards lower ground.

DataLoader Integration: This method uses the DataLoader API for efficient batch processing. It creates a batcher using CreateBatcher(OptimizationInputData<T, TInput, TOutput>, int) and notifies the sampler of epoch starts using NotifyEpochStart(int).

ReverseUpdate(Vector<T>, Vector<T>)

Reverses a Gradient Descent update to recover original parameters.

public override Vector<T> ReverseUpdate(Vector<T> updatedParameters, Vector<T> appliedGradients)

Parameters

updatedParameters Vector<T>: Parameters after GD update
appliedGradients Vector<T>: The gradients that were applied

Returns

Vector<T>: Original parameters before the update

Remarks

Gradient Descent uses vanilla SGD update rule: params_new = params_old - lr * gradient. The reverse is straightforward: params_old = params_new + lr * gradient.

For Beginners: This calculates where parameters were before a Gradient Descent update. Since GD uses simple steps (parameter minus learning_rate times gradient), reversing just means adding back that step.

Serialize()

Converts the current state of the Gradient Descent optimizer into a byte array for storage or transmission.

public override byte[] Serialize()

Returns

byte[]: A byte array representing the serialized state of the optimizer.

Remarks

This method serializes both the base class data and the Gradient Descent-specific options. It uses a combination of binary serialization for efficiency and JSON serialization for flexibility.

For Beginners: This is like packing up your hiking gear and writing down your plan:

It saves all the important information about the optimizer's current state
This saved information can be used later to recreate the optimizer exactly as it is now
It's useful for saving your progress or sharing your optimizer setup with others

Think of it as creating a detailed snapshot of your hiking journey that you can use to continue from the same point later or allow someone else to follow your exact path.

UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)

Updates the options for the Gradient Descent optimizer.

protected override void UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput> options)

Parameters

options OptimizationAlgorithmOptions<T, TInput, TOutput>: The new options to apply to the optimizer.

Remarks

For Beginners: This method allows you to change the settings of the optimizer while it's running. It's like adjusting your hiking strategy mid-journey based on the terrain you encounter.

Exceptions

ArgumentException: Thrown when the provided options are not of the correct type.

UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)

Updates parameters on the GPU using vanilla SGD.

public override void UpdateParametersGpu(IGpuBuffer parameters, IGpuBuffer gradients, int parameterCount, IDirectGpuBackend backend)

Parameters

parameters IGpuBuffer
gradients IGpuBuffer
parameterCount int
backend IDirectGpuBackend

UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)

Updates the current solution based on the calculated gradient.

protected override IFullModel<T, TInput, TOutput> UpdateSolution(IFullModel<T, TInput, TOutput> currentSolution, Vector<T> gradient)

Parameters

currentSolution IFullModel<T, TInput, TOutput>: The current solution.
gradient Vector<T>: The calculated gradient.

Returns

IFullModel<T, TInput, TOutput>: The updated solution.

Remarks

For Beginners: This method adjusts the current solution to make it better. It's like taking a step in the direction you've determined will lead you downhill.

Table of Contents

Class GradientDescentOptimizer<T, TInput, TOutput>

Type Parameters

Remarks

Constructors

GradientDescentOptimizer(IFullModel<T, TInput, TOutput>, GradientDescentOptimizerOptions<T, TInput, TOutput>?, IEngine?)

Parameters

Remarks

Methods

Deserialize(byte[])

Parameters

Remarks

Exceptions

GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)

Parameters

Returns

Remarks

GetOptions()

Returns

Remarks

Optimize(OptimizationInputData<T, TInput, TOutput>)

Parameters

Returns

Remarks

ReverseUpdate(Vector<T>, Vector<T>)

Parameters

Returns

Remarks

Serialize()

Returns

Remarks

UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)

Parameters

Remarks

Exceptions

UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)

Parameters

UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)

Parameters

Returns

Remarks