Table of Contents

Class GradientBasedOptimizerBase<T, TInput, TOutput>

Namespace
AiDotNet.Optimizers
Assembly
AiDotNet.dll

Represents a base class for gradient-based optimization algorithms.

public abstract class GradientBasedOptimizerBase<T, TInput, TOutput> : OptimizerBase<T, TInput, TOutput>, IGradientBasedOptimizer<T, TInput, TOutput>, IOptimizer<T, TInput, TOutput>, IModelSerializer

Type Parameters

T

The numeric type used for calculations, typically float or double.

TInput
TOutput
Inheritance
OptimizerBase<T, TInput, TOutput>
GradientBasedOptimizerBase<T, TInput, TOutput>
Implements
IGradientBasedOptimizer<T, TInput, TOutput>
IOptimizer<T, TInput, TOutput>
Derived
Inherited Members
Extension Methods

Remarks

Gradient-based optimizers use the gradient of the loss function to update the model parameters in a direction that minimizes the loss. This base class provides common functionality for various gradient-based optimization techniques.

For Beginners: Think of gradient-based optimization like finding the bottom of a valley:

  • You start at a random point on a hilly landscape (your initial model parameters)
  • You look around to see which way is steepest downhill (calculate the gradient)
  • You take a step in that direction (update the parameters)
  • You repeat this process until you reach the bottom of the valley (optimize the model)

This approach helps the model learn by gradually adjusting its parameters to minimize errors.

Constructors

GradientBasedOptimizerBase(IFullModel<T, TInput, TOutput>?, GradientBasedOptimizerOptions<T, TInput, TOutput>)

Initializes a new instance of the GradientBasedOptimizerBase class.

protected GradientBasedOptimizerBase(IFullModel<T, TInput, TOutput>? model, GradientBasedOptimizerOptions<T, TInput, TOutput> options)

Parameters

model IFullModel<T, TInput, TOutput>

The model to optimize (can be null if set later).

options GradientBasedOptimizerOptions<T, TInput, TOutput>

Options for the gradient-based optimizer.

Remarks

For Beginners: This sets up the gradient-based optimizer with its initial settings. It's like preparing for your hike by choosing your starting point, deciding how big your steps will be, and how much you'll consider your previous direction when choosing your next step.

Fields

GradientCache

A cache for storing and retrieving gradients to improve performance.

protected IGradientCache<T> GradientCache

Field Value

IGradientCache<T>

GradientOptions

Options specific to gradient-based optimization algorithms.

protected GradientBasedOptimizerOptions<T, TInput, TOutput> GradientOptions

Field Value

GradientBasedOptimizerOptions<T, TInput, TOutput>

LossFunction

A method used to compare the predicted values vs the actual values.

protected ILossFunction<T> LossFunction

Field Value

ILossFunction<T>

Regularization

A method used to regularize the parameters so they don't get out of control.

protected IRegularization<T, TInput, TOutput> Regularization

Field Value

IRegularization<T, TInput, TOutput>

_currentEpoch

The current epoch number for scheduler tracking.

protected int _currentEpoch

Field Value

int

_currentStep

The current step (batch) number for scheduler tracking.

protected int _currentStep

Field Value

int

_gpuState

GPU-resident optimizer state. Derived classes override to store their specific state.

protected IGpuBuffer? _gpuState

Field Value

IGpuBuffer

_gpuStateInitialized

Whether GPU state has been initialized.

protected bool _gpuStateInitialized

Field Value

bool

_lastComputedGradients

The gradients computed during the last optimization step.

protected Vector<T> _lastComputedGradients

Field Value

Vector<T>

Remarks

This field stores the gradients calculated in the most recent call to CalculateGradient(). It enables external access to gradients for features like gradient clipping, distributed training (true DDP), debugging, and visualization. Returns Vector<T>.Empty() if no gradients have been computed yet.

_learningRateScheduler

The learning rate scheduler to use for adjusting learning rate during training.

protected ILearningRateScheduler? _learningRateScheduler

Field Value

ILearningRateScheduler

Remarks

For Beginners: A learning rate scheduler automatically adjusts how fast your model learns during training. Common strategies include starting high and decreasing over time, or using warmup to slowly increase the learning rate at the beginning.

_mixedPrecisionContext

Mixed-precision training context (null if mixed-precision is disabled).

protected MixedPrecisionContext? _mixedPrecisionContext

Field Value

MixedPrecisionContext

Remarks

For Beginners: Mixed-precision training uses both 16-bit (FP16) and 32-bit (FP32) floating-point numbers during optimization. This context manages the conversion between precisions and handles loss scaling to prevent numerical issues. When enabled, this can provide: - 2-3x faster training on modern GPUs (V100, A100, RTX 3000+) - ~50% memory reduction - Maintained accuracy through careful precision management

_previousGradient

The gradient from the previous optimization step, used for momentum calculations.

protected Vector<T> _previousGradient

Field Value

Vector<T>

_schedulerStepMode

Specifies when to step the learning rate scheduler.

protected SchedulerStepMode _schedulerStepMode

Field Value

SchedulerStepMode

Remarks

Controls whether the scheduler updates after each batch, each epoch, or uses warmup followed by per-epoch stepping.

Properties

CurrentEpoch

Gets the current training epoch.

public int CurrentEpoch { get; }

Property Value

int

CurrentStep

Gets the current training step (batch count).

public int CurrentStep { get; }

Property Value

int

IsMixedPrecisionEnabled

Gets whether mixed-precision training is enabled for this optimizer.

public bool IsMixedPrecisionEnabled { get; }

Property Value

bool

LastComputedGradients

Gets the gradients computed during the last optimization step.

public virtual Vector<T> LastComputedGradients { get; }

Property Value

Vector<T>

Vector of gradients for each parameter. Returns empty vector if no optimization performed yet.

Remarks

This property provides access to the gradients (partial derivatives) calculated during the most recent optimization. Essential for distributed training, gradient clipping, and debugging.

For Beginners: Gradients are "directions" showing how to adjust each parameter to improve the model. This property lets you see those directions after optimization runs.

Industry Standard: PyTorch, TensorFlow, and JAX all expose gradients for features like gradient clipping, true Distributed Data Parallel (DDP), and gradient compression.

LearningRateScheduler

Gets the current learning rate scheduler, if one is configured.

public ILearningRateScheduler? LearningRateScheduler { get; }

Property Value

ILearningRateScheduler

SchedulerStepMode

Gets the current scheduler step mode.

public SchedulerStepMode SchedulerStepMode { get; }

Property Value

SchedulerStepMode

SupportsGpuUpdate

Gets whether this optimizer supports GPU-accelerated parameter updates.

public virtual bool SupportsGpuUpdate { get; }

Property Value

bool

Remarks

For Beginners: Override this in derived classes that have GPU kernel implementations. The base class returns false since it has no specific GPU kernel.

Methods

ApplyGradientClipping(Vector<T>)

Applies gradient clipping based on the configured options.

protected virtual Vector<T> ApplyGradientClipping(Vector<T> gradient)

Parameters

gradient Vector<T>

The gradient to clip.

Returns

Vector<T>

The clipped gradient.

Remarks

For Beginners: Gradient clipping prevents training instability by limiting how large gradients can become. This is especially important for deep networks and RNNs where gradients can "explode" (become extremely large) during backpropagation.

ApplyGradients(Vector<T>, IFullModel<T, TInput, TOutput>)

Applies pre-computed gradients to a model's parameters.

public virtual IFullModel<T, TInput, TOutput> ApplyGradients(Vector<T> gradients, IFullModel<T, TInput, TOutput> model)

Parameters

gradients Vector<T>

Gradients to apply (must match model parameter count)

model IFullModel<T, TInput, TOutput>

Model whose parameters should be updated

Returns

IFullModel<T, TInput, TOutput>

Model with updated parameters

Remarks

Allows applying externally-computed or modified gradients (averaged, compressed, clipped, etc.) to update model parameters. Essential for production distributed training.

For Beginners: This takes pre-calculated "directions" (gradients) and uses them to update the model. Like having a GPS tell you which way to go, this method moves you there.

Production Use Cases: - **True DDP**: Average gradients across GPUs, then apply - **Gradient Compression**: Compress, sync, decompress, then apply - **Federated Learning**: Average gradients from clients before applying - **Gradient Clipping**: Clip gradients to prevent exploding, then apply

Exceptions

ArgumentNullException

If gradients or model is null

ArgumentException

If gradient size doesn't match parameters

ApplyGradients(Vector<T>, Vector<T>, IFullModel<T, TInput, TOutput>)

Applies pre-computed gradients to explicit original parameters (double-step safe).

public virtual IFullModel<T, TInput, TOutput> ApplyGradients(Vector<T> originalParameters, Vector<T> gradients, IFullModel<T, TInput, TOutput> model)

Parameters

originalParameters Vector<T>

Pre-update parameters to start from

gradients Vector<T>

Gradients to apply

model IFullModel<T, TInput, TOutput>

Model template (only used for structure, parameters ignored)

Returns

IFullModel<T, TInput, TOutput>

New model with updated parameters

Remarks

⚠️ RECOMMENDED for Distributed Training: This overload accepts originalParameters explicitly, making it impossible to accidentally apply gradients twice. Use this in distributed optimizers where you need explicit control over which parameter state to start from.

Prevents double-stepping bug: - WRONG: ApplyGradients(g_avg, modelWithLocalUpdate) → double step! - RIGHT: ApplyGradients(originalParams, g_avg, modelTemplate) → single step!

Distributed Pattern: 1. Save originalParams before local optimization 2. Run local optimization → get localGradients 3. Synchronize gradients → get avgGradients 4. Call ApplyGradients(originalParams, avgGradients, model) → correct result!

ApplyMomentum(Vector<T>)

Applies momentum to the gradient calculation.

protected virtual Vector<T> ApplyMomentum(Vector<T> gradient)

Parameters

gradient Vector<T>

The current gradient.

Returns

Vector<T>

The gradient adjusted for momentum.

Remarks

For Beginners: This method considers the direction you were moving in previously when deciding which way to go next. It's like considering your momentum when hiking - you might keep going in roughly the same direction rather than abruptly changing course.

AreGradientsExploding(double)

Checks if the current gradients are exhibiting exploding gradient behavior.

public bool AreGradientsExploding(double threshold = 1000)

Parameters

threshold double

The threshold above which gradients are considered exploding. Default is 1000.

Returns

bool

True if gradients are exploding, false otherwise.

Remarks

For Beginners: This method helps detect when training is becoming unstable. If gradients become too large, it usually indicates a problem with the learning rate or model architecture that needs to be addressed.

AreGradientsVanishing(double)

Checks if the current gradients are exhibiting vanishing gradient behavior.

public bool AreGradientsVanishing(double threshold = 1E-07)

Parameters

threshold double

The threshold below which gradients are considered vanishing. Default is 1e-7.

Returns

bool

True if gradients are vanishing, false otherwise.

Remarks

For Beginners: Vanishing gradients occur when gradients become so small that learning effectively stops. This is common in deep networks and can indicate the need for techniques like residual connections, batch normalization, or different activation functions.

CalculateGradient(IFullModel<T, TInput, TOutput>, TInput, TOutput)

Calculates the gradient for the given model and input data.

protected virtual Vector<T> CalculateGradient(IFullModel<T, TInput, TOutput> solution, TInput X, TOutput y)

Parameters

solution IFullModel<T, TInput, TOutput>

The current solution.

X TInput

The input features.

y TOutput

The target values.

Returns

Vector<T>

The calculated gradient.

Remarks

For Beginners: This method calculates how steep the hill is and in which direction. It helps determine which way the optimizer should step to improve the model.

CalculateGradient(IFullModel<T, TInput, TOutput>, TInput, TOutput, int[])

Calculates the gradient for a given solution using a batch of training data.

protected virtual Vector<T> CalculateGradient(IFullModel<T, TInput, TOutput> solution, TInput xTrain, TOutput yTrain, int[] batchIndices)

Parameters

solution IFullModel<T, TInput, TOutput>

The current solution (model).

xTrain TInput

The training input data.

yTrain TOutput

The training target data.

batchIndices int[]

The indices to use for the current batch.

Returns

Vector<T>

A vector representing the gradient of the loss function with respect to the model parameters.

Remarks

For Beginners: The gradient tells us which direction to adjust our model's parameters to improve performance. It's like a compass showing the way to a better solution.

ComputeHessianEfficiently(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

Computes the Hessian matrix (second derivatives) more efficiently when the model supports explicit gradient computation.

protected virtual Matrix<T> ComputeHessianEfficiently(IFullModel<T, TInput, TOutput> model, OptimizationInputData<T, TInput, TOutput> inputData)

Parameters

model IFullModel<T, TInput, TOutput>

The model to compute Hessian for.

inputData OptimizationInputData<T, TInput, TOutput>

The input data for optimization.

Returns

Matrix<T>

The Hessian matrix.

Remarks

For Beginners: The Hessian tells us how the gradient changes - it's the "curvature" of the loss landscape. This is crucial for second-order optimization methods like Newton's method.

Production Enhancement: If the model implements IGradientComputable, this method computes the Hessian by taking gradients of the gradient (using finite differences on the gradient function), which is much more efficient than the traditional double finite differences approach. This is O(n) gradient evaluations instead of O(n²) loss evaluations.

Note: For models implementing IGradientComputable with ComputeSecondOrderGradients support, true Hessian-vector products could be computed even more efficiently. This is currently a middle ground that works with any model implementing ComputeGradients.

ComputeHessianFiniteDifferences(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)

Computes the Hessian matrix using traditional finite differences (fallback method).

protected virtual Matrix<T> ComputeHessianFiniteDifferences(IFullModel<T, TInput, TOutput> model, OptimizationInputData<T, TInput, TOutput> inputData)

Parameters

model IFullModel<T, TInput, TOutput>
inputData OptimizationInputData<T, TInput, TOutput>

Returns

Matrix<T>

Remarks

For Beginners: This is the slower but more universally applicable method. It approximates the curvature by testing small changes in parameters.

CreateBatcher(OptimizationInputData<T, TInput, TOutput>, int)

Creates a data batcher for the given optimization input data using configured sampling options.

protected OptimizationDataBatcher<T, TInput, TOutput> CreateBatcher(OptimizationInputData<T, TInput, TOutput> inputData, int batchSize)

Parameters

inputData OptimizationInputData<T, TInput, TOutput>

The optimization input data to batch.

batchSize int

The batch size for training.

Returns

OptimizationDataBatcher<T, TInput, TOutput>

An OptimizationDataBatcher configured with the optimizer's sampling options.

Remarks

For Beginners: This method creates a helper that splits your training data into smaller batches for efficient training. The batching behavior is controlled by: - DataSampler (if set): Advanced sampling strategies like weighted/curriculum learning - ShuffleData: Whether to randomize the order each epoch - DropLastBatch: Whether to discard incomplete final batches - RandomSeed: For reproducible randomization

Example usage:

var batcher = CreateBatcher(inputData, batchSize: 32);
foreach (var (xBatch, yBatch, indices) in batcher.GetBatches())
{
    var gradient = CalculateGradient(model, xBatch, yBatch);
    model = UpdateSolution(model, gradient);
}

CreateBatcher(OptimizationInputData<T, TInput, TOutput>, int, IDataSampler)

Creates a data batcher with a custom sampler, overriding the configured options.

protected OptimizationDataBatcher<T, TInput, TOutput> CreateBatcher(OptimizationInputData<T, TInput, TOutput> inputData, int batchSize, IDataSampler sampler)

Parameters

inputData OptimizationInputData<T, TInput, TOutput>

The optimization input data to batch.

batchSize int

The batch size for training.

sampler IDataSampler

The custom sampler to use for advanced sampling strategies.

Returns

OptimizationDataBatcher<T, TInput, TOutput>

An OptimizationDataBatcher with the custom sampler.

Remarks

For Beginners: Use this when you want to try a different sampling strategy without changing the optimizer's default configuration.

Example:

// Create a curriculum learning sampler
var sampler = Samplers.Curriculum(difficulties, totalEpochs: 100);
var batcher = CreateBatcher(inputData, batchSize: 32, sampler: sampler);

// Use balanced sampling for class imbalance
var sampler = Samplers.Balanced(labels, numClasses: 10);
var batcher = CreateBatcher(inputData, batchSize: 32, sampler: sampler);

CreateRegularization(GradientDescentOptimizerOptions<T, TInput, TOutput>)

Creates a regularization technique based on the provided options.

protected IRegularization<T, TInput, TOutput> CreateRegularization(GradientDescentOptimizerOptions<T, TInput, TOutput> options)

Parameters

options GradientDescentOptimizerOptions<T, TInput, TOutput>

The options specifying the regularization technique to use.

Returns

IRegularization<T, TInput, TOutput>

An instance of the specified regularization technique.

Remarks

For Beginners: This method sets up a way to prevent the model from becoming too complex. It's like adding rules to your hiking strategy to avoid taking unnecessarily complicated paths.

DisposeGpuState()

Disposes GPU-allocated optimizer state.

public virtual void DisposeGpuState()

Remarks

For Beginners: The base implementation disposes _gpuState if set. Derived classes with multiple state buffers should override.

GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)

Generates a unique key for caching gradients.

protected virtual string GenerateGradientCacheKey(IFullModel<T, TInput, TOutput> model, TInput X, TOutput y)

Parameters

model IFullModel<T, TInput, TOutput>

The current model.

X TInput

The input features.

y TOutput

The target values.

Returns

string

A string key for caching the gradient.

Remarks

For Beginners: This method creates a unique identifier for each gradient calculation. It's like labeling each spot on the hill so you can remember what the gradient was there.

GetCurrentLearningRate()

Gets the current learning rate being used by this optimizer.

public double GetCurrentLearningRate()

Returns

double

The current learning rate.

Remarks

For Beginners: The learning rate controls how big each update step is. This value may change during training if a learning rate scheduler is configured.

GetGradientNorm()

Gets the L2 norm of the last computed gradients.

public T GetGradientNorm()

Returns

T

The gradient norm, or 0 if no gradients have been computed.

Remarks

For Beginners: The gradient norm is a measure of how "strong" the overall gradient is. Monitoring this value during training can help diagnose issues with exploding or vanishing gradients.

InitializeGpuState(int, IDirectGpuBackend)

Initializes optimizer state on the GPU for a given parameter count.

public virtual void InitializeGpuState(int parameterCount, IDirectGpuBackend backend)

Parameters

parameterCount int

Number of parameters to initialize state for.

backend IDirectGpuBackend

The GPU backend to use for memory allocation.

Remarks

For Beginners: The base implementation does nothing. Derived classes that maintain optimizer state (like momentum or adaptive learning rates) override this.

IsInWarmupPhase()

Determines whether the scheduler is currently in the warmup phase.

protected virtual bool IsInWarmupPhase()

Returns

bool

True if in warmup phase, false otherwise.

Remarks

Warmup is a technique where the learning rate starts very low and gradually increases to the base learning rate over a specified number of steps. This helps stabilize training in the early phases.

Detection Logic: For LinearWarmupScheduler, this method uses the explicit warmup step count for accurate detection. For other schedulers, warmup detection is not supported and this method returns false. The heuristic of comparing current LR to base LR was removed because it incorrectly identifies decay phases (e.g., cosine annealing) as warmup when the learning rate drops below the base learning rate.

LineSearch(IFullModel<T, TInput, TOutput>, Vector<T>, Vector<T>, OptimizationInputData<T, TInput, TOutput>)

Performs a line search to find an appropriate step size.

protected T LineSearch(IFullModel<T, TInput, TOutput> currentSolution, Vector<T> direction, Vector<T> gradient, OptimizationInputData<T, TInput, TOutput> inputData)

Parameters

currentSolution IFullModel<T, TInput, TOutput>

The current solution.

direction Vector<T>

The search direction.

gradient Vector<T>

The current gradient.

inputData OptimizationInputData<T, TInput, TOutput>

The input data for the optimization process.

Returns

T

The step size to use.

Remarks

For Beginners: This method determines how big of a step to take in the chosen direction. It tries to find a step size that sufficiently decreases the function value while not being too small.

NotifyEpochStart(int)

Notifies the sampler that a new epoch has started (for epoch-aware samplers).

protected void NotifyEpochStart(int currentEpoch)

Parameters

currentEpoch int

The current epoch number (0-based).

Remarks

Call this at the beginning of each training epoch when using adaptive samplers like curriculum learning or self-paced learning that adjust their behavior over time.

OnBatchEnd()

Called at the end of each training batch to update scheduler state if applicable.

public virtual void OnBatchEnd()

Remarks

When to call this method: This method must be called after each batch if you are using StepPerBatch, or during the warmup phase when using WarmupThenEpoch. Failure to call this method will prevent the learning rate scheduler from advancing on a per-batch basis.

For Beginners: A batch is a small subset of your training data processed at once. Some schedulers (like warmup or cyclical learning rates) need to update after every batch for smooth, fine-grained control of the learning rate.

OnEpochEnd()

Called at the end of each training epoch to update scheduler state if applicable.

public virtual void OnEpochEnd()

Remarks

When to call this method: This method must be called at the end of each epoch if you are using StepPerEpoch or WarmupThenEpoch. Failure to call this method will prevent the learning rate scheduler from advancing, resulting in a constant learning rate throughout training.

For Beginners: An epoch is one complete pass through all your training data. Many learning rate schedules (like step decay or cosine annealing) work on an epoch basis, reducing the learning rate after each complete pass through the data.

Reset()

Resets the optimizer to its initial state.

public override void Reset()

Remarks

For Beginners: This method clears all the remembered information and starts fresh. It's like wiping your map clean and starting your hike from the beginning.

ReverseUpdate(Vector<T>, Vector<T>)

Reverses a gradient update to recover original parameters.

public virtual Vector<T> ReverseUpdate(Vector<T> updatedParameters, Vector<T> appliedGradients)

Parameters

updatedParameters Vector<T>

Parameters after gradient application

appliedGradients Vector<T>

The gradients that were applied

Returns

Vector<T>

Estimated original parameters

Remarks

This base implementation uses the vanilla SGD reversal formula: params_old = params_new + learning_rate * gradients

For Adaptive Optimizers (Adam, RMSprop, etc.): This method should be overridden to account for optimizer-specific state. The base implementation is only accurate for vanilla SGD.

For Beginners: This calculates where the parameters were before a gradient update was applied. Think of it like rewinding a step you took.

StepScheduler()

Steps the learning rate scheduler and updates the current learning rate.

public double StepScheduler()

Returns

double

The new learning rate after stepping.

Remarks

This method advances the scheduler by one step and synchronizes the optimizer's learning rate with the scheduler's current value.

For Beginners: Call this method to update the learning rate according to the scheduler's policy. The scheduler will automatically adjust the learning rate based on how many steps have been taken.

UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)

Updates the options for the gradient-based optimizer.

protected override void UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput> options)

Parameters

options OptimizationAlgorithmOptions<T, TInput, TOutput>

The new options to apply to the optimizer.

Remarks

For Beginners: This method allows you to change the settings of the optimizer while it's running. It's like adjusting your hiking strategy mid-journey based on the terrain you encounter.

UpdateParameters(Matrix<T>, Matrix<T>)

Updates a matrix of parameters based on the calculated gradient.

public virtual Matrix<T> UpdateParameters(Matrix<T> parameters, Matrix<T> gradient)

Parameters

parameters Matrix<T>

The current parameters.

gradient Matrix<T>

The calculated gradient.

Returns

Matrix<T>

The updated parameters.

Remarks

For Beginners: This method adjusts the model's parameters to improve its performance. It's like taking a step in the direction you've determined will lead you downhill.

UpdateParameters(Tensor<T>, Tensor<T>)

Updates a tensor of parameters based on the calculated gradient.

public virtual Tensor<T> UpdateParameters(Tensor<T> parameters, Tensor<T> gradient)

Parameters

parameters Tensor<T>

The current tensor parameters.

gradient Tensor<T>

The calculated gradient tensor.

Returns

Tensor<T>

The updated tensor parameters.

Remarks

For Beginners: This method adjusts the model's parameters stored in tensor format to improve its performance. It's like taking a step in the direction you've determined will lead you downhill, but for more complex multi-dimensional data structures. Tensors are useful for representing parameters in deep neural networks where data has multiple dimensions (like images with width, height, and channels).

UpdateParameters(Vector<T>, Vector<T>)

Updates a vector of parameters based on the calculated gradient.

public virtual Vector<T> UpdateParameters(Vector<T> parameters, Vector<T> gradient)

Parameters

parameters Vector<T>

The current parameters.

gradient Vector<T>

The calculated gradient.

Returns

Vector<T>

The updated parameters.

Remarks

For Beginners: This method is similar to UpdateMatrix, but for when the parameters are in a vector format instead of a matrix. It's another way of taking a step to improve the model.

UpdateParameters(List<ILayer<T>>)

Updates the parameters of the model based on the calculated gradients.

public virtual void UpdateParameters(List<ILayer<T>> layers)

Parameters

layers List<ILayer<T>>

The layers of the neural network containing the parameters to update.

Remarks

For Beginners: This method adjusts the model's parameters to improve its performance. It's like taking steps in the direction that will lead to better results, based on what we've learned from the data.

UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)

Updates parameters on the GPU using optimizer-specific GPU kernels.

public virtual void UpdateParametersGpu(IGpuBuffer parameters, IGpuBuffer gradients, int parameterCount, IDirectGpuBackend backend)

Parameters

parameters IGpuBuffer

GPU buffer containing parameters to update (modified in-place).

gradients IGpuBuffer

GPU buffer containing gradients.

parameterCount int

Number of parameters.

backend IDirectGpuBackend

The GPU backend to use for execution.

Remarks

For Beginners: The base implementation throws since there's no generic GPU kernel. Derived classes that support GPU updates override this method.

UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)

Updates the current solution based on the calculated gradient.

protected virtual IFullModel<T, TInput, TOutput> UpdateSolution(IFullModel<T, TInput, TOutput> currentSolution, Vector<T> gradient)

Parameters

currentSolution IFullModel<T, TInput, TOutput>

The current solution being optimized.

gradient Vector<T>

The calculated gradient.

Returns

IFullModel<T, TInput, TOutput>

A new solution with updated parameters.

Remarks

For Beginners: This method moves the model's parameters in the direction indicated by the gradient, hopefully improving the model's performance.