Class GradientBasedOptimizerBase<T, TInput, TOutput>
- Namespace
- AiDotNet.Optimizers
- Assembly
- AiDotNet.dll
Represents a base class for gradient-based optimization algorithms.
public abstract class GradientBasedOptimizerBase<T, TInput, TOutput> : OptimizerBase<T, TInput, TOutput>, IGradientBasedOptimizer<T, TInput, TOutput>, IOptimizer<T, TInput, TOutput>, IModelSerializer
Type Parameters
TThe numeric type used for calculations, typically float or double.
TInputTOutput
- Inheritance
-
OptimizerBase<T, TInput, TOutput>GradientBasedOptimizerBase<T, TInput, TOutput>
- Implements
-
IGradientBasedOptimizer<T, TInput, TOutput>IOptimizer<T, TInput, TOutput>
- Derived
- Inherited Members
- Extension Methods
Remarks
Gradient-based optimizers use the gradient of the loss function to update the model parameters in a direction that minimizes the loss. This base class provides common functionality for various gradient-based optimization techniques.
For Beginners: Think of gradient-based optimization like finding the bottom of a valley:
- You start at a random point on a hilly landscape (your initial model parameters)
- You look around to see which way is steepest downhill (calculate the gradient)
- You take a step in that direction (update the parameters)
- You repeat this process until you reach the bottom of the valley (optimize the model)
This approach helps the model learn by gradually adjusting its parameters to minimize errors.
Constructors
GradientBasedOptimizerBase(IFullModel<T, TInput, TOutput>?, GradientBasedOptimizerOptions<T, TInput, TOutput>)
Initializes a new instance of the GradientBasedOptimizerBase class.
protected GradientBasedOptimizerBase(IFullModel<T, TInput, TOutput>? model, GradientBasedOptimizerOptions<T, TInput, TOutput> options)
Parameters
modelIFullModel<T, TInput, TOutput>The model to optimize (can be null if set later).
optionsGradientBasedOptimizerOptions<T, TInput, TOutput>Options for the gradient-based optimizer.
Remarks
For Beginners: This sets up the gradient-based optimizer with its initial settings. It's like preparing for your hike by choosing your starting point, deciding how big your steps will be, and how much you'll consider your previous direction when choosing your next step.
Fields
GradientCache
A cache for storing and retrieving gradients to improve performance.
protected IGradientCache<T> GradientCache
Field Value
GradientOptions
Options specific to gradient-based optimization algorithms.
protected GradientBasedOptimizerOptions<T, TInput, TOutput> GradientOptions
Field Value
- GradientBasedOptimizerOptions<T, TInput, TOutput>
LossFunction
A method used to compare the predicted values vs the actual values.
protected ILossFunction<T> LossFunction
Field Value
Regularization
A method used to regularize the parameters so they don't get out of control.
protected IRegularization<T, TInput, TOutput> Regularization
Field Value
- IRegularization<T, TInput, TOutput>
_currentEpoch
The current epoch number for scheduler tracking.
protected int _currentEpoch
Field Value
_currentStep
The current step (batch) number for scheduler tracking.
protected int _currentStep
Field Value
_gpuState
GPU-resident optimizer state. Derived classes override to store their specific state.
protected IGpuBuffer? _gpuState
Field Value
- IGpuBuffer
_gpuStateInitialized
Whether GPU state has been initialized.
protected bool _gpuStateInitialized
Field Value
_lastComputedGradients
The gradients computed during the last optimization step.
protected Vector<T> _lastComputedGradients
Field Value
- Vector<T>
Remarks
This field stores the gradients calculated in the most recent call to CalculateGradient(). It enables external access to gradients for features like gradient clipping, distributed training (true DDP), debugging, and visualization. Returns Vector<T>.Empty() if no gradients have been computed yet.
_learningRateScheduler
The learning rate scheduler to use for adjusting learning rate during training.
protected ILearningRateScheduler? _learningRateScheduler
Field Value
Remarks
For Beginners: A learning rate scheduler automatically adjusts how fast your model learns during training. Common strategies include starting high and decreasing over time, or using warmup to slowly increase the learning rate at the beginning.
_mixedPrecisionContext
Mixed-precision training context (null if mixed-precision is disabled).
protected MixedPrecisionContext? _mixedPrecisionContext
Field Value
Remarks
For Beginners: Mixed-precision training uses both 16-bit (FP16) and 32-bit (FP32) floating-point numbers during optimization. This context manages the conversion between precisions and handles loss scaling to prevent numerical issues. When enabled, this can provide: - 2-3x faster training on modern GPUs (V100, A100, RTX 3000+) - ~50% memory reduction - Maintained accuracy through careful precision management
_previousGradient
The gradient from the previous optimization step, used for momentum calculations.
protected Vector<T> _previousGradient
Field Value
- Vector<T>
_schedulerStepMode
Specifies when to step the learning rate scheduler.
protected SchedulerStepMode _schedulerStepMode
Field Value
Remarks
Controls whether the scheduler updates after each batch, each epoch, or uses warmup followed by per-epoch stepping.
Properties
CurrentEpoch
Gets the current training epoch.
public int CurrentEpoch { get; }
Property Value
CurrentStep
Gets the current training step (batch count).
public int CurrentStep { get; }
Property Value
IsMixedPrecisionEnabled
Gets whether mixed-precision training is enabled for this optimizer.
public bool IsMixedPrecisionEnabled { get; }
Property Value
LastComputedGradients
Gets the gradients computed during the last optimization step.
public virtual Vector<T> LastComputedGradients { get; }
Property Value
- Vector<T>
Vector of gradients for each parameter. Returns empty vector if no optimization performed yet.
Remarks
This property provides access to the gradients (partial derivatives) calculated during the most recent optimization. Essential for distributed training, gradient clipping, and debugging.
For Beginners: Gradients are "directions" showing how to adjust each parameter to improve the model. This property lets you see those directions after optimization runs.
Industry Standard: PyTorch, TensorFlow, and JAX all expose gradients for features like gradient clipping, true Distributed Data Parallel (DDP), and gradient compression.
LearningRateScheduler
Gets the current learning rate scheduler, if one is configured.
public ILearningRateScheduler? LearningRateScheduler { get; }
Property Value
SchedulerStepMode
Gets the current scheduler step mode.
public SchedulerStepMode SchedulerStepMode { get; }
Property Value
SupportsGpuUpdate
Gets whether this optimizer supports GPU-accelerated parameter updates.
public virtual bool SupportsGpuUpdate { get; }
Property Value
Remarks
For Beginners: Override this in derived classes that have GPU kernel implementations. The base class returns false since it has no specific GPU kernel.
Methods
ApplyGradientClipping(Vector<T>)
Applies gradient clipping based on the configured options.
protected virtual Vector<T> ApplyGradientClipping(Vector<T> gradient)
Parameters
gradientVector<T>The gradient to clip.
Returns
- Vector<T>
The clipped gradient.
Remarks
For Beginners: Gradient clipping prevents training instability by limiting how large gradients can become. This is especially important for deep networks and RNNs where gradients can "explode" (become extremely large) during backpropagation.
ApplyGradients(Vector<T>, IFullModel<T, TInput, TOutput>)
Applies pre-computed gradients to a model's parameters.
public virtual IFullModel<T, TInput, TOutput> ApplyGradients(Vector<T> gradients, IFullModel<T, TInput, TOutput> model)
Parameters
gradientsVector<T>Gradients to apply (must match model parameter count)
modelIFullModel<T, TInput, TOutput>Model whose parameters should be updated
Returns
- IFullModel<T, TInput, TOutput>
Model with updated parameters
Remarks
Allows applying externally-computed or modified gradients (averaged, compressed, clipped, etc.) to update model parameters. Essential for production distributed training.
For Beginners: This takes pre-calculated "directions" (gradients) and uses them to update the model. Like having a GPS tell you which way to go, this method moves you there.
Production Use Cases: - **True DDP**: Average gradients across GPUs, then apply - **Gradient Compression**: Compress, sync, decompress, then apply - **Federated Learning**: Average gradients from clients before applying - **Gradient Clipping**: Clip gradients to prevent exploding, then apply
Exceptions
- ArgumentNullException
If gradients or model is null
- ArgumentException
If gradient size doesn't match parameters
ApplyGradients(Vector<T>, Vector<T>, IFullModel<T, TInput, TOutput>)
Applies pre-computed gradients to explicit original parameters (double-step safe).
public virtual IFullModel<T, TInput, TOutput> ApplyGradients(Vector<T> originalParameters, Vector<T> gradients, IFullModel<T, TInput, TOutput> model)
Parameters
originalParametersVector<T>Pre-update parameters to start from
gradientsVector<T>Gradients to apply
modelIFullModel<T, TInput, TOutput>Model template (only used for structure, parameters ignored)
Returns
- IFullModel<T, TInput, TOutput>
New model with updated parameters
Remarks
⚠️ RECOMMENDED for Distributed Training: This overload accepts originalParameters explicitly, making it impossible to accidentally apply gradients twice. Use this in distributed optimizers where you need explicit control over which parameter state to start from.
Prevents double-stepping bug: - WRONG: ApplyGradients(g_avg, modelWithLocalUpdate) → double step! - RIGHT: ApplyGradients(originalParams, g_avg, modelTemplate) → single step!
Distributed Pattern: 1. Save originalParams before local optimization 2. Run local optimization → get localGradients 3. Synchronize gradients → get avgGradients 4. Call ApplyGradients(originalParams, avgGradients, model) → correct result!
ApplyMomentum(Vector<T>)
Applies momentum to the gradient calculation.
protected virtual Vector<T> ApplyMomentum(Vector<T> gradient)
Parameters
gradientVector<T>The current gradient.
Returns
- Vector<T>
The gradient adjusted for momentum.
Remarks
For Beginners: This method considers the direction you were moving in previously when deciding which way to go next. It's like considering your momentum when hiking - you might keep going in roughly the same direction rather than abruptly changing course.
AreGradientsExploding(double)
Checks if the current gradients are exhibiting exploding gradient behavior.
public bool AreGradientsExploding(double threshold = 1000)
Parameters
thresholddoubleThe threshold above which gradients are considered exploding. Default is 1000.
Returns
- bool
True if gradients are exploding, false otherwise.
Remarks
For Beginners: This method helps detect when training is becoming unstable. If gradients become too large, it usually indicates a problem with the learning rate or model architecture that needs to be addressed.
AreGradientsVanishing(double)
Checks if the current gradients are exhibiting vanishing gradient behavior.
public bool AreGradientsVanishing(double threshold = 1E-07)
Parameters
thresholddoubleThe threshold below which gradients are considered vanishing. Default is 1e-7.
Returns
- bool
True if gradients are vanishing, false otherwise.
Remarks
For Beginners: Vanishing gradients occur when gradients become so small that learning effectively stops. This is common in deep networks and can indicate the need for techniques like residual connections, batch normalization, or different activation functions.
CalculateGradient(IFullModel<T, TInput, TOutput>, TInput, TOutput)
Calculates the gradient for the given model and input data.
protected virtual Vector<T> CalculateGradient(IFullModel<T, TInput, TOutput> solution, TInput X, TOutput y)
Parameters
solutionIFullModel<T, TInput, TOutput>The current solution.
XTInputThe input features.
yTOutputThe target values.
Returns
- Vector<T>
The calculated gradient.
Remarks
For Beginners: This method calculates how steep the hill is and in which direction. It helps determine which way the optimizer should step to improve the model.
CalculateGradient(IFullModel<T, TInput, TOutput>, TInput, TOutput, int[])
Calculates the gradient for a given solution using a batch of training data.
protected virtual Vector<T> CalculateGradient(IFullModel<T, TInput, TOutput> solution, TInput xTrain, TOutput yTrain, int[] batchIndices)
Parameters
solutionIFullModel<T, TInput, TOutput>The current solution (model).
xTrainTInputThe training input data.
yTrainTOutputThe training target data.
batchIndicesint[]The indices to use for the current batch.
Returns
- Vector<T>
A vector representing the gradient of the loss function with respect to the model parameters.
Remarks
For Beginners: The gradient tells us which direction to adjust our model's parameters to improve performance. It's like a compass showing the way to a better solution.
ComputeHessianEfficiently(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)
Computes the Hessian matrix (second derivatives) more efficiently when the model supports explicit gradient computation.
protected virtual Matrix<T> ComputeHessianEfficiently(IFullModel<T, TInput, TOutput> model, OptimizationInputData<T, TInput, TOutput> inputData)
Parameters
modelIFullModel<T, TInput, TOutput>The model to compute Hessian for.
inputDataOptimizationInputData<T, TInput, TOutput>The input data for optimization.
Returns
- Matrix<T>
The Hessian matrix.
Remarks
For Beginners: The Hessian tells us how the gradient changes - it's the "curvature" of the loss landscape. This is crucial for second-order optimization methods like Newton's method.
Production Enhancement: If the model implements IGradientComputable, this method computes the Hessian by taking gradients of the gradient (using finite differences on the gradient function), which is much more efficient than the traditional double finite differences approach. This is O(n) gradient evaluations instead of O(n²) loss evaluations.
Note: For models implementing IGradientComputable with ComputeSecondOrderGradients support, true Hessian-vector products could be computed even more efficiently. This is currently a middle ground that works with any model implementing ComputeGradients.
ComputeHessianFiniteDifferences(IFullModel<T, TInput, TOutput>, OptimizationInputData<T, TInput, TOutput>)
Computes the Hessian matrix using traditional finite differences (fallback method).
protected virtual Matrix<T> ComputeHessianFiniteDifferences(IFullModel<T, TInput, TOutput> model, OptimizationInputData<T, TInput, TOutput> inputData)
Parameters
modelIFullModel<T, TInput, TOutput>inputDataOptimizationInputData<T, TInput, TOutput>
Returns
- Matrix<T>
Remarks
For Beginners: This is the slower but more universally applicable method. It approximates the curvature by testing small changes in parameters.
CreateBatcher(OptimizationInputData<T, TInput, TOutput>, int)
Creates a data batcher for the given optimization input data using configured sampling options.
protected OptimizationDataBatcher<T, TInput, TOutput> CreateBatcher(OptimizationInputData<T, TInput, TOutput> inputData, int batchSize)
Parameters
inputDataOptimizationInputData<T, TInput, TOutput>The optimization input data to batch.
batchSizeintThe batch size for training.
Returns
- OptimizationDataBatcher<T, TInput, TOutput>
An OptimizationDataBatcher configured with the optimizer's sampling options.
Remarks
For Beginners: This method creates a helper that splits your training data into smaller batches for efficient training. The batching behavior is controlled by: - DataSampler (if set): Advanced sampling strategies like weighted/curriculum learning - ShuffleData: Whether to randomize the order each epoch - DropLastBatch: Whether to discard incomplete final batches - RandomSeed: For reproducible randomization
Example usage:
var batcher = CreateBatcher(inputData, batchSize: 32);
foreach (var (xBatch, yBatch, indices) in batcher.GetBatches())
{
var gradient = CalculateGradient(model, xBatch, yBatch);
model = UpdateSolution(model, gradient);
}
CreateBatcher(OptimizationInputData<T, TInput, TOutput>, int, IDataSampler)
Creates a data batcher with a custom sampler, overriding the configured options.
protected OptimizationDataBatcher<T, TInput, TOutput> CreateBatcher(OptimizationInputData<T, TInput, TOutput> inputData, int batchSize, IDataSampler sampler)
Parameters
inputDataOptimizationInputData<T, TInput, TOutput>The optimization input data to batch.
batchSizeintThe batch size for training.
samplerIDataSamplerThe custom sampler to use for advanced sampling strategies.
Returns
- OptimizationDataBatcher<T, TInput, TOutput>
An OptimizationDataBatcher with the custom sampler.
Remarks
For Beginners: Use this when you want to try a different sampling strategy without changing the optimizer's default configuration.
Example:
// Create a curriculum learning sampler
var sampler = Samplers.Curriculum(difficulties, totalEpochs: 100);
var batcher = CreateBatcher(inputData, batchSize: 32, sampler: sampler);
// Use balanced sampling for class imbalance
var sampler = Samplers.Balanced(labels, numClasses: 10);
var batcher = CreateBatcher(inputData, batchSize: 32, sampler: sampler);
CreateRegularization(GradientDescentOptimizerOptions<T, TInput, TOutput>)
Creates a regularization technique based on the provided options.
protected IRegularization<T, TInput, TOutput> CreateRegularization(GradientDescentOptimizerOptions<T, TInput, TOutput> options)
Parameters
optionsGradientDescentOptimizerOptions<T, TInput, TOutput>The options specifying the regularization technique to use.
Returns
- IRegularization<T, TInput, TOutput>
An instance of the specified regularization technique.
Remarks
For Beginners: This method sets up a way to prevent the model from becoming too complex. It's like adding rules to your hiking strategy to avoid taking unnecessarily complicated paths.
DisposeGpuState()
Disposes GPU-allocated optimizer state.
public virtual void DisposeGpuState()
Remarks
For Beginners: The base implementation disposes _gpuState if set. Derived classes with multiple state buffers should override.
GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)
Generates a unique key for caching gradients.
protected virtual string GenerateGradientCacheKey(IFullModel<T, TInput, TOutput> model, TInput X, TOutput y)
Parameters
modelIFullModel<T, TInput, TOutput>The current model.
XTInputThe input features.
yTOutputThe target values.
Returns
- string
A string key for caching the gradient.
Remarks
For Beginners: This method creates a unique identifier for each gradient calculation. It's like labeling each spot on the hill so you can remember what the gradient was there.
GetCurrentLearningRate()
Gets the current learning rate being used by this optimizer.
public double GetCurrentLearningRate()
Returns
- double
The current learning rate.
Remarks
For Beginners: The learning rate controls how big each update step is. This value may change during training if a learning rate scheduler is configured.
GetGradientNorm()
Gets the L2 norm of the last computed gradients.
public T GetGradientNorm()
Returns
- T
The gradient norm, or 0 if no gradients have been computed.
Remarks
For Beginners: The gradient norm is a measure of how "strong" the overall gradient is. Monitoring this value during training can help diagnose issues with exploding or vanishing gradients.
InitializeGpuState(int, IDirectGpuBackend)
Initializes optimizer state on the GPU for a given parameter count.
public virtual void InitializeGpuState(int parameterCount, IDirectGpuBackend backend)
Parameters
parameterCountintNumber of parameters to initialize state for.
backendIDirectGpuBackendThe GPU backend to use for memory allocation.
Remarks
For Beginners: The base implementation does nothing. Derived classes that maintain optimizer state (like momentum or adaptive learning rates) override this.
IsInWarmupPhase()
Determines whether the scheduler is currently in the warmup phase.
protected virtual bool IsInWarmupPhase()
Returns
- bool
True if in warmup phase, false otherwise.
Remarks
Warmup is a technique where the learning rate starts very low and gradually increases to the base learning rate over a specified number of steps. This helps stabilize training in the early phases.
Detection Logic: For LinearWarmupScheduler, this method uses the explicit warmup step count for accurate detection. For other schedulers, warmup detection is not supported and this method returns false. The heuristic of comparing current LR to base LR was removed because it incorrectly identifies decay phases (e.g., cosine annealing) as warmup when the learning rate drops below the base learning rate.
LineSearch(IFullModel<T, TInput, TOutput>, Vector<T>, Vector<T>, OptimizationInputData<T, TInput, TOutput>)
Performs a line search to find an appropriate step size.
protected T LineSearch(IFullModel<T, TInput, TOutput> currentSolution, Vector<T> direction, Vector<T> gradient, OptimizationInputData<T, TInput, TOutput> inputData)
Parameters
currentSolutionIFullModel<T, TInput, TOutput>The current solution.
directionVector<T>The search direction.
gradientVector<T>The current gradient.
inputDataOptimizationInputData<T, TInput, TOutput>The input data for the optimization process.
Returns
- T
The step size to use.
Remarks
For Beginners: This method determines how big of a step to take in the chosen direction. It tries to find a step size that sufficiently decreases the function value while not being too small.
NotifyEpochStart(int)
Notifies the sampler that a new epoch has started (for epoch-aware samplers).
protected void NotifyEpochStart(int currentEpoch)
Parameters
currentEpochintThe current epoch number (0-based).
Remarks
Call this at the beginning of each training epoch when using adaptive samplers like curriculum learning or self-paced learning that adjust their behavior over time.
OnBatchEnd()
Called at the end of each training batch to update scheduler state if applicable.
public virtual void OnBatchEnd()
Remarks
When to call this method: This method must be called after each batch if you are using StepPerBatch, or during the warmup phase when using WarmupThenEpoch. Failure to call this method will prevent the learning rate scheduler from advancing on a per-batch basis.
For Beginners: A batch is a small subset of your training data processed at once. Some schedulers (like warmup or cyclical learning rates) need to update after every batch for smooth, fine-grained control of the learning rate.
OnEpochEnd()
Called at the end of each training epoch to update scheduler state if applicable.
public virtual void OnEpochEnd()
Remarks
When to call this method: This method must be called at the end of each epoch if you are using StepPerEpoch or WarmupThenEpoch. Failure to call this method will prevent the learning rate scheduler from advancing, resulting in a constant learning rate throughout training.
For Beginners: An epoch is one complete pass through all your training data. Many learning rate schedules (like step decay or cosine annealing) work on an epoch basis, reducing the learning rate after each complete pass through the data.
Reset()
Resets the optimizer to its initial state.
public override void Reset()
Remarks
For Beginners: This method clears all the remembered information and starts fresh. It's like wiping your map clean and starting your hike from the beginning.
ReverseUpdate(Vector<T>, Vector<T>)
Reverses a gradient update to recover original parameters.
public virtual Vector<T> ReverseUpdate(Vector<T> updatedParameters, Vector<T> appliedGradients)
Parameters
updatedParametersVector<T>Parameters after gradient application
appliedGradientsVector<T>The gradients that were applied
Returns
- Vector<T>
Estimated original parameters
Remarks
This base implementation uses the vanilla SGD reversal formula: params_old = params_new + learning_rate * gradients
For Adaptive Optimizers (Adam, RMSprop, etc.): This method should be overridden to account for optimizer-specific state. The base implementation is only accurate for vanilla SGD.
For Beginners: This calculates where the parameters were before a gradient update was applied. Think of it like rewinding a step you took.
StepScheduler()
Steps the learning rate scheduler and updates the current learning rate.
public double StepScheduler()
Returns
- double
The new learning rate after stepping.
Remarks
This method advances the scheduler by one step and synchronizes the optimizer's learning rate with the scheduler's current value.
For Beginners: Call this method to update the learning rate according to the scheduler's policy. The scheduler will automatically adjust the learning rate based on how many steps have been taken.
UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)
Updates the options for the gradient-based optimizer.
protected override void UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput> options)
Parameters
optionsOptimizationAlgorithmOptions<T, TInput, TOutput>The new options to apply to the optimizer.
Remarks
For Beginners: This method allows you to change the settings of the optimizer while it's running. It's like adjusting your hiking strategy mid-journey based on the terrain you encounter.
UpdateParameters(Matrix<T>, Matrix<T>)
Updates a matrix of parameters based on the calculated gradient.
public virtual Matrix<T> UpdateParameters(Matrix<T> parameters, Matrix<T> gradient)
Parameters
parametersMatrix<T>The current parameters.
gradientMatrix<T>The calculated gradient.
Returns
- Matrix<T>
The updated parameters.
Remarks
For Beginners: This method adjusts the model's parameters to improve its performance. It's like taking a step in the direction you've determined will lead you downhill.
UpdateParameters(Tensor<T>, Tensor<T>)
Updates a tensor of parameters based on the calculated gradient.
public virtual Tensor<T> UpdateParameters(Tensor<T> parameters, Tensor<T> gradient)
Parameters
parametersTensor<T>The current tensor parameters.
gradientTensor<T>The calculated gradient tensor.
Returns
- Tensor<T>
The updated tensor parameters.
Remarks
For Beginners: This method adjusts the model's parameters stored in tensor format to improve its performance. It's like taking a step in the direction you've determined will lead you downhill, but for more complex multi-dimensional data structures. Tensors are useful for representing parameters in deep neural networks where data has multiple dimensions (like images with width, height, and channels).
UpdateParameters(Vector<T>, Vector<T>)
Updates a vector of parameters based on the calculated gradient.
public virtual Vector<T> UpdateParameters(Vector<T> parameters, Vector<T> gradient)
Parameters
parametersVector<T>The current parameters.
gradientVector<T>The calculated gradient.
Returns
- Vector<T>
The updated parameters.
Remarks
For Beginners: This method is similar to UpdateMatrix, but for when the parameters are in a vector format instead of a matrix. It's another way of taking a step to improve the model.
UpdateParameters(List<ILayer<T>>)
Updates the parameters of the model based on the calculated gradients.
public virtual void UpdateParameters(List<ILayer<T>> layers)
Parameters
Remarks
For Beginners: This method adjusts the model's parameters to improve its performance. It's like taking steps in the direction that will lead to better results, based on what we've learned from the data.
UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)
Updates parameters on the GPU using optimizer-specific GPU kernels.
public virtual void UpdateParametersGpu(IGpuBuffer parameters, IGpuBuffer gradients, int parameterCount, IDirectGpuBackend backend)
Parameters
parametersIGpuBufferGPU buffer containing parameters to update (modified in-place).
gradientsIGpuBufferGPU buffer containing gradients.
parameterCountintNumber of parameters.
backendIDirectGpuBackendThe GPU backend to use for execution.
Remarks
For Beginners: The base implementation throws since there's no generic GPU kernel. Derived classes that support GPU updates override this method.
UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)
Updates the current solution based on the calculated gradient.
protected virtual IFullModel<T, TInput, TOutput> UpdateSolution(IFullModel<T, TInput, TOutput> currentSolution, Vector<T> gradient)
Parameters
currentSolutionIFullModel<T, TInput, TOutput>The current solution being optimized.
gradientVector<T>The calculated gradient.
Returns
- IFullModel<T, TInput, TOutput>
A new solution with updated parameters.
Remarks
For Beginners: This method moves the model's parameters in the direction indicated by the gradient, hopefully improving the model's performance.