Interface IGradientBasedOptimizer<T, TInput, TOutput>

Namespace: AiDotNet.Interfaces

Assembly: AiDotNet.dll

public interface IGradientBasedOptimizer<T, TInput, TOutput> : IOptimizer<T, TInput, TOutput>, IModelSerializer

Type Parameters

T
TInput
TOutput

Inherited Members: IOptimizer<T, TInput, TOutput>.Optimize(OptimizationInputData<T, TInput, TOutput>)

IOptimizer<T, TInput, TOutput>.ShouldEarlyStop()

IOptimizer<T, TInput, TOutput>.GetOptions()

IOptimizer<T, TInput, TOutput>.Reset()

IModelSerializer.Serialize()

IModelSerializer.Deserialize(byte[])

IModelSerializer.SaveModel(string)

IModelSerializer.LoadModel(string)

Extension Methods: DistributedExtensions.AsDistributed<T, TInput, TOutput>(IOptimizer<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IOptimizer<T, TInput, TOutput>, IShardingConfiguration<T>)

Properties

LastComputedGradients

Gets the gradients computed during the last optimization step.

Vector<T> LastComputedGradients { get; }

Property Value

Vector<T>: Vector of gradients for each parameter. Returns empty vector if no optimization performed yet.

Remarks

This property provides access to the gradients (partial derivatives) calculated during the most recent optimization. Essential for distributed training, gradient clipping, and debugging.

For Beginners: Gradients are "directions" showing how to adjust each parameter to improve the model. This property lets you see those directions after optimization runs.

Industry Standard: PyTorch, TensorFlow, and JAX all expose gradients for features like gradient clipping, true Distributed Data Parallel (DDP), and gradient compression.

SupportsGpuUpdate

Gets whether this optimizer supports GPU-accelerated parameter updates.

bool SupportsGpuUpdate { get; }

Property Value

bool

Remarks

For Beginners: This indicates whether the optimizer can update parameters directly on the GPU without transferring data to the CPU. GPU updates are much faster for large models.

Methods

ApplyGradients(Vector<T>, IFullModel<T, TInput, TOutput>)

Applies pre-computed gradients to a model's parameters.

IFullModel<T, TInput, TOutput> ApplyGradients(Vector<T> gradients, IFullModel<T, TInput, TOutput> model)

Parameters

gradients Vector<T>: Gradients to apply (must match model parameter count)
model IFullModel<T, TInput, TOutput>: Model whose parameters should be updated

Returns

IFullModel<T, TInput, TOutput>: Model with updated parameters

Remarks

Allows applying externally-computed or modified gradients (averaged, compressed, clipped, etc.) to update model parameters. Essential for production distributed training.

For Beginners: This takes pre-calculated "directions" (gradients) and uses them to update the model. Like having a GPS tell you which way to go, this method moves you there.

Production Use Cases: - **True DDP**: Average gradients across GPUs, then apply - **Gradient Compression**: Compress, sync, decompress, then apply - **Federated Learning**: Average gradients from clients before applying - **Gradient Clipping**: Clip gradients to prevent exploding, then apply

Exceptions

ArgumentNullException: If gradients or model is null
ArgumentException: If gradient size doesn't match parameters

ApplyGradients(Vector<T>, Vector<T>, IFullModel<T, TInput, TOutput>)

Applies pre-computed gradients to explicit original parameters (double-step safe).

IFullModel<T, TInput, TOutput> ApplyGradients(Vector<T> originalParameters, Vector<T> gradients, IFullModel<T, TInput, TOutput> model)

Parameters

originalParameters Vector<T>: Pre-update parameters to start from
gradients Vector<T>: Gradients to apply
model IFullModel<T, TInput, TOutput>: Model template (only used for structure, parameters ignored)

Returns

IFullModel<T, TInput, TOutput>: New model with updated parameters

Remarks

⚠️ RECOMMENDED for Distributed Training: This overload accepts originalParameters explicitly, making it impossible to accidentally apply gradients twice. Use this in distributed optimizers where you need explicit control over which parameter state to start from.

Prevents double-stepping bug: - WRONG: ApplyGradients(g_avg, modelWithLocalUpdate) → double step! - RIGHT: ApplyGradients(originalParams, g_avg, modelTemplate) → single step!

Distributed Pattern: 1. Save originalParams before local optimization 2. Run local optimization → get localGradients 3. Synchronize gradients → get avgGradients 4. Call ApplyGradients(originalParams, avgGradients, model) → correct result!

DisposeGpuState()

Disposes GPU-allocated optimizer state.

void DisposeGpuState()

Remarks

For Beginners: Frees GPU memory used by the optimizer's internal state. Call this when you're done training or want to reclaim GPU memory.

InitializeGpuState(int, IDirectGpuBackend)

Initializes optimizer state on the GPU for a given parameter count.

void InitializeGpuState(int parameterCount, IDirectGpuBackend backend)

Parameters

parameterCount int: Number of parameters to initialize state for.
backend IDirectGpuBackend: The GPU backend to use for memory allocation.

Remarks

For Beginners: Many optimizers maintain internal state (like momentum or adaptive learning rates). This method allocates that state on the GPU so that all updates can happen without CPU transfers.

ReverseUpdate(Vector<T>, Vector<T>)

Reverses a gradient update to recover original parameters before the update was applied.

Vector<T> ReverseUpdate(Vector<T> updatedParameters, Vector<T> appliedGradients)

Parameters

updatedParameters Vector<T>: Parameters after gradient application
appliedGradients Vector<T>: The gradients that were applied to produce updated parameters

Returns

Vector<T>: Original parameters before the gradient update

Remarks

This method computes the original parameters given updated parameters and the gradients that were applied. Each optimizer implements this differently based on its update rule.

For Beginners: This is like "undo" for a gradient update. Given where you are now (updated parameters) and the directions you took (gradients), it calculates where you started.

Optimizer-Specific Behavior: - **SGD**: params_old = params_new + learning_rate * gradients - **Adam**: Requires reversing momentum and adaptive learning rate adjustments - **RMSprop**: Requires reversing adaptive learning rate based on gradient history

Production Use Cases: - **Distributed Training**: Reverse local updates before applying synchronized gradients - **Checkpointing**: Recover previous parameter states - **Debugging**: Validate gradient application correctness

Exceptions

ArgumentNullException: If parameters or gradients are null
ArgumentException: If parameter and gradient sizes don't match

UpdateParameters(Matrix<T>, Matrix<T>)

Updates matrix parameters based on their gradients.

Matrix<T> UpdateParameters(Matrix<T> parameters, Matrix<T> gradient)

Parameters

parameters Matrix<T>: The current matrix parameter values.
gradient Matrix<T>: The gradient matrix indicating the direction of steepest increase in error.

Returns

Matrix<T>: The updated matrix parameter values.

Remarks

For Beginners: This method adjusts a grid of numbers (matrix parameters) to make your model better.

The parameters:

parameters: The current matrix settings of your model (like weights in a neural network layer)
gradient: Information about which direction to change each matrix element to reduce errors

What this method does:

Takes your current matrix parameters
Looks at the gradient matrix to see which direction would reduce errors for each element
Decides how big of a step to take in that direction for each element
Returns a new, improved matrix of parameter values

Think of it like adjusting multiple rows and columns of knobs on a complex control panel:

The parameters matrix represents the current positions of all these knobs
The gradient matrix tells you which knobs to turn up or down and by how much
This method returns the new positions for all the knobs, organized in the same grid

This matrix-specific version avoids the need to flatten matrices into vectors and then reshape them back, making the code more efficient and easier to understand when working with matrix parameters like neural network weights.

UpdateParameters(Vector<T>, Vector<T>)

Updates parameters based on their gradients.

Vector<T> UpdateParameters(Vector<T> parameters, Vector<T> gradient)

Parameters

parameters Vector<T>: The current parameter values.
gradient Vector<T>: The gradient indicating the direction of steepest increase in error.

Returns

Vector<T>: The updated parameter values.

Remarks

For Beginners: This method adjusts a set of numbers (parameters) to make your model better.

The parameters:

parameters: The current settings of your model (like weights in a neural network)
gradient: Information about which direction to change each parameter to reduce errors

What this method does:

Takes your current model parameters
Looks at the gradient to see which direction would reduce errors
Decides how big of a step to take in that direction
Returns new, improved parameter values

Think of it like adjusting the volume, bass, and treble knobs on a stereo:

The parameters are the current knob positions
The gradient tells you which knobs to turn up or down
This method returns the new positions for all the knobs

Different optimizers (like Adam, SGD, or RMSProp) will make different decisions about how far to turn each knob based on the gradient information.

This method is flexible enough to handle different data structures (e.g., vectors, matrices, or tensors) depending on the type of model and the specific implementation of the optimizer.

UpdateParameters(List<ILayer<T>>)

Updates the parameters of all layers in a model based on their calculated gradients.

void UpdateParameters(List<ILayer<T>> layers)

Parameters

layers List<ILayer<T>>: A list of layers in the model whose parameters need to be updated.

Remarks

For Beginners: This method adjusts the settings (parameters) of each part (layer) of your model to make it better at its task.

What this method does:

Goes through each layer of your model
For each layer that can be trained:
- Looks at how the layer's current settings are contributing to errors (the gradients)
- Decides how much to change each setting to reduce errors
- Updates the layer's settings with these new values

Think of it like tuning a complex machine with many knobs:

Each layer is a set of knobs
The gradients tell you which way to turn each knob
This method goes through and adjusts all the knobs to make the machine work better

This method is crucial in the training process because:

It applies the learning from the backward pass to actually improve the model
It handles the intricacies of updating different types of layers (e.g., convolutional, recurrent)
It ensures that all trainable parts of your model are updated consistently

Different optimizers may implement this method differently, using various strategies to determine the best way to update parameters based on the gradients and potentially other factors like momentum or adaptive learning rates.

UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)

Updates parameters on the GPU using optimizer-specific GPU kernels.

void UpdateParametersGpu(IGpuBuffer parameters, IGpuBuffer gradients, int parameterCount, IDirectGpuBackend backend)

Parameters

parameters IGpuBuffer: GPU buffer containing parameters to update (modified in-place).
gradients IGpuBuffer: GPU buffer containing gradients.
parameterCount int: Number of parameters.
backend IDirectGpuBackend: The GPU backend to use for execution.

Remarks

For Beginners: This method performs the same parameter update as UpdateParameters, but executes directly on the GPU for maximum performance. The parameters and gradients must already be on the GPU.

Production Use Cases: - **Large-scale training**: Avoid CPU-GPU data transfers during training - **GPU-resident training**: Keep all training data on GPU for maximum throughput - **Mixed-precision training**: Combine with FP16 gradients for even faster training

Table of Contents

Interface IGradientBasedOptimizer<T, TInput, TOutput>

Type Parameters

Properties

LastComputedGradients

Property Value

Remarks

SupportsGpuUpdate

Property Value

Remarks

Methods

ApplyGradients(Vector<T>, IFullModel<T, TInput, TOutput>)

Parameters

Returns

Remarks

Exceptions

ApplyGradients(Vector<T>, Vector<T>, IFullModel<T, TInput, TOutput>)

Parameters

Returns

Remarks

DisposeGpuState()

Remarks

InitializeGpuState(int, IDirectGpuBackend)

Parameters

Remarks

ReverseUpdate(Vector<T>, Vector<T>)

Parameters

Returns

Remarks

Exceptions

UpdateParameters(Matrix<T>, Matrix<T>)

Parameters

Returns

Remarks

UpdateParameters(Vector<T>, Vector<T>)

Parameters

Returns

Remarks

UpdateParameters(List<ILayer<T>>)

Parameters

Remarks

UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)

Parameters

Remarks