Table of Contents

Interface IGradientBasedOptimizer<T, TInput, TOutput>

Namespace
AiDotNet.Interfaces
Assembly
AiDotNet.dll
public interface IGradientBasedOptimizer<T, TInput, TOutput> : IOptimizer<T, TInput, TOutput>, IModelSerializer

Type Parameters

T
TInput
TOutput
Inherited Members
Extension Methods

Properties

LastComputedGradients

Gets the gradients computed during the last optimization step.

Vector<T> LastComputedGradients { get; }

Property Value

Vector<T>

Vector of gradients for each parameter. Returns empty vector if no optimization performed yet.

Remarks

This property provides access to the gradients (partial derivatives) calculated during the most recent optimization. Essential for distributed training, gradient clipping, and debugging.

For Beginners: Gradients are "directions" showing how to adjust each parameter to improve the model. This property lets you see those directions after optimization runs.

Industry Standard: PyTorch, TensorFlow, and JAX all expose gradients for features like gradient clipping, true Distributed Data Parallel (DDP), and gradient compression.

SupportsGpuUpdate

Gets whether this optimizer supports GPU-accelerated parameter updates.

bool SupportsGpuUpdate { get; }

Property Value

bool

Remarks

For Beginners: This indicates whether the optimizer can update parameters directly on the GPU without transferring data to the CPU. GPU updates are much faster for large models.

Methods

ApplyGradients(Vector<T>, IFullModel<T, TInput, TOutput>)

Applies pre-computed gradients to a model's parameters.

IFullModel<T, TInput, TOutput> ApplyGradients(Vector<T> gradients, IFullModel<T, TInput, TOutput> model)

Parameters

gradients Vector<T>

Gradients to apply (must match model parameter count)

model IFullModel<T, TInput, TOutput>

Model whose parameters should be updated

Returns

IFullModel<T, TInput, TOutput>

Model with updated parameters

Remarks

Allows applying externally-computed or modified gradients (averaged, compressed, clipped, etc.) to update model parameters. Essential for production distributed training.

For Beginners: This takes pre-calculated "directions" (gradients) and uses them to update the model. Like having a GPS tell you which way to go, this method moves you there.

Production Use Cases: - **True DDP**: Average gradients across GPUs, then apply - **Gradient Compression**: Compress, sync, decompress, then apply - **Federated Learning**: Average gradients from clients before applying - **Gradient Clipping**: Clip gradients to prevent exploding, then apply

Exceptions

ArgumentNullException

If gradients or model is null

ArgumentException

If gradient size doesn't match parameters

ApplyGradients(Vector<T>, Vector<T>, IFullModel<T, TInput, TOutput>)

Applies pre-computed gradients to explicit original parameters (double-step safe).

IFullModel<T, TInput, TOutput> ApplyGradients(Vector<T> originalParameters, Vector<T> gradients, IFullModel<T, TInput, TOutput> model)

Parameters

originalParameters Vector<T>

Pre-update parameters to start from

gradients Vector<T>

Gradients to apply

model IFullModel<T, TInput, TOutput>

Model template (only used for structure, parameters ignored)

Returns

IFullModel<T, TInput, TOutput>

New model with updated parameters

Remarks

⚠️ RECOMMENDED for Distributed Training: This overload accepts originalParameters explicitly, making it impossible to accidentally apply gradients twice. Use this in distributed optimizers where you need explicit control over which parameter state to start from.

Prevents double-stepping bug: - WRONG: ApplyGradients(g_avg, modelWithLocalUpdate) → double step! - RIGHT: ApplyGradients(originalParams, g_avg, modelTemplate) → single step!

Distributed Pattern: 1. Save originalParams before local optimization 2. Run local optimization → get localGradients 3. Synchronize gradients → get avgGradients 4. Call ApplyGradients(originalParams, avgGradients, model) → correct result!

DisposeGpuState()

Disposes GPU-allocated optimizer state.

void DisposeGpuState()

Remarks

For Beginners: Frees GPU memory used by the optimizer's internal state. Call this when you're done training or want to reclaim GPU memory.

InitializeGpuState(int, IDirectGpuBackend)

Initializes optimizer state on the GPU for a given parameter count.

void InitializeGpuState(int parameterCount, IDirectGpuBackend backend)

Parameters

parameterCount int

Number of parameters to initialize state for.

backend IDirectGpuBackend

The GPU backend to use for memory allocation.

Remarks

For Beginners: Many optimizers maintain internal state (like momentum or adaptive learning rates). This method allocates that state on the GPU so that all updates can happen without CPU transfers.

ReverseUpdate(Vector<T>, Vector<T>)

Reverses a gradient update to recover original parameters before the update was applied.

Vector<T> ReverseUpdate(Vector<T> updatedParameters, Vector<T> appliedGradients)

Parameters

updatedParameters Vector<T>

Parameters after gradient application

appliedGradients Vector<T>

The gradients that were applied to produce updated parameters

Returns

Vector<T>

Original parameters before the gradient update

Remarks

This method computes the original parameters given updated parameters and the gradients that were applied. Each optimizer implements this differently based on its update rule.

For Beginners: This is like "undo" for a gradient update. Given where you are now (updated parameters) and the directions you took (gradients), it calculates where you started.

Optimizer-Specific Behavior: - **SGD**: params_old = params_new + learning_rate * gradients - **Adam**: Requires reversing momentum and adaptive learning rate adjustments - **RMSprop**: Requires reversing adaptive learning rate based on gradient history

Production Use Cases: - **Distributed Training**: Reverse local updates before applying synchronized gradients - **Checkpointing**: Recover previous parameter states - **Debugging**: Validate gradient application correctness

Exceptions

ArgumentNullException

If parameters or gradients are null

ArgumentException

If parameter and gradient sizes don't match

UpdateParameters(Matrix<T>, Matrix<T>)

Updates matrix parameters based on their gradients.

Matrix<T> UpdateParameters(Matrix<T> parameters, Matrix<T> gradient)

Parameters

parameters Matrix<T>

The current matrix parameter values.

gradient Matrix<T>

The gradient matrix indicating the direction of steepest increase in error.

Returns

Matrix<T>

The updated matrix parameter values.

Remarks

For Beginners: This method adjusts a grid of numbers (matrix parameters) to make your model better.

The parameters:

  • parameters: The current matrix settings of your model (like weights in a neural network layer)
  • gradient: Information about which direction to change each matrix element to reduce errors

What this method does:

  1. Takes your current matrix parameters
  2. Looks at the gradient matrix to see which direction would reduce errors for each element
  3. Decides how big of a step to take in that direction for each element
  4. Returns a new, improved matrix of parameter values

Think of it like adjusting multiple rows and columns of knobs on a complex control panel:

  • The parameters matrix represents the current positions of all these knobs
  • The gradient matrix tells you which knobs to turn up or down and by how much
  • This method returns the new positions for all the knobs, organized in the same grid

This matrix-specific version avoids the need to flatten matrices into vectors and then reshape them back, making the code more efficient and easier to understand when working with matrix parameters like neural network weights.

UpdateParameters(Vector<T>, Vector<T>)

Updates parameters based on their gradients.

Vector<T> UpdateParameters(Vector<T> parameters, Vector<T> gradient)

Parameters

parameters Vector<T>

The current parameter values.

gradient Vector<T>

The gradient indicating the direction of steepest increase in error.

Returns

Vector<T>

The updated parameter values.

Remarks

For Beginners: This method adjusts a set of numbers (parameters) to make your model better.

The parameters:

  • parameters: The current settings of your model (like weights in a neural network)
  • gradient: Information about which direction to change each parameter to reduce errors

What this method does:

  1. Takes your current model parameters
  2. Looks at the gradient to see which direction would reduce errors
  3. Decides how big of a step to take in that direction
  4. Returns new, improved parameter values

Think of it like adjusting the volume, bass, and treble knobs on a stereo:

  • The parameters are the current knob positions
  • The gradient tells you which knobs to turn up or down
  • This method returns the new positions for all the knobs

Different optimizers (like Adam, SGD, or RMSProp) will make different decisions about how far to turn each knob based on the gradient information.

This method is flexible enough to handle different data structures (e.g., vectors, matrices, or tensors) depending on the type of model and the specific implementation of the optimizer.

UpdateParameters(List<ILayer<T>>)

Updates the parameters of all layers in a model based on their calculated gradients.

void UpdateParameters(List<ILayer<T>> layers)

Parameters

layers List<ILayer<T>>

A list of layers in the model whose parameters need to be updated.

Remarks

For Beginners: This method adjusts the settings (parameters) of each part (layer) of your model to make it better at its task.

What this method does:

  1. Goes through each layer of your model
  2. For each layer that can be trained:
    • Looks at how the layer's current settings are contributing to errors (the gradients)
    • Decides how much to change each setting to reduce errors
    • Updates the layer's settings with these new values

Think of it like tuning a complex machine with many knobs:

  • Each layer is a set of knobs
  • The gradients tell you which way to turn each knob
  • This method goes through and adjusts all the knobs to make the machine work better

This method is crucial in the training process because:

  • It applies the learning from the backward pass to actually improve the model
  • It handles the intricacies of updating different types of layers (e.g., convolutional, recurrent)
  • It ensures that all trainable parts of your model are updated consistently

Different optimizers may implement this method differently, using various strategies to determine the best way to update parameters based on the gradients and potentially other factors like momentum or adaptive learning rates.

UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)

Updates parameters on the GPU using optimizer-specific GPU kernels.

void UpdateParametersGpu(IGpuBuffer parameters, IGpuBuffer gradients, int parameterCount, IDirectGpuBackend backend)

Parameters

parameters IGpuBuffer

GPU buffer containing parameters to update (modified in-place).

gradients IGpuBuffer

GPU buffer containing gradients.

parameterCount int

Number of parameters.

backend IDirectGpuBackend

The GPU backend to use for execution.

Remarks

For Beginners: This method performs the same parameter update as UpdateParameters, but executes directly on the GPU for maximum performance. The parameters and gradients must already be on the GPU.

Production Use Cases: - **Large-scale training**: Avoid CPU-GPU data transfers during training - **GPU-resident training**: Keep all training data on GPU for maximum throughput - **Mixed-precision training**: Combine with FP16 gradients for even faster training