Interface IGradientComputable<T, TInput, TOutput>
- Namespace
- AiDotNet.Interfaces
- Assembly
- AiDotNet.dll
Base interface for models that can compute gradients explicitly without updating parameters.
public interface IGradientComputable<T, TInput, TOutput>
Type Parameters
TThe numeric data type (e.g., float, double).
TInputThe input data type.
TOutputThe output data type.
Remarks
This interface enables models to compute gradients without immediately applying parameter updates. This is essential for: - Distributed Training: Compute local gradients, synchronize across workers, then apply averaged gradients - Meta-Learning: Compute gradients on query sets after adaptation (see ISecondOrderGradientComputable<T, TInput, TOutput>) - Custom Optimization: Manually control when and how to apply gradients - Gradient Analysis: Inspect gradient values for debugging or monitoring
For Beginners: Regular training computes gradients and immediately updates the model in one step. This interface separates those two operations:
- ComputeGradients(TInput, TOutput, ILossFunction<T>?) - Calculate which direction improves the model (WITHOUT changing it)
- ApplyGradients(Vector<T>, T) - Actually update the model using those directions
This separation is crucial when you need to process gradients before applying them, such as averaging gradients across multiple GPUs in distributed training.
Distributed Training Use Case: In Data Parallel training (DDP), each GPU: 1. Computes gradients on its local data batch 2. Communicates gradients with other GPUs to compute the average 3. Applies the averaged gradients to update parameters
Without this interface, step 2 would be impossible because gradients would already be applied in step 1.
Methods
ApplyGradients(Vector<T>, T)
Applies pre-computed gradients to update the model parameters.
void ApplyGradients(Vector<T> gradients, T learningRate)
Parameters
gradientsVector<T>The gradient vector to apply.
learningRateTThe learning rate for the update.
Remarks
Updates parameters using: θ = θ - learningRate * gradients
For Beginners: After computing gradients (seeing which direction to move), this method actually moves the model in that direction. The learning rate controls how big of a step to take.
Distributed Training: In DDP/ZeRO-2, this applies the synchronized (averaged) gradients after communication across workers. Each worker applies the same averaged gradients to keep parameters consistent.
ComputeGradients(TInput, TOutput, ILossFunction<T>?)
Computes gradients of the loss function with respect to model parameters for the given data, WITHOUT updating the model parameters.
Vector<T> ComputeGradients(TInput input, TOutput target, ILossFunction<T>? lossFunction = null)
Parameters
inputTInputThe input data.
targetTOutputThe target/expected output.
lossFunctionILossFunction<T>The loss function to use for gradient computation. If null, uses the model's default loss function.
Returns
- Vector<T>
A vector containing gradients with respect to all model parameters.
Remarks
This method performs a forward pass, computes the loss, and back-propagates to compute gradients, but does NOT update the model's parameters. The parameters remain unchanged after this call.
Distributed Training: In DDP/ZeRO-2, each worker calls this to compute local gradients on its data batch. These gradients are then synchronized (averaged) across workers before applying updates. This ensures all workers compute the same parameter updates despite having different data.
For Meta-Learning: After adapting a model on a support set, you can use this method to compute gradients on the query set. These gradients become the meta-gradients for updating the meta-parameters.
For Beginners: Think of this as "dry run" training: - The model sees what direction it should move (the gradients) - But it doesn't actually move (parameters stay the same) - You get to decide what to do with this information (average with others, inspect, modify, etc.)
Exceptions
- InvalidOperationException
If lossFunction is null and the model has no default loss function.