Interface IGradientComputable<T, TInput, TOutput>

Namespace: AiDotNet.Interfaces

Assembly: AiDotNet.dll

Base interface for models that can compute gradients explicitly without updating parameters.

public interface IGradientComputable<T, TInput, TOutput>

Type Parameters

T: The numeric data type (e.g., float, double).
TInput: The input data type.
TOutput: The output data type.

Remarks

This interface enables models to compute gradients without immediately applying parameter updates. This is essential for: - Distributed Training: Compute local gradients, synchronize across workers, then apply averaged gradients - Meta-Learning: Compute gradients on query sets after adaptation (see ISecondOrderGradientComputable<T, TInput, TOutput>) - Custom Optimization: Manually control when and how to apply gradients - Gradient Analysis: Inspect gradient values for debugging or monitoring

For Beginners: Regular training computes gradients and immediately updates the model in one step. This interface separates those two operations:

ComputeGradients(TInput, TOutput, ILossFunction<T>?) - Calculate which direction improves the model (WITHOUT changing it)
ApplyGradients(Vector<T>, T) - Actually update the model using those directions

This separation is crucial when you need to process gradients before applying them, such as averaging gradients across multiple GPUs in distributed training.

Distributed Training Use Case: In Data Parallel training (DDP), each GPU: 1. Computes gradients on its local data batch 2. Communicates gradients with other GPUs to compute the average 3. Applies the averaged gradients to update parameters

Without this interface, step 2 would be impossible because gradients would already be applied in step 1.

Methods

ApplyGradients(Vector<T>, T)

Applies pre-computed gradients to update the model parameters.

void ApplyGradients(Vector<T> gradients, T learningRate)

Parameters

gradients Vector<T>: The gradient vector to apply.
learningRate T: The learning rate for the update.

Remarks

Updates parameters using: θ = θ - learningRate * gradients

For Beginners: After computing gradients (seeing which direction to move), this method actually moves the model in that direction. The learning rate controls how big of a step to take.

Distributed Training: In DDP/ZeRO-2, this applies the synchronized (averaged) gradients after communication across workers. Each worker applies the same averaged gradients to keep parameters consistent.

ComputeGradients(TInput, TOutput, ILossFunction<T>?)

Computes gradients of the loss function with respect to model parameters for the given data, WITHOUT updating the model parameters.

Vector<T> ComputeGradients(TInput input, TOutput target, ILossFunction<T>? lossFunction = null)

Parameters

input TInput: The input data.
target TOutput: The target/expected output.
lossFunction ILossFunction<T>: The loss function to use for gradient computation. If null, uses the model's default loss function.

Returns

Vector<T>: A vector containing gradients with respect to all model parameters.

Remarks

This method performs a forward pass, computes the loss, and back-propagates to compute gradients, but does NOT update the model's parameters. The parameters remain unchanged after this call.

Distributed Training: In DDP/ZeRO-2, each worker calls this to compute local gradients on its data batch. These gradients are then synchronized (averaged) across workers before applying updates. This ensures all workers compute the same parameter updates despite having different data.

For Meta-Learning: After adapting a model on a support set, you can use this method to compute gradients on the query set. These gradients become the meta-gradients for updating the meta-parameters.

For Beginners: Think of this as "dry run" training: - The model sees what direction it should move (the gradients) - But it doesn't actually move (parameters stay the same) - You get to decide what to do with this information (average with others, inspect, modify, etc.)

Exceptions

InvalidOperationException: If lossFunction is null and the model has no default loss function.

Table of Contents

Interface IGradientComputable<T, TInput, TOutput>

Type Parameters

Remarks

Methods

ApplyGradients(Vector<T>, T)

Parameters

Remarks

ComputeGradients(TInput, TOutput, ILossFunction<T>?)

Parameters

Returns

Remarks

Exceptions