Table of Contents

Class GradientClippingHelper

Namespace
AiDotNet.Helpers
Assembly
AiDotNet.dll

Provides gradient clipping utilities to prevent exploding gradients during training.

public static class GradientClippingHelper
Inheritance
GradientClippingHelper
Inherited Members

Remarks

For Beginners: During neural network training, gradients tell us how to adjust weights. Sometimes gradients become extremely large ("exploding gradients"), which can destabilize training. Gradient clipping limits the magnitude of gradients to keep training stable.

There are two main approaches:

  • Clip by Value: Limits each gradient element to a range (e.g., -1 to 1)
  • Clip by Norm: Scales the entire gradient vector if its norm exceeds a threshold

The "by norm" approach is generally preferred as it preserves gradient direction.

Fields

DefaultMaxNorm

Default maximum gradient norm for clipping.

public const double DefaultMaxNorm = 1

Field Value

double

DefaultMaxValue

Default maximum gradient value for value clipping.

public const double DefaultMaxValue = 1

Field Value

double

Methods

AreGradientsExploding<T>(Vector<T>, double)

Detects if gradients are exploding (have very large values).

public static bool AreGradientsExploding<T>(Vector<T> gradients, double threshold = 1000000)

Parameters

gradients Vector<T>

The gradient vector to check.

threshold double

Threshold for considering gradients as exploding.

Returns

bool

True if gradients appear to be exploding.

Type Parameters

T

The numeric type.

AreGradientsVanishing<T>(Vector<T>, double)

Detects if gradients are vanishing (have very small values).

public static bool AreGradientsVanishing<T>(Vector<T> gradients, double threshold = 1E-07)

Parameters

gradients Vector<T>

The gradient vector to check.

threshold double

Threshold for considering gradients as vanishing.

Returns

bool

True if gradients appear to be vanishing.

Type Parameters

T

The numeric type.

ClipAdaptive<T>(Vector<T>?, Vector<T>?, double)

Applies adaptive gradient clipping based on parameter norm.

public static Vector<T>? ClipAdaptive<T>(Vector<T>? gradients, Vector<T>? parameters, double clipRatio = 0.01)

Parameters

gradients Vector<T>

The gradient vector.

parameters Vector<T>

The corresponding parameter vector.

clipRatio double

Ratio threshold for clipping (e.g., 0.01 means gradient norm should not exceed 1% of parameter norm).

Returns

Vector<T>

Clipped gradients.

Type Parameters

T

The numeric type.

Remarks

For Beginners: Adaptive gradient clipping (AGC) scales the clipping threshold based on the magnitude of the parameters themselves. This is useful because large parameters can tolerate larger gradients without destabilizing, while small parameters need tighter gradient bounds.

This technique was introduced in the NFNet paper and can help train very deep networks without batch normalization.

ClipByGlobalNorm<T>(List<Vector<T>>?, double)

Clips gradients by global norm across multiple gradient vectors.

public static List<Vector<T>>? ClipByGlobalNorm<T>(List<Vector<T>>? gradientsList, double maxNorm = 1)

Parameters

gradientsList List<Vector<T>>

List of gradient vectors to clip together.

maxNorm double

Maximum global L2 norm.

Returns

List<Vector<T>>

A list of clipped gradient vectors.

Type Parameters

T

The numeric type.

Remarks

For Beginners: When training a neural network with multiple layers, each layer has its own gradients. Global norm clipping computes the norm across ALL gradients and scales them all together. This ensures consistent clipping behavior across the entire network.

ClipByNormInPlace<T>(Vector<T>, double)

Clips gradients by their L2 norm in place.

public static bool ClipByNormInPlace<T>(Vector<T> gradients, double maxNorm = 1)

Parameters

gradients Vector<T>

The gradient vector to clip (modified in place).

maxNorm double

Maximum L2 norm for the gradient vector.

Returns

bool

True if clipping was applied, false otherwise.

Type Parameters

T

The numeric type.

ClipByNorm<T>(Tensor<T>?, double)

Clips tensor gradients by their L2 norm.

public static Tensor<T>? ClipByNorm<T>(Tensor<T>? gradients, double maxNorm = 1)

Parameters

gradients Tensor<T>

The gradient tensor to clip.

maxNorm double

Maximum L2 norm.

Returns

Tensor<T>

A new tensor with clipped gradients.

Type Parameters

T

The numeric type.

ClipByNorm<T>(Vector<T>?, double)

Clips gradients by their L2 norm (global norm clipping).

public static Vector<T>? ClipByNorm<T>(Vector<T>? gradients, double maxNorm = 1)

Parameters

gradients Vector<T>

The gradient vector to clip.

maxNorm double

Maximum L2 norm for the gradient vector.

Returns

Vector<T>

A new vector with clipped gradients.

Type Parameters

T

The numeric type.

Remarks

For Beginners: This is the preferred gradient clipping method. Instead of clipping each value independently, we look at the total "length" (norm) of the gradient vector. If it exceeds maxNorm, we scale the entire vector down proportionally.

This preserves the direction of the gradient while limiting its magnitude, which typically leads to better training behavior.

Formula: if ||g|| > maxNorm, then g = g * (maxNorm / ||g||)

ClipByValueInPlace<T>(Vector<T>, double)

Clips gradient values to a specified range [-maxValue, maxValue] in place.

public static void ClipByValueInPlace<T>(Vector<T> gradients, double maxValue = 1)

Parameters

gradients Vector<T>

The gradient vector to clip (modified in place).

maxValue double

Maximum absolute value for any gradient element.

Type Parameters

T

The numeric type.

ClipByValue<T>(Vector<T>?, double)

Clips gradient values to a specified range [-maxValue, maxValue].

public static Vector<T>? ClipByValue<T>(Vector<T>? gradients, double maxValue = 1)

Parameters

gradients Vector<T>

The gradient vector to clip.

maxValue double

Maximum absolute value for any gradient element.

Returns

Vector<T>

A new vector with clipped gradients.

Type Parameters

T

The numeric type.

Remarks

For Beginners: This is the simplest form of gradient clipping. Each gradient value is independently limited to the range [-maxValue, maxValue]. For example, with maxValue=1.0, a gradient of 5.0 becomes 1.0, and -3.0 becomes -1.0.

ComputeGlobalNorm<T>(List<Vector<T>>)

Computes the global L2 norm across multiple gradient vectors.

public static T ComputeGlobalNorm<T>(List<Vector<T>> gradientsList)

Parameters

gradientsList List<Vector<T>>

List of gradient vectors.

Returns

T

The global L2 norm.

Type Parameters

T

The numeric type.

ComputeNorm<T>(Vector<T>)

Computes the L2 norm of a gradient vector.

public static T ComputeNorm<T>(Vector<T> gradients)

Parameters

gradients Vector<T>

The gradient vector.

Returns

T

The L2 norm.

Type Parameters

T

The numeric type.