Class GradientClippingHelper
Provides gradient clipping utilities to prevent exploding gradients during training.
public static class GradientClippingHelper
- Inheritance
-
GradientClippingHelper
- Inherited Members
Remarks
For Beginners: During neural network training, gradients tell us how to adjust weights. Sometimes gradients become extremely large ("exploding gradients"), which can destabilize training. Gradient clipping limits the magnitude of gradients to keep training stable.
There are two main approaches:
- Clip by Value: Limits each gradient element to a range (e.g., -1 to 1)
- Clip by Norm: Scales the entire gradient vector if its norm exceeds a threshold
The "by norm" approach is generally preferred as it preserves gradient direction.
Fields
DefaultMaxNorm
Default maximum gradient norm for clipping.
public const double DefaultMaxNorm = 1
Field Value
DefaultMaxValue
Default maximum gradient value for value clipping.
public const double DefaultMaxValue = 1
Field Value
Methods
AreGradientsExploding<T>(Vector<T>, double)
Detects if gradients are exploding (have very large values).
public static bool AreGradientsExploding<T>(Vector<T> gradients, double threshold = 1000000)
Parameters
gradientsVector<T>The gradient vector to check.
thresholddoubleThreshold for considering gradients as exploding.
Returns
- bool
True if gradients appear to be exploding.
Type Parameters
TThe numeric type.
AreGradientsVanishing<T>(Vector<T>, double)
Detects if gradients are vanishing (have very small values).
public static bool AreGradientsVanishing<T>(Vector<T> gradients, double threshold = 1E-07)
Parameters
gradientsVector<T>The gradient vector to check.
thresholddoubleThreshold for considering gradients as vanishing.
Returns
- bool
True if gradients appear to be vanishing.
Type Parameters
TThe numeric type.
ClipAdaptive<T>(Vector<T>?, Vector<T>?, double)
Applies adaptive gradient clipping based on parameter norm.
public static Vector<T>? ClipAdaptive<T>(Vector<T>? gradients, Vector<T>? parameters, double clipRatio = 0.01)
Parameters
gradientsVector<T>The gradient vector.
parametersVector<T>The corresponding parameter vector.
clipRatiodoubleRatio threshold for clipping (e.g., 0.01 means gradient norm should not exceed 1% of parameter norm).
Returns
- Vector<T>
Clipped gradients.
Type Parameters
TThe numeric type.
Remarks
For Beginners: Adaptive gradient clipping (AGC) scales the clipping threshold based on the magnitude of the parameters themselves. This is useful because large parameters can tolerate larger gradients without destabilizing, while small parameters need tighter gradient bounds.
This technique was introduced in the NFNet paper and can help train very deep networks without batch normalization.
ClipByGlobalNorm<T>(List<Vector<T>>?, double)
Clips gradients by global norm across multiple gradient vectors.
public static List<Vector<T>>? ClipByGlobalNorm<T>(List<Vector<T>>? gradientsList, double maxNorm = 1)
Parameters
gradientsListList<Vector<T>>List of gradient vectors to clip together.
maxNormdoubleMaximum global L2 norm.
Returns
- List<Vector<T>>
A list of clipped gradient vectors.
Type Parameters
TThe numeric type.
Remarks
For Beginners: When training a neural network with multiple layers, each layer has its own gradients. Global norm clipping computes the norm across ALL gradients and scales them all together. This ensures consistent clipping behavior across the entire network.
ClipByNormInPlace<T>(Vector<T>, double)
Clips gradients by their L2 norm in place.
public static bool ClipByNormInPlace<T>(Vector<T> gradients, double maxNorm = 1)
Parameters
gradientsVector<T>The gradient vector to clip (modified in place).
maxNormdoubleMaximum L2 norm for the gradient vector.
Returns
- bool
True if clipping was applied, false otherwise.
Type Parameters
TThe numeric type.
ClipByNorm<T>(Tensor<T>?, double)
Clips tensor gradients by their L2 norm.
public static Tensor<T>? ClipByNorm<T>(Tensor<T>? gradients, double maxNorm = 1)
Parameters
gradientsTensor<T>The gradient tensor to clip.
maxNormdoubleMaximum L2 norm.
Returns
- Tensor<T>
A new tensor with clipped gradients.
Type Parameters
TThe numeric type.
ClipByNorm<T>(Vector<T>?, double)
Clips gradients by their L2 norm (global norm clipping).
public static Vector<T>? ClipByNorm<T>(Vector<T>? gradients, double maxNorm = 1)
Parameters
gradientsVector<T>The gradient vector to clip.
maxNormdoubleMaximum L2 norm for the gradient vector.
Returns
- Vector<T>
A new vector with clipped gradients.
Type Parameters
TThe numeric type.
Remarks
For Beginners: This is the preferred gradient clipping method. Instead of clipping each value independently, we look at the total "length" (norm) of the gradient vector. If it exceeds maxNorm, we scale the entire vector down proportionally.
This preserves the direction of the gradient while limiting its magnitude, which typically leads to better training behavior.
Formula: if ||g|| > maxNorm, then g = g * (maxNorm / ||g||)
ClipByValueInPlace<T>(Vector<T>, double)
Clips gradient values to a specified range [-maxValue, maxValue] in place.
public static void ClipByValueInPlace<T>(Vector<T> gradients, double maxValue = 1)
Parameters
gradientsVector<T>The gradient vector to clip (modified in place).
maxValuedoubleMaximum absolute value for any gradient element.
Type Parameters
TThe numeric type.
ClipByValue<T>(Vector<T>?, double)
Clips gradient values to a specified range [-maxValue, maxValue].
public static Vector<T>? ClipByValue<T>(Vector<T>? gradients, double maxValue = 1)
Parameters
gradientsVector<T>The gradient vector to clip.
maxValuedoubleMaximum absolute value for any gradient element.
Returns
- Vector<T>
A new vector with clipped gradients.
Type Parameters
TThe numeric type.
Remarks
For Beginners: This is the simplest form of gradient clipping. Each gradient value is independently limited to the range [-maxValue, maxValue]. For example, with maxValue=1.0, a gradient of 5.0 becomes 1.0, and -3.0 becomes -1.0.
ComputeGlobalNorm<T>(List<Vector<T>>)
Computes the global L2 norm across multiple gradient vectors.
public static T ComputeGlobalNorm<T>(List<Vector<T>> gradientsList)
Parameters
gradientsListList<Vector<T>>List of gradient vectors.
Returns
- T
The global L2 norm.
Type Parameters
TThe numeric type.
ComputeNorm<T>(Vector<T>)
Computes the L2 norm of a gradient vector.
public static T ComputeNorm<T>(Vector<T> gradients)
Parameters
gradientsVector<T>The gradient vector.
Returns
- T
The L2 norm.
Type Parameters
TThe numeric type.