Class LossScaler<T>
- Namespace
- AiDotNet.MixedPrecision
- Assembly
- AiDotNet.dll
Implements dynamic loss scaling for mixed-precision training to prevent gradient underflow.
public class LossScaler<T>
Type Parameters
T
- Inheritance
-
LossScaler<T>
- Inherited Members
Examples
// Create a loss scaler with defaults
var scaler = new LossScaler<float>(
initialScale: 65536.0,
dynamicScaling: true
);
// In training loop:
float loss = lossFunction.Compute(predictions, targets);
float scaledLoss = scaler.ScaleLoss(loss);
// Backpropagation with scaled loss...
var gradients = model.Backward(scaledLoss);
// Unscale and check for overflow
if (scaler.UnscaleGradientsAndCheck(gradients))
{
// Safe to update parameters
optimizer.Update(parameters, gradients);
}
else
{
// Skip this update due to gradient overflow
Console.WriteLine($"Gradient overflow, scale reduced to {scaler.Scale}");
}
Remarks
For Beginners: Loss scaling is a technique used in mixed-precision training to prevent very small gradient values from becoming zero (underflow) when using 16-bit precision.
The problem:
- FP16 (Half) can only represent numbers in the range [6e-8, 65504]
- During training, gradients are often very small (e.g., 1e-10)
- Small gradients underflow to zero in FP16, stopping learning
The solution:
- Scale the loss by a large factor (e.g., 2^16 = 65536) before backpropagation
- This makes gradients larger, preventing underflow
- Unscale gradients back to their original values before parameter updates
Dynamic scaling:
- Automatically adjusts the scale factor during training
- Increases scale when gradients are stable (no overflow)
- Decreases scale when gradients overflow (become infinity/NaN)
Technical Details: The algorithm follows NVIDIA's approach: 1. Start with a large initial scale (default: 2^16 = 65536) 2. If no overflow for N steps, increase scale by growth factor (default: 2.0) 3. If overflow detected, decrease scale by backoff factor (default: 0.5) and skip update 4. Monitor consecutive successful updates for scale adjustment
Constructors
LossScaler(double, bool, int, double, double, double, double)
Initializes a new instance of the LossScaler class.
public LossScaler(double initialScale = 65536, bool dynamicScaling = true, int growthInterval = 2000, double growthFactor = 2, double backoffFactor = 0.5, double minScale = 1, double maxScale = 16777216)
Parameters
initialScaledoubleInitial loss scale factor (default: 65536 = 2^16).
dynamicScalingboolEnable dynamic scale adjustment (default: true).
growthIntervalintNumber of successful updates before scaling up (default: 2000).
growthFactordoubleFactor to grow scale by (default: 2.0).
backoffFactordoubleFactor to reduce scale by (default: 0.5).
minScaledoubleMinimum scale value (default: 1.0).
maxScaledoubleMaximum scale value (default: 2^24 = 16777216).
Remarks
For Beginners: Default values follow NVIDIA's mixed-precision training recommendations: - Initial scale of 2^16 works well for most models - Growth interval of 2000 prevents oscillation - Growth factor of 2.0 and backoff of 0.5 balance exploration - Min/max bounds prevent extreme scale values
Properties
BackoffFactor
Factor by which to multiply the scale when decreasing (default: 0.5).
public double BackoffFactor { get; set; }
Property Value
DynamicScaling
Whether to use dynamic loss scaling.
public bool DynamicScaling { get; set; }
Property Value
GrowthFactor
Factor by which to multiply the scale when increasing (default: 2.0).
public double GrowthFactor { get; set; }
Property Value
GrowthInterval
Number of consecutive iterations without overflow before increasing scale.
public int GrowthInterval { get; set; }
Property Value
MaxScale
Maximum allowed scale value to prevent excessive growth.
public double MaxScale { get; set; }
Property Value
MinScale
Minimum allowed scale value to prevent excessive reduction.
public double MinScale { get; set; }
Property Value
OverflowRate
Gets the overflow rate (skipped / total).
public double OverflowRate { get; }
Property Value
Scale
Current loss scale factor.
public double Scale { get; }
Property Value
SkippedUpdates
Gets the number of updates skipped due to overflow.
public int SkippedUpdates { get; }
Property Value
TotalUpdates
Gets the total number of updates attempted.
public int TotalUpdates { get; }
Property Value
Methods
DetectOverflow(Tensor<T>)
Checks if any gradient in a tensor has overflowed.
public bool DetectOverflow(Tensor<T> gradients)
Parameters
gradientsTensor<T>The tensor of gradients to check.
Returns
- bool
True if any gradient is NaN or infinity; otherwise, false.
DetectOverflow(Vector<T>)
Checks if any gradient in a vector has overflowed.
public bool DetectOverflow(Vector<T> gradients)
Parameters
gradientsVector<T>The vector of gradients to check.
Returns
- bool
True if any gradient is NaN or infinity; otherwise, false.
HasOverflow(T)
Checks if a single value has overflowed (is NaN or infinity).
public bool HasOverflow(T value)
Parameters
valueTThe value to check.
Returns
- bool
True if the value is NaN or infinity; otherwise, false.
Reset(double?)
Resets the statistics and scale to initial values.
public void Reset(double? newInitialScale = null)
Parameters
newInitialScaledouble?Optional new initial scale value.
ScaleLoss(T)
Scales the loss value to prevent gradient underflow.
public T ScaleLoss(T loss)
Parameters
lossTThe original loss value.
Returns
- T
The scaled loss value.
Remarks
For Beginners: This multiplies your loss by the scale factor. The scaled loss is used for backpropagation, which makes all gradients proportionally larger.
ToString()
Gets a summary of the loss scaler's current state.
public override string ToString()
Returns
- string
A string describing the current state.
UnscaleGradient(T)
Unscales a single gradient value.
public T UnscaleGradient(T gradient)
Parameters
gradientTThe scaled gradient value.
Returns
- T
The unscaled gradient value.
UnscaleGradients(Tensor<T>)
Unscales all gradients in a tensor.
public void UnscaleGradients(Tensor<T> gradients)
Parameters
gradientsTensor<T>The tensor of scaled gradients.
Remarks
For Beginners: This divides all gradient values by the scale factor, returning them to their true magnitudes for parameter updates.
UnscaleGradients(Vector<T>)
Unscales all gradients in a vector.
public void UnscaleGradients(Vector<T> gradients)
Parameters
gradientsVector<T>The vector of scaled gradients.
UnscaleGradientsAndCheck(Tensor<T>)
Unscales gradients and checks for overflow, updating the scale factor if dynamic scaling is enabled.
public bool UnscaleGradientsAndCheck(Tensor<T> gradients)
Parameters
gradientsTensor<T>The tensor of scaled gradients.
Returns
- bool
True if gradients are valid and update can proceed; false if overflow detected and update should be skipped.
Remarks
For Beginners: This is the main method to use in your training loop. It performs three steps: 1. Unscales the gradients (divides by scale factor) 2. Checks if any gradients are NaN or infinity 3. Adjusts the scale factor if dynamic scaling is enabled
If overflow is detected, you should skip the parameter update for this step.
UnscaleGradientsAndCheck(Vector<T>)
Unscales gradients and checks for overflow (vector version).
public bool UnscaleGradientsAndCheck(Vector<T> gradients)
Parameters
gradientsVector<T>The vector of scaled gradients.
Returns
- bool
True if gradients are valid; false if overflow detected.