Class MixedPrecisionConfig
- Namespace
- AiDotNet.MixedPrecision
- Assembly
- AiDotNet.dll
Configuration settings for mixed-precision training.
public class MixedPrecisionConfig
- Inheritance
-
MixedPrecisionConfig
- Inherited Members
Remarks
For Beginners: This class contains all the settings you can adjust for mixed-precision training. The default values work well for most models, but you can customize them based on your specific needs.
Key concepts:
- Loss Scaling: Prevents small gradients from becoming zero in FP16
- Dynamic Scaling: Automatically adjusts the loss scale during training
- Master Weights: FP32 copy of parameters for precise updates
- Working Weights: FP16 copy used for forward/backward passes
Constructors
MixedPrecisionConfig()
Creates a configuration with default recommended settings.
public MixedPrecisionConfig()
Properties
EnableDynamicScaling
Enable dynamic loss scaling (default: true).
public bool EnableDynamicScaling { get; set; }
Property Value
Remarks
For Beginners: Dynamic scaling automatically adjusts the loss scale during training. This is generally recommended as it adapts to your model's gradient magnitudes. Set to false only if you want to manually control the scale.
Fp32BatchNorm
Whether to keep batch normalization layers in FP32 (default: true).
public bool Fp32BatchNorm { get; set; }
Property Value
Remarks
For Beginners: Batch normalization can be numerically unstable in FP16. Keeping it in FP32 improves training stability with minimal performance impact. This is recommended for most models.
Fp32GradientAccumulation
Whether to accumulate gradients in FP32 (default: true).
public bool Fp32GradientAccumulation { get; set; }
Property Value
Remarks
For Beginners: Accumulating gradients in FP32 prevents precision loss when adding many small gradient values. This is essential for mixed-precision training.
Fp32Loss
Whether to keep loss computation in FP32 (default: true).
public bool Fp32Loss { get; set; }
Property Value
Remarks
For Beginners: Computing the loss in FP32 improves numerical accuracy and stability. This is recommended for most models.
InitialLossScale
Initial loss scale factor (default: 65536 = 2^16).
public double InitialLossScale { get; set; }
Property Value
Remarks
For Beginners: This is the starting value for loss scaling. 2^16 = 65536 works well for most models. If you see NaN early in training, try a smaller value like 2^12 = 4096.
MaxLossScale
Maximum allowed loss scale (default: 16777216 = 2^24).
public double MaxLossScale { get; set; }
Property Value
Remarks
For Beginners: The scale will never go above this value. 2^24 is a safe upper bound that prevents excessive scaling.
MinLossScale
Minimum allowed loss scale (default: 1.0).
public double MinLossScale { get; set; }
Property Value
Remarks
For Beginners: The scale will never go below this value. 1.0 means no scaling (equivalent to regular FP32 training).
ScaleBackoffFactor
Factor by which to multiply scale when decreasing (default: 0.5).
public double ScaleBackoffFactor { get; set; }
Property Value
Remarks
For Beginners: When overflow is detected, multiply scale by this factor. 0.5 means halve the scale. Values between 0.25 and 0.5 are typical.
ScaleGrowthFactor
Factor by which to multiply scale when increasing (default: 2.0).
public double ScaleGrowthFactor { get; set; }
Property Value
Remarks
For Beginners: When increasing the scale, multiply it by this factor. 2.0 means double the scale. Values between 1.5 and 2.0 are typical.
ScaleGrowthInterval
Number of consecutive successful updates before increasing scale (default: 2000).
public int ScaleGrowthInterval { get; set; }
Property Value
Remarks
For Beginners: After this many updates without overflow, the scale will increase. Higher values = more conservative (slower to increase scale). Lower values = more aggressive (faster to increase scale, but more likely to overflow).
Methods
ToString()
Gets a summary of the configuration.
public override string ToString()
Returns
- string
A string describing the configuration.