Class MixedPrecisionConfig

Namespace: AiDotNet.MixedPrecision

Assembly: AiDotNet.dll

Configuration settings for mixed-precision training.

public class MixedPrecisionConfig

Inheritance: object

MixedPrecisionConfig

Inherited Members: object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

Remarks

For Beginners: This class contains all the settings you can adjust for mixed-precision training. The default values work well for most models, but you can customize them based on your specific needs.

Key concepts:

Loss Scaling: Prevents small gradients from becoming zero in FP16
Dynamic Scaling: Automatically adjusts the loss scale during training
Master Weights: FP32 copy of parameters for precise updates
Working Weights: FP16 copy used for forward/backward passes

Constructors

MixedPrecisionConfig()

Creates a configuration with default recommended settings.

public MixedPrecisionConfig()

Properties

EnableDynamicScaling

Enable dynamic loss scaling (default: true).

public bool EnableDynamicScaling { get; set; }

Property Value

bool

Remarks

For Beginners: Dynamic scaling automatically adjusts the loss scale during training. This is generally recommended as it adapts to your model's gradient magnitudes. Set to false only if you want to manually control the scale.

Fp32BatchNorm

Whether to keep batch normalization layers in FP32 (default: true).

public bool Fp32BatchNorm { get; set; }

Property Value

bool

Remarks

For Beginners: Batch normalization can be numerically unstable in FP16. Keeping it in FP32 improves training stability with minimal performance impact. This is recommended for most models.

Fp32GradientAccumulation

Whether to accumulate gradients in FP32 (default: true).

public bool Fp32GradientAccumulation { get; set; }

Property Value

bool

Remarks

For Beginners: Accumulating gradients in FP32 prevents precision loss when adding many small gradient values. This is essential for mixed-precision training.

Fp32Loss

Whether to keep loss computation in FP32 (default: true).

public bool Fp32Loss { get; set; }

Property Value

bool

Remarks

For Beginners: Computing the loss in FP32 improves numerical accuracy and stability. This is recommended for most models.

InitialLossScale

Initial loss scale factor (default: 65536 = 2^16).

public double InitialLossScale { get; set; }

Property Value

double

Remarks

For Beginners: This is the starting value for loss scaling. 2^16 = 65536 works well for most models. If you see NaN early in training, try a smaller value like 2^12 = 4096.

MaxLossScale

Maximum allowed loss scale (default: 16777216 = 2^24).

public double MaxLossScale { get; set; }

Property Value

double

Remarks

For Beginners: The scale will never go above this value. 2^24 is a safe upper bound that prevents excessive scaling.

MinLossScale

Minimum allowed loss scale (default: 1.0).

public double MinLossScale { get; set; }

Property Value

double

Remarks

For Beginners: The scale will never go below this value. 1.0 means no scaling (equivalent to regular FP32 training).

ScaleBackoffFactor

Factor by which to multiply scale when decreasing (default: 0.5).

public double ScaleBackoffFactor { get; set; }

Property Value

double

Remarks

For Beginners: When overflow is detected, multiply scale by this factor. 0.5 means halve the scale. Values between 0.25 and 0.5 are typical.

ScaleGrowthFactor

Factor by which to multiply scale when increasing (default: 2.0).

public double ScaleGrowthFactor { get; set; }

Property Value

double

Remarks

For Beginners: When increasing the scale, multiply it by this factor. 2.0 means double the scale. Values between 1.5 and 2.0 are typical.

ScaleGrowthInterval

Number of consecutive successful updates before increasing scale (default: 2000).

public int ScaleGrowthInterval { get; set; }

Property Value

int

Remarks

For Beginners: After this many updates without overflow, the scale will increase. Higher values = more conservative (slower to increase scale). Lower values = more aggressive (faster to increase scale, but more likely to overflow).

Methods

ToString()

Gets a summary of the configuration.

public override string ToString()

Returns

string: A string describing the configuration.

Table of Contents

Class MixedPrecisionConfig

Remarks

Constructors

MixedPrecisionConfig()

Properties

EnableDynamicScaling

Property Value

Remarks

Fp32BatchNorm

Property Value

Remarks

Fp32GradientAccumulation

Property Value

Remarks

Fp32Loss

Property Value

Remarks

InitialLossScale

Property Value

Remarks

MaxLossScale

Property Value

Remarks

MinLossScale

Property Value

Remarks

ScaleBackoffFactor

Property Value

Remarks

ScaleGrowthFactor

Property Value

Remarks

ScaleGrowthInterval

Property Value

Remarks

Methods

ToString()

Returns