Class AdamWGpuConfig
- Namespace
- AiDotNet.Interfaces
- Assembly
- AiDotNet.dll
Configuration for AdamW optimizer on GPU.
public class AdamWGpuConfig : IGpuOptimizerConfig
- Inheritance
-
AdamWGpuConfig
- Implements
- Inherited Members
Remarks
AdamW is Adam with decoupled weight decay. Instead of adding weight decay to the gradient before the Adam update, it subtracts it directly from the weights after the update.
For Beginners: AdamW fixes a subtle issue with L2 regularization in Adam. The original Adam with weight decay doesn't properly regularize because the adaptive learning rates interfere. AdamW applies weight decay directly to weights, which works better.
Constructors
AdamWGpuConfig(float, float, float, float, float, int)
Creates a new AdamW GPU configuration.
public AdamWGpuConfig(float learningRate, float beta1 = 0.9, float beta2 = 0.999, float epsilon = 1E-08, float weightDecay = 0.01, int step = 0)
Parameters
learningRatefloatLearning rate for parameter updates.
beta1floatExponential decay rate for first moment (default 0.9).
beta2floatExponential decay rate for second moment (default 0.999).
epsilonfloatNumerical stability constant (default 1e-8).
weightDecayfloatWeight decay coefficient (default 0.01).
stepintCurrent optimization step.
Properties
Beta1
Gets the exponential decay rate for the first moment estimates (typically 0.9).
public float Beta1 { get; init; }
Property Value
Beta2
Gets the exponential decay rate for the second moment estimates (typically 0.999).
public float Beta2 { get; init; }
Property Value
Epsilon
Gets the small constant for numerical stability (typically 1e-8).
public float Epsilon { get; init; }
Property Value
LearningRate
Gets the learning rate for parameter updates.
public float LearningRate { get; init; }
Property Value
OptimizerType
Gets the type of optimizer (SGD, Adam, AdamW, etc.).
public GpuOptimizerType OptimizerType { get; }
Property Value
Step
Gets the current optimization step (used for bias correction in Adam-family optimizers).
public int Step { get; init; }
Property Value
WeightDecay
Gets the weight decay (L2 regularization) coefficient.
public float WeightDecay { get; init; }
Property Value
Methods
ApplyUpdate(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, GpuOptimizerState, int)
Applies the optimizer update to the given parameter buffer using its gradient.
public void ApplyUpdate(IDirectGpuBackend backend, IGpuBuffer param, IGpuBuffer gradient, GpuOptimizerState state, int size)
Parameters
backendIDirectGpuBackendThe GPU backend to execute the update.
paramIGpuBufferBuffer containing the parameters to update (modified in-place).
gradientIGpuBufferBuffer containing the gradients.
stateGpuOptimizerStateOptimizer state buffers (momentum, squared gradients, etc.).
sizeintNumber of parameters to update.
Remarks
For Beginners: This method applies the optimizer's update rule directly on the GPU. Each optimizer type (SGD, Adam, etc.) implements its own update logic using GPU kernels. The state parameter contains any auxiliary buffers needed (like velocity for SGD with momentum, or m/v buffers for Adam).
Design Note: Following the Open/Closed Principle, each optimizer config knows how to apply its own update, so adding new optimizers doesn't require modifying layer code.