Class LAMBOptimizer<T, TInput, TOutput>
- Namespace
- AiDotNet.Optimizers
- Assembly
- AiDotNet.dll
Implements the LAMB (Layer-wise Adaptive Moments for Batch training) optimization algorithm.
public class LAMBOptimizer<T, TInput, TOutput> : GradientBasedOptimizerBase<T, TInput, TOutput>, IGradientBasedOptimizer<T, TInput, TOutput>, IOptimizer<T, TInput, TOutput>, IModelSerializer
Type Parameters
TThe numeric type used for calculations (e.g., float, double).
TInputTOutput
- Inheritance
-
OptimizerBase<T, TInput, TOutput>GradientBasedOptimizerBase<T, TInput, TOutput>LAMBOptimizer<T, TInput, TOutput>
- Implements
-
IGradientBasedOptimizer<T, TInput, TOutput>IOptimizer<T, TInput, TOutput>
- Inherited Members
- Extension Methods
Examples
// For BERT-style transformer training
var options = new LAMBOptimizerOptions<float, Matrix<float>, Vector<float>>
{
InitialLearningRate = 0.00176 * Math.Sqrt(batchSize / 256.0),
Beta1 = 0.9,
Beta2 = 0.999,
WeightDecay = 0.01,
WarmupEpochs = 1
};
var optimizer = new LAMBOptimizer<float, Matrix<float>, Vector<float>>(model, options);
Remarks
LAMB combines Adam's adaptive learning rates with LARS's layer-wise scaling, enabling training with extremely large batch sizes (up to 32K) while maintaining accuracy.
Key Formula:
m = beta1 * m + (1 - beta1) * g
v = beta2 * v + (1 - beta2) * g^2
m_hat = m / (1 - beta1^t)
v_hat = v / (1 - beta2^t)
r = m_hat / (sqrt(v_hat) + epsilon) + weight_decay * w
trust_ratio = ||w|| / ||r||
w = w - lr * trust_ratio * r
For Beginners: LAMB is the optimizer of choice for training large language models like BERT with massive batch sizes. It works by:
- Computing Adam-style updates (momentum + adaptive learning rates)
- Adding weight decay to prevent overfitting
- Scaling the update per-layer based on weight/update magnitude ratios
Based on the paper "Large Batch Optimization for Deep Learning: Training BERT in 76 minutes" by You et al. (2019).
Constructors
LAMBOptimizer(IFullModel<T, TInput, TOutput>?, LAMBOptimizerOptions<T, TInput, TOutput>?)
Initializes a new instance of the LAMBOptimizer class.
public LAMBOptimizer(IFullModel<T, TInput, TOutput>? model, LAMBOptimizerOptions<T, TInput, TOutput>? options = null)
Parameters
modelIFullModel<T, TInput, TOutput>The model to optimize.
optionsLAMBOptimizerOptions<T, TInput, TOutput>The options for configuring the LAMB optimizer.
Remarks
For Beginners: This sets up the LAMB optimizer. Key parameters:
- Learning rate: Use sqrt scaling (base_lr * sqrt(batch_size / 256))
- Beta1/Beta2: Keep at 0.9/0.999 for most cases
- Weight decay: 0.01 is typical for transformers
Properties
Beta1
Gets the current beta1 value.
public double Beta1 { get; }
Property Value
Beta2
Gets the current beta2 value.
public double Beta2 { get; }
Property Value
SupportsGpuUpdate
Gets whether this optimizer supports GPU-accelerated parameter updates.
public override bool SupportsGpuUpdate { get; }
Property Value
WeightDecay
Gets the current weight decay coefficient.
public double WeightDecay { get; }
Property Value
Methods
Deserialize(byte[])
Deserializes the optimizer's state from a byte array.
public override void Deserialize(byte[] data)
Parameters
databyte[]
DisposeGpuState()
Disposes GPU-allocated optimizer state.
public override void DisposeGpuState()
GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)
Generates a unique key for caching gradients.
protected override string GenerateGradientCacheKey(IFullModel<T, TInput, TOutput> model, TInput X, TOutput y)
Parameters
modelIFullModel<T, TInput, TOutput>XTInputyTOutput
Returns
GetOptions()
Gets the current optimizer options.
public override OptimizationAlgorithmOptions<T, TInput, TOutput> GetOptions()
Returns
- OptimizationAlgorithmOptions<T, TInput, TOutput>
InitializeAdaptiveParameters()
Initializes the adaptive parameters used by the LAMB optimizer.
protected override void InitializeAdaptiveParameters()
InitializeGpuState(int, IDirectGpuBackend)
Initializes LAMB optimizer state on the GPU.
public override void InitializeGpuState(int parameterCount, IDirectGpuBackend backend)
Parameters
parameterCountintbackendIDirectGpuBackend
Optimize(OptimizationInputData<T, TInput, TOutput>)
Performs the optimization process using the LAMB algorithm.
public override OptimizationResult<T, TInput, TOutput> Optimize(OptimizationInputData<T, TInput, TOutput> inputData)
Parameters
inputDataOptimizationInputData<T, TInput, TOutput>The input data for optimization.
Returns
- OptimizationResult<T, TInput, TOutput>
The result of the optimization process.
Reset()
Resets the optimizer's internal state.
public override void Reset()
ReverseUpdate(Vector<T>, Vector<T>)
Reverses a LAMB gradient update to recover original parameters.
public override Vector<T> ReverseUpdate(Vector<T> updatedParameters, Vector<T> appliedGradients)
Parameters
updatedParametersVector<T>appliedGradientsVector<T>
Returns
- Vector<T>
Serialize()
Serializes the optimizer's state into a byte array.
public override byte[] Serialize()
Returns
- byte[]
UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)
Updates the optimizer's options.
protected override void UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput> options)
Parameters
optionsOptimizationAlgorithmOptions<T, TInput, TOutput>
UpdateParameters(Matrix<T>, Matrix<T>)
Updates a matrix of parameters using the LAMB optimization algorithm.
public override Matrix<T> UpdateParameters(Matrix<T> parameters, Matrix<T> gradient)
Parameters
parametersMatrix<T>gradientMatrix<T>
Returns
- Matrix<T>
UpdateParameters(Vector<T>, Vector<T>)
Updates a vector of parameters using the LAMB optimization algorithm.
public override Vector<T> UpdateParameters(Vector<T> parameters, Vector<T> gradient)
Parameters
parametersVector<T>gradientVector<T>
Returns
- Vector<T>
UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)
Updates parameters on the GPU using the LAMB kernel.
public override void UpdateParametersGpu(IGpuBuffer parameters, IGpuBuffer gradients, int parameterCount, IDirectGpuBackend backend)
Parameters
parametersIGpuBuffergradientsIGpuBufferparameterCountintbackendIDirectGpuBackend
UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)
Updates the current solution using the LAMB update rule.
protected override IFullModel<T, TInput, TOutput> UpdateSolution(IFullModel<T, TInput, TOutput> currentSolution, Vector<T> gradient)
Parameters
currentSolutionIFullModel<T, TInput, TOutput>gradientVector<T>
Returns
- IFullModel<T, TInput, TOutput>