Table of Contents

Class LAMBOptimizer<T, TInput, TOutput>

Namespace
AiDotNet.Optimizers
Assembly
AiDotNet.dll

Implements the LAMB (Layer-wise Adaptive Moments for Batch training) optimization algorithm.

public class LAMBOptimizer<T, TInput, TOutput> : GradientBasedOptimizerBase<T, TInput, TOutput>, IGradientBasedOptimizer<T, TInput, TOutput>, IOptimizer<T, TInput, TOutput>, IModelSerializer

Type Parameters

T

The numeric type used for calculations (e.g., float, double).

TInput
TOutput
Inheritance
OptimizerBase<T, TInput, TOutput>
GradientBasedOptimizerBase<T, TInput, TOutput>
LAMBOptimizer<T, TInput, TOutput>
Implements
IGradientBasedOptimizer<T, TInput, TOutput>
IOptimizer<T, TInput, TOutput>
Inherited Members
Extension Methods

Examples

// For BERT-style transformer training
var options = new LAMBOptimizerOptions<float, Matrix<float>, Vector<float>>
{
    InitialLearningRate = 0.00176 * Math.Sqrt(batchSize / 256.0),
    Beta1 = 0.9,
    Beta2 = 0.999,
    WeightDecay = 0.01,
    WarmupEpochs = 1
};
var optimizer = new LAMBOptimizer<float, Matrix<float>, Vector<float>>(model, options);

Remarks

LAMB combines Adam's adaptive learning rates with LARS's layer-wise scaling, enabling training with extremely large batch sizes (up to 32K) while maintaining accuracy.

Key Formula:

m = beta1 * m + (1 - beta1) * g
v = beta2 * v + (1 - beta2) * g^2
m_hat = m / (1 - beta1^t)
v_hat = v / (1 - beta2^t)
r = m_hat / (sqrt(v_hat) + epsilon) + weight_decay * w
trust_ratio = ||w|| / ||r||
w = w - lr * trust_ratio * r

For Beginners: LAMB is the optimizer of choice for training large language models like BERT with massive batch sizes. It works by:

  1. Computing Adam-style updates (momentum + adaptive learning rates)
  2. Adding weight decay to prevent overfitting
  3. Scaling the update per-layer based on weight/update magnitude ratios
This combination allows training to scale linearly with batch size while maintaining the same final accuracy as small-batch training.

Based on the paper "Large Batch Optimization for Deep Learning: Training BERT in 76 minutes" by You et al. (2019).

Constructors

LAMBOptimizer(IFullModel<T, TInput, TOutput>?, LAMBOptimizerOptions<T, TInput, TOutput>?)

Initializes a new instance of the LAMBOptimizer class.

public LAMBOptimizer(IFullModel<T, TInput, TOutput>? model, LAMBOptimizerOptions<T, TInput, TOutput>? options = null)

Parameters

model IFullModel<T, TInput, TOutput>

The model to optimize.

options LAMBOptimizerOptions<T, TInput, TOutput>

The options for configuring the LAMB optimizer.

Remarks

For Beginners: This sets up the LAMB optimizer. Key parameters:

  • Learning rate: Use sqrt scaling (base_lr * sqrt(batch_size / 256))
  • Beta1/Beta2: Keep at 0.9/0.999 for most cases
  • Weight decay: 0.01 is typical for transformers

Properties

Beta1

Gets the current beta1 value.

public double Beta1 { get; }

Property Value

double

Beta2

Gets the current beta2 value.

public double Beta2 { get; }

Property Value

double

SupportsGpuUpdate

Gets whether this optimizer supports GPU-accelerated parameter updates.

public override bool SupportsGpuUpdate { get; }

Property Value

bool

WeightDecay

Gets the current weight decay coefficient.

public double WeightDecay { get; }

Property Value

double

Methods

Deserialize(byte[])

Deserializes the optimizer's state from a byte array.

public override void Deserialize(byte[] data)

Parameters

data byte[]

DisposeGpuState()

Disposes GPU-allocated optimizer state.

public override void DisposeGpuState()

GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)

Generates a unique key for caching gradients.

protected override string GenerateGradientCacheKey(IFullModel<T, TInput, TOutput> model, TInput X, TOutput y)

Parameters

model IFullModel<T, TInput, TOutput>
X TInput
y TOutput

Returns

string

GetOptions()

Gets the current optimizer options.

public override OptimizationAlgorithmOptions<T, TInput, TOutput> GetOptions()

Returns

OptimizationAlgorithmOptions<T, TInput, TOutput>

InitializeAdaptiveParameters()

Initializes the adaptive parameters used by the LAMB optimizer.

protected override void InitializeAdaptiveParameters()

InitializeGpuState(int, IDirectGpuBackend)

Initializes LAMB optimizer state on the GPU.

public override void InitializeGpuState(int parameterCount, IDirectGpuBackend backend)

Parameters

parameterCount int
backend IDirectGpuBackend

Optimize(OptimizationInputData<T, TInput, TOutput>)

Performs the optimization process using the LAMB algorithm.

public override OptimizationResult<T, TInput, TOutput> Optimize(OptimizationInputData<T, TInput, TOutput> inputData)

Parameters

inputData OptimizationInputData<T, TInput, TOutput>

The input data for optimization.

Returns

OptimizationResult<T, TInput, TOutput>

The result of the optimization process.

Reset()

Resets the optimizer's internal state.

public override void Reset()

ReverseUpdate(Vector<T>, Vector<T>)

Reverses a LAMB gradient update to recover original parameters.

public override Vector<T> ReverseUpdate(Vector<T> updatedParameters, Vector<T> appliedGradients)

Parameters

updatedParameters Vector<T>
appliedGradients Vector<T>

Returns

Vector<T>

Serialize()

Serializes the optimizer's state into a byte array.

public override byte[] Serialize()

Returns

byte[]

UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)

Updates the optimizer's options.

protected override void UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput> options)

Parameters

options OptimizationAlgorithmOptions<T, TInput, TOutput>

UpdateParameters(Matrix<T>, Matrix<T>)

Updates a matrix of parameters using the LAMB optimization algorithm.

public override Matrix<T> UpdateParameters(Matrix<T> parameters, Matrix<T> gradient)

Parameters

parameters Matrix<T>
gradient Matrix<T>

Returns

Matrix<T>

UpdateParameters(Vector<T>, Vector<T>)

Updates a vector of parameters using the LAMB optimization algorithm.

public override Vector<T> UpdateParameters(Vector<T> parameters, Vector<T> gradient)

Parameters

parameters Vector<T>
gradient Vector<T>

Returns

Vector<T>

UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)

Updates parameters on the GPU using the LAMB kernel.

public override void UpdateParametersGpu(IGpuBuffer parameters, IGpuBuffer gradients, int parameterCount, IDirectGpuBackend backend)

Parameters

parameters IGpuBuffer
gradients IGpuBuffer
parameterCount int
backend IDirectGpuBackend

UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)

Updates the current solution using the LAMB update rule.

protected override IFullModel<T, TInput, TOutput> UpdateSolution(IFullModel<T, TInput, TOutput> currentSolution, Vector<T> gradient)

Parameters

currentSolution IFullModel<T, TInput, TOutput>
gradient Vector<T>

Returns

IFullModel<T, TInput, TOutput>