Class LARSOptimizer<T, TInput, TOutput>
- Namespace
- AiDotNet.Optimizers
- Assembly
- AiDotNet.dll
Implements the LARS (Layer-wise Adaptive Rate Scaling) optimization algorithm.
public class LARSOptimizer<T, TInput, TOutput> : GradientBasedOptimizerBase<T, TInput, TOutput>, IGradientBasedOptimizer<T, TInput, TOutput>, IOptimizer<T, TInput, TOutput>, IModelSerializer
Type Parameters
TThe numeric type used for calculations (e.g., float, double).
TInputTOutput
- Inheritance
-
OptimizerBase<T, TInput, TOutput>GradientBasedOptimizerBase<T, TInput, TOutput>LARSOptimizer<T, TInput, TOutput>
- Implements
-
IGradientBasedOptimizer<T, TInput, TOutput>IOptimizer<T, TInput, TOutput>
- Inherited Members
- Extension Methods
Examples
// For SimCLR training with large batches
var options = new LARSOptimizerOptions<float, Matrix<float>, Vector<float>>
{
InitialLearningRate = 0.3 * batchSize / 256.0, // Linear scaling rule
Momentum = 0.9,
WeightDecay = 1e-4,
TrustCoefficient = 0.001,
WarmupEpochs = 10
};
var optimizer = new LARSOptimizer<float, Matrix<float>, Vector<float>>(model, options);
Remarks
LARS is specifically designed for training with very large batch sizes (4096-32768). It automatically adapts the learning rate for each layer based on the ratio of parameter norm to gradient norm, which helps maintain stable training at scale.
Key Formula:
local_lr = trust_coeff * ||w|| / (||g|| + weight_decay * ||w|| + epsilon)
update = local_lr * (g + weight_decay * w)
w = w - lr * update (with momentum)
For Beginners: When training with very large batches (common in self-supervised learning like SimCLR), regular optimizers can become unstable because gradients get averaged over more samples, making them smaller. LARS solves this by looking at each layer and asking "how big are the weights compared to the gradients?" and scaling the learning rate accordingly. This allows stable training with batch sizes of 4096 or even larger.
Based on the paper "Large Batch Training of Convolutional Networks" by You et al. (2017).
Constructors
LARSOptimizer(IFullModel<T, TInput, TOutput>?, LARSOptimizerOptions<T, TInput, TOutput>?)
Initializes a new instance of the LARSOptimizer class.
public LARSOptimizer(IFullModel<T, TInput, TOutput>? model, LARSOptimizerOptions<T, TInput, TOutput>? options = null)
Parameters
modelIFullModel<T, TInput, TOutput>The model to optimize.
optionsLARSOptimizerOptions<T, TInput, TOutput>The options for configuring the LARS optimizer.
Remarks
For Beginners: This sets up the LARS optimizer with its initial configuration. The most important parameters for SSL are: - Learning rate: Use linear scaling (base_lr * batch_size / 256) - Trust coefficient: Controls layer-wise scaling (0.001 is typical) - Warmup epochs: Gradually ramp up learning rate (10 epochs typical)
Properties
Momentum
Gets the current momentum coefficient.
public double Momentum { get; }
Property Value
SupportsGpuUpdate
Gets whether this optimizer supports GPU-accelerated parameter updates.
public override bool SupportsGpuUpdate { get; }
Property Value
TrustCoefficient
Gets the LARS trust coefficient.
public double TrustCoefficient { get; }
Property Value
WeightDecay
Gets the current weight decay coefficient.
public double WeightDecay { get; }
Property Value
Methods
Deserialize(byte[])
Deserializes the optimizer's state from a byte array.
public override void Deserialize(byte[] data)
Parameters
databyte[]
DisposeGpuState()
Disposes GPU-allocated optimizer state.
public override void DisposeGpuState()
GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)
Generates a unique key for caching gradients.
protected override string GenerateGradientCacheKey(IFullModel<T, TInput, TOutput> model, TInput X, TOutput y)
Parameters
modelIFullModel<T, TInput, TOutput>XTInputyTOutput
Returns
GetOptions()
Gets the current optimizer options.
public override OptimizationAlgorithmOptions<T, TInput, TOutput> GetOptions()
Returns
- OptimizationAlgorithmOptions<T, TInput, TOutput>
InitializeAdaptiveParameters()
Initializes the adaptive parameters used by the LARS optimizer.
protected override void InitializeAdaptiveParameters()
InitializeGpuState(int, IDirectGpuBackend)
Initializes LARS optimizer state on the GPU.
public override void InitializeGpuState(int parameterCount, IDirectGpuBackend backend)
Parameters
parameterCountintbackendIDirectGpuBackend
Optimize(OptimizationInputData<T, TInput, TOutput>)
Performs the optimization process using the LARS algorithm.
public override OptimizationResult<T, TInput, TOutput> Optimize(OptimizationInputData<T, TInput, TOutput> inputData)
Parameters
inputDataOptimizationInputData<T, TInput, TOutput>The input data for optimization.
Returns
- OptimizationResult<T, TInput, TOutput>
The result of the optimization process.
Reset()
Resets the optimizer's internal state.
public override void Reset()
ReverseUpdate(Vector<T>, Vector<T>)
Reverses a LARS gradient update to recover original parameters.
public override Vector<T> ReverseUpdate(Vector<T> updatedParameters, Vector<T> appliedGradients)
Parameters
updatedParametersVector<T>appliedGradientsVector<T>
Returns
- Vector<T>
Serialize()
Serializes the optimizer's state into a byte array.
public override byte[] Serialize()
Returns
- byte[]
UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)
Updates the optimizer's options.
protected override void UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput> options)
Parameters
optionsOptimizationAlgorithmOptions<T, TInput, TOutput>
UpdateParameters(Matrix<T>, Matrix<T>)
Updates a matrix of parameters using the LARS optimization algorithm.
public override Matrix<T> UpdateParameters(Matrix<T> parameters, Matrix<T> gradient)
Parameters
parametersMatrix<T>gradientMatrix<T>
Returns
- Matrix<T>
UpdateParameters(Vector<T>, Vector<T>)
Updates a vector of parameters using the LARS optimization algorithm.
public override Vector<T> UpdateParameters(Vector<T> parameters, Vector<T> gradient)
Parameters
parametersVector<T>gradientVector<T>
Returns
- Vector<T>
UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)
Updates parameters on the GPU using the LARS kernel.
public override void UpdateParametersGpu(IGpuBuffer parameters, IGpuBuffer gradients, int parameterCount, IDirectGpuBackend backend)
Parameters
parametersIGpuBuffergradientsIGpuBufferparameterCountintbackendIDirectGpuBackend
UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)
Updates the current solution using the LARS update rule.
protected override IFullModel<T, TInput, TOutput> UpdateSolution(IFullModel<T, TInput, TOutput> currentSolution, Vector<T> gradient)
Parameters
currentSolutionIFullModel<T, TInput, TOutput>gradientVector<T>
Returns
- IFullModel<T, TInput, TOutput>