Table of Contents

Class LARSOptimizer<T, TInput, TOutput>

Namespace
AiDotNet.Optimizers
Assembly
AiDotNet.dll

Implements the LARS (Layer-wise Adaptive Rate Scaling) optimization algorithm.

public class LARSOptimizer<T, TInput, TOutput> : GradientBasedOptimizerBase<T, TInput, TOutput>, IGradientBasedOptimizer<T, TInput, TOutput>, IOptimizer<T, TInput, TOutput>, IModelSerializer

Type Parameters

T

The numeric type used for calculations (e.g., float, double).

TInput
TOutput
Inheritance
OptimizerBase<T, TInput, TOutput>
GradientBasedOptimizerBase<T, TInput, TOutput>
LARSOptimizer<T, TInput, TOutput>
Implements
IGradientBasedOptimizer<T, TInput, TOutput>
IOptimizer<T, TInput, TOutput>
Inherited Members
Extension Methods

Examples

// For SimCLR training with large batches
var options = new LARSOptimizerOptions<float, Matrix<float>, Vector<float>>
{
    InitialLearningRate = 0.3 * batchSize / 256.0,  // Linear scaling rule
    Momentum = 0.9,
    WeightDecay = 1e-4,
    TrustCoefficient = 0.001,
    WarmupEpochs = 10
};
var optimizer = new LARSOptimizer<float, Matrix<float>, Vector<float>>(model, options);

Remarks

LARS is specifically designed for training with very large batch sizes (4096-32768). It automatically adapts the learning rate for each layer based on the ratio of parameter norm to gradient norm, which helps maintain stable training at scale.

Key Formula:

local_lr = trust_coeff * ||w|| / (||g|| + weight_decay * ||w|| + epsilon)
update = local_lr * (g + weight_decay * w)
w = w - lr * update (with momentum)

For Beginners: When training with very large batches (common in self-supervised learning like SimCLR), regular optimizers can become unstable because gradients get averaged over more samples, making them smaller. LARS solves this by looking at each layer and asking "how big are the weights compared to the gradients?" and scaling the learning rate accordingly. This allows stable training with batch sizes of 4096 or even larger.

Based on the paper "Large Batch Training of Convolutional Networks" by You et al. (2017).

Constructors

LARSOptimizer(IFullModel<T, TInput, TOutput>?, LARSOptimizerOptions<T, TInput, TOutput>?)

Initializes a new instance of the LARSOptimizer class.

public LARSOptimizer(IFullModel<T, TInput, TOutput>? model, LARSOptimizerOptions<T, TInput, TOutput>? options = null)

Parameters

model IFullModel<T, TInput, TOutput>

The model to optimize.

options LARSOptimizerOptions<T, TInput, TOutput>

The options for configuring the LARS optimizer.

Remarks

For Beginners: This sets up the LARS optimizer with its initial configuration. The most important parameters for SSL are: - Learning rate: Use linear scaling (base_lr * batch_size / 256) - Trust coefficient: Controls layer-wise scaling (0.001 is typical) - Warmup epochs: Gradually ramp up learning rate (10 epochs typical)

Properties

Momentum

Gets the current momentum coefficient.

public double Momentum { get; }

Property Value

double

SupportsGpuUpdate

Gets whether this optimizer supports GPU-accelerated parameter updates.

public override bool SupportsGpuUpdate { get; }

Property Value

bool

TrustCoefficient

Gets the LARS trust coefficient.

public double TrustCoefficient { get; }

Property Value

double

WeightDecay

Gets the current weight decay coefficient.

public double WeightDecay { get; }

Property Value

double

Methods

Deserialize(byte[])

Deserializes the optimizer's state from a byte array.

public override void Deserialize(byte[] data)

Parameters

data byte[]

DisposeGpuState()

Disposes GPU-allocated optimizer state.

public override void DisposeGpuState()

GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)

Generates a unique key for caching gradients.

protected override string GenerateGradientCacheKey(IFullModel<T, TInput, TOutput> model, TInput X, TOutput y)

Parameters

model IFullModel<T, TInput, TOutput>
X TInput
y TOutput

Returns

string

GetOptions()

Gets the current optimizer options.

public override OptimizationAlgorithmOptions<T, TInput, TOutput> GetOptions()

Returns

OptimizationAlgorithmOptions<T, TInput, TOutput>

InitializeAdaptiveParameters()

Initializes the adaptive parameters used by the LARS optimizer.

protected override void InitializeAdaptiveParameters()

InitializeGpuState(int, IDirectGpuBackend)

Initializes LARS optimizer state on the GPU.

public override void InitializeGpuState(int parameterCount, IDirectGpuBackend backend)

Parameters

parameterCount int
backend IDirectGpuBackend

Optimize(OptimizationInputData<T, TInput, TOutput>)

Performs the optimization process using the LARS algorithm.

public override OptimizationResult<T, TInput, TOutput> Optimize(OptimizationInputData<T, TInput, TOutput> inputData)

Parameters

inputData OptimizationInputData<T, TInput, TOutput>

The input data for optimization.

Returns

OptimizationResult<T, TInput, TOutput>

The result of the optimization process.

Reset()

Resets the optimizer's internal state.

public override void Reset()

ReverseUpdate(Vector<T>, Vector<T>)

Reverses a LARS gradient update to recover original parameters.

public override Vector<T> ReverseUpdate(Vector<T> updatedParameters, Vector<T> appliedGradients)

Parameters

updatedParameters Vector<T>
appliedGradients Vector<T>

Returns

Vector<T>

Serialize()

Serializes the optimizer's state into a byte array.

public override byte[] Serialize()

Returns

byte[]

UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)

Updates the optimizer's options.

protected override void UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput> options)

Parameters

options OptimizationAlgorithmOptions<T, TInput, TOutput>

UpdateParameters(Matrix<T>, Matrix<T>)

Updates a matrix of parameters using the LARS optimization algorithm.

public override Matrix<T> UpdateParameters(Matrix<T> parameters, Matrix<T> gradient)

Parameters

parameters Matrix<T>
gradient Matrix<T>

Returns

Matrix<T>

UpdateParameters(Vector<T>, Vector<T>)

Updates a vector of parameters using the LARS optimization algorithm.

public override Vector<T> UpdateParameters(Vector<T> parameters, Vector<T> gradient)

Parameters

parameters Vector<T>
gradient Vector<T>

Returns

Vector<T>

UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)

Updates parameters on the GPU using the LARS kernel.

public override void UpdateParametersGpu(IGpuBuffer parameters, IGpuBuffer gradients, int parameterCount, IDirectGpuBackend backend)

Parameters

parameters IGpuBuffer
gradients IGpuBuffer
parameterCount int
backend IDirectGpuBackend

UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)

Updates the current solution using the LARS update rule.

protected override IFullModel<T, TInput, TOutput> UpdateSolution(IFullModel<T, TInput, TOutput> currentSolution, Vector<T> gradient)

Parameters

currentSolution IFullModel<T, TInput, TOutput>
gradient Vector<T>

Returns

IFullModel<T, TInput, TOutput>