Class LionOptimizer<T, TInput, TOutput>
- Namespace
- AiDotNet.Optimizers
- Assembly
- AiDotNet.dll
Implements the Lion (Evolved Sign Momentum) optimization algorithm for gradient-based optimization.
public class LionOptimizer<T, TInput, TOutput> : GradientBasedOptimizerBase<T, TInput, TOutput>, IGradientBasedOptimizer<T, TInput, TOutput>, IOptimizer<T, TInput, TOutput>, IModelSerializer
Type Parameters
TThe numeric type used for calculations (e.g., float, double).
TInputTOutput
- Inheritance
-
OptimizerBase<T, TInput, TOutput>GradientBasedOptimizerBase<T, TInput, TOutput>LionOptimizer<T, TInput, TOutput>
- Implements
-
IGradientBasedOptimizer<T, TInput, TOutput>IOptimizer<T, TInput, TOutput>
- Inherited Members
- Extension Methods
Remarks
Lion is a modern optimization algorithm discovered through symbolic program search that offers significant advantages over traditional optimizers like Adam. It achieves 50% memory reduction by maintaining only a single momentum state (compared to Adam's two states) while often achieving superior performance on large transformer models and other deep learning architectures.
The algorithm uses sign-based gradient updates, which provides implicit regularization and better generalization. Unlike Adam's magnitude-based updates, Lion focuses purely on the direction of gradients, making it more robust to gradient scale variations and leading to more consistent training dynamics.
For Beginners: Lion is like a simplified but more powerful version of Adam. Instead of carefully measuring how big each step should be (like Adam does), Lion only looks at which direction to go and takes consistent-sized steps in that direction. This is like following a compass that only shows direction - it's simpler, uses less memory, and often gets you to your destination faster. Lion is particularly good for training large neural networks.
Constructors
LionOptimizer(IFullModel<T, TInput, TOutput>?, LionOptimizerOptions<T, TInput, TOutput>?, IEngine?)
Initializes a new instance of the LionOptimizer class.
public LionOptimizer(IFullModel<T, TInput, TOutput>? model, LionOptimizerOptions<T, TInput, TOutput>? options = null, IEngine? engine = null)
Parameters
modelIFullModel<T, TInput, TOutput>The model to optimize.
optionsLionOptimizerOptions<T, TInput, TOutput>The options for configuring the Lion optimizer.
engineIEngineThe computation engine (CPU or GPU) for vectorized operations.
Remarks
For Beginners: This sets up the Lion optimizer with its initial configuration. Lion requires minimal tuning compared to other optimizers - the default settings work well for most deep learning problems. The main parameter you might want to adjust is the learning rate, which is typically set lower than Adam (around 1e-4 instead of 1e-3).
Methods
Deserialize(byte[])
Deserializes the optimizer's state from a byte array.
public override void Deserialize(byte[] data)
Parameters
databyte[]The byte array containing the serialized optimizer state.
Remarks
For Beginners: This method rebuilds the optimizer's state from a saved snapshot. Use this to resume training from a checkpoint, restoring all momentum and configuration exactly as it was when you saved it.
DisposeGpuState()
Disposes GPU-allocated optimizer state.
public override void DisposeGpuState()
GenerateGradientCacheKey(IFullModel<T, TInput, TOutput>, TInput, TOutput)
Generates a unique key for caching gradients.
protected override string GenerateGradientCacheKey(IFullModel<T, TInput, TOutput> model, TInput X, TOutput y)
Parameters
modelIFullModel<T, TInput, TOutput>The symbolic model.
XTInputThe input matrix.
yTOutputThe target vector.
Returns
- string
A string key for gradient caching.
Remarks
For Beginners: This method creates a unique identifier for a specific optimization scenario. It helps the optimizer efficiently store and retrieve previously calculated gradients, speeding up training.
GetOptions()
Gets the current optimizer options.
public override OptimizationAlgorithmOptions<T, TInput, TOutput> GetOptions()
Returns
- OptimizationAlgorithmOptions<T, TInput, TOutput>
The current LionOptimizerOptions.
Remarks
For Beginners: This method lets you check what settings the optimizer is currently using. It's useful for debugging or logging your training configuration.
InitializeAdaptiveParameters()
Initializes the adaptive parameters used by the Lion optimizer.
protected override void InitializeAdaptiveParameters()
Remarks
For Beginners: This sets up the momentum factors. Lion typically uses fixed values for these parameters, but they can be made adaptive if needed. Learning rate is handled by the base class and synced with any configured scheduler.
InitializeGpuState(int, IDirectGpuBackend)
Initializes Lion optimizer state on the GPU.
public override void InitializeGpuState(int parameterCount, IDirectGpuBackend backend)
Parameters
parameterCountintNumber of parameters.
backendIDirectGpuBackendGPU backend for memory allocation.
Optimize(OptimizationInputData<T, TInput, TOutput>)
Performs the optimization process using the Lion algorithm.
public override OptimizationResult<T, TInput, TOutput> Optimize(OptimizationInputData<T, TInput, TOutput> inputData)
Parameters
inputDataOptimizationInputData<T, TInput, TOutput>The input data for optimization, including training data and targets.
Returns
- OptimizationResult<T, TInput, TOutput>
The result of the optimization process, including the best solution found.
Remarks
For Beginners: This is the main learning process. It repeatedly improves the model's parameters using the Lion algorithm. Lion's sign-based updates make it particularly efficient for large-scale optimization problems, often converging faster than Adam while using less memory.
DataLoader Integration: This method uses the DataLoader API for efficient batch processing. It creates a batcher using CreateBatcher(OptimizationInputData<T, TInput, TOutput>, int) and notifies the sampler of epoch starts using NotifyEpochStart(int).
Reset()
Resets the optimizer's internal state.
public override void Reset()
Remarks
For Beginners: This is like resetting the optimizer's memory. It forgets all past momentum and starts fresh, which can be useful when you want to reuse the optimizer for a new problem or restart training from scratch.
ReverseUpdate(Vector<T>, Vector<T>)
Reverses a Lion gradient update to recover original parameters.
public override Vector<T> ReverseUpdate(Vector<T> updatedParameters, Vector<T> appliedGradients)
Parameters
updatedParametersVector<T>Parameters after Lion update
appliedGradientsVector<T>The gradients that were applied
Returns
- Vector<T>
Original parameters before the update
Remarks
Lion's reverse update is complex due to its sign-based updates and optional weight decay. This method must be called immediately after UpdateParameters while the momentum state (_m) is fresh. It recalculates the sign of the interpolated momentum-gradient and reverses the weight decay effect.
For Beginners: This calculates where parameters were before a Lion update. Lion uses only the direction (sign) of updates, not their magnitude. To reverse, we need to remember what direction was used (calculated from momentum and gradients) and also undo the weight decay that was applied to prevent parameters from growing too large.
Serialize()
Serializes the optimizer's state into a byte array.
public override byte[] Serialize()
Returns
- byte[]
A byte array representing the serialized state of the optimizer.
Remarks
For Beginners: This method saves the optimizer's current state into a compact form. You can use this to pause training, save your progress, and resume later from exactly where you left off. Lion's single momentum state makes serialization more efficient than Adam.
UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput>, OptimizationStepData<T, TInput, TOutput>)
Updates the adaptive parameters of the optimizer based on the current and previous optimization steps.
protected override void UpdateAdaptiveParameters(OptimizationStepData<T, TInput, TOutput> currentStepData, OptimizationStepData<T, TInput, TOutput> previousStepData)
Parameters
currentStepDataOptimizationStepData<T, TInput, TOutput>Data from the current optimization step.
previousStepDataOptimizationStepData<T, TInput, TOutput>Data from the previous optimization step.
Remarks
For Beginners: This method can adjust how the optimizer learns based on its recent performance. However, Lion typically works well with fixed parameters, so this is mainly useful for advanced scenarios.
UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput>)
Updates the optimizer's options.
protected override void UpdateOptions(OptimizationAlgorithmOptions<T, TInput, TOutput> options)
Parameters
optionsOptimizationAlgorithmOptions<T, TInput, TOutput>The new options to be set.
Remarks
For Beginners: This method allows you to change the optimizer's settings during training. However, Lion is designed to work well with fixed settings, so you typically won't need to change these mid-training.
Exceptions
- ArgumentException
Thrown when the provided options are not of type LionOptimizerOptions.
UpdateParameters(Matrix<T>, Matrix<T>)
Updates a matrix of parameters using the Lion optimization algorithm.
public override Matrix<T> UpdateParameters(Matrix<T> parameters, Matrix<T> gradient)
Parameters
parametersMatrix<T>The current parameter matrix to be updated.
gradientMatrix<T>The gradient matrix corresponding to the parameters.
Returns
- Matrix<T>
The updated parameter matrix.
Remarks
For Beginners: This method is similar to UpdateParameters for vectors, but it works on a 2D grid of parameters instead of a 1D list. This is commonly used for weight matrices in neural networks. Lion's sign-based updates make it particularly effective for large parameter matrices.
UpdateParameters(Vector<T>, Vector<T>)
Updates a vector of parameters using the Lion optimization algorithm.
public override Vector<T> UpdateParameters(Vector<T> parameters, Vector<T> gradient)
Parameters
parametersVector<T>The current parameter vector to be updated.
gradientVector<T>The gradient vector corresponding to the parameters.
Returns
- Vector<T>
The updated parameter vector.
Remarks
For Beginners: This method applies the Lion algorithm to a vector of parameters. Unlike Adam which considers both the direction and magnitude of gradients, Lion only cares about the direction (sign). This makes it simpler and often more robust to different scales of gradients.
UpdateParametersGpu(IGpuBuffer, IGpuBuffer, int, IDirectGpuBackend)
Updates parameters on GPU using Lion optimization.
public override void UpdateParametersGpu(IGpuBuffer parameters, IGpuBuffer gradients, int parameterCount, IDirectGpuBackend backend)
Parameters
parametersIGpuBuffergradientsIGpuBufferparameterCountintbackendIDirectGpuBackend
UpdateSolution(IFullModel<T, TInput, TOutput>, Vector<T>)
Updates the current solution using the Lion update rule.
protected override IFullModel<T, TInput, TOutput> UpdateSolution(IFullModel<T, TInput, TOutput> currentSolution, Vector<T> gradient)
Parameters
currentSolutionIFullModel<T, TInput, TOutput>The current solution being optimized.
gradientVector<T>The calculated gradient for the current solution.
Returns
- IFullModel<T, TInput, TOutput>
A new solution with updated parameters.
Remarks
For Beginners: This method applies the Lion algorithm's unique approach to parameter updates. The algorithm works in three steps: 1. Interpolate between current gradient and past momentum 2. Take the sign of this interpolation and use it to update parameters 3. Update the momentum for the next iteration This sign-based approach is what makes Lion both simple and powerful.