Class LoRAPlusAdapter<T>

Namespace: AiDotNet.LoRA.Adapters

Assembly: AiDotNet.dll

LoRA+ adapter that uses optimized learning rates for faster convergence and better performance.

public class LoRAPlusAdapter<T> : LoRAAdapterBase<T>, IDisposable, ILoRAAdapter<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>

Type Parameters

T: The numeric type used for calculations, typically float or double.

Inheritance: object

LayerBase<T>

LoRAAdapterBase<T>

LoRAPlusAdapter<T>

Implements: IDisposable

ILoRAAdapter<T>

ILayer<T>

IJitCompilable<T>

IDiagnosticsProvider

IWeightLoadable<T>

Inherited Members: LoRAAdapterBase<T>._baseLayer

LoRAAdapterBase<T>._loraLayer

LoRAAdapterBase<T>._freezeBaseLayer

LoRAAdapterBase<T>.BaseLayer

LoRAAdapterBase<T>.LoRALayer

LoRAAdapterBase<T>.IsBaseLayerFrozen

LoRAAdapterBase<T>.Rank

LoRAAdapterBase<T>.Alpha

LoRAAdapterBase<T>.ParameterCount

LoRAAdapterBase<T>.SupportsTraining

LoRAAdapterBase<T>.CreateLoRALayer(int, double)

LoRAAdapterBase<T>.GetParameters()

LoRAAdapterBase<T>.SetParameters(Vector<T>)

LoRAAdapterBase<T>.CreateMergedLayerWithClone(Vector<T>)

LoRAAdapterBase<T>.MergeToDenseOrFullyConnected()

LoRAAdapterBase<T>.UpdateParametersFromLayers()

LoRAAdapterBase<T>.ResetState()

LoRAAdapterBase<T>.SupportsJitCompilation

LoRAAdapterBase<T>.ExportComputationGraph(List<ComputationNode<T>>)

LayerBase<T>.Engine

LayerBase<T>.ScalarActivation

LayerBase<T>.VectorActivation

LayerBase<T>.UsingVectorActivation

LayerBase<T>.NumOps

LayerBase<T>.Random

LayerBase<T>.Parameters

LayerBase<T>.ParameterGradients

LayerBase<T>.InputShape

LayerBase<T>.InputShapes

LayerBase<T>.UpdateInputShape(int[])

LayerBase<T>.OutputShape

LayerBase<T>.IsTrainingMode

LayerBase<T>.InitializationStrategy

LayerBase<T>.IsInitialized

LayerBase<T>.InitializationLock

LayerBase<T>.EnsureInitialized()

LayerBase<T>.UseAutodiff

LayerBase<T>.SetTrainingMode(bool)

LayerBase<T>.GetParameterGradients()

LayerBase<T>.ClearGradients()

LayerBase<T>.GetInputShape()

LayerBase<T>.GetInputShapes()

LayerBase<T>.GetOutputShape()

LayerBase<T>.GetWeights()

LayerBase<T>.GetBiases()

LayerBase<T>.MapActivationToFused()

LayerBase<T>.SupportsGpuExecution

LayerBase<T>.SupportsGpuTraining

LayerBase<T>.CanExecuteOnGpu

LayerBase<T>.CanTrainOnGpu

LayerBase<T>.ForwardGpu(params IGpuTensor<T>[])

LayerBase<T>.BackwardGpu(IGpuTensor<T>)

LayerBase<T>.UpdateParametersGpu(IGpuOptimizerConfig)

LayerBase<T>.UploadWeightsToGpu()

LayerBase<T>.DownloadWeightsFromGpu()

LayerBase<T>.ZeroGradientsGpu()

LayerBase<T>.GetActivationTypes()

LayerBase<T>.Forward(params Tensor<T>[])

LayerBase<T>.ApplyActivation(Tensor<T>)

LayerBase<T>.ApplyActivation(Vector<T>)

LayerBase<T>.ActivateTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ActivateTensor(IVectorActivationFunction<T>, Tensor<T>)

LayerBase<T>.CalculateInputShape(int, int, int)

LayerBase<T>.CalculateOutputShape(int, int, int)

LayerBase<T>.Clone()

LayerBase<T>.DerivativeTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ApplyActivationDerivative(T, T)

LayerBase<T>.ApplyActivationDerivative(Tensor<T>, Tensor<T>)

LayerBase<T>.ComputeActivationJacobian(Vector<T>)

LayerBase<T>.ApplyActivationDerivative(Vector<T>, Vector<T>)

LayerBase<T>.UpdateParameters(Vector<T>)

LayerBase<T>.Serialize(BinaryWriter)

LayerBase<T>.Deserialize(BinaryReader)

LayerBase<T>.GetDiagnostics()

LayerBase<T>.ApplyActivationToGraph(ComputationNode<T>)

LayerBase<T>.CanActivationBeJitted()

LayerBase<T>.RegisterTrainableParameter(Tensor<T>, PersistentTensorRole)

LayerBase<T>.InvalidateTrainableParameter(Tensor<T>)

LayerBase<T>.HasGpuActivation()

LayerBase<T>.ApplyActivationForwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.ApplyActivationBackwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.GetFusedActivationType()

LayerBase<T>.ApplyGpuActivation(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int, FusedActivationType)

LayerBase<T>.ApplyGpuActivationBackward(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int, FusedActivationType, float)

LayerBase<T>.Dispose()

LayerBase<T>.Dispose(bool)

LayerBase<T>.WeightParameterName

LayerBase<T>.BiasParameterName

LayerBase<T>.SetWeights(Tensor<T>)

LayerBase<T>.SetBiases(Tensor<T>)

LayerBase<T>.GetParameterNames()

LayerBase<T>.TryGetParameter(string, out Tensor<T>)

LayerBase<T>.SetParameter(string, Tensor<T>)

LayerBase<T>.GetParameterShape(string)

LayerBase<T>.NamedParameterCount

LayerBase<T>.ValidateWeights(IEnumerable<string>, Func<string, string>)

LayerBase<T>.LoadWeights(Dictionary<string, Tensor<T>>, Func<string, string>, bool)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

LoRA+ (February 2024) improves upon standard LoRA by using different learning rates for the A and B matrices. The key insight is that matrix B (which starts at zero) needs faster updates than matrix A (which starts random). This simple modification leads to significantly faster convergence and improved final performance.

For Beginners: LoRA+ is an enhanced version of LoRA that trains faster and better.

In standard LoRA:

Both matrix A and B are updated with the same learning rate
Matrix B starts at zero, so it needs time to "catch up"
Matrix A starts random, so it's already contributing from the start

LoRA+ recognizes this asymmetry:

Matrix A is updated with a base learning rate (e.g., 0.0001)
Matrix B is updated with a higher learning rate (e.g., 0.0016 = 16x higher)
This accelerates learning without instability

Key parameters:

BaseLearningRate: Learning rate for matrix A (the "slow" matrix)
LearningRateRatio: Multiplier for matrix B (typically 16.0)
ScaledLearningRate: Computed as BaseLearningRate * LearningRateRatio

Research shows LoRA+ typically achieves:

2x faster convergence
Better final performance
No additional parameters compared to standard LoRA

Example: If base learning rate is 0.0001 and ratio is 16.0:

Matrix A updates with learning rate 0.0001
Matrix B updates with learning rate 0.0016

Reference: LoRA+: Efficient Low Rank Adaptation of Large Models (February 2024)

Constructors

LoRAPlusAdapter(ILayer<T>, int, double, double, bool)

Initializes a new LoRA+ adapter with optimized dual learning rates.

public LoRAPlusAdapter(ILayer<T> baseLayer, int rank, double alpha = -1, double learningRateRatio = 16, bool freezeBaseLayer = true)

Parameters

baseLayer ILayer<T>: The layer to adapt with LoRA+.
rank int: The rank of the LoRA decomposition.
alpha double: The LoRA scaling factor (defaults to rank if negative).
learningRateRatio double: The ratio of B's learning rate to A's learning rate (default: 16.0).
freezeBaseLayer bool: Whether to freeze the base layer's parameters during training.

Remarks

For Beginners: This creates a LoRA+ adapter that will train faster than standard LoRA.

Parameters:

baseLayer: The layer you want to efficiently fine-tune
rank: How much compression (lower = fewer parameters)
alpha: How strong the LoRA effect is
learningRateRatio: How much faster B learns than A (16.0 is recommended)
freezeBaseLayer: Whether to lock the original weights (usually true)

The learning rate ratio is the key differentiator from standard LoRA. Higher ratios mean faster convergence but require careful tuning to avoid instability.

Exceptions

ArgumentNullException: Thrown when baseLayer is null.
ArgumentException: Thrown when learningRateRatio is less than 1.0.

Properties

BaseLearningRate

Gets the base learning rate for matrix A.

public T BaseLearningRate { get; }

Property Value

T

LearningRateRatio

Gets or sets the learning rate ratio between matrix B and matrix A.

public double LearningRateRatio { get; set; }

Property Value

double

Remarks

Default value is 16.0 as recommended by the LoRA+ paper. Valid range is typically 1.0 to 32.0.

For Beginners: This is the multiplier that makes matrix B learn faster. - 1.0 = same speed as standard LoRA (no benefit) - 8.0 = moderate speedup - 16.0 = recommended default - 32.0 = aggressive speedup (may be unstable)

ScaledLearningRate

Gets the scaled learning rate for matrix B.

public T ScaledLearningRate { get; }

Property Value

T

Methods

Backward(Tensor<T>)

Performs the backward pass through both layers with dual learning rate scaling.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>: Gradient flowing back from the next layer.

Returns

Tensor<T>: Gradient to pass to the previous layer.

Remarks

The backward pass computes gradients for both matrices but applies different scaling factors to prepare for the dual learning rate update. Matrix B gradients are implicitly prepared for faster updates during the UpdateParameters call.

For Beginners: This is where LoRA+ differs from standard LoRA! During backpropagation, we compute gradients for both A and B matrices, but we'll apply different learning rates when actually updating the parameters. This prepares the gradients for the dual learning rate optimization.

Forward(Tensor<T>)

Performs the forward pass through both base and LoRA layers.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>: Input tensor.

Returns

Tensor<T>: Sum of base layer output and LoRA output.

Remarks

The forward pass is identical to standard LoRA: output = base_layer(input) + lora_layer(input). The dual learning rate optimization only affects the backward pass and parameter updates.

For Beginners: This works exactly like standard LoRA during the forward pass. The magic of LoRA+ happens during training (backward pass), not inference.

MergeToOriginalLayer()

Merges the LoRA+ adaptation into the base layer and returns the merged layer.

public override ILayer<T> MergeToOriginalLayer()

Returns

ILayer<T>: A new layer with LoRA weights merged into the base layer's weights.

Remarks

For LoRA+, merging works exactly like standard LoRA - the dual learning rates only affect training, not the final merged weights.

For Beginners: After training with LoRA+, you can merge the weights just like standard LoRA. The faster training doesn't change the final result, it just gets you there quicker!

SetLearningRates(T)

Sets the learning rates for this adapter.

public void SetLearningRates(T baseLearningRate)

Parameters

baseLearningRate T: The base learning rate for matrix A.

Remarks

This method sets the base learning rate and automatically computes the scaled learning rate for matrix B using the current learning rate ratio.

For Beginners: Call this to configure how fast the adapter learns. You only need to provide the base learning rate - the higher learning rate for matrix B is calculated automatically using the ratio you specified.

Example: If you call SetLearningRates(0.0001) with ratio 16.0:

Matrix A will use learning rate 0.0001
Matrix B will use learning rate 0.0016 (16x faster)

UpdateParameters(T)

Updates parameters using dual learning rates (base rate for A, scaled rate for B).

public override void UpdateParameters(T learningRate)

Parameters

learningRate T: This parameter is used as the base learning rate for matrix A.

Remarks

This method overrides the standard LoRA parameter update to apply different learning rates: - Matrix A is updated with the base learning rate - Matrix B is updated with the scaled learning rate (base * ratio) - Base layer is updated with the base learning rate if not frozen

For Beginners: This is where the dual learning rate magic happens! Instead of updating both matrices at the same speed, we: 1. Update matrix A slowly (with the base learning rate) 2. Update matrix B quickly (with the scaled learning rate)

This asymmetry accelerates training because:

Matrix A already has random values and is contributing
Matrix B starts at zero and needs to catch up
Giving B a higher learning rate helps it catch up faster

The result is faster convergence and better final performance!

Table of Contents

Class LoRAPlusAdapter<T>

Type Parameters

Remarks

Constructors

LoRAPlusAdapter(ILayer<T>, int, double, double, bool)

Parameters

Remarks

Exceptions

Properties

BaseLearningRate

Property Value

LearningRateRatio

Property Value

Remarks

ScaledLearningRate

Property Value

Methods

Backward(Tensor<T>)

Parameters

Returns

Remarks

Forward(Tensor<T>)

Parameters

Returns

Remarks

MergeToOriginalLayer()

Returns

Remarks

SetLearningRates(T)

Parameters

Remarks

UpdateParameters(T)

Parameters

Remarks