Class LoRAPlusAdapter<T>
LoRA+ adapter that uses optimized learning rates for faster convergence and better performance.
public class LoRAPlusAdapter<T> : LoRAAdapterBase<T>, IDisposable, ILoRAAdapter<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
LayerBase<T>LoRAPlusAdapter<T>
- Implements
-
ILoRAAdapter<T>ILayer<T>
- Inherited Members
Remarks
LoRA+ (February 2024) improves upon standard LoRA by using different learning rates for the A and B matrices. The key insight is that matrix B (which starts at zero) needs faster updates than matrix A (which starts random). This simple modification leads to significantly faster convergence and improved final performance.
For Beginners: LoRA+ is an enhanced version of LoRA that trains faster and better.
In standard LoRA:
- Both matrix A and B are updated with the same learning rate
- Matrix B starts at zero, so it needs time to "catch up"
- Matrix A starts random, so it's already contributing from the start
LoRA+ recognizes this asymmetry:
- Matrix A is updated with a base learning rate (e.g., 0.0001)
- Matrix B is updated with a higher learning rate (e.g., 0.0016 = 16x higher)
- This accelerates learning without instability
Key parameters:
- BaseLearningRate: Learning rate for matrix A (the "slow" matrix)
- LearningRateRatio: Multiplier for matrix B (typically 16.0)
- ScaledLearningRate: Computed as BaseLearningRate * LearningRateRatio
Research shows LoRA+ typically achieves:
- 2x faster convergence
- Better final performance
- No additional parameters compared to standard LoRA
Example: If base learning rate is 0.0001 and ratio is 16.0:
- Matrix A updates with learning rate 0.0001
- Matrix B updates with learning rate 0.0016
Reference: LoRA+: Efficient Low Rank Adaptation of Large Models (February 2024)
Constructors
LoRAPlusAdapter(ILayer<T>, int, double, double, bool)
Initializes a new LoRA+ adapter with optimized dual learning rates.
public LoRAPlusAdapter(ILayer<T> baseLayer, int rank, double alpha = -1, double learningRateRatio = 16, bool freezeBaseLayer = true)
Parameters
baseLayerILayer<T>The layer to adapt with LoRA+.
rankintThe rank of the LoRA decomposition.
alphadoubleThe LoRA scaling factor (defaults to rank if negative).
learningRateRatiodoubleThe ratio of B's learning rate to A's learning rate (default: 16.0).
freezeBaseLayerboolWhether to freeze the base layer's parameters during training.
Remarks
For Beginners: This creates a LoRA+ adapter that will train faster than standard LoRA.
Parameters:
- baseLayer: The layer you want to efficiently fine-tune
- rank: How much compression (lower = fewer parameters)
- alpha: How strong the LoRA effect is
- learningRateRatio: How much faster B learns than A (16.0 is recommended)
- freezeBaseLayer: Whether to lock the original weights (usually true)
The learning rate ratio is the key differentiator from standard LoRA. Higher ratios mean faster convergence but require careful tuning to avoid instability.
Exceptions
- ArgumentNullException
Thrown when baseLayer is null.
- ArgumentException
Thrown when learningRateRatio is less than 1.0.
Properties
BaseLearningRate
Gets the base learning rate for matrix A.
public T BaseLearningRate { get; }
Property Value
- T
LearningRateRatio
Gets or sets the learning rate ratio between matrix B and matrix A.
public double LearningRateRatio { get; set; }
Property Value
Remarks
Default value is 16.0 as recommended by the LoRA+ paper. Valid range is typically 1.0 to 32.0.
For Beginners: This is the multiplier that makes matrix B learn faster. - 1.0 = same speed as standard LoRA (no benefit) - 8.0 = moderate speedup - 16.0 = recommended default - 32.0 = aggressive speedup (may be unstable)
ScaledLearningRate
Gets the scaled learning rate for matrix B.
public T ScaledLearningRate { get; }
Property Value
- T
Methods
Backward(Tensor<T>)
Performs the backward pass through both layers with dual learning rate scaling.
public override Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>Gradient flowing back from the next layer.
Returns
- Tensor<T>
Gradient to pass to the previous layer.
Remarks
The backward pass computes gradients for both matrices but applies different scaling factors to prepare for the dual learning rate update. Matrix B gradients are implicitly prepared for faster updates during the UpdateParameters call.
For Beginners: This is where LoRA+ differs from standard LoRA! During backpropagation, we compute gradients for both A and B matrices, but we'll apply different learning rates when actually updating the parameters. This prepares the gradients for the dual learning rate optimization.
Forward(Tensor<T>)
Performs the forward pass through both base and LoRA layers.
public override Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>Input tensor.
Returns
- Tensor<T>
Sum of base layer output and LoRA output.
Remarks
The forward pass is identical to standard LoRA: output = base_layer(input) + lora_layer(input). The dual learning rate optimization only affects the backward pass and parameter updates.
For Beginners: This works exactly like standard LoRA during the forward pass. The magic of LoRA+ happens during training (backward pass), not inference.
MergeToOriginalLayer()
Merges the LoRA+ adaptation into the base layer and returns the merged layer.
public override ILayer<T> MergeToOriginalLayer()
Returns
- ILayer<T>
A new layer with LoRA weights merged into the base layer's weights.
Remarks
For LoRA+, merging works exactly like standard LoRA - the dual learning rates only affect training, not the final merged weights.
For Beginners: After training with LoRA+, you can merge the weights just like standard LoRA. The faster training doesn't change the final result, it just gets you there quicker!
SetLearningRates(T)
Sets the learning rates for this adapter.
public void SetLearningRates(T baseLearningRate)
Parameters
baseLearningRateTThe base learning rate for matrix A.
Remarks
This method sets the base learning rate and automatically computes the scaled learning rate for matrix B using the current learning rate ratio.
For Beginners: Call this to configure how fast the adapter learns. You only need to provide the base learning rate - the higher learning rate for matrix B is calculated automatically using the ratio you specified.
Example: If you call SetLearningRates(0.0001) with ratio 16.0:
- Matrix A will use learning rate 0.0001
- Matrix B will use learning rate 0.0016 (16x faster)
UpdateParameters(T)
Updates parameters using dual learning rates (base rate for A, scaled rate for B).
public override void UpdateParameters(T learningRate)
Parameters
learningRateTThis parameter is used as the base learning rate for matrix A.
Remarks
This method overrides the standard LoRA parameter update to apply different learning rates: - Matrix A is updated with the base learning rate - Matrix B is updated with the scaled learning rate (base * ratio) - Base layer is updated with the base learning rate if not frozen
For Beginners: This is where the dual learning rate magic happens! Instead of updating both matrices at the same speed, we: 1. Update matrix A slowly (with the base learning rate) 2. Update matrix B quickly (with the scaled learning rate)
This asymmetry accelerates training because:
- Matrix A already has random values and is contributing
- Matrix B starts at zero and needs to catch up
- Giving B a higher learning rate helps it catch up faster
The result is faster convergence and better final performance!