Class LoRAFAAdapter<T>
LoRA-FA (LoRA with Frozen A matrix) adapter for parameter-efficient fine-tuning.
public class LoRAFAAdapter<T> : LoRAAdapterBase<T>, IDisposable, ILoRAAdapter<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
LayerBase<T>LoRAFAAdapter<T>
- Implements
-
ILoRAAdapter<T>ILayer<T>
- Inherited Members
Remarks
LoRA-FA is a variant of standard LoRA that freezes matrix A after random initialization and only trains matrix B. This provides approximately 50% parameter reduction compared to standard LoRA with minimal performance loss in most scenarios.
For Beginners: LoRA-FA makes LoRA even more efficient!
Standard LoRA uses two small matrices (A and B) that both get trained:
- Matrix A: Compresses input (trained)
- Matrix B: Expands to output (trained)
LoRA-FA optimizes this further:
- Matrix A: Compresses input (frozen - never changes after initialization)
- Matrix B: Expands to output (trained - the only thing that learns)
Why freeze matrix A?
- Research shows matrix A can be randomly initialized and frozen without much performance loss
- This cuts trainable parameters in half (only matrix B is trained)
- Training is faster and uses less memory
- Perfect when you need maximum efficiency
Example parameter counts for a 1000×1000 layer with rank=8:
- Standard LoRA: 8,000 (A) + 8,000 (B) = 16,000 trainable parameters
- LoRA-FA: 0 (A frozen) + 8,000 (B) = 8,000 trainable parameters (50% reduction!)
When to use LoRA-FA:
- Memory is very limited
- Training speed is critical
- You can tolerate a small performance trade-off
- You're working with very large models
Constructors
LoRAFAAdapter(ILayer<T>, int, double, bool)
Initializes a new LoRA-FA adapter wrapping an existing layer.
public LoRAFAAdapter(ILayer<T> baseLayer, int rank, double alpha = -1, bool freezeBaseLayer = true)
Parameters
baseLayerILayer<T>The layer to adapt with LoRA-FA.
rankintThe rank of the LoRA decomposition.
alphadoubleThe LoRA scaling factor (defaults to rank if negative).
freezeBaseLayerboolWhether to freeze the base layer's parameters during training.
Remarks
For Beginners: This creates a LoRA-FA adapter that wraps any layer.
Parameters:
- baseLayer: The layer you want to make more efficient to fine-tune
- rank: How much compression (lower = fewer parameters, less flexibility)
- alpha: How strong the LoRA adaptation is
- freezeBaseLayer: Whether to lock the original layer's weights (usually true for efficiency)
What happens during initialization:
- Matrix A gets random values (Gaussian initialization)
- Matrix A is immediately frozen (never updated during training)
- Matrix B starts at zero (so initially LoRA-FA has no effect)
- Only matrix B will be trained, reducing parameters by 50% vs standard LoRA
This is perfect when you need maximum parameter efficiency!
Exceptions
- ArgumentNullException
Thrown when baseLayer is null.
Properties
IsMatrixAFrozen
Gets whether matrix A is frozen during training (always true for LoRA-FA).
public bool IsMatrixAFrozen { get; }
Property Value
Remarks
This is a key characteristic of LoRA-FA - matrix A is randomly initialized and then frozen, never updated during training.
ParameterCount
Gets the total number of trainable parameters (only matrix B).
public override int ParameterCount { get; }
Property Value
Remarks
For LoRA-FA, only matrix B is trainable. Matrix A is frozen, so it doesn't count toward trainable parameters. This results in approximately 50% parameter reduction compared to standard LoRA.
For Beginners: This returns how many parameters will actually be trained. Since matrix A is frozen, we only count matrix B's parameters. If the base layer is also frozen (typical case), this is just matrix B. Otherwise, it's base layer + matrix B.
For a layer with input size 1000, output size 1000, and rank 8:
- Matrix B size: rank × outputSize = 8 × 1000 = 8,000 parameters
- Matrix A size: inputSize × rank = 1000 × 8 = 8,000 parameters (but frozen, so not counted)
- Total trainable: 8,000 (50% less than standard LoRA's 16,000)
Methods
Backward(Tensor<T>)
Performs the backward pass, computing gradients only for matrix B (matrix A is frozen).
public override Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>Gradient flowing back from the next layer.
Returns
- Tensor<T>
Gradient to pass to the previous layer.
Remarks
The backward pass differs from standard LoRA in that gradients for matrix A are not computed or stored, since matrix A is frozen. Only gradients for matrix B and (if not frozen) the base layer are computed.
For Beginners: This is where LoRA-FA saves computation and memory!
During learning, the backward pass normally computes gradients for both matrix A and B. But in LoRA-FA, we skip the gradient computation for matrix A entirely because:
- Matrix A is frozen (won't be updated anyway)
- No need to store gradients we won't use
- Less computation = faster training
- Less memory = can train larger models
We still compute:
- Gradients for matrix B (the only trainable LoRA component)
- Gradients for the base layer (if not frozen)
- Input gradients to pass to earlier layers
This is the key optimization that makes LoRA-FA more efficient than standard LoRA!
Forward(Tensor<T>)
Performs the forward pass through both base and LoRA layers.
public override Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>Input tensor.
Returns
- Tensor<T>
Sum of base layer output and LoRA output.
Remarks
The forward pass is identical to standard LoRA: output = base_layer(input) + lora_layer(input) The difference is that matrix A inside the LoRA layer is frozen, but this doesn't affect the forward computation.
For Beginners: The forward pass works exactly like standard LoRA. We compute the base layer output, compute the LoRA correction (using frozen A and trainable B), and add them together. The frozen matrix A still participates in the computation - it just doesn't get updated during training.
MergeToOriginalLayer()
Merges the LoRA-FA adaptation into the base layer and returns the merged layer.
public override ILayer<T> MergeToOriginalLayer()
Returns
- ILayer<T>
A new layer with LoRA weights merged into the base layer's weights.
Remarks
This method merges the LoRA-FA adaptation (using frozen matrix A and trained matrix B) back into the base layer's weights. The process is identical to standard LoRA merging, as both frozen and trained matrices contribute equally to the final merged weights.
For Beginners: This "bakes in" your LoRA-FA adaptation to create a regular layer.
Even though matrix A was frozen during training, it still participated in all the forward passes and contributed to the model's behavior. When merging:
- Compute the full weight matrix: W_lora = A × B × scaling
- Add these weights to the base layer's weights
- Create a new layer with the merged weights
The result is identical to what your adapted model was producing, but:
- Faster inference (single matrix multiply instead of A × B)
- Simpler deployment (one layer instead of adapter + base layer)
- No need for LoRA-aware code in production
Even though A was frozen (never trained), it still matters for the final merged weights because it was part of the random projection that B learned to work with!
Exceptions
- InvalidOperationException
Thrown when the base layer type is not DenseLayer or FullyConnectedLayer.
UpdateParameters(T)
Updates parameters, but only for matrix B (matrix A remains frozen).
public override void UpdateParameters(T learningRate)
Parameters
learningRateTThe learning rate for parameter updates.
Remarks
This method updates only matrix B using the gradients computed during backpropagation. Matrix A is never updated, as it remains frozen at its initial random values.
For Beginners: This is where we apply what we learned during training!
The parameter update phase normally adjusts both matrix A and B based on their gradients. But in LoRA-FA, we only update matrix B:
- Get the gradients for matrix B from backpropagation
- Update matrix B: B_new = B_old - learningRate × gradient_B
- Skip matrix A entirely (it stays frozen)
- Update base layer parameters if not frozen
This is faster than standard LoRA because:
- Fewer parameters to update
- Less memory traffic
- Simpler computation
Matrix A stays exactly as it was initialized - random Gaussian values that never change!
UpdateParametersFromLayers()
Updates the parameter vector from the current layer states.
protected override void UpdateParametersFromLayers()
Remarks
CRITICAL: For LoRA-FA, this packs BOTH matrix A and B to match ParameterCount. Even though matrix A is frozen, it must be included in the parameter buffer to maintain base-class invariants and prevent buffer overruns. The freeze logic is in UpdateParameters, not in buffer packing.