Class LoRAFAAdapter<T>

Namespace: AiDotNet.LoRA.Adapters

Assembly: AiDotNet.dll

LoRA-FA (LoRA with Frozen A matrix) adapter for parameter-efficient fine-tuning.

public class LoRAFAAdapter<T> : LoRAAdapterBase<T>, IDisposable, ILoRAAdapter<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>

Type Parameters

T: The numeric type used for calculations, typically float or double.

Inheritance: object

LayerBase<T>

LoRAAdapterBase<T>

LoRAFAAdapter<T>

Implements: IDisposable

ILoRAAdapter<T>

ILayer<T>

IJitCompilable<T>

IDiagnosticsProvider

IWeightLoadable<T>

Inherited Members: LoRAAdapterBase<T>._baseLayer

LoRAAdapterBase<T>._loraLayer

LoRAAdapterBase<T>._freezeBaseLayer

LoRAAdapterBase<T>.BaseLayer

LoRAAdapterBase<T>.LoRALayer

LoRAAdapterBase<T>.IsBaseLayerFrozen

LoRAAdapterBase<T>.Rank

LoRAAdapterBase<T>.Alpha

LoRAAdapterBase<T>.SupportsTraining

LoRAAdapterBase<T>.CreateLoRALayer(int, double)

LoRAAdapterBase<T>.GetParameters()

LoRAAdapterBase<T>.SetParameters(Vector<T>)

LoRAAdapterBase<T>.CreateMergedLayerWithClone(Vector<T>)

LoRAAdapterBase<T>.MergeToDenseOrFullyConnected()

LoRAAdapterBase<T>.ResetState()

LoRAAdapterBase<T>.SupportsJitCompilation

LoRAAdapterBase<T>.ExportComputationGraph(List<ComputationNode<T>>)

LayerBase<T>.Engine

LayerBase<T>.ScalarActivation

LayerBase<T>.VectorActivation

LayerBase<T>.UsingVectorActivation

LayerBase<T>.NumOps

LayerBase<T>.Random

LayerBase<T>.Parameters

LayerBase<T>.ParameterGradients

LayerBase<T>.InputShape

LayerBase<T>.InputShapes

LayerBase<T>.UpdateInputShape(int[])

LayerBase<T>.OutputShape

LayerBase<T>.IsTrainingMode

LayerBase<T>.InitializationStrategy

LayerBase<T>.IsInitialized

LayerBase<T>.InitializationLock

LayerBase<T>.EnsureInitialized()

LayerBase<T>.UseAutodiff

LayerBase<T>.SetTrainingMode(bool)

LayerBase<T>.GetParameterGradients()

LayerBase<T>.ClearGradients()

LayerBase<T>.GetInputShape()

LayerBase<T>.GetInputShapes()

LayerBase<T>.GetOutputShape()

LayerBase<T>.GetWeights()

LayerBase<T>.GetBiases()

LayerBase<T>.MapActivationToFused()

LayerBase<T>.SupportsGpuExecution

LayerBase<T>.SupportsGpuTraining

LayerBase<T>.CanExecuteOnGpu

LayerBase<T>.CanTrainOnGpu

LayerBase<T>.ForwardGpu(params IGpuTensor<T>[])

LayerBase<T>.BackwardGpu(IGpuTensor<T>)

LayerBase<T>.UpdateParametersGpu(IGpuOptimizerConfig)

LayerBase<T>.UploadWeightsToGpu()

LayerBase<T>.DownloadWeightsFromGpu()

LayerBase<T>.ZeroGradientsGpu()

LayerBase<T>.GetActivationTypes()

LayerBase<T>.Forward(params Tensor<T>[])

LayerBase<T>.ApplyActivation(Tensor<T>)

LayerBase<T>.ApplyActivation(Vector<T>)

LayerBase<T>.ActivateTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ActivateTensor(IVectorActivationFunction<T>, Tensor<T>)

LayerBase<T>.CalculateInputShape(int, int, int)

LayerBase<T>.CalculateOutputShape(int, int, int)

LayerBase<T>.Clone()

LayerBase<T>.DerivativeTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ApplyActivationDerivative(T, T)

LayerBase<T>.ApplyActivationDerivative(Tensor<T>, Tensor<T>)

LayerBase<T>.ComputeActivationJacobian(Vector<T>)

LayerBase<T>.ApplyActivationDerivative(Vector<T>, Vector<T>)

LayerBase<T>.UpdateParameters(Vector<T>)

LayerBase<T>.Serialize(BinaryWriter)

LayerBase<T>.Deserialize(BinaryReader)

LayerBase<T>.GetDiagnostics()

LayerBase<T>.ApplyActivationToGraph(ComputationNode<T>)

LayerBase<T>.CanActivationBeJitted()

LayerBase<T>.RegisterTrainableParameter(Tensor<T>, PersistentTensorRole)

LayerBase<T>.InvalidateTrainableParameter(Tensor<T>)

LayerBase<T>.HasGpuActivation()

LayerBase<T>.ApplyActivationForwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.ApplyActivationBackwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.GetFusedActivationType()

LayerBase<T>.ApplyGpuActivation(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int, FusedActivationType)

LayerBase<T>.ApplyGpuActivationBackward(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int, FusedActivationType, float)

LayerBase<T>.Dispose()

LayerBase<T>.Dispose(bool)

LayerBase<T>.WeightParameterName

LayerBase<T>.BiasParameterName

LayerBase<T>.SetWeights(Tensor<T>)

LayerBase<T>.SetBiases(Tensor<T>)

LayerBase<T>.GetParameterNames()

LayerBase<T>.TryGetParameter(string, out Tensor<T>)

LayerBase<T>.SetParameter(string, Tensor<T>)

LayerBase<T>.GetParameterShape(string)

LayerBase<T>.NamedParameterCount

LayerBase<T>.ValidateWeights(IEnumerable<string>, Func<string, string>)

LayerBase<T>.LoadWeights(Dictionary<string, Tensor<T>>, Func<string, string>, bool)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

LoRA-FA is a variant of standard LoRA that freezes matrix A after random initialization and only trains matrix B. This provides approximately 50% parameter reduction compared to standard LoRA with minimal performance loss in most scenarios.

For Beginners: LoRA-FA makes LoRA even more efficient!

Standard LoRA uses two small matrices (A and B) that both get trained:

Matrix A: Compresses input (trained)
Matrix B: Expands to output (trained)

LoRA-FA optimizes this further:

Matrix A: Compresses input (frozen - never changes after initialization)
Matrix B: Expands to output (trained - the only thing that learns)

Why freeze matrix A?

Research shows matrix A can be randomly initialized and frozen without much performance loss
This cuts trainable parameters in half (only matrix B is trained)
Training is faster and uses less memory
Perfect when you need maximum efficiency

Example parameter counts for a 1000×1000 layer with rank=8:

Standard LoRA: 8,000 (A) + 8,000 (B) = 16,000 trainable parameters
LoRA-FA: 0 (A frozen) + 8,000 (B) = 8,000 trainable parameters (50% reduction!)

When to use LoRA-FA:

Memory is very limited
Training speed is critical
You can tolerate a small performance trade-off
You're working with very large models

Constructors

LoRAFAAdapter(ILayer<T>, int, double, bool)

Initializes a new LoRA-FA adapter wrapping an existing layer.

public LoRAFAAdapter(ILayer<T> baseLayer, int rank, double alpha = -1, bool freezeBaseLayer = true)

Parameters

baseLayer ILayer<T>: The layer to adapt with LoRA-FA.
rank int: The rank of the LoRA decomposition.
alpha double: The LoRA scaling factor (defaults to rank if negative).
freezeBaseLayer bool: Whether to freeze the base layer's parameters during training.

Remarks

For Beginners: This creates a LoRA-FA adapter that wraps any layer.

Parameters:

baseLayer: The layer you want to make more efficient to fine-tune
rank: How much compression (lower = fewer parameters, less flexibility)
alpha: How strong the LoRA adaptation is
freezeBaseLayer: Whether to lock the original layer's weights (usually true for efficiency)

What happens during initialization:

Matrix A gets random values (Gaussian initialization)
Matrix A is immediately frozen (never updated during training)
Matrix B starts at zero (so initially LoRA-FA has no effect)
Only matrix B will be trained, reducing parameters by 50% vs standard LoRA

This is perfect when you need maximum parameter efficiency!

Exceptions

ArgumentNullException: Thrown when baseLayer is null.

Properties

IsMatrixAFrozen

Gets whether matrix A is frozen during training (always true for LoRA-FA).

public bool IsMatrixAFrozen { get; }

Property Value

bool

Remarks

This is a key characteristic of LoRA-FA - matrix A is randomly initialized and then frozen, never updated during training.

ParameterCount

Gets the total number of trainable parameters (only matrix B).

public override int ParameterCount { get; }

Property Value

int

Remarks

For LoRA-FA, only matrix B is trainable. Matrix A is frozen, so it doesn't count toward trainable parameters. This results in approximately 50% parameter reduction compared to standard LoRA.

For Beginners: This returns how many parameters will actually be trained. Since matrix A is frozen, we only count matrix B's parameters. If the base layer is also frozen (typical case), this is just matrix B. Otherwise, it's base layer + matrix B.

For a layer with input size 1000, output size 1000, and rank 8:

Matrix B size: rank × outputSize = 8 × 1000 = 8,000 parameters
Matrix A size: inputSize × rank = 1000 × 8 = 8,000 parameters (but frozen, so not counted)
Total trainable: 8,000 (50% less than standard LoRA's 16,000)

Methods

Backward(Tensor<T>)

Performs the backward pass, computing gradients only for matrix B (matrix A is frozen).

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>: Gradient flowing back from the next layer.

Returns

Tensor<T>: Gradient to pass to the previous layer.

Remarks

The backward pass differs from standard LoRA in that gradients for matrix A are not computed or stored, since matrix A is frozen. Only gradients for matrix B and (if not frozen) the base layer are computed.

For Beginners: This is where LoRA-FA saves computation and memory!

During learning, the backward pass normally computes gradients for both matrix A and B. But in LoRA-FA, we skip the gradient computation for matrix A entirely because:

Matrix A is frozen (won't be updated anyway)
No need to store gradients we won't use
Less computation = faster training
Less memory = can train larger models

We still compute:

Gradients for matrix B (the only trainable LoRA component)
Gradients for the base layer (if not frozen)
Input gradients to pass to earlier layers

This is the key optimization that makes LoRA-FA more efficient than standard LoRA!

Forward(Tensor<T>)

Performs the forward pass through both base and LoRA layers.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>: Input tensor.

Returns

Tensor<T>: Sum of base layer output and LoRA output.

Remarks

The forward pass is identical to standard LoRA: output = base_layer(input) + lora_layer(input) The difference is that matrix A inside the LoRA layer is frozen, but this doesn't affect the forward computation.

For Beginners: The forward pass works exactly like standard LoRA. We compute the base layer output, compute the LoRA correction (using frozen A and trainable B), and add them together. The frozen matrix A still participates in the computation - it just doesn't get updated during training.

MergeToOriginalLayer()

Merges the LoRA-FA adaptation into the base layer and returns the merged layer.

public override ILayer<T> MergeToOriginalLayer()

Returns

ILayer<T>: A new layer with LoRA weights merged into the base layer's weights.

Remarks

This method merges the LoRA-FA adaptation (using frozen matrix A and trained matrix B) back into the base layer's weights. The process is identical to standard LoRA merging, as both frozen and trained matrices contribute equally to the final merged weights.

For Beginners: This "bakes in" your LoRA-FA adaptation to create a regular layer.

Even though matrix A was frozen during training, it still participated in all the forward passes and contributed to the model's behavior. When merging:

Compute the full weight matrix: W_lora = A × B × scaling
Add these weights to the base layer's weights
Create a new layer with the merged weights

The result is identical to what your adapted model was producing, but:

Faster inference (single matrix multiply instead of A × B)
Simpler deployment (one layer instead of adapter + base layer)
No need for LoRA-aware code in production

Even though A was frozen (never trained), it still matters for the final merged weights because it was part of the random projection that B learned to work with!

Exceptions

InvalidOperationException: Thrown when the base layer type is not DenseLayer or FullyConnectedLayer.

UpdateParameters(T)

Updates parameters, but only for matrix B (matrix A remains frozen).

public override void UpdateParameters(T learningRate)

Parameters

learningRate T: The learning rate for parameter updates.

Remarks

This method updates only matrix B using the gradients computed during backpropagation. Matrix A is never updated, as it remains frozen at its initial random values.

For Beginners: This is where we apply what we learned during training!

The parameter update phase normally adjusts both matrix A and B based on their gradients. But in LoRA-FA, we only update matrix B:

Get the gradients for matrix B from backpropagation
Update matrix B: B_new = B_old - learningRate × gradient_B
Skip matrix A entirely (it stays frozen)
Update base layer parameters if not frozen

This is faster than standard LoRA because:

Fewer parameters to update
Less memory traffic
Simpler computation

Matrix A stays exactly as it was initialized - random Gaussian values that never change!

UpdateParametersFromLayers()

Updates the parameter vector from the current layer states.

protected override void UpdateParametersFromLayers()

Remarks

CRITICAL: For LoRA-FA, this packs BOTH matrix A and B to match ParameterCount. Even though matrix A is frozen, it must be included in the parameter buffer to maintain base-class invariants and prevent buffer overruns. The freeze logic is in UpdateParameters, not in buffer packing.

Table of Contents

Class LoRAFAAdapter<T>

Type Parameters

Remarks

Constructors

LoRAFAAdapter(ILayer<T>, int, double, bool)

Parameters

Remarks

Exceptions

Properties

IsMatrixAFrozen

Property Value

Remarks

ParameterCount

Property Value

Remarks

Methods

Backward(Tensor<T>)

Parameters

Returns

Remarks

Forward(Tensor<T>)

Parameters

Returns

Remarks

MergeToOriginalLayer()

Returns

Remarks

Exceptions

UpdateParameters(T)

Parameters

Remarks

UpdateParametersFromLayers()

Remarks