Table of Contents

Class LoRALayer<T>

Namespace
AiDotNet.LoRA
Assembly
AiDotNet.dll

Implements Low-Rank Adaptation (LoRA) layer for parameter-efficient fine-tuning of neural networks.

public class LoRALayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable

Type Parameters

T

The numeric type used for calculations, typically float or double.

Inheritance
LoRALayer<T>
Implements
Inherited Members

Remarks

LoRA works by decomposing weight updates into two low-rank matrices A and B, where the actual update is computed as B * A. This dramatically reduces the number of trainable parameters compared to fine-tuning all weights directly.

For Beginners: LoRA is a technique that makes it much cheaper to adapt large neural networks to new tasks. Instead of updating all the weights in a layer (which can be millions of parameters), LoRA adds two small matrices that work together to approximate the needed changes.

Think of it like this:

  • Traditional fine-tuning: Adjusting every single knob on a massive control panel
  • LoRA: Using just a few master controls that influence many knobs at once

The key insight is that the changes needed for fine-tuning often lie in a "low-rank" space, meaning we don't need full freedom to adjust every parameter independently.

Key parameters:

  • Rank (r): Controls how many "master controls" you have. Higher rank = more flexibility but more parameters
  • Alpha: A scaling factor that controls how much influence the LoRA adaptation has

For example, adapting a layer with 1000x1000 weights (1M parameters) using LoRA with rank=8 only requires 8x1000 + 8x1000 = 16,000 parameters (98.4% reduction!).

Constructors

LoRALayer(int, int, int, double, IActivationFunction<T>?)

Initializes a new LoRA layer with the specified dimensions and hyperparameters.

public LoRALayer(int inputSize, int outputSize, int rank, double alpha = -1, IActivationFunction<T>? activationFunction = null)

Parameters

inputSize int

The number of input features.

outputSize int

The number of output features.

rank int

The rank of the low-rank decomposition (must be positive and less than min(inputSize, outputSize)).

alpha double

The scaling factor for LoRA contributions (typically similar to rank value).

activationFunction IActivationFunction<T>

Optional activation function to apply after the LoRA transformation.

Remarks

The LoRA matrices are initialized as follows: - Matrix A: Random values from a Gaussian distribution (similar to Kaiming initialization) - Matrix B: Zero initialization (so LoRA starts with no effect)

For Beginners: This creates a new LoRA layer. You specify the input and output sizes (which should match the layer you're adapting), the rank (how much compression), and alpha (how strong the adaptation is).

The initialization is carefully chosen:

  • Matrix A gets random values (so training can start moving in useful directions)
  • Matrix B starts at zero (so initially, LoRA doesn't change anything)

Exceptions

ArgumentException

Thrown when rank is invalid.

Properties

Alpha

Gets the alpha scaling factor.

public T Alpha { get; }

Property Value

T

ParameterCount

Gets the total number of trainable parameters (elements in A and B matrices).

public override int ParameterCount { get; }

Property Value

int

Rank

Gets the rank of this LoRA layer.

public int Rank { get; }

Property Value

int

Scaling

Gets the computed scaling factor (alpha / rank).

public T Scaling { get; }

Property Value

T

SupportsJitCompilation

Gets whether this LoRA layer supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool

True if the LoRA matrices are initialized.

Remarks

LoRA layers support JIT compilation when their matrices (A and B) are properly initialized. The JIT-compiled version computes output = input * A * B * scaling using optimized tensor operations.

For Beginners: JIT compilation makes the LoRA layer run faster by converting its math operations into optimized native code. This is especially beneficial for inference when you want maximum speed.

The layer can be JIT compiled as long as it has been initialized, which happens automatically when the layer is created.

SupportsTraining

Gets whether this layer supports training (always true for LoRA).

public override bool SupportsTraining { get; }

Property Value

bool

Methods

Backward(Tensor<T>)

Performs the backward pass through the LoRA layer.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>

Gradient flowing back from the next layer.

Returns

Tensor<T>

Gradient to pass to the previous layer.

Remarks

The backward pass computes gradients for both LoRA matrices and propagates gradients back to the input. Gradients are computed as: - dL/dB = A^T * input^T * outputGradient * scaling - dL/dA = input^T * outputGradient * B^T * scaling - dL/dinput = outputGradient * B^T * A^T * scaling

For Beginners: This is where learning happens! The backward pass: 1. Figures out how to adjust matrix A and B to reduce error 2. Passes gradients back to earlier layers so they can learn too

It uses calculus (specifically, the chain rule) to figure out how each parameter contributed to the error.

ExportComputationGraph(List<ComputationNode<T>>)

Exports the computation graph for JIT compilation.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>

List to which input nodes will be added.

Returns

ComputationNode<T>

The output computation node representing the LoRA transformation.

Remarks

The computation graph implements: output = input * A * B * scaling where: - A is the low-rank projection matrix (inputSize × rank) - B is the reconstruction matrix (rank × outputSize) - scaling = alpha / rank

For Beginners: This exports the LoRA computation as a graph of operations that can be optimized and compiled to fast native code.

The graph represents:

  1. Input → multiply by matrix A (compress to low rank)
  2. Result → multiply by matrix B (expand to output size)
  3. Result → multiply by scaling factor

The JIT compiler can then fuse these operations and apply optimizations like SIMD vectorization.

Exceptions

ArgumentNullException

Thrown when inputNodes is null.

InvalidOperationException

Thrown when matrices are not initialized.

Forward(Tensor<T>)

Performs the forward pass through the LoRA layer.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>

Input tensor of shape [batchSize, inputSize].

Returns

Tensor<T>

Output tensor of shape [batchSize, outputSize].

Remarks

The forward pass computes: output = input * A * B * scaling where scaling = alpha / rank.

For Beginners: This processes data through the LoRA layer. The input is: 1. Multiplied by matrix A (compressing to rank dimensions) 2. Multiplied by matrix B (expanding back to output dimensions) 3. Scaled by alpha/rank (controlling the strength)

The result represents the adaptation that gets added to the base layer's output.

GetMatrixA()

Gets matrix A (for inspection or advanced use cases).

public Matrix<T> GetMatrixA()

Returns

Matrix<T>

GetMatrixB()

Gets matrix B (for inspection or advanced use cases).

public Matrix<T> GetMatrixB()

Returns

Matrix<T>

GetParameters()

Gets the current parameters as a vector.

public override Vector<T> GetParameters()

Returns

Vector<T>

Vector containing all LoRA parameters (A and B matrices flattened).

MergeWeights()

Merges the LoRA weights into a dense weight matrix that can be added to a base layer.

public Matrix<T> MergeWeights()

Returns

Matrix<T>

The merged weight matrix (inputSize × outputSize) representing the full LoRA contribution.

Remarks

This computes the full weight matrix W_lora = A * B * scaling, which can then be added to the base layer's weights. This is useful for deployment when you want to merge the adaptation back into the base model for inference efficiency.

For Beginners: This "bakes in" the LoRA adaptation into a regular weight matrix. Instead of storing two small matrices (A and B) and computing them during inference, you can merge them into one larger matrix and add it to the original weights.

This is like converting assembly instructions back into a final product - once you're done training, you can simplify the model for faster inference.

ResetState()

Resets the internal state of the layer.

public override void ResetState()

Remarks

For LoRA layers, this clears the stored input from the last forward pass.

For Beginners: This clears the layer's memory of the last input it processed. It's like hitting a reset button before processing a new, unrelated batch of data.

SetParameters(Vector<T>)

Sets the layer parameters from a vector.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

Vector containing all LoRA parameters.

UpdateParameters(T)

Updates the layer's parameters using the specified learning rate.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T

The learning rate for parameter updates.