Table of Contents

Class TiedLoRAAdapter<T>

Namespace
AiDotNet.LoRA.Adapters
Assembly
AiDotNet.dll

Tied-LoRA adapter - LoRA with weight tying for extreme parameter efficiency across deep networks.

public class TiedLoRAAdapter<T> : LoRAAdapterBase<T>, IDisposable, ILoRAAdapter<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>

Type Parameters

T

The numeric type used for calculations, typically float or double.

Inheritance
TiedLoRAAdapter<T>
Implements
Inherited Members

Remarks

Tied-LoRA achieves even greater parameter efficiency than standard LoRA by: - Sharing the same LoRA matrices (A and B) across multiple layers - Training only layer-specific scaling factors - Particularly effective for very deep networks with many similar layers

The forward computation is: output = base_layer(input) + layerScaling * (B_shared * A_shared * input) where layerScaling is a trainable scalar unique to each layer, and A and B are shared trainable matrices.

For Beginners: Tied-LoRA is an ultra-efficient variant of LoRA for deep networks.

Think of the difference this way:

  • Standard LoRA: Each layer has its own pair of small matrices (A and B) that are trained
  • VeRA: ALL layers share the same random matrices (A and B) which are frozen. Only tiny scaling vectors are trained per layer.
  • Tied-LoRA: ALL layers share the same matrices (A and B) which ARE trained. Only a single scaling factor is trained per layer.

Example parameter comparison for 10 layers of 1000x1000 with rank=8:

  • Full fine-tuning: 10,000,000 parameters
  • Standard LoRA (rank=8): 160,000 parameters (10 layers × 16,000 params each)
  • Tied-LoRA (rank=8): ~16,010 parameters (shared 16,000 + 10 scaling factors)

Benefits of Tied-LoRA:

  • ✅ Extreme parameter efficiency for deep networks (scales with depth)
  • ✅ Shared matrices enforce consistency across layers
  • ✅ Still trainable (unlike VeRA's frozen matrices)
  • ✅ Very low memory footprint
  • ✅ Faster training (fewer parameters to update)

Trade-offs:

  • ⚠️ Less flexible than standard LoRA (shared adaptation across layers)
  • ⚠️ Assumes layers benefit from similar adaptations
  • ⚠️ May underperform standard LoRA on heterogeneous architectures

When to use Tied-LoRA:

  • Very deep networks (transformers with many similar layers)
  • Extreme memory constraints
  • When layers have similar structure and function
  • Rapid prototyping with minimal parameter overhead
  • Fine-tuning massive models (GPT, BERT-style architectures)

Research insight: Tied-LoRA works well because in deep networks, many layers learn similar transformations. By sharing the LoRA matrices and only varying the strength per layer, we capture most of the adaptation capability with minimal parameters.

Constructors

TiedLoRAAdapter(ILayer<T>, int, int, double, bool)

Initializes a new Tied-LoRA adapter wrapping an existing layer.

public TiedLoRAAdapter(ILayer<T> baseLayer, int rank, int layerIndex = 0, double alpha = -1, bool freezeBaseLayer = true)

Parameters

baseLayer ILayer<T>

The layer to adapt with Tied-LoRA.

rank int

The rank of the low-rank decomposition (shared across all Tied-LoRA layers).

layerIndex int

The index of this layer in the network (for tracking and debugging).

alpha double

The scaling factor (defaults to rank if negative).

freezeBaseLayer bool

Whether to freeze the base layer's parameters during training.

Remarks

Before creating any Tied-LoRA adapters, you must call InitializeSharedMatrices() once to set up the shared trainable matrices that all Tied-LoRA layers will use.

For Beginners: This creates a Tied-LoRA adapter for a layer. You must initialize the shared matrices first by calling:

TiedLoRAAdapter<T>.InitializeSharedMatrices(inputSize, outputSize, rank);

This needs to be done once before creating any Tied-LoRA adapters.

Parameters:

  • baseLayer: The layer you want to adapt
  • rank: How much compression (lower = fewer parameters)
  • layerIndex: Which layer this is (0, 1, 2, etc.) for tracking
  • alpha: How strong the Tied-LoRA adaptation is
  • freezeBaseLayer: Whether to lock the original layer's weights (usually true)

The layerIndex helps identify which layer this adapter belongs to, which is useful for debugging and understanding how different layers use the shared adaptation.

Exceptions

ArgumentNullException

Thrown when baseLayer is null.

ArgumentException

Thrown when rank is invalid or shared matrices are not initialized.

Properties

AreSharedMatricesInitialized

Gets whether the shared matrices have been initialized.

public static bool AreSharedMatricesInitialized { get; }

Property Value

bool

LayerIndex

Gets the layer index.

public int LayerIndex { get; }

Property Value

int

LayerScaling

Gets the layer-specific scaling factor.

public double LayerScaling { get; }

Property Value

double

ParameterCount

Gets the total number of trainable parameters.

public override int ParameterCount { get; }

Property Value

int

Remarks

Tied-LoRA only trains a single scaling factor per layer (plus the base layer if not frozen). The shared matrices contribute to the parameter count only once across all layers.

Methods

Backward(Tensor<T>)

Performs the backward pass through the Tied-LoRA adapter.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>

Gradient flowing back from the next layer.

Returns

Tensor<T>

Gradient to pass to the previous layer.

Remarks

The backward pass computes gradients for: 1. Layer-specific scaling factor (local to this layer) 2. Shared matrices A and B (accumulated across all layers)

For Beginners: This is where Tied-LoRA learns! During backpropagation: 1. Compute gradient for this layer's scaling factor 2. Accumulate gradients for shared matrices A and B (these are summed across all layers) 3. Update base layer if not frozen 4. Pass gradients back to earlier layers

The shared matrices are updated once after all layers have computed their gradients, using the accumulated gradients from all layers.

CreateLoRALayer(int, double)

Creates a Tied-LoRA-specific layer (not used since Tied-LoRA doesn't use standard LoRALayer).

protected override LoRALayer<T> CreateLoRALayer(int rank, double alpha)

Parameters

rank int
alpha double

Returns

LoRALayer<T>

Forward(Tensor<T>)

Performs the forward pass through the Tied-LoRA adapter.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>

Input tensor.

Returns

Tensor<T>

Sum of base layer output and Tied-LoRA output.

Remarks

The Tied-LoRA forward pass computes: output = base_layer(input) + layerScaling * (B_shared * A_shared * input) * (alpha/rank)

For Beginners: This processes input through both the original layer and the Tied-LoRA adaptation: 1. Base layer processes the input (original behavior) 2. Tied-LoRA computes: input → A_shared (trainable) → B_shared (trainable) → layerScaling 3. The outputs are added together

The key difference from standard LoRA: A and B are shared across all layers and ARE trained, but each layer only has one trainable parameter (layerScaling) to control the strength!

GetParameters()

Gets the current parameters as a vector.

public override Vector<T> GetParameters()

Returns

Vector<T>

Vector containing parameters (layer scaling factor only, or base + scaling if base not frozen).

InitializeSharedMatrices(int, int, int, int?)

Initializes the shared trainable matrices used by all Tied-LoRA adapters.

public static void InitializeSharedMatrices(int inputSize, int outputSize, int rank, int? seed = null)

Parameters

inputSize int

The input dimension for the layers.

outputSize int

The output dimension for the layers.

rank int

The rank of the low-rank decomposition.

seed int?

Optional random seed for reproducibility.

Remarks

This method must be called once before creating any Tied-LoRA adapters. It initializes the shared matrices A and B with random values that will be trained during fine-tuning.

The shared matrices are initialized with Gaussian random values similar to Kaiming initialization for matrix A, and zeros for matrix B (so Tied-LoRA starts with no effect).

For Beginners: Call this once at the start before creating any Tied-LoRA layers:

// Initialize shared trainable matrices (do this once) TiedLoRAAdapter<double>.InitializeSharedMatrices(inputSize: 784, outputSize: 128, rank: 8);

// Now create Tied-LoRA adapters (they will use the shared matrices) var adapter1 = new TiedLoRAAdapter<double>(layer1, rank: 8, layerIndex: 0); var adapter2 = new TiedLoRAAdapter<double>(layer2, rank: 8, layerIndex: 1);

All adapters share the same A and B matrices, but each has its own scaling factor! During training, the shared matrices learn the common adaptation pattern, while each layer's scaling factor controls how much to use that pattern.

MergeToOriginalLayer()

Merges the Tied-LoRA adaptation into the base layer and returns the merged layer.

public override ILayer<T> MergeToOriginalLayer()

Returns

ILayer<T>

A new layer with Tied-LoRA weights merged into the base layer's weights.

Remarks

This computes the full weight contribution from Tied-LoRA: W_tied = layerScaling * (B_shared * A_shared) * (alpha/rank) and adds it to the base layer's weights.

For Beginners: This "bakes in" the Tied-LoRA adaptation for deployment. After training, you can merge the adaptation into the original weights for faster inference. The merged layer will behave identically but without the Tied-LoRA overhead.

Each layer gets a different merged result because the layer-specific scaling factor modulates how much of the shared adaptation is applied to that layer.

ResetSharedGradients()

Resets the accumulated gradients for the shared matrices. Should be called after each optimization step.

public static void ResetSharedGradients()

ResetSharedMatrices()

Resets the shared matrices and gradients (useful for testing or reinitializing).

public static void ResetSharedMatrices()

ResetState()

Resets the internal state of the Tied-LoRA adapter.

public override void ResetState()

SetParameters(Vector<T>)

Sets the layer parameters from a vector.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

Vector containing parameters.

UpdateParameters(T)

Updates parameters using the specified learning rate.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T

The learning rate for parameter updates.

Remarks

Tied-LoRA updates the layer-specific scaling factor locally, but shared matrices must be updated separately using UpdateSharedMatrices() after all layers have performed their backward pass.

For Beginners: This updates only the layer-specific scaling factor. The shared matrices A and B need to be updated separately after all layers finish their backward pass, because they accumulate gradients from all layers.

UpdateSharedMatrices(T)

Updates the shared matrices using accumulated gradients. Should be called once after all layers have performed backward pass.

public static void UpdateSharedMatrices(T learningRate)

Parameters

learningRate T

The learning rate for parameter updates.