Class TiedLoRAAdapter<T>
Tied-LoRA adapter - LoRA with weight tying for extreme parameter efficiency across deep networks.
public class TiedLoRAAdapter<T> : LoRAAdapterBase<T>, IDisposable, ILoRAAdapter<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
LayerBase<T>TiedLoRAAdapter<T>
- Implements
-
ILoRAAdapter<T>ILayer<T>
- Inherited Members
Remarks
Tied-LoRA achieves even greater parameter efficiency than standard LoRA by: - Sharing the same LoRA matrices (A and B) across multiple layers - Training only layer-specific scaling factors - Particularly effective for very deep networks with many similar layers
The forward computation is: output = base_layer(input) + layerScaling * (B_shared * A_shared * input) where layerScaling is a trainable scalar unique to each layer, and A and B are shared trainable matrices.
For Beginners: Tied-LoRA is an ultra-efficient variant of LoRA for deep networks.
Think of the difference this way:
- Standard LoRA: Each layer has its own pair of small matrices (A and B) that are trained
- VeRA: ALL layers share the same random matrices (A and B) which are frozen. Only tiny scaling vectors are trained per layer.
- Tied-LoRA: ALL layers share the same matrices (A and B) which ARE trained. Only a single scaling factor is trained per layer.
Example parameter comparison for 10 layers of 1000x1000 with rank=8:
- Full fine-tuning: 10,000,000 parameters
- Standard LoRA (rank=8): 160,000 parameters (10 layers × 16,000 params each)
- Tied-LoRA (rank=8): ~16,010 parameters (shared 16,000 + 10 scaling factors)
Benefits of Tied-LoRA:
- ✅ Extreme parameter efficiency for deep networks (scales with depth)
- ✅ Shared matrices enforce consistency across layers
- ✅ Still trainable (unlike VeRA's frozen matrices)
- ✅ Very low memory footprint
- ✅ Faster training (fewer parameters to update)
Trade-offs:
- ⚠️ Less flexible than standard LoRA (shared adaptation across layers)
- ⚠️ Assumes layers benefit from similar adaptations
- ⚠️ May underperform standard LoRA on heterogeneous architectures
When to use Tied-LoRA:
- Very deep networks (transformers with many similar layers)
- Extreme memory constraints
- When layers have similar structure and function
- Rapid prototyping with minimal parameter overhead
- Fine-tuning massive models (GPT, BERT-style architectures)
Research insight: Tied-LoRA works well because in deep networks, many layers learn similar transformations. By sharing the LoRA matrices and only varying the strength per layer, we capture most of the adaptation capability with minimal parameters.
Constructors
TiedLoRAAdapter(ILayer<T>, int, int, double, bool)
Initializes a new Tied-LoRA adapter wrapping an existing layer.
public TiedLoRAAdapter(ILayer<T> baseLayer, int rank, int layerIndex = 0, double alpha = -1, bool freezeBaseLayer = true)
Parameters
baseLayerILayer<T>The layer to adapt with Tied-LoRA.
rankintThe rank of the low-rank decomposition (shared across all Tied-LoRA layers).
layerIndexintThe index of this layer in the network (for tracking and debugging).
alphadoubleThe scaling factor (defaults to rank if negative).
freezeBaseLayerboolWhether to freeze the base layer's parameters during training.
Remarks
Before creating any Tied-LoRA adapters, you must call InitializeSharedMatrices() once to set up the shared trainable matrices that all Tied-LoRA layers will use.
For Beginners: This creates a Tied-LoRA adapter for a layer. You must initialize the shared matrices first by calling:
TiedLoRAAdapter<T>.InitializeSharedMatrices(inputSize, outputSize, rank);
This needs to be done once before creating any Tied-LoRA adapters.
Parameters:
- baseLayer: The layer you want to adapt
- rank: How much compression (lower = fewer parameters)
- layerIndex: Which layer this is (0, 1, 2, etc.) for tracking
- alpha: How strong the Tied-LoRA adaptation is
- freezeBaseLayer: Whether to lock the original layer's weights (usually true)
The layerIndex helps identify which layer this adapter belongs to, which is useful for debugging and understanding how different layers use the shared adaptation.
Exceptions
- ArgumentNullException
Thrown when baseLayer is null.
- ArgumentException
Thrown when rank is invalid or shared matrices are not initialized.
Properties
AreSharedMatricesInitialized
Gets whether the shared matrices have been initialized.
public static bool AreSharedMatricesInitialized { get; }
Property Value
LayerIndex
Gets the layer index.
public int LayerIndex { get; }
Property Value
LayerScaling
Gets the layer-specific scaling factor.
public double LayerScaling { get; }
Property Value
ParameterCount
Gets the total number of trainable parameters.
public override int ParameterCount { get; }
Property Value
Remarks
Tied-LoRA only trains a single scaling factor per layer (plus the base layer if not frozen). The shared matrices contribute to the parameter count only once across all layers.
Methods
Backward(Tensor<T>)
Performs the backward pass through the Tied-LoRA adapter.
public override Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>Gradient flowing back from the next layer.
Returns
- Tensor<T>
Gradient to pass to the previous layer.
Remarks
The backward pass computes gradients for: 1. Layer-specific scaling factor (local to this layer) 2. Shared matrices A and B (accumulated across all layers)
For Beginners: This is where Tied-LoRA learns! During backpropagation: 1. Compute gradient for this layer's scaling factor 2. Accumulate gradients for shared matrices A and B (these are summed across all layers) 3. Update base layer if not frozen 4. Pass gradients back to earlier layers
The shared matrices are updated once after all layers have computed their gradients, using the accumulated gradients from all layers.
CreateLoRALayer(int, double)
Creates a Tied-LoRA-specific layer (not used since Tied-LoRA doesn't use standard LoRALayer).
protected override LoRALayer<T> CreateLoRALayer(int rank, double alpha)
Parameters
Returns
- LoRALayer<T>
Forward(Tensor<T>)
Performs the forward pass through the Tied-LoRA adapter.
public override Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>Input tensor.
Returns
- Tensor<T>
Sum of base layer output and Tied-LoRA output.
Remarks
The Tied-LoRA forward pass computes: output = base_layer(input) + layerScaling * (B_shared * A_shared * input) * (alpha/rank)
For Beginners: This processes input through both the original layer and the Tied-LoRA adaptation: 1. Base layer processes the input (original behavior) 2. Tied-LoRA computes: input → A_shared (trainable) → B_shared (trainable) → layerScaling 3. The outputs are added together
The key difference from standard LoRA: A and B are shared across all layers and ARE trained, but each layer only has one trainable parameter (layerScaling) to control the strength!
GetParameters()
Gets the current parameters as a vector.
public override Vector<T> GetParameters()
Returns
- Vector<T>
Vector containing parameters (layer scaling factor only, or base + scaling if base not frozen).
InitializeSharedMatrices(int, int, int, int?)
Initializes the shared trainable matrices used by all Tied-LoRA adapters.
public static void InitializeSharedMatrices(int inputSize, int outputSize, int rank, int? seed = null)
Parameters
inputSizeintThe input dimension for the layers.
outputSizeintThe output dimension for the layers.
rankintThe rank of the low-rank decomposition.
seedint?Optional random seed for reproducibility.
Remarks
This method must be called once before creating any Tied-LoRA adapters. It initializes the shared matrices A and B with random values that will be trained during fine-tuning.
The shared matrices are initialized with Gaussian random values similar to Kaiming initialization for matrix A, and zeros for matrix B (so Tied-LoRA starts with no effect).
For Beginners: Call this once at the start before creating any Tied-LoRA layers:
// Initialize shared trainable matrices (do this once) TiedLoRAAdapter<double>.InitializeSharedMatrices(inputSize: 784, outputSize: 128, rank: 8);
// Now create Tied-LoRA adapters (they will use the shared matrices) var adapter1 = new TiedLoRAAdapter<double>(layer1, rank: 8, layerIndex: 0); var adapter2 = new TiedLoRAAdapter<double>(layer2, rank: 8, layerIndex: 1);
All adapters share the same A and B matrices, but each has its own scaling factor! During training, the shared matrices learn the common adaptation pattern, while each layer's scaling factor controls how much to use that pattern.
MergeToOriginalLayer()
Merges the Tied-LoRA adaptation into the base layer and returns the merged layer.
public override ILayer<T> MergeToOriginalLayer()
Returns
- ILayer<T>
A new layer with Tied-LoRA weights merged into the base layer's weights.
Remarks
This computes the full weight contribution from Tied-LoRA: W_tied = layerScaling * (B_shared * A_shared) * (alpha/rank) and adds it to the base layer's weights.
For Beginners: This "bakes in" the Tied-LoRA adaptation for deployment. After training, you can merge the adaptation into the original weights for faster inference. The merged layer will behave identically but without the Tied-LoRA overhead.
Each layer gets a different merged result because the layer-specific scaling factor modulates how much of the shared adaptation is applied to that layer.
ResetSharedGradients()
Resets the accumulated gradients for the shared matrices. Should be called after each optimization step.
public static void ResetSharedGradients()
ResetSharedMatrices()
Resets the shared matrices and gradients (useful for testing or reinitializing).
public static void ResetSharedMatrices()
ResetState()
Resets the internal state of the Tied-LoRA adapter.
public override void ResetState()
SetParameters(Vector<T>)
Sets the layer parameters from a vector.
public override void SetParameters(Vector<T> parameters)
Parameters
parametersVector<T>Vector containing parameters.
UpdateParameters(T)
Updates parameters using the specified learning rate.
public override void UpdateParameters(T learningRate)
Parameters
learningRateTThe learning rate for parameter updates.
Remarks
Tied-LoRA updates the layer-specific scaling factor locally, but shared matrices must be updated separately using UpdateSharedMatrices() after all layers have performed their backward pass.
For Beginners: This updates only the layer-specific scaling factor. The shared matrices A and B need to be updated separately after all layers finish their backward pass, because they accumulate gradients from all layers.
UpdateSharedMatrices(T)
Updates the shared matrices using accumulated gradients. Should be called once after all layers have performed backward pass.
public static void UpdateSharedMatrices(T learningRate)
Parameters
learningRateTThe learning rate for parameter updates.