Class LoRETTAAdapter<T>
LoRETTA (Low-Rank Economic Tensor-Train Adaptation) adapter for parameter-efficient fine-tuning.
public class LoRETTAAdapter<T> : LoRAAdapterBase<T>, IDisposable, ILoRAAdapter<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
LayerBase<T>LoRETTAAdapter<T>
- Implements
-
ILoRAAdapter<T>ILayer<T>
- Inherited Members
Remarks
LoRETTA extends LoRA by using tensor-train decomposition instead of simple matrix factorization. Instead of representing weight updates as W = A × B, LoRETTA uses a tensor-train decomposition that captures higher-order correlations with even fewer parameters.
Tensor-train decomposition represents a high-dimensional tensor as a sequence of lower-dimensional "cores" that are contracted together. For a weight matrix W of size (m × n), the tensor-train representation is:
W[i,j] = G1[i] × G2 × G3 × ... × Gd[j]
where each core Gk has dimensions (r_{k-1} × n_k × r_k), and r_k are the TT-ranks. The boundary ranks are r_0 = r_d = 1.
For Beginners: LoRETTA is an advanced version of LoRA that uses "tensor-train decomposition"!
Standard LoRA uses two matrices (A and B) to approximate weight changes:
- Matrix A: Compresses input to rank dimensions
- Matrix B: Expands back to output dimensions
- Parameters: inputSize × rank + rank × outputSize
LoRETTA uses multiple small "cores" chained together:
- Instead of 2 large matrices, use many small tensors
- Each core captures local correlations
- The cores are "contracted" (multiplied in sequence)
- Can express more complex patterns with fewer parameters
Why tensor-train decomposition?
- More expressive: Can capture higher-order correlations
- More efficient: Fewer parameters than matrix factorization
- Better compression: Exploits structure in weight updates
- Scalable: Grows logarithmically with dimensions
Example parameter counts for 1000×1000 layer:
- Full update: 1,000,000 parameters
- Standard LoRA (rank=8): 16,000 parameters (98.4% reduction)
- LoRETTA (rank=4, 3 cores): ~6,000 parameters (99.4% reduction, even better!)
Key parameters:
- ttRank: Controls compression (like LoRA's rank but more powerful)
- numCores: How many tensor cores in the chain (typically 3-5)
- alpha: Scaling factor for the adaptation strength
When to use LoRETTA:
- Maximum parameter efficiency needed
- Weight updates have higher-order structure
- You have very large layers to adapt
- Standard LoRA isn't expressive enough at low ranks
Reference: Tensor-train decomposition: I. V. Oseledets, "Tensor-train decomposition," SIAM J. Scientific Computing, 2011.
Constructors
LoRETTAAdapter(ILayer<T>, int, int, double, bool)
Initializes a new LoRETTA adapter wrapping an existing layer.
public LoRETTAAdapter(ILayer<T> baseLayer, int ttRank, int numCores = 3, double alpha = -1, bool freezeBaseLayer = true)
Parameters
baseLayerILayer<T>The layer to adapt with LoRETTA.
ttRankintThe rank of the tensor-train decomposition.
numCoresintNumber of cores in the tensor-train (default: 3).
alphadoubleThe LoRA scaling factor (defaults to ttRank if negative).
freezeBaseLayerboolWhether to freeze the base layer's parameters during training.
Remarks
For Beginners: This creates a LoRETTA adapter that wraps any layer.
Parameters:
- baseLayer: The layer you want to adapt efficiently
- ttRank: Controls compression (lower = fewer parameters, less flexibility)
- numCores: How many tensor cores to use (more cores = more expressive but more params)
- alpha: How strong the adaptation is
- freezeBaseLayer: Whether to lock the original layer's weights (usually true)
The cores are initialized carefully:
- First and last cores connect to input/output dimensions
- Middle cores have uniform shapes
- All cores start with small random values (Gaussian initialization)
- Designed so initial LoRETTA has minimal effect
Recommended settings:
- ttRank=4 to 8: Good balance of efficiency and expressiveness
- numCores=3: Standard choice (input core, middle core, output core)
- numCores=4-5: For very large layers or complex adaptations
Exceptions
- ArgumentNullException
Thrown when baseLayer is null.
- ArgumentException
Thrown when ttRank or numCores are invalid.
Properties
NumCores
Gets the number of cores in the tensor-train.
public int NumCores { get; }
Property Value
ParameterCount
Gets the total number of trainable parameters in the tensor-train cores.
public override int ParameterCount { get; }
Property Value
Remarks
The total parameters is the sum of all core sizes: sum_k (ttRanks[k-1] × coreShapes[k] × ttRanks[k])
This is typically much smaller than standard LoRA for the same expressiveness.
TTRank
Gets the tensor-train rank.
public int TTRank { get; }
Property Value
Remarks
This is the maximum rank in the tensor-train decomposition. Lower rank means more compression but less expressiveness.
Methods
Backward(Tensor<T>)
Performs the backward pass through the LoRETTA adapter.
public override Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>Gradient flowing back from the next layer.
Returns
- Tensor<T>
Gradient to pass to the previous layer.
Remarks
The backward pass computes gradients for all TT cores and propagates gradients back through the tensor-train contraction.
For Beginners: This is where learning happens for LoRETTA!
The backward pass:
- Backpropagate through base layer
- Backpropagate through tensor-train cores
- Compute gradients for each core
- Combine input gradients from both paths
This is more complex than standard LoRA because we need to backpropagate through multiple cores, but the principle is the same: figure out how each parameter contributed to the error.
Forward(Tensor<T>)
Performs the forward pass through the LoRETTA adapter.
public override Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>Input tensor.
Returns
- Tensor<T>
Sum of base layer output and LoRETTA output.
Remarks
The forward pass computes the tensor-train contraction to produce the adaptation, then adds it to the base layer output.
For Beginners: This processes input through both the original layer and the LoRETTA adaptation, then combines them.
The LoRETTA forward pass:
- Forward through base layer (original behavior)
- Contract tensor-train cores with input (compute adaptation)
- Add base output + adaptation output
The tensor contraction is done sequentially through the cores, which is efficient even though it looks complex mathematically.
GetParameterEfficiencyMetrics()
Gets parameter efficiency metrics for this LoRETTA adapter.
public string GetParameterEfficiencyMetrics()
Returns
- string
A formatted string with parameter efficiency statistics.
Remarks
For Beginners: This shows how efficient LoRETTA is compared to alternatives.
The metrics include:
- Total parameters in base layer (what full fine-tuning would require)
- LoRETTA parameters (what you actually train)
- Equivalent LoRA parameters (for comparison)
- Parameter reduction percentage
- Compression ratio
These numbers help you understand the efficiency gains from using LoRETTA!
GetParameters()
Gets the current parameters as a vector.
public override Vector<T> GetParameters()
Returns
- Vector<T>
Vector containing parameters.
MergeToOriginalLayer()
Merges the LoRETTA adaptation into the base layer and returns the merged layer.
public override ILayer<T> MergeToOriginalLayer()
Returns
- ILayer<T>
A new layer with LoRETTA weights merged into the base layer's weights.
Remarks
For Beginners: This "bakes in" your LoRETTA adaptation to create a regular layer.
After training:
- Contract all TT cores to form a full weight matrix
- Add this matrix to the base layer's weights
- Create a new layer with the merged weights
The result is a standard layer that behaves like your adapted model but:
- Faster inference (no tensor-train contraction needed)
- Simpler deployment (single layer instead of adapter)
- Compatible with any framework
The tensor-train cores are contracted to form a full weight update matrix, which is then added to the original weights.
Exceptions
- InvalidOperationException
Thrown when the base layer type is not supported.
ResetState()
Resets the internal state of the adapter.
public override void ResetState()
SetParameters(Vector<T>)
Sets the layer parameters from a vector.
public override void SetParameters(Vector<T> parameters)
Parameters
parametersVector<T>Vector containing parameters.
UpdateParameters(T)
Updates parameters using the specified learning rate.
public override void UpdateParameters(T learningRate)
Parameters
learningRateTThe learning rate for parameter updates.
Remarks
For Beginners: This applies the gradients to update the TT cores.
For each core:
- Get the gradient computed during backpropagation
- Update: core_new = core_old - learningRate × gradient
- Update base layer if not frozen
This is conceptually the same as standard gradient descent, but applied to the tensor-train cores instead of weight matrices.