Table of Contents

Class LoRETTAAdapter<T>

Namespace
AiDotNet.LoRA.Adapters
Assembly
AiDotNet.dll

LoRETTA (Low-Rank Economic Tensor-Train Adaptation) adapter for parameter-efficient fine-tuning.

public class LoRETTAAdapter<T> : LoRAAdapterBase<T>, IDisposable, ILoRAAdapter<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>

Type Parameters

T

The numeric type used for calculations, typically float or double.

Inheritance
LoRETTAAdapter<T>
Implements
Inherited Members

Remarks

LoRETTA extends LoRA by using tensor-train decomposition instead of simple matrix factorization. Instead of representing weight updates as W = A × B, LoRETTA uses a tensor-train decomposition that captures higher-order correlations with even fewer parameters.

Tensor-train decomposition represents a high-dimensional tensor as a sequence of lower-dimensional "cores" that are contracted together. For a weight matrix W of size (m × n), the tensor-train representation is:

W[i,j] = G1[i] × G2 × G3 × ... × Gd[j]

where each core Gk has dimensions (r_{k-1} × n_k × r_k), and r_k are the TT-ranks. The boundary ranks are r_0 = r_d = 1.

For Beginners: LoRETTA is an advanced version of LoRA that uses "tensor-train decomposition"!

Standard LoRA uses two matrices (A and B) to approximate weight changes:

  • Matrix A: Compresses input to rank dimensions
  • Matrix B: Expands back to output dimensions
  • Parameters: inputSize × rank + rank × outputSize

LoRETTA uses multiple small "cores" chained together:

  • Instead of 2 large matrices, use many small tensors
  • Each core captures local correlations
  • The cores are "contracted" (multiplied in sequence)
  • Can express more complex patterns with fewer parameters

Why tensor-train decomposition?

  1. More expressive: Can capture higher-order correlations
  2. More efficient: Fewer parameters than matrix factorization
  3. Better compression: Exploits structure in weight updates
  4. Scalable: Grows logarithmically with dimensions

Example parameter counts for 1000×1000 layer:

  • Full update: 1,000,000 parameters
  • Standard LoRA (rank=8): 16,000 parameters (98.4% reduction)
  • LoRETTA (rank=4, 3 cores): ~6,000 parameters (99.4% reduction, even better!)

Key parameters:

  • ttRank: Controls compression (like LoRA's rank but more powerful)
  • numCores: How many tensor cores in the chain (typically 3-5)
  • alpha: Scaling factor for the adaptation strength

When to use LoRETTA:

  • Maximum parameter efficiency needed
  • Weight updates have higher-order structure
  • You have very large layers to adapt
  • Standard LoRA isn't expressive enough at low ranks

Reference: Tensor-train decomposition: I. V. Oseledets, "Tensor-train decomposition," SIAM J. Scientific Computing, 2011.

Constructors

LoRETTAAdapter(ILayer<T>, int, int, double, bool)

Initializes a new LoRETTA adapter wrapping an existing layer.

public LoRETTAAdapter(ILayer<T> baseLayer, int ttRank, int numCores = 3, double alpha = -1, bool freezeBaseLayer = true)

Parameters

baseLayer ILayer<T>

The layer to adapt with LoRETTA.

ttRank int

The rank of the tensor-train decomposition.

numCores int

Number of cores in the tensor-train (default: 3).

alpha double

The LoRA scaling factor (defaults to ttRank if negative).

freezeBaseLayer bool

Whether to freeze the base layer's parameters during training.

Remarks

For Beginners: This creates a LoRETTA adapter that wraps any layer.

Parameters:

  • baseLayer: The layer you want to adapt efficiently
  • ttRank: Controls compression (lower = fewer parameters, less flexibility)
  • numCores: How many tensor cores to use (more cores = more expressive but more params)
  • alpha: How strong the adaptation is
  • freezeBaseLayer: Whether to lock the original layer's weights (usually true)

The cores are initialized carefully:

  • First and last cores connect to input/output dimensions
  • Middle cores have uniform shapes
  • All cores start with small random values (Gaussian initialization)
  • Designed so initial LoRETTA has minimal effect

Recommended settings:

  • ttRank=4 to 8: Good balance of efficiency and expressiveness
  • numCores=3: Standard choice (input core, middle core, output core)
  • numCores=4-5: For very large layers or complex adaptations

Exceptions

ArgumentNullException

Thrown when baseLayer is null.

ArgumentException

Thrown when ttRank or numCores are invalid.

Properties

NumCores

Gets the number of cores in the tensor-train.

public int NumCores { get; }

Property Value

int

ParameterCount

Gets the total number of trainable parameters in the tensor-train cores.

public override int ParameterCount { get; }

Property Value

int

Remarks

The total parameters is the sum of all core sizes: sum_k (ttRanks[k-1] × coreShapes[k] × ttRanks[k])

This is typically much smaller than standard LoRA for the same expressiveness.

TTRank

Gets the tensor-train rank.

public int TTRank { get; }

Property Value

int

Remarks

This is the maximum rank in the tensor-train decomposition. Lower rank means more compression but less expressiveness.

Methods

Backward(Tensor<T>)

Performs the backward pass through the LoRETTA adapter.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>

Gradient flowing back from the next layer.

Returns

Tensor<T>

Gradient to pass to the previous layer.

Remarks

The backward pass computes gradients for all TT cores and propagates gradients back through the tensor-train contraction.

For Beginners: This is where learning happens for LoRETTA!

The backward pass:

  1. Backpropagate through base layer
  2. Backpropagate through tensor-train cores
  3. Compute gradients for each core
  4. Combine input gradients from both paths

This is more complex than standard LoRA because we need to backpropagate through multiple cores, but the principle is the same: figure out how each parameter contributed to the error.

Forward(Tensor<T>)

Performs the forward pass through the LoRETTA adapter.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>

Input tensor.

Returns

Tensor<T>

Sum of base layer output and LoRETTA output.

Remarks

The forward pass computes the tensor-train contraction to produce the adaptation, then adds it to the base layer output.

For Beginners: This processes input through both the original layer and the LoRETTA adaptation, then combines them.

The LoRETTA forward pass:

  1. Forward through base layer (original behavior)
  2. Contract tensor-train cores with input (compute adaptation)
  3. Add base output + adaptation output

The tensor contraction is done sequentially through the cores, which is efficient even though it looks complex mathematically.

GetParameterEfficiencyMetrics()

Gets parameter efficiency metrics for this LoRETTA adapter.

public string GetParameterEfficiencyMetrics()

Returns

string

A formatted string with parameter efficiency statistics.

Remarks

For Beginners: This shows how efficient LoRETTA is compared to alternatives.

The metrics include:

  • Total parameters in base layer (what full fine-tuning would require)
  • LoRETTA parameters (what you actually train)
  • Equivalent LoRA parameters (for comparison)
  • Parameter reduction percentage
  • Compression ratio

These numbers help you understand the efficiency gains from using LoRETTA!

GetParameters()

Gets the current parameters as a vector.

public override Vector<T> GetParameters()

Returns

Vector<T>

Vector containing parameters.

MergeToOriginalLayer()

Merges the LoRETTA adaptation into the base layer and returns the merged layer.

public override ILayer<T> MergeToOriginalLayer()

Returns

ILayer<T>

A new layer with LoRETTA weights merged into the base layer's weights.

Remarks

For Beginners: This "bakes in" your LoRETTA adaptation to create a regular layer.

After training:

  1. Contract all TT cores to form a full weight matrix
  2. Add this matrix to the base layer's weights
  3. Create a new layer with the merged weights

The result is a standard layer that behaves like your adapted model but:

  • Faster inference (no tensor-train contraction needed)
  • Simpler deployment (single layer instead of adapter)
  • Compatible with any framework

The tensor-train cores are contracted to form a full weight update matrix, which is then added to the original weights.

Exceptions

InvalidOperationException

Thrown when the base layer type is not supported.

ResetState()

Resets the internal state of the adapter.

public override void ResetState()

SetParameters(Vector<T>)

Sets the layer parameters from a vector.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

Vector containing parameters.

UpdateParameters(T)

Updates parameters using the specified learning rate.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T

The learning rate for parameter updates.

Remarks

For Beginners: This applies the gradients to update the TT cores.

For each core:

  1. Get the gradient computed during backpropagation
  2. Update: core_new = core_old - learningRate × gradient
  3. Update base layer if not frozen

This is conceptually the same as standard gradient descent, but applied to the tensor-train cores instead of weight matrices.