Class LoRETTAAdapter<T>

Namespace: AiDotNet.LoRA.Adapters

Assembly: AiDotNet.dll

LoRETTA (Low-Rank Economic Tensor-Train Adaptation) adapter for parameter-efficient fine-tuning.

public class LoRETTAAdapter<T> : LoRAAdapterBase<T>, IDisposable, ILoRAAdapter<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>

Type Parameters

T: The numeric type used for calculations, typically float or double.

Inheritance: object

LayerBase<T>

LoRAAdapterBase<T>

LoRETTAAdapter<T>

Implements: IDisposable

ILoRAAdapter<T>

ILayer<T>

IJitCompilable<T>

IDiagnosticsProvider

IWeightLoadable<T>

Inherited Members: LoRAAdapterBase<T>._baseLayer

LoRAAdapterBase<T>._loraLayer

LoRAAdapterBase<T>._freezeBaseLayer

LoRAAdapterBase<T>.BaseLayer

LoRAAdapterBase<T>.LoRALayer

LoRAAdapterBase<T>.IsBaseLayerFrozen

LoRAAdapterBase<T>.Rank

LoRAAdapterBase<T>.Alpha

LoRAAdapterBase<T>.SupportsTraining

LoRAAdapterBase<T>.CreateLoRALayer(int, double)

LoRAAdapterBase<T>.CreateMergedLayerWithClone(Vector<T>)

LoRAAdapterBase<T>.MergeToDenseOrFullyConnected()

LoRAAdapterBase<T>.UpdateParametersFromLayers()

LoRAAdapterBase<T>.SupportsJitCompilation

LoRAAdapterBase<T>.ExportComputationGraph(List<ComputationNode<T>>)

LayerBase<T>.Engine

LayerBase<T>.ScalarActivation

LayerBase<T>.VectorActivation

LayerBase<T>.UsingVectorActivation

LayerBase<T>.NumOps

LayerBase<T>.Random

LayerBase<T>.Parameters

LayerBase<T>.ParameterGradients

LayerBase<T>.InputShape

LayerBase<T>.InputShapes

LayerBase<T>.UpdateInputShape(int[])

LayerBase<T>.OutputShape

LayerBase<T>.IsTrainingMode

LayerBase<T>.InitializationStrategy

LayerBase<T>.IsInitialized

LayerBase<T>.InitializationLock

LayerBase<T>.EnsureInitialized()

LayerBase<T>.UseAutodiff

LayerBase<T>.SetTrainingMode(bool)

LayerBase<T>.GetParameterGradients()

LayerBase<T>.ClearGradients()

LayerBase<T>.GetInputShape()

LayerBase<T>.GetInputShapes()

LayerBase<T>.GetOutputShape()

LayerBase<T>.GetWeights()

LayerBase<T>.GetBiases()

LayerBase<T>.MapActivationToFused()

LayerBase<T>.SupportsGpuExecution

LayerBase<T>.SupportsGpuTraining

LayerBase<T>.CanExecuteOnGpu

LayerBase<T>.CanTrainOnGpu

LayerBase<T>.ForwardGpu(params IGpuTensor<T>[])

LayerBase<T>.BackwardGpu(IGpuTensor<T>)

LayerBase<T>.UpdateParametersGpu(IGpuOptimizerConfig)

LayerBase<T>.UploadWeightsToGpu()

LayerBase<T>.DownloadWeightsFromGpu()

LayerBase<T>.ZeroGradientsGpu()

LayerBase<T>.GetActivationTypes()

LayerBase<T>.Forward(params Tensor<T>[])

LayerBase<T>.ApplyActivation(Tensor<T>)

LayerBase<T>.ApplyActivation(Vector<T>)

LayerBase<T>.ActivateTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ActivateTensor(IVectorActivationFunction<T>, Tensor<T>)

LayerBase<T>.CalculateInputShape(int, int, int)

LayerBase<T>.CalculateOutputShape(int, int, int)

LayerBase<T>.Clone()

LayerBase<T>.DerivativeTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ApplyActivationDerivative(T, T)

LayerBase<T>.ApplyActivationDerivative(Tensor<T>, Tensor<T>)

LayerBase<T>.ComputeActivationJacobian(Vector<T>)

LayerBase<T>.ApplyActivationDerivative(Vector<T>, Vector<T>)

LayerBase<T>.UpdateParameters(Vector<T>)

LayerBase<T>.Serialize(BinaryWriter)

LayerBase<T>.Deserialize(BinaryReader)

LayerBase<T>.GetDiagnostics()

LayerBase<T>.ApplyActivationToGraph(ComputationNode<T>)

LayerBase<T>.CanActivationBeJitted()

LayerBase<T>.RegisterTrainableParameter(Tensor<T>, PersistentTensorRole)

LayerBase<T>.InvalidateTrainableParameter(Tensor<T>)

LayerBase<T>.HasGpuActivation()

LayerBase<T>.ApplyActivationForwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.ApplyActivationBackwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.GetFusedActivationType()

LayerBase<T>.ApplyGpuActivation(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int, FusedActivationType)

LayerBase<T>.ApplyGpuActivationBackward(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int, FusedActivationType, float)

LayerBase<T>.Dispose()

LayerBase<T>.Dispose(bool)

LayerBase<T>.WeightParameterName

LayerBase<T>.BiasParameterName

LayerBase<T>.SetWeights(Tensor<T>)

LayerBase<T>.SetBiases(Tensor<T>)

LayerBase<T>.GetParameterNames()

LayerBase<T>.TryGetParameter(string, out Tensor<T>)

LayerBase<T>.SetParameter(string, Tensor<T>)

LayerBase<T>.GetParameterShape(string)

LayerBase<T>.NamedParameterCount

LayerBase<T>.ValidateWeights(IEnumerable<string>, Func<string, string>)

LayerBase<T>.LoadWeights(Dictionary<string, Tensor<T>>, Func<string, string>, bool)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

LoRETTA extends LoRA by using tensor-train decomposition instead of simple matrix factorization. Instead of representing weight updates as W = A × B, LoRETTA uses a tensor-train decomposition that captures higher-order correlations with even fewer parameters.

Tensor-train decomposition represents a high-dimensional tensor as a sequence of lower-dimensional "cores" that are contracted together. For a weight matrix W of size (m × n), the tensor-train representation is:

W[i,j] = G1[i] × G2 × G3 × ... × Gd[j]

where each core Gk has dimensions (r_{k-1} × n_k × r_k), and r_k are the TT-ranks. The boundary ranks are r_0 = r_d = 1.

For Beginners: LoRETTA is an advanced version of LoRA that uses "tensor-train decomposition"!

Standard LoRA uses two matrices (A and B) to approximate weight changes:

Matrix A: Compresses input to rank dimensions
Matrix B: Expands back to output dimensions
Parameters: inputSize × rank + rank × outputSize

LoRETTA uses multiple small "cores" chained together:

Instead of 2 large matrices, use many small tensors
Each core captures local correlations
The cores are "contracted" (multiplied in sequence)
Can express more complex patterns with fewer parameters

Why tensor-train decomposition?

More expressive: Can capture higher-order correlations
More efficient: Fewer parameters than matrix factorization
Better compression: Exploits structure in weight updates
Scalable: Grows logarithmically with dimensions

Example parameter counts for 1000×1000 layer:

Full update: 1,000,000 parameters
Standard LoRA (rank=8): 16,000 parameters (98.4% reduction)
LoRETTA (rank=4, 3 cores): ~6,000 parameters (99.4% reduction, even better!)

Key parameters:

ttRank: Controls compression (like LoRA's rank but more powerful)
numCores: How many tensor cores in the chain (typically 3-5)
alpha: Scaling factor for the adaptation strength

When to use LoRETTA:

Maximum parameter efficiency needed
Weight updates have higher-order structure
You have very large layers to adapt
Standard LoRA isn't expressive enough at low ranks

Reference: Tensor-train decomposition: I. V. Oseledets, "Tensor-train decomposition," SIAM J. Scientific Computing, 2011.

Constructors

LoRETTAAdapter(ILayer<T>, int, int, double, bool)

Initializes a new LoRETTA adapter wrapping an existing layer.

public LoRETTAAdapter(ILayer<T> baseLayer, int ttRank, int numCores = 3, double alpha = -1, bool freezeBaseLayer = true)

Parameters

baseLayer ILayer<T>: The layer to adapt with LoRETTA.
ttRank int: The rank of the tensor-train decomposition.
numCores int: Number of cores in the tensor-train (default: 3).
alpha double: The LoRA scaling factor (defaults to ttRank if negative).
freezeBaseLayer bool: Whether to freeze the base layer's parameters during training.

Remarks

For Beginners: This creates a LoRETTA adapter that wraps any layer.

Parameters:

baseLayer: The layer you want to adapt efficiently
ttRank: Controls compression (lower = fewer parameters, less flexibility)
numCores: How many tensor cores to use (more cores = more expressive but more params)
alpha: How strong the adaptation is
freezeBaseLayer: Whether to lock the original layer's weights (usually true)

The cores are initialized carefully:

First and last cores connect to input/output dimensions
Middle cores have uniform shapes
All cores start with small random values (Gaussian initialization)
Designed so initial LoRETTA has minimal effect

Recommended settings:

ttRank=4 to 8: Good balance of efficiency and expressiveness
numCores=3: Standard choice (input core, middle core, output core)
numCores=4-5: For very large layers or complex adaptations

Exceptions

ArgumentNullException: Thrown when baseLayer is null.
ArgumentException: Thrown when ttRank or numCores are invalid.

Properties

NumCores

Gets the number of cores in the tensor-train.

public int NumCores { get; }

Property Value

int

ParameterCount

Gets the total number of trainable parameters in the tensor-train cores.

public override int ParameterCount { get; }

Property Value

int

Remarks

The total parameters is the sum of all core sizes: sum_k (ttRanks[k-1] × coreShapes[k] × ttRanks[k])

This is typically much smaller than standard LoRA for the same expressiveness.

TTRank

Gets the tensor-train rank.

public int TTRank { get; }

Property Value

int

Remarks

This is the maximum rank in the tensor-train decomposition. Lower rank means more compression but less expressiveness.

Methods

Backward(Tensor<T>)

Performs the backward pass through the LoRETTA adapter.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>: Gradient flowing back from the next layer.

Returns

Tensor<T>: Gradient to pass to the previous layer.

Remarks

The backward pass computes gradients for all TT cores and propagates gradients back through the tensor-train contraction.

For Beginners: This is where learning happens for LoRETTA!

The backward pass:

Backpropagate through base layer
Backpropagate through tensor-train cores
Compute gradients for each core
Combine input gradients from both paths

This is more complex than standard LoRA because we need to backpropagate through multiple cores, but the principle is the same: figure out how each parameter contributed to the error.

Forward(Tensor<T>)

Performs the forward pass through the LoRETTA adapter.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>: Input tensor.

Returns

Tensor<T>: Sum of base layer output and LoRETTA output.

Remarks

The forward pass computes the tensor-train contraction to produce the adaptation, then adds it to the base layer output.

For Beginners: This processes input through both the original layer and the LoRETTA adaptation, then combines them.

The LoRETTA forward pass:

Forward through base layer (original behavior)
Contract tensor-train cores with input (compute adaptation)
Add base output + adaptation output

The tensor contraction is done sequentially through the cores, which is efficient even though it looks complex mathematically.

GetParameterEfficiencyMetrics()

Gets parameter efficiency metrics for this LoRETTA adapter.

public string GetParameterEfficiencyMetrics()

Returns

string: A formatted string with parameter efficiency statistics.

Remarks

For Beginners: This shows how efficient LoRETTA is compared to alternatives.

The metrics include:

Total parameters in base layer (what full fine-tuning would require)
LoRETTA parameters (what you actually train)
Equivalent LoRA parameters (for comparison)
Parameter reduction percentage
Compression ratio

These numbers help you understand the efficiency gains from using LoRETTA!

GetParameters()

Gets the current parameters as a vector.

public override Vector<T> GetParameters()

Returns

Vector<T>: Vector containing parameters.

MergeToOriginalLayer()

Merges the LoRETTA adaptation into the base layer and returns the merged layer.

public override ILayer<T> MergeToOriginalLayer()

Returns

ILayer<T>: A new layer with LoRETTA weights merged into the base layer's weights.

Remarks

For Beginners: This "bakes in" your LoRETTA adaptation to create a regular layer.

After training:

Contract all TT cores to form a full weight matrix
Add this matrix to the base layer's weights
Create a new layer with the merged weights

The result is a standard layer that behaves like your adapted model but:

Faster inference (no tensor-train contraction needed)
Simpler deployment (single layer instead of adapter)
Compatible with any framework

The tensor-train cores are contracted to form a full weight update matrix, which is then added to the original weights.

Exceptions

InvalidOperationException: Thrown when the base layer type is not supported.

ResetState()

Resets the internal state of the adapter.

public override void ResetState()

SetParameters(Vector<T>)

Sets the layer parameters from a vector.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>: Vector containing parameters.

UpdateParameters(T)

Updates parameters using the specified learning rate.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T: The learning rate for parameter updates.

Remarks

For Beginners: This applies the gradients to update the TT cores.

For each core:

Get the gradient computed during backpropagation
Update: core_new = core_old - learningRate × gradient
Update base layer if not frozen

This is conceptually the same as standard gradient descent, but applied to the tensor-train cores instead of weight matrices.

Table of Contents

Class LoRETTAAdapter<T>

Type Parameters

Remarks

Constructors

LoRETTAAdapter(ILayer<T>, int, int, double, bool)

Parameters

Remarks

Exceptions

Properties

NumCores

Property Value

ParameterCount

Property Value

Remarks

TTRank

Property Value

Remarks

Methods

Backward(Tensor<T>)

Parameters

Returns

Remarks

Forward(Tensor<T>)

Parameters

Returns

Remarks

GetParameterEfficiencyMetrics()

Returns

Remarks

GetParameters()

Returns

MergeToOriginalLayer()

Returns

Remarks

Exceptions

ResetState()

SetParameters(Vector<T>)

Parameters

UpdateParameters(T)

Parameters

Remarks