Class VeRAAdapter<T>

Namespace: AiDotNet.LoRA.Adapters

Assembly: AiDotNet.dll

VeRA (Vector-based Random Matrix Adaptation) adapter - an extreme parameter-efficient variant of LoRA.

public class VeRAAdapter<T> : LoRAAdapterBase<T>, IDisposable, ILoRAAdapter<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>

Type Parameters

T: The numeric type used for calculations, typically float or double.

Inheritance: object

LayerBase<T>

LoRAAdapterBase<T>

VeRAAdapter<T>

Implements: IDisposable

ILoRAAdapter<T>

ILayer<T>

IJitCompilable<T>

IDiagnosticsProvider

IWeightLoadable<T>

Inherited Members: LoRAAdapterBase<T>._baseLayer

LoRAAdapterBase<T>._loraLayer

LoRAAdapterBase<T>._freezeBaseLayer

LoRAAdapterBase<T>.BaseLayer

LoRAAdapterBase<T>.LoRALayer

LoRAAdapterBase<T>.IsBaseLayerFrozen

LoRAAdapterBase<T>.Rank

LoRAAdapterBase<T>.Alpha

LoRAAdapterBase<T>.SupportsTraining

LoRAAdapterBase<T>.CreateMergedLayerWithClone(Vector<T>)

LoRAAdapterBase<T>.MergeToDenseOrFullyConnected()

LoRAAdapterBase<T>.SupportsJitCompilation

LoRAAdapterBase<T>.ExportComputationGraph(List<ComputationNode<T>>)

LayerBase<T>.Engine

LayerBase<T>.ScalarActivation

LayerBase<T>.VectorActivation

LayerBase<T>.UsingVectorActivation

LayerBase<T>.NumOps

LayerBase<T>.Random

LayerBase<T>.Parameters

LayerBase<T>.ParameterGradients

LayerBase<T>.InputShape

LayerBase<T>.InputShapes

LayerBase<T>.UpdateInputShape(int[])

LayerBase<T>.OutputShape

LayerBase<T>.IsTrainingMode

LayerBase<T>.InitializationStrategy

LayerBase<T>.IsInitialized

LayerBase<T>.InitializationLock

LayerBase<T>.EnsureInitialized()

LayerBase<T>.UseAutodiff

LayerBase<T>.SetTrainingMode(bool)

LayerBase<T>.GetParameterGradients()

LayerBase<T>.ClearGradients()

LayerBase<T>.GetInputShape()

LayerBase<T>.GetInputShapes()

LayerBase<T>.GetOutputShape()

LayerBase<T>.GetWeights()

LayerBase<T>.GetBiases()

LayerBase<T>.MapActivationToFused()

LayerBase<T>.SupportsGpuExecution

LayerBase<T>.SupportsGpuTraining

LayerBase<T>.CanExecuteOnGpu

LayerBase<T>.CanTrainOnGpu

LayerBase<T>.ForwardGpu(params IGpuTensor<T>[])

LayerBase<T>.BackwardGpu(IGpuTensor<T>)

LayerBase<T>.UpdateParametersGpu(IGpuOptimizerConfig)

LayerBase<T>.UploadWeightsToGpu()

LayerBase<T>.DownloadWeightsFromGpu()

LayerBase<T>.ZeroGradientsGpu()

LayerBase<T>.GetActivationTypes()

LayerBase<T>.Forward(params Tensor<T>[])

LayerBase<T>.ApplyActivation(Tensor<T>)

LayerBase<T>.ApplyActivation(Vector<T>)

LayerBase<T>.ActivateTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ActivateTensor(IVectorActivationFunction<T>, Tensor<T>)

LayerBase<T>.CalculateInputShape(int, int, int)

LayerBase<T>.CalculateOutputShape(int, int, int)

LayerBase<T>.Clone()

LayerBase<T>.DerivativeTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ApplyActivationDerivative(T, T)

LayerBase<T>.ApplyActivationDerivative(Tensor<T>, Tensor<T>)

LayerBase<T>.ComputeActivationJacobian(Vector<T>)

LayerBase<T>.ApplyActivationDerivative(Vector<T>, Vector<T>)

LayerBase<T>.UpdateParameters(Vector<T>)

LayerBase<T>.Serialize(BinaryWriter)

LayerBase<T>.Deserialize(BinaryReader)

LayerBase<T>.GetDiagnostics()

LayerBase<T>.ApplyActivationToGraph(ComputationNode<T>)

LayerBase<T>.CanActivationBeJitted()

LayerBase<T>.RegisterTrainableParameter(Tensor<T>, PersistentTensorRole)

LayerBase<T>.InvalidateTrainableParameter(Tensor<T>)

LayerBase<T>.HasGpuActivation()

LayerBase<T>.ApplyActivationForwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.ApplyActivationBackwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.GetFusedActivationType()

LayerBase<T>.ApplyGpuActivation(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int, FusedActivationType)

LayerBase<T>.ApplyGpuActivationBackward(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int, FusedActivationType, float)

LayerBase<T>.Dispose()

LayerBase<T>.Dispose(bool)

LayerBase<T>.WeightParameterName

LayerBase<T>.BiasParameterName

LayerBase<T>.SetWeights(Tensor<T>)

LayerBase<T>.SetBiases(Tensor<T>)

LayerBase<T>.GetParameterNames()

LayerBase<T>.TryGetParameter(string, out Tensor<T>)

LayerBase<T>.SetParameter(string, Tensor<T>)

LayerBase<T>.GetParameterShape(string)

LayerBase<T>.NamedParameterCount

LayerBase<T>.ValidateWeights(IEnumerable<string>, Func<string, string>)

LayerBase<T>.LoadWeights(Dictionary<string, Tensor<T>>, Func<string, string>, bool)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

VeRA achieves 10x fewer trainable parameters than standard LoRA by: - Using a single pair of random low-rank matrices (A and B) shared across ALL layers - Freezing these shared matrices (they are never trained) - Training only small scaling vectors (d and b) that are specific to each layer

The forward computation is: output = base_layer(input) + d * (B * A * input) * b where d and b are trainable vectors, and A and B are frozen shared matrices.

For Beginners: VeRA is an ultra-efficient version of LoRA for extreme memory constraints.

Think of the difference this way:

Standard LoRA: Each layer has its own pair of small matrices (A and B) that are trained
VeRA: ALL layers share the same random matrices (A and B) which are frozen. Only tiny scaling vectors are trained per layer.

Example parameter comparison for a 1000x1000 layer with rank=8:

Full fine-tuning: 1,000,000 parameters
Standard LoRA (rank=8): 16,000 parameters (98.4% reduction)
VeRA (rank=8): ~1,600 parameters (99.84% reduction) - 10x fewer than LoRA!

Trade-offs:

✅ Extreme parameter efficiency (10x fewer than LoRA)
✅ Very low memory footprint
✅ Shared matrices reduce storage when adapting many layers
⚠️ Slightly less flexible than standard LoRA (shared random projection)
⚠️ Performance may be marginally lower than LoRA in some cases

When to use VeRA:

Extreme memory constraints (mobile, edge devices)
Fine-tuning many layers with limited resources
Rapid prototyping with minimal parameter overhead
When LoRA is still too expensive

Constructors

VeRAAdapter(ILayer<T>, int, double, bool)

Initializes a new VeRA adapter wrapping an existing layer.

public VeRAAdapter(ILayer<T> baseLayer, int rank, double alpha = -1, bool freezeBaseLayer = true)

Parameters

baseLayer ILayer<T>: The layer to adapt with VeRA.
rank int: The rank of the low-rank decomposition (shared across all VeRA layers).
alpha double: The scaling factor (defaults to rank if negative).
freezeBaseLayer bool: Whether to freeze the base layer's parameters during training.

Remarks

Before creating any VeRA adapters, you must call InitializeSharedMatrices() once to set up the shared random matrices that all VeRA layers will use.

For Beginners: This creates a VeRA adapter for a layer. Unlike standard LoRA, you must initialize the shared random matrices first by calling:

VeRAAdapter<T>.InitializeSharedMatrices(inputSize, outputSize, rank);

This needs to be done once before creating any VeRA adapters.

Parameters:

baseLayer: The layer you want to adapt
rank: How much compression (lower = fewer parameters)
alpha: How strong the VeRA adaptation is
freezeBaseLayer: Whether to lock the original layer's weights (usually true)

Exceptions

ArgumentNullException: Thrown when baseLayer is null.
ArgumentException: Thrown when rank is invalid or shared matrices are not initialized.

Properties

AreSharedMatricesInitialized

Gets whether the shared matrices have been initialized.

public static bool AreSharedMatricesInitialized { get; }

Property Value

bool

ParameterCount

Gets the total number of trainable parameters (only the scaling vectors d and b).

public override int ParameterCount { get; }

Property Value

int

Remarks

VeRA only trains the scaling vectors, not the shared matrices. For a layer with outputSize and rank r, this is: outputSize + rank. This is typically 10x fewer parameters than standard LoRA.

Methods

Backward(Tensor<T>)

Performs the backward pass through the VeRA adapter.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>: Gradient flowing back from the next layer.

Returns

Tensor<T>: Gradient to pass to the previous layer.

Remarks

The backward pass computes gradients ONLY for the scaling vectors d and b. The shared matrices A and B remain frozen and are never updated.

For Beginners: This is where VeRA learns! During backpropagation: 1. Compute gradients for scaling vectors d and b (these are trained) 2. Shared matrices A and B are NOT updated (they stay frozen) 3. Pass gradients back to earlier layers

This is why VeRA is so efficient - we only train tiny scaling vectors!

CreateLoRALayer(int, double)

Creates a VeRA-specific layer (not used since VeRA doesn't use LoRALayer).

protected override LoRALayer<T> CreateLoRALayer(int rank, double alpha)

Parameters

rank int
alpha double

Returns

LoRALayer<T>

Remarks

VeRA doesn't use the standard LoRALayer, so this creates a dummy layer. The actual VeRA computation is handled in Forward() and Backward() methods.

Forward(Tensor<T>)

Performs the forward pass through the VeRA adapter.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>: Input tensor.

Returns

Tensor<T>: Sum of base layer output and VeRA output.

Remarks

The VeRA forward pass computes: output = base_layer(input) + d * (B * A * input) * b * scaling where d and b are trainable scaling vectors, A and B are frozen shared matrices, and scaling = alpha/rank.

For Beginners: This processes input through both the original layer and the VeRA adaptation: 1. Base layer processes the input (original behavior) 2. VeRA computes: input → A (shared) → b (scale) → B (shared) → d (scale) 3. The outputs are added together

The key difference from standard LoRA: A and B are shared and frozen, only d and b are trained!

GetParameters()

Gets the current parameters as a vector (scaling vectors only).

public override Vector<T> GetParameters()

Returns

Vector<T>: Vector containing VeRA parameters (d and b vectors).

InitializeSharedMatrices(int, int, int, int?)

Initializes the shared random matrices used by all VeRA adapters.

public static void InitializeSharedMatrices(int inputSize, int outputSize, int rank, int? seed = null)

Parameters

inputSize int: The input dimension for the layers.
outputSize int: The output dimension for the layers.
rank int: The rank of the low-rank decomposition.
seed int?: Optional random seed for reproducibility.

Remarks

This method must be called once before creating any VeRA adapters. It initializes the shared matrices A and B with random values that are frozen (never trained).

The shared matrices are initialized with Gaussian random values similar to Kaiming initialization. Once initialized, they remain frozen and are shared across all VeRA adapters with matching dimensions.

For Beginners: Call this once at the start before creating any VeRA layers:

// Initialize shared random matrices (do this once) VeRAAdapter<double>.InitializeSharedMatrices(inputSize: 784, outputSize: 128, rank: 8);

// Now create VeRA adapters (they will use the shared matrices) var adapter1 = new VeRAAdapter<double>(layer1, rank: 8); var adapter2 = new VeRAAdapter<double>(layer2, rank: 8);

All adapters share the same random A and B matrices, saving memory!

MergeToOriginalLayer()

Merges the VeRA adaptation into the base layer and returns the merged layer.

public override ILayer<T> MergeToOriginalLayer()

Returns

ILayer<T>: A new layer with VeRA weights merged into the base layer's weights.

Remarks

This computes the full weight contribution from VeRA: W_vera = d * B * A * b * scaling, and adds it to the base layer's weights.

For Beginners: This "bakes in" the VeRA adaptation for deployment. After training, you can merge the adaptation into the original weights for faster inference. The merged layer will behave identically but without the VeRA overhead.

ResetSharedMatrices()

Resets the shared matrices (useful for testing or reinitializing).

public static void ResetSharedMatrices()

ResetState()

Resets the internal state of the VeRA adapter.

public override void ResetState()

SetParameters(Vector<T>)

Sets the layer parameters from a vector.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>: Vector containing VeRA parameters.

UpdateParameters(T)

Updates parameters using the specified learning rate.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T: The learning rate for parameter updates.

Remarks

VeRA only updates the scaling vectors d and b. The shared matrices A and B remain frozen.

UpdateParametersFromLayers()

Updates the parameter vector from the current layer states.

protected override void UpdateParametersFromLayers()

Remarks

VeRA overrides this to only copy scaling vectors (d and b), not the full LoRA layer parameters. This is called from the base constructor before scaling vectors are initialized, so we check for null and skip if not ready yet.

Table of Contents

Class VeRAAdapter<T>

Type Parameters

Remarks

Constructors

VeRAAdapter(ILayer<T>, int, double, bool)

Parameters

Remarks

Exceptions

Properties

AreSharedMatricesInitialized

Property Value

ParameterCount

Property Value

Remarks

Methods

Backward(Tensor<T>)

Parameters

Returns

Remarks

CreateLoRALayer(int, double)

Parameters

Returns

Remarks

Forward(Tensor<T>)

Parameters

Returns

Remarks

GetParameters()

Returns

InitializeSharedMatrices(int, int, int, int?)

Parameters

Remarks

MergeToOriginalLayer()

Returns

Remarks

ResetSharedMatrices()

ResetState()

SetParameters(Vector<T>)

Parameters

UpdateParameters(T)

Parameters

Remarks

UpdateParametersFromLayers()

Remarks