Table of Contents

Class DVoRAAdapter<T>

Namespace
AiDotNet.LoRA.Adapters
Assembly
AiDotNet.dll

DVoRA (DoRA + VeRA) adapter - combines DoRA's magnitude-direction decomposition with VeRA's extreme parameter efficiency.

public class DVoRAAdapter<T> : LoRAAdapterBase<T>, IDisposable, ILoRAAdapter<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>

Type Parameters

T

The numeric type used for calculations, typically float or double.

Inheritance
DVoRAAdapter<T>
Implements
Inherited Members

Remarks

DVoRA achieves the best of both worlds by: - Applying DoRA's magnitude-direction decomposition for training stability - Using VeRA's shared frozen matrices and scaling vectors for extreme parameter efficiency - Applying the VeRA adaptation only to the direction component (not the magnitude)

Mathematical Formulation: Given pre-trained weights W, DVoRA: 1. Decomposes: W = m * d (magnitude and direction) 2. Applies VeRA to direction: d' = d + d_scale * (B * A * input) * b_scale 3. Normalizes direction: d_norm = d' / ||d'|| 4. Recomposes: W' = m * d_norm

Where:

  • m: magnitude vector (trainable)
  • d: direction matrix (normalized weight vectors)
  • A, B: shared frozen random matrices (VeRA style)
  • d_scale, b_scale: per-layer trainable scaling vectors (VeRA style)

Research Context: DVoRA scores 5.0 vs VeRA's 4.3 (improvement of 16%) while maintaining ultra-low parameter counts. It combines DoRA's superior training stability with VeRA's extreme parameter efficiency.

For Beginners: DVoRA is the ultimate parameter-efficient adapter.

Think of it as a hybrid technique:

  • From DoRA: Separate magnitude (strength) from direction for stability
  • From VeRA: Use shared random matrices and tiny scaling vectors for efficiency
  • The magic: Apply VeRA's adaptation only to the direction, not the magnitude

Parameter comparison for 1000x1000 layer with rank=8:

  • Full fine-tuning: 1,000,000 parameters
  • Standard LoRA: 16,000 parameters (98.4% reduction)
  • DoRA: 17,000 parameters (LoRA + magnitude vector)
  • VeRA: 1,600 parameters (99.84% reduction)
  • DVoRA: ~1,600 parameters (same as VeRA!) but with better performance (5.0 vs 4.3)

Benefits:

  • ✅ Extremely parameter-efficient (10x fewer than standard LoRA, same as VeRA)
  • ✅ Better performance than VeRA alone (5.0 vs 4.3 score)
  • ✅ Training stability from DoRA's magnitude-direction decomposition
  • ✅ Shared matrices reduce storage when adapting many layers
  • ✅ Best choice for extreme memory constraints with quality requirements

Trade-offs:

  • ⚠️ Requires shared matrix initialization before use
  • ⚠️ Slightly more computation than VeRA (due to normalization)
  • ⚠️ More complex than standard adapters (combines two techniques)

When to use DVoRA:

  • Extreme memory constraints but need better quality than VeRA
  • Mobile/edge deployment with limited resources
  • Fine-tuning many layers efficiently
  • When you want the absolute best parameter efficiency + quality balance

References: - DoRA: "Weight-Decomposed Low-Rank Adaptation" (ICML 2024 Oral) - VeRA: "Vector-based Random Matrix Adaptation" - DVoRA: Combines both techniques for optimal efficiency and performance

Constructors

DVoRAAdapter(ILayer<T>, int, double, bool)

public DVoRAAdapter(ILayer<T> baseLayer, int rank, double alpha = -1, bool freezeBaseLayer = true)

Parameters

baseLayer ILayer<T>
rank int
alpha double
freezeBaseLayer bool

Properties

AreSharedMatricesInitialized

Gets whether the shared matrices have been initialized.

public static bool AreSharedMatricesInitialized { get; }

Property Value

bool

ParameterCount

Gets the total number of trainable parameters.

public override int ParameterCount { get; }

Property Value

int

Remarks

DVoRA parameters = base (if unfrozen) + LoRA layer + magnitude (outputSize) + d_scale (outputSize) + b_scale (rank). This is only slightly more than VeRA (adds magnitude vector) but much fewer than DoRA (no full LoRA matrices). Handles pre-initialization state by using fallback values when fields are null.

Methods

Backward(Tensor<T>)

Performs the backward pass through the DVoRA adapter.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>

Gradient flowing back from the next layer.

Returns

Tensor<T>

Gradient to pass to the previous layer.

Remarks

The backward pass computes gradients for: 1. Magnitude parameters (DoRA component, one per output neuron) 2. Scaling vectors d and b (VeRA component, per-layer) 3. Base layer weights (if not frozen)

The shared matrices A and B remain frozen and are never updated.

For Beginners: This is where DVoRA learns! During backpropagation: 1. Compute gradients for magnitude (DoRA learning) 2. Compute gradients for scaling vectors d and b (VeRA learning) 3. Shared matrices A and B stay frozen (VeRA efficiency) 4. Pass gradients back to earlier layers

We only train: magnitude + d + b = very few parameters!

CreateLoRALayer(int, double)

Creates a dummy LoRA layer (not used since DVoRA uses custom logic).

protected override LoRALayer<T> CreateLoRALayer(int rank, double alpha)

Parameters

rank int
alpha double

Returns

LoRALayer<T>

Forward(Tensor<T>)

Performs the forward pass through the DVoRA adapter.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>

Input tensor.

Returns

Tensor<T>

Output combining base layer with DVoRA-adapted weights.

Remarks

The DVoRA forward pass combines DoRA and VeRA: 1. Gets base layer weights W 2. Computes direction: d = W / ||W|| (DoRA) 3. Applies VeRA to direction: d' = d + d_scale * (B * A * input) * b_scale (VeRA) 4. Normalizes adapted direction: d_norm = d' / ||d'|| (DoRA) 5. Recomposes weights: W' = m * d_norm (DoRA) 6. Computes output: y = input @ W'^T

For Beginners: This is where DVoRA combines both techniques:

DoRA part:

  • Split weights into magnitude (strength) and direction
  • Keep magnitude separate, work only with direction

VeRA part:

  • Apply shared random matrices + tiny scaling vectors to the direction

Final step:

  • Normalize the adjusted direction
  • Multiply magnitude back in
  • Use these hybrid-adapted weights for prediction

Result: Stability of DoRA + efficiency of VeRA = best of both worlds!

GetParameters()

Gets the current parameters as a vector.

public override Vector<T> GetParameters()

Returns

Vector<T>

Vector containing all DVoRA parameters (magnitude, d, b).

InitializeSharedMatrices(int, int, int, int?)

public static void InitializeSharedMatrices(int inputSize, int outputSize, int rank, int? seed = null)

Parameters

inputSize int
outputSize int
rank int
seed int?

MergeToOriginalLayer()

Merges the DVoRA adaptation into the base layer and returns the merged layer.

public override ILayer<T> MergeToOriginalLayer()

Returns

ILayer<T>

A new layer with DVoRA weights merged into the base layer's weights.

Remarks

This method creates a final layer with the DVoRA adaptations baked in. The merged weights combine DoRA's magnitude-direction decomposition with VeRA's adaptation: W' = m * normalize(d + VeRA_contribution)

For Beginners: This "bakes in" your DVoRA adaptation for deployment.

After training with DVoRA, you probably want to deploy a simpler model without all the DVoRA machinery. This method creates that simpler model by:

  1. Computing the VeRA contribution to direction
  2. Adding it to the base direction
  3. Normalizing the result (DoRA)
  4. Multiplying by magnitude (DoRA)
  5. Creating a new layer with these merged weights

The result is a standard layer that behaves like your DVoRA-adapted model but is faster to run because it doesn't need the DVoRA computation at runtime.

Exceptions

InvalidOperationException

Thrown when the base layer type is not supported for merging.

ResetSharedMatrices()

Resets the shared matrices (useful for testing or reinitializing).

public static void ResetSharedMatrices()

ResetState()

Resets the internal state of the DVoRA adapter.

public override void ResetState()

SetParameters(Vector<T>)

Sets the layer parameters from a vector.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

Vector containing all parameters.

UpdateParameters(T)

Updates parameters using the specified learning rate.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T

The learning rate for parameter updates.