Table of Contents

Class DeltaLoRAAdapter<T>

Namespace
AiDotNet.LoRA.Adapters
Assembly
AiDotNet.dll

Delta-LoRA adapter that focuses on parameter-efficient delta updates with momentum.

public class DeltaLoRAAdapter<T> : LoRAAdapterBase<T>, IDisposable, ILoRAAdapter<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>

Type Parameters

T

The numeric type used for calculations, typically float or double.

Inheritance
DeltaLoRAAdapter<T>
Implements
Inherited Members

Remarks

Delta-LoRA is a variant of LoRA that explicitly models the change (delta) in parameters rather than the absolute values. This approach can achieve better convergence in certain scenarios by focusing on the parameter update dynamics with momentum-based accumulation.

For Beginners: Think of Delta-LoRA as "change-focused" LoRA.

Regular LoRA learns: "What should the weights be?" Delta-LoRA learns: "How should the weights change?"

This difference matters because:

  1. Changes (deltas) often have simpler patterns than absolute values
  2. Momentum helps smooth out noisy updates
  3. Can converge faster when the optimal adaptation is a smooth transformation

Key concepts:

  • Delta weights: Accumulated changes to parameters (not the parameters themselves)
  • Delta scaling: Controls how strongly deltas affect the output
  • Momentum: Smooths updates by remembering previous changes

When Delta-LoRA works better than standard LoRA:

  • Tasks requiring smooth, gradual adaptations
  • Fine-tuning where the base model is already close to optimal
  • Scenarios with noisy gradients that benefit from momentum
  • Transfer learning where you want to preserve more of the original model's behavior

Example: If you're adapting a language model to a new domain, Delta-LoRA can make smaller, more conservative changes that preserve the model's general knowledge while adapting to domain-specific patterns.

Constructors

DeltaLoRAAdapter(ILayer<T>, int, double, double, double, bool)

Initializes a new Delta-LoRA adapter wrapping an existing layer.

public DeltaLoRAAdapter(ILayer<T> baseLayer, int rank, double alpha = -1, double deltaScaling = 0.1, double momentumFactor = 0.9, bool freezeBaseLayer = true)

Parameters

baseLayer ILayer<T>

The layer to adapt with Delta-LoRA.

rank int

The rank of the LoRA decomposition.

alpha double

The LoRA scaling factor (defaults to rank if negative).

deltaScaling double

Scaling factor for delta updates (default: 0.1).

momentumFactor double

Momentum factor for delta accumulation (default: 0.9).

freezeBaseLayer bool

Whether to freeze the base layer's parameters during training.

Remarks

For Beginners: This creates a Delta-LoRA adapter with momentum-based updates.

Parameters:

  • baseLayer: The layer you want to adapt
  • rank: Compression level (lower = fewer parameters)
  • alpha: LoRA strength
  • deltaScaling: How strongly deltas affect output (0.01 to 1.0, default 0.1)
  • momentumFactor: How much to smooth updates (0.0 to 1.0, default 0.9)
  • freezeBaseLayer: Whether to lock the original layer (usually true)

Recommended settings:

  • For stable tasks: deltaScaling=0.1, momentumFactor=0.9
  • For aggressive adaptation: deltaScaling=0.5, momentumFactor=0.5
  • For conservative adaptation: deltaScaling=0.01, momentumFactor=0.95

Exceptions

ArgumentNullException

Thrown when baseLayer is null.

ArgumentException

Thrown when deltaScaling or momentumFactor are out of valid range.

Properties

DeltaScaling

Gets the scaling factor for delta updates.

public double DeltaScaling { get; }

Property Value

double

MomentumFactor

Gets the momentum factor for delta accumulation.

public double MomentumFactor { get; }

Property Value

double

ParameterCount

Gets the total number of trainable parameters including delta weights.

public override int ParameterCount { get; }

Property Value

int

Remarks

Includes base layer (if not frozen), LoRA layer, and delta weights matrix parameters.

Methods

Backward(Tensor<T>)

Performs the backward pass, computing gradients for delta weights with momentum.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>

Gradient flowing back from the next layer.

Returns

Tensor<T>

Gradient to pass to the previous layer.

Remarks

The backward pass: 1. Propagates gradients through base and LoRA layers (from base class) 2. Computes gradients for delta weights 3. Updates velocity using momentum 4. Accumulates all input gradients

For Beginners: This figures out how to improve all components: - The LoRA matrices (via the base class) - The delta weights (computed here) - Applies momentum to smooth out the delta updates

Momentum helps by:

  • Accelerating convergence when gradients are consistent
  • Dampening oscillations when gradients are noisy
  • Creating smoother, more stable training dynamics

Forward(Tensor<T>)

Performs the forward pass: output = base_layer(input) + LoRA(input) + delta_weights @ input * delta_scaling.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>

Input tensor.

Returns

Tensor<T>

Combined output from base layer, LoRA layer, and delta weights.

Remarks

The forward pass computes three components: 1. Base layer output (original layer behavior) 2. LoRA output (low-rank adaptation) 3. Delta output (accumulated parameter changes scaled by deltaScaling)

For Beginners: This combines three sources of information: - The original layer's predictions (base) - The LoRA adaptation (learned low-rank changes) - The accumulated deltas (momentum-smoothed changes)

The delta component is what makes this different from standard LoRA - it explicitly applies the accumulated changes with scaling, allowing for more controlled adaptation.

GetCurrentDelta()

Gets the current delta weights matrix.

public Matrix<T> GetCurrentDelta()

Returns

Matrix<T>

A copy of the current delta weights.

Remarks

For Beginners: This shows you the accumulated changes that Delta-LoRA has learned. You can use this to: - Visualize how the model is adapting - Compare different checkpoints during training - Understand which connections are changing the most

GetParameterGradients()

Gets all parameter gradients including base layer, LoRA layer, and delta weight gradients.

public override Vector<T> GetParameterGradients()

Returns

Vector<T>

Vector containing all gradients.

Remarks

Gradient packing order matches GetParameters: [base layer gradients (if not frozen)], [LoRA gradients], [delta weight gradients].

For Beginners: This packs all the gradients computed during backpropagation so optimizers can update all parameters consistently. Without this override, optimizers would miss the delta weight gradients, causing them to never update correctly.

GetParameters()

Gets the current parameters including base layer, LoRA layer, and delta weights.

public override Vector<T> GetParameters()

Returns

Vector<T>

Vector containing all parameters (base + LoRA + delta weights flattened).

Remarks

Parameters are packed in order: [base layer params (if not frozen)], [LoRA params], [delta weights].

MergeToOriginalLayer()

Merges the LoRA adaptation and delta weights into the base layer.

public override ILayer<T> MergeToOriginalLayer()

Returns

ILayer<T>

A new layer with LoRA and delta weights merged into the base layer's weights.

Remarks

This method merges three components: 1. Base layer weights (original) 2. LoRA weights (low-rank adaptation) 3. Delta weights (momentum-accumulated changes, scaled by deltaScaling)

For Beginners: This "bakes in" all the adaptations to create a single efficient layer.

The final weights include:

  • Original pre-trained weights
    • LoRA adaptations (B × A matrices)
    • Delta weights (accumulated changes × scaling factor)

After merging:

  • Faster inference (single layer instead of three components)
  • Simpler deployment (no need for special LoRA code)
  • Preserves all the learned adaptations

This is typically done after training is complete and you want to deploy the model.

Exceptions

InvalidOperationException

Thrown when the base layer type is not DenseLayer or FullyConnectedLayer.

ResetState()

Resets the internal state including delta weights, velocity, and cached inputs.

public override void ResetState()

Remarks

For Beginners: This clears all temporary state but preserves learned parameters. Use this when starting to process a completely new, unrelated batch of data.

SetParameters(Vector<T>)

Sets the layer parameters including base layer, LoRA layer, and delta weights.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

Vector containing all parameters.

Remarks

Parameters must be packed in order: [base layer params (if not frozen)], [LoRA params], [delta weights].

Exceptions

ArgumentException

Thrown when parameter count doesn't match expected count.

UpdateParameters(T)

Updates parameters using momentum-based delta updates.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T

The learning rate for parameter updates.

Remarks

The update process: 1. Update base and LoRA parameters (via base class) 2. Update velocity with momentum: velocity = momentum * velocity + (1 - momentum) * gradient 3. Update delta weights: delta_weights -= learning_rate * velocity

For Beginners: This is where the momentum magic happens!

Without momentum:

  • Updates can be jerky and unstable
  • Training might oscillate around the optimum

With momentum:

  • Velocity builds up in consistent gradient directions (speeds up convergence)
  • Velocity dampens in inconsistent directions (reduces oscillation)
  • Results in smoother, faster convergence

Think of it like pushing a shopping cart: if you keep pushing in the same direction, it picks up speed (momentum). If you change direction, it slows down first.