Class DeltaLoRAAdapter<T>
Delta-LoRA adapter that focuses on parameter-efficient delta updates with momentum.
public class DeltaLoRAAdapter<T> : LoRAAdapterBase<T>, IDisposable, ILoRAAdapter<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
LayerBase<T>DeltaLoRAAdapter<T>
- Implements
-
ILoRAAdapter<T>ILayer<T>
- Inherited Members
Remarks
Delta-LoRA is a variant of LoRA that explicitly models the change (delta) in parameters rather than the absolute values. This approach can achieve better convergence in certain scenarios by focusing on the parameter update dynamics with momentum-based accumulation.
For Beginners: Think of Delta-LoRA as "change-focused" LoRA.
Regular LoRA learns: "What should the weights be?" Delta-LoRA learns: "How should the weights change?"
This difference matters because:
- Changes (deltas) often have simpler patterns than absolute values
- Momentum helps smooth out noisy updates
- Can converge faster when the optimal adaptation is a smooth transformation
Key concepts:
- Delta weights: Accumulated changes to parameters (not the parameters themselves)
- Delta scaling: Controls how strongly deltas affect the output
- Momentum: Smooths updates by remembering previous changes
When Delta-LoRA works better than standard LoRA:
- Tasks requiring smooth, gradual adaptations
- Fine-tuning where the base model is already close to optimal
- Scenarios with noisy gradients that benefit from momentum
- Transfer learning where you want to preserve more of the original model's behavior
Example: If you're adapting a language model to a new domain, Delta-LoRA can make smaller, more conservative changes that preserve the model's general knowledge while adapting to domain-specific patterns.
Constructors
DeltaLoRAAdapter(ILayer<T>, int, double, double, double, bool)
Initializes a new Delta-LoRA adapter wrapping an existing layer.
public DeltaLoRAAdapter(ILayer<T> baseLayer, int rank, double alpha = -1, double deltaScaling = 0.1, double momentumFactor = 0.9, bool freezeBaseLayer = true)
Parameters
baseLayerILayer<T>The layer to adapt with Delta-LoRA.
rankintThe rank of the LoRA decomposition.
alphadoubleThe LoRA scaling factor (defaults to rank if negative).
deltaScalingdoubleScaling factor for delta updates (default: 0.1).
momentumFactordoubleMomentum factor for delta accumulation (default: 0.9).
freezeBaseLayerboolWhether to freeze the base layer's parameters during training.
Remarks
For Beginners: This creates a Delta-LoRA adapter with momentum-based updates.
Parameters:
- baseLayer: The layer you want to adapt
- rank: Compression level (lower = fewer parameters)
- alpha: LoRA strength
- deltaScaling: How strongly deltas affect output (0.01 to 1.0, default 0.1)
- momentumFactor: How much to smooth updates (0.0 to 1.0, default 0.9)
- freezeBaseLayer: Whether to lock the original layer (usually true)
Recommended settings:
- For stable tasks: deltaScaling=0.1, momentumFactor=0.9
- For aggressive adaptation: deltaScaling=0.5, momentumFactor=0.5
- For conservative adaptation: deltaScaling=0.01, momentumFactor=0.95
Exceptions
- ArgumentNullException
Thrown when baseLayer is null.
- ArgumentException
Thrown when deltaScaling or momentumFactor are out of valid range.
Properties
DeltaScaling
Gets the scaling factor for delta updates.
public double DeltaScaling { get; }
Property Value
MomentumFactor
Gets the momentum factor for delta accumulation.
public double MomentumFactor { get; }
Property Value
ParameterCount
Gets the total number of trainable parameters including delta weights.
public override int ParameterCount { get; }
Property Value
Remarks
Includes base layer (if not frozen), LoRA layer, and delta weights matrix parameters.
Methods
Backward(Tensor<T>)
Performs the backward pass, computing gradients for delta weights with momentum.
public override Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>Gradient flowing back from the next layer.
Returns
- Tensor<T>
Gradient to pass to the previous layer.
Remarks
The backward pass: 1. Propagates gradients through base and LoRA layers (from base class) 2. Computes gradients for delta weights 3. Updates velocity using momentum 4. Accumulates all input gradients
For Beginners: This figures out how to improve all components: - The LoRA matrices (via the base class) - The delta weights (computed here) - Applies momentum to smooth out the delta updates
Momentum helps by:
- Accelerating convergence when gradients are consistent
- Dampening oscillations when gradients are noisy
- Creating smoother, more stable training dynamics
Forward(Tensor<T>)
Performs the forward pass: output = base_layer(input) + LoRA(input) + delta_weights @ input * delta_scaling.
public override Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>Input tensor.
Returns
- Tensor<T>
Combined output from base layer, LoRA layer, and delta weights.
Remarks
The forward pass computes three components: 1. Base layer output (original layer behavior) 2. LoRA output (low-rank adaptation) 3. Delta output (accumulated parameter changes scaled by deltaScaling)
For Beginners: This combines three sources of information: - The original layer's predictions (base) - The LoRA adaptation (learned low-rank changes) - The accumulated deltas (momentum-smoothed changes)
The delta component is what makes this different from standard LoRA - it explicitly applies the accumulated changes with scaling, allowing for more controlled adaptation.
GetCurrentDelta()
Gets the current delta weights matrix.
public Matrix<T> GetCurrentDelta()
Returns
- Matrix<T>
A copy of the current delta weights.
Remarks
For Beginners: This shows you the accumulated changes that Delta-LoRA has learned. You can use this to: - Visualize how the model is adapting - Compare different checkpoints during training - Understand which connections are changing the most
GetParameterGradients()
Gets all parameter gradients including base layer, LoRA layer, and delta weight gradients.
public override Vector<T> GetParameterGradients()
Returns
- Vector<T>
Vector containing all gradients.
Remarks
Gradient packing order matches GetParameters: [base layer gradients (if not frozen)], [LoRA gradients], [delta weight gradients].
For Beginners: This packs all the gradients computed during backpropagation so optimizers can update all parameters consistently. Without this override, optimizers would miss the delta weight gradients, causing them to never update correctly.
GetParameters()
Gets the current parameters including base layer, LoRA layer, and delta weights.
public override Vector<T> GetParameters()
Returns
- Vector<T>
Vector containing all parameters (base + LoRA + delta weights flattened).
Remarks
Parameters are packed in order: [base layer params (if not frozen)], [LoRA params], [delta weights].
MergeToOriginalLayer()
Merges the LoRA adaptation and delta weights into the base layer.
public override ILayer<T> MergeToOriginalLayer()
Returns
- ILayer<T>
A new layer with LoRA and delta weights merged into the base layer's weights.
Remarks
This method merges three components: 1. Base layer weights (original) 2. LoRA weights (low-rank adaptation) 3. Delta weights (momentum-accumulated changes, scaled by deltaScaling)
For Beginners: This "bakes in" all the adaptations to create a single efficient layer.
The final weights include:
- Original pre-trained weights
-
- LoRA adaptations (B × A matrices)
-
- Delta weights (accumulated changes × scaling factor)
After merging:
- Faster inference (single layer instead of three components)
- Simpler deployment (no need for special LoRA code)
- Preserves all the learned adaptations
This is typically done after training is complete and you want to deploy the model.
Exceptions
- InvalidOperationException
Thrown when the base layer type is not DenseLayer or FullyConnectedLayer.
ResetState()
Resets the internal state including delta weights, velocity, and cached inputs.
public override void ResetState()
Remarks
For Beginners: This clears all temporary state but preserves learned parameters. Use this when starting to process a completely new, unrelated batch of data.
SetParameters(Vector<T>)
Sets the layer parameters including base layer, LoRA layer, and delta weights.
public override void SetParameters(Vector<T> parameters)
Parameters
parametersVector<T>Vector containing all parameters.
Remarks
Parameters must be packed in order: [base layer params (if not frozen)], [LoRA params], [delta weights].
Exceptions
- ArgumentException
Thrown when parameter count doesn't match expected count.
UpdateParameters(T)
Updates parameters using momentum-based delta updates.
public override void UpdateParameters(T learningRate)
Parameters
learningRateTThe learning rate for parameter updates.
Remarks
The update process: 1. Update base and LoRA parameters (via base class) 2. Update velocity with momentum: velocity = momentum * velocity + (1 - momentum) * gradient 3. Update delta weights: delta_weights -= learning_rate * velocity
For Beginners: This is where the momentum magic happens!
Without momentum:
- Updates can be jerky and unstable
- Training might oscillate around the optimum
With momentum:
- Velocity builds up in consistent gradient directions (speeds up convergence)
- Velocity dampens in inconsistent directions (reduces oscillation)
- Results in smoother, faster convergence
Think of it like pushing a shopping cart: if you keep pushing in the same direction, it picks up speed (momentum). If you change direction, it slows down first.