Class LoRAAdapterBase<T>

Namespace: AiDotNet.LoRA.Adapters

Assembly: AiDotNet.dll

Abstract base class for LoRA (Low-Rank Adaptation) adapters that wrap existing layers.

public abstract class LoRAAdapterBase<T> : LayerBase<T>, IDisposable, ILoRAAdapter<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>

Type Parameters

T: The numeric type used for calculations, typically float or double.

Inheritance: object

LayerBase<T>

LoRAAdapterBase<T>

Implements: IDisposable

ILoRAAdapter<T>

ILayer<T>

IJitCompilable<T>

IDiagnosticsProvider

IWeightLoadable<T>

Inherited Members: LayerBase<T>.Engine

LayerBase<T>.ScalarActivation

LayerBase<T>.VectorActivation

LayerBase<T>.UsingVectorActivation

LayerBase<T>.NumOps

LayerBase<T>.Random

LayerBase<T>.Parameters

LayerBase<T>.ParameterGradients

LayerBase<T>.InputShape

LayerBase<T>.InputShapes

LayerBase<T>.UpdateInputShape(int[])

LayerBase<T>.OutputShape

LayerBase<T>.IsTrainingMode

LayerBase<T>.InitializationStrategy

LayerBase<T>.IsInitialized

LayerBase<T>.InitializationLock

LayerBase<T>.EnsureInitialized()

LayerBase<T>.UseAutodiff

LayerBase<T>.SetTrainingMode(bool)

LayerBase<T>.GetParameterGradients()

LayerBase<T>.ClearGradients()

LayerBase<T>.GetInputShape()

LayerBase<T>.GetInputShapes()

LayerBase<T>.GetOutputShape()

LayerBase<T>.GetWeights()

LayerBase<T>.GetBiases()

LayerBase<T>.MapActivationToFused()

LayerBase<T>.SupportsGpuExecution

LayerBase<T>.SupportsGpuTraining

LayerBase<T>.CanExecuteOnGpu

LayerBase<T>.CanTrainOnGpu

LayerBase<T>.ForwardGpu(params IGpuTensor<T>[])

LayerBase<T>.BackwardGpu(IGpuTensor<T>)

LayerBase<T>.UpdateParametersGpu(IGpuOptimizerConfig)

LayerBase<T>.UploadWeightsToGpu()

LayerBase<T>.DownloadWeightsFromGpu()

LayerBase<T>.ZeroGradientsGpu()

LayerBase<T>.GetActivationTypes()

LayerBase<T>.Forward(params Tensor<T>[])

LayerBase<T>.ApplyActivation(Tensor<T>)

LayerBase<T>.ApplyActivation(Vector<T>)

LayerBase<T>.ActivateTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ActivateTensor(IVectorActivationFunction<T>, Tensor<T>)

LayerBase<T>.CalculateInputShape(int, int, int)

LayerBase<T>.CalculateOutputShape(int, int, int)

LayerBase<T>.Clone()

LayerBase<T>.DerivativeTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ApplyActivationDerivative(T, T)

LayerBase<T>.ApplyActivationDerivative(Tensor<T>, Tensor<T>)

LayerBase<T>.ComputeActivationJacobian(Vector<T>)

LayerBase<T>.ApplyActivationDerivative(Vector<T>, Vector<T>)

LayerBase<T>.UpdateParameters(Vector<T>)

LayerBase<T>.Serialize(BinaryWriter)

LayerBase<T>.Deserialize(BinaryReader)

LayerBase<T>.GetDiagnostics()

LayerBase<T>.ApplyActivationToGraph(ComputationNode<T>)

LayerBase<T>.CanActivationBeJitted()

LayerBase<T>.RegisterTrainableParameter(Tensor<T>, PersistentTensorRole)

LayerBase<T>.InvalidateTrainableParameter(Tensor<T>)

LayerBase<T>.HasGpuActivation()

LayerBase<T>.ApplyActivationForwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.ApplyActivationBackwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.GetFusedActivationType()

LayerBase<T>.ApplyGpuActivation(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int, FusedActivationType)

LayerBase<T>.ApplyGpuActivationBackward(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int, FusedActivationType, float)

LayerBase<T>.Dispose()

LayerBase<T>.Dispose(bool)

LayerBase<T>.WeightParameterName

LayerBase<T>.BiasParameterName

LayerBase<T>.SetWeights(Tensor<T>)

LayerBase<T>.SetBiases(Tensor<T>)

LayerBase<T>.GetParameterNames()

LayerBase<T>.TryGetParameter(string, out Tensor<T>)

LayerBase<T>.SetParameter(string, Tensor<T>)

LayerBase<T>.GetParameterShape(string)

LayerBase<T>.NamedParameterCount

LayerBase<T>.ValidateWeights(IEnumerable<string>, Func<string, string>)

LayerBase<T>.LoadWeights(Dictionary<string, Tensor<T>>, Func<string, string>, bool)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

This base class provides common functionality for all LoRA adapter implementations. It manages the base layer, LoRA layer, and parameter synchronization, while allowing derived classes to implement layer-type-specific logic such as merging and validation.

For Beginners: This is the foundation for all LoRA adapters in the library.

A LoRA adapter wraps an existing layer (like a dense or convolutional layer) and adds a small "correction layer" that learns what adjustments are needed. This base class:

Manages both the original layer and the LoRA correction layer
Handles parameter synchronization between them
Provides common forward/backward pass logic (original + correction)
Lets specialized adapters handle layer-specific details

This design allows you to create LoRA adapters for any layer type by:

Inheriting from this base class
Implementing layer-specific validation
Implementing how to merge the LoRA weights back into the original layer

The result is parameter-efficient fine-tuning that works across different layer architectures!

Constructors

LoRAAdapterBase(ILayer<T>, int, double, bool)

Initializes a new LoRA adapter base with the specified parameters.

protected LoRAAdapterBase(ILayer<T> baseLayer, int rank, double alpha = -1, bool freezeBaseLayer = true)

Parameters

baseLayer ILayer<T>: The layer to adapt with LoRA.
rank int: The rank of the LoRA decomposition.
alpha double: The LoRA scaling factor (defaults to rank if negative).
freezeBaseLayer bool: Whether to freeze the base layer's parameters during training.

Remarks

For Beginners: This creates the foundation for a LoRA adapter.

Parameters:

baseLayer: The layer you want to make more efficient to fine-tune
rank: How much compression (lower = fewer parameters, less flexibility)
alpha: How strong the LoRA adaptation is
freezeBaseLayer: Whether to lock the original layer's weights (usually true for efficiency)

Derived classes will call this constructor and then add their own layer-specific logic.

Exceptions

ArgumentNullException: Thrown when baseLayer is null.

Fields

_baseLayer

The base layer being adapted.

protected readonly ILayer<T> _baseLayer

Field Value

ILayer<T>

_freezeBaseLayer

Whether the base layer's parameters are frozen (not trainable).

protected readonly bool _freezeBaseLayer

Field Value

bool

_loraLayer

The LoRA layer that provides the adaptation.

protected readonly LoRALayer<T> _loraLayer

Field Value

LoRALayer<T>

Properties

Alpha

Gets the scaling factor (alpha) for the LoRA adaptation.

public double Alpha { get; }

Property Value

double

Remarks

Alpha controls how strongly the LoRA adaptation affects the output. The actual LoRA contribution is scaled by alpha/rank. Common practice: alpha = rank (scaling factor of 1.0)

BaseLayer

Gets the base layer being adapted with LoRA.

public ILayer<T> BaseLayer { get; }

Property Value

ILayer<T>

Remarks

This is the original layer that's being enhanced with LoRA adaptations. It may be frozen (non-trainable) during fine-tuning for maximum efficiency.

IsBaseLayerFrozen

Gets whether the base layer's parameters are frozen during training.

public bool IsBaseLayerFrozen { get; }

Property Value

bool

Remarks

When true, only the LoRA parameters are trained, dramatically reducing memory requirements and training time. This is the typical use case for LoRA.

LoRALayer

Gets the LoRA layer providing the low-rank adaptation.

public LoRALayer<T> LoRALayer { get; }

Property Value

LoRALayer<T>

Remarks

This layer implements the low-rank decomposition (A and B matrices) that provides the adaptation to the base layer's behavior.

ParameterCount

Gets the total number of trainable parameters.

public override int ParameterCount { get; }

Property Value

int

Remarks

If the base layer is frozen, this returns only the LoRA parameter count. Otherwise, it returns the sum of base and LoRA parameters.

Rank

Gets the rank of the low-rank decomposition.

public int Rank { get; }

Property Value

int

Remarks

The rank determines how many parameters the LoRA adaptation uses. Lower rank = fewer parameters = more efficient but less flexible.

Typical values: - rank=1-4: Very efficient, minimal parameters - rank=8: Good balance (default for many applications) - rank=16-32: More flexibility, more parameters - rank=64+: Diminishing returns, approaching full fine-tuning

SupportsJitCompilation

Gets whether this LoRA adapter supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool: True if both the base layer and LoRA layer support JIT compilation.

Remarks

LoRA adapters support JIT compilation when both their component layers (the base layer and the LoRA layer) support JIT compilation. The computation graph combines both layers: output = base_layer(input) + lora_layer(input)

For Beginners: JIT compilation makes layers run faster by converting their math operations into optimized native code.

A LoRA adapter can be JIT compiled when:

The base layer supports JIT compilation (has its weights initialized)
The LoRA layer supports JIT compilation (has its A and B matrices initialized)

The JIT-compiled version computes both the base layer's output and the LoRA adaptation in parallel, then adds them together. This can provide significant speedup (5-10x).

Alternatively, you can merge the LoRA weights into the base layer using MergeToOriginalLayer() for an even simpler and potentially faster deployment.

SupportsTraining

Gets whether this adapter supports training.

public override bool SupportsTraining { get; }

Property Value

bool

Methods

Backward(Tensor<T>)

Performs the backward pass through both layers.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>: Gradient flowing back from the next layer.

Returns

Tensor<T>: Gradient to pass to the previous layer.

Remarks

The backward pass propagates gradients through both the LoRA layer and (if not frozen) the base layer. The input gradients from both paths are summed.

For Beginners: During learning, this figures out how to improve both layers: - Always updates the LoRA layer (that's what we're training) - Only updates the base layer if it's not frozen - Combines the gradients from both paths to tell earlier layers how to improve

CreateLoRALayer(int, double)

Creates the LoRA layer for this adapter.

protected virtual LoRALayer<T> CreateLoRALayer(int rank, double alpha)

Parameters

rank int: The rank of the LoRA decomposition.
alpha double: The LoRA scaling factor.

Returns

LoRALayer<T>: A LoRA layer configured for this adapter.

Remarks

This method can be overridden by derived classes to customize LoRA layer creation. By default, it creates a standard LoRA layer with the adapter's input and output dimensions.

For Beginners: This creates the "correction layer" that learns adaptations.

Different adapter types might need different LoRA layer configurations:

Dense layers: Standard 1D LoRA
Convolutional layers: LoRA with spatial dimensions
Attention layers: LoRA for query/key/value projections

This method lets each adapter type create the right kind of LoRA layer.

CreateMergedLayerWithClone(Vector<T>)

Helper method to create a merged layer by cloning the base layer and updating its parameters.

protected ILayer<T> CreateMergedLayerWithClone(Vector<T> mergedParams)

Parameters

mergedParams Vector<T>: The merged parameters to set on the cloned layer.

Returns

ILayer<T>: A cloned layer with merged parameters and preserved activation function.

Remarks

This helper method preserves the activation function and other settings from the base layer by using Clone() instead of creating a new layer. This ensures the merged layer behaves identically to the original adapted layer.

For Beginners: This is a utility method that derived classes can use to create a properly merged layer without duplicating the Clone() pattern everywhere.

Exceptions

InvalidOperationException: Thrown when base layer is not DenseLayer or FullyConnectedLayer.

ExportComputationGraph(List<ComputationNode<T>>)

Exports the computation graph for JIT compilation.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>: List to which input nodes will be added.

Returns

ComputationNode<T>: The output computation node representing the combined base + LoRA transformation.

Remarks

The computation graph implements: output = base_layer(input) + lora_layer(input)

This mirrors the Forward() method logic where:

The input is passed through the base layer
The same input is passed through the LoRA layer
The two outputs are added element-wise

For Beginners: This exports the LoRA adapter's computation as a graph of operations that can be optimized and compiled to fast native code.

The graph represents:

Input → base layer computation → base output
Input → LoRA layer computation → LoRA output
base output + LoRA output → final output

The JIT compiler can then fuse operations, apply SIMD vectorization, and perform other optimizations to make inference faster.

Exceptions

ArgumentNullException: Thrown when inputNodes is null.
InvalidOperationException: Thrown when component layers are not initialized.

Forward(Tensor<T>)

Performs the forward pass through both base and LoRA layers.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>: Input tensor.

Returns

Tensor<T>: Sum of base layer output and LoRA output.

Remarks

The forward pass computes: output = base_layer(input) + lora_layer(input)

For Beginners: This runs the input through both the original layer and the LoRA correction layer, then adds their outputs together. The result is the original behavior plus the learned adaptation.

GetParameters()

Gets the current parameters as a vector.

public override Vector<T> GetParameters()

Returns

Vector<T>: Vector containing parameters (LoRA only if base is frozen, otherwise both).

MergeToDenseOrFullyConnected()

Merges LoRA weights into the base layer for DenseLayer or FullyConnectedLayer.

protected ILayer<T> MergeToDenseOrFullyConnected()

Returns

ILayer<T>: A new layer with merged weights.

Remarks

This helper method implements the standard LoRA merge logic for Dense and FullyConnected layers: 1. Get LoRA weight contribution from low-rank matrices 2. Add to base layer weights element-wise 3. Preserve biases unchanged 4. Create new layer with merged parameters

For Beginners: This combines the base weights with the LoRA adaptation, creating a single layer that doesn't need the adapter anymore. Useful for deployment!

Exceptions

InvalidOperationException: Thrown when base layer is not DenseLayer or FullyConnectedLayer.

MergeToOriginalLayer()

Merges the LoRA adaptation into the base layer and returns the merged layer.

public abstract ILayer<T> MergeToOriginalLayer()

Returns

ILayer<T>: A new layer with LoRA weights merged into the base layer's weights.

Remarks

This method must be implemented by derived classes to handle layer-type-specific merging logic. Each type of adapter (Dense, Convolutional, etc.) needs to know how to combine its LoRA weights with the base layer's weights.

For Beginners: This "bakes in" your LoRA adaptation to create a regular layer. After training with LoRA, you can merge the adaptation into the original weights for: - Faster inference (no need to compute LoRA separately) - Simpler deployment (single layer instead of two) - Compatibility with systems that don't support LoRA

Each layer type implements this differently because they have different internal structures.

ResetState()

Resets the internal state of both the base layer and LoRA layer.

public override void ResetState()

Remarks

For Beginners: This clears the memory of both the base layer and the LoRA layer. It's useful when starting to process a completely new, unrelated batch of data.

SetParameters(Vector<T>)

Sets the layer parameters from a vector.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>: Vector containing parameters.

UpdateParameters(T)

Updates parameters using the specified learning rate.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T: The learning rate for parameter updates.

UpdateParametersFromLayers()

Updates the parameter vector from the current base and LoRA layer states.

protected virtual void UpdateParametersFromLayers()

Remarks

This helper method synchronizes the adapter's parameter vector with the current state of the base and LoRA layers after updates. It packs parameters in the standard order: base layer parameters (if not frozen) followed by LoRA parameters.

For Beginners: This ensures the adapter's parameter vector stays in sync with its component layers. Called after parameter updates.

Table of Contents

Class LoRAAdapterBase<T>

Type Parameters

Remarks

Constructors

LoRAAdapterBase(ILayer<T>, int, double, bool)

Parameters

Remarks

Exceptions

Fields

_baseLayer

Field Value

_freezeBaseLayer

Field Value

_loraLayer

Field Value

Properties

Alpha

Property Value

Remarks

BaseLayer

Property Value

Remarks

IsBaseLayerFrozen

Property Value

Remarks

LoRALayer

Property Value

Remarks

ParameterCount

Property Value

Remarks

Rank

Property Value

Remarks

SupportsJitCompilation

Property Value

Remarks

SupportsTraining

Property Value

Methods

Backward(Tensor<T>)

Parameters

Returns

Remarks

CreateLoRALayer(int, double)

Parameters

Returns

Remarks

CreateMergedLayerWithClone(Vector<T>)

Parameters

Returns

Remarks

Exceptions

ExportComputationGraph(List<ComputationNode<T>>)

Parameters

Returns

Remarks

Exceptions

Forward(Tensor<T>)

Parameters

Returns

Remarks

GetParameters()

Returns

MergeToDenseOrFullyConnected()

Returns

Remarks

Exceptions

MergeToOriginalLayer()

Returns

Remarks

ResetState()

Remarks

SetParameters(Vector<T>)

Parameters

UpdateParameters(T)

Parameters

UpdateParametersFromLayers()

Remarks