Class AdaLoRAAdapter<T>

Namespace: AiDotNet.LoRA.Adapters

Assembly: AiDotNet.dll

Adaptive Low-Rank Adaptation (AdaLoRA) adapter that dynamically allocates parameter budgets among weight matrices.

public class AdaLoRAAdapter<T> : LoRAAdapterBase<T>, IDisposable, ILoRAAdapter<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>

Type Parameters

T: The numeric type used for calculations, typically float or double.

Inheritance: object

LayerBase<T>

LoRAAdapterBase<T>

AdaLoRAAdapter<T>

Implements: IDisposable

ILoRAAdapter<T>

ILayer<T>

IJitCompilable<T>

IDiagnosticsProvider

IWeightLoadable<T>

Inherited Members: LoRAAdapterBase<T>._baseLayer

LoRAAdapterBase<T>._loraLayer

LoRAAdapterBase<T>._freezeBaseLayer

LoRAAdapterBase<T>.BaseLayer

LoRAAdapterBase<T>.LoRALayer

LoRAAdapterBase<T>.IsBaseLayerFrozen

LoRAAdapterBase<T>.Rank

LoRAAdapterBase<T>.Alpha

LoRAAdapterBase<T>.ParameterCount

LoRAAdapterBase<T>.SupportsTraining

LoRAAdapterBase<T>.CreateLoRALayer(int, double)

LoRAAdapterBase<T>.UpdateParameters(T)

LoRAAdapterBase<T>.GetParameters()

LoRAAdapterBase<T>.SetParameters(Vector<T>)

LoRAAdapterBase<T>.CreateMergedLayerWithClone(Vector<T>)

LoRAAdapterBase<T>.MergeToDenseOrFullyConnected()

LoRAAdapterBase<T>.UpdateParametersFromLayers()

LoRAAdapterBase<T>.ResetState()

LoRAAdapterBase<T>.SupportsJitCompilation

LoRAAdapterBase<T>.ExportComputationGraph(List<ComputationNode<T>>)

LayerBase<T>.Engine

LayerBase<T>.ScalarActivation

LayerBase<T>.VectorActivation

LayerBase<T>.UsingVectorActivation

LayerBase<T>.NumOps

LayerBase<T>.Random

LayerBase<T>.Parameters

LayerBase<T>.ParameterGradients

LayerBase<T>.InputShape

LayerBase<T>.InputShapes

LayerBase<T>.UpdateInputShape(int[])

LayerBase<T>.OutputShape

LayerBase<T>.IsTrainingMode

LayerBase<T>.InitializationStrategy

LayerBase<T>.IsInitialized

LayerBase<T>.InitializationLock

LayerBase<T>.EnsureInitialized()

LayerBase<T>.UseAutodiff

LayerBase<T>.SetTrainingMode(bool)

LayerBase<T>.GetParameterGradients()

LayerBase<T>.ClearGradients()

LayerBase<T>.GetInputShape()

LayerBase<T>.GetInputShapes()

LayerBase<T>.GetOutputShape()

LayerBase<T>.GetWeights()

LayerBase<T>.GetBiases()

LayerBase<T>.MapActivationToFused()

LayerBase<T>.SupportsGpuExecution

LayerBase<T>.SupportsGpuTraining

LayerBase<T>.CanExecuteOnGpu

LayerBase<T>.CanTrainOnGpu

LayerBase<T>.ForwardGpu(params IGpuTensor<T>[])

LayerBase<T>.BackwardGpu(IGpuTensor<T>)

LayerBase<T>.UpdateParametersGpu(IGpuOptimizerConfig)

LayerBase<T>.UploadWeightsToGpu()

LayerBase<T>.DownloadWeightsFromGpu()

LayerBase<T>.ZeroGradientsGpu()

LayerBase<T>.GetActivationTypes()

LayerBase<T>.Forward(params Tensor<T>[])

LayerBase<T>.ApplyActivation(Tensor<T>)

LayerBase<T>.ApplyActivation(Vector<T>)

LayerBase<T>.ActivateTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ActivateTensor(IVectorActivationFunction<T>, Tensor<T>)

LayerBase<T>.CalculateInputShape(int, int, int)

LayerBase<T>.CalculateOutputShape(int, int, int)

LayerBase<T>.Clone()

LayerBase<T>.DerivativeTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ApplyActivationDerivative(T, T)

LayerBase<T>.ApplyActivationDerivative(Tensor<T>, Tensor<T>)

LayerBase<T>.ComputeActivationJacobian(Vector<T>)

LayerBase<T>.ApplyActivationDerivative(Vector<T>, Vector<T>)

LayerBase<T>.UpdateParameters(Vector<T>)

LayerBase<T>.Serialize(BinaryWriter)

LayerBase<T>.Deserialize(BinaryReader)

LayerBase<T>.GetDiagnostics()

LayerBase<T>.ApplyActivationToGraph(ComputationNode<T>)

LayerBase<T>.CanActivationBeJitted()

LayerBase<T>.RegisterTrainableParameter(Tensor<T>, PersistentTensorRole)

LayerBase<T>.InvalidateTrainableParameter(Tensor<T>)

LayerBase<T>.HasGpuActivation()

LayerBase<T>.ApplyActivationForwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.ApplyActivationBackwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.GetFusedActivationType()

LayerBase<T>.ApplyGpuActivation(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int, FusedActivationType)

LayerBase<T>.ApplyGpuActivationBackward(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int, FusedActivationType, float)

LayerBase<T>.Dispose()

LayerBase<T>.Dispose(bool)

LayerBase<T>.WeightParameterName

LayerBase<T>.BiasParameterName

LayerBase<T>.SetWeights(Tensor<T>)

LayerBase<T>.SetBiases(Tensor<T>)

LayerBase<T>.GetParameterNames()

LayerBase<T>.TryGetParameter(string, out Tensor<T>)

LayerBase<T>.SetParameter(string, Tensor<T>)

LayerBase<T>.GetParameterShape(string)

LayerBase<T>.NamedParameterCount

LayerBase<T>.ValidateWeights(IEnumerable<string>, Func<string, string>)

LayerBase<T>.LoadWeights(Dictionary<string, Tensor<T>>, Func<string, string>, bool)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

AdaLoRA improves upon standard LoRA by dynamically adjusting the rank allocation based on importance scores. Instead of using a fixed rank for all weight matrices, AdaLoRA: - Starts with a maximum rank and adaptively reduces it during training - Computes importance scores for each singular value component - Prunes less important components to focus parameter budget on critical adaptations - Allows different layers to have different effective ranks

This leads to more efficient parameter usage compared to fixed-rank LoRA, especially for large models where some layers need more adaptation capacity than others.

For Beginners: AdaLoRA is like smart LoRA that learns which parts of the adaptation matter most.

Think of standard LoRA as giving every layer the same budget (rank=8 everywhere). AdaLoRA is smarter:

Some layers get more budget (rank=16) because they're important for the task
Other layers get less budget (rank=2) because small changes are enough
The model learns this automatically during training

How it works:

Start with a large rank (e.g., maxRank=32)
During training, track how important each component is
Prune components with low importance scores
Focus parameters on what actually helps

Benefits:

More parameter-efficient than fixed-rank LoRA
Better performance with same parameter budget
Automatically finds optimal rank per layer

Reference: "Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning" (ICLR 2023) https://arxiv.org/abs/2303.10512

Constructors

AdaLoRAAdapter(ILayer<T>, int, double, bool, double, int, int, double)

Initializes a new AdaLoRA adapter with adaptive rank allocation.

public AdaLoRAAdapter(ILayer<T> baseLayer, int maxRank, double alpha = -1, bool freezeBaseLayer = true, double rankPruningThreshold = 0.05, int minRank = 1, int pruningInterval = 100, double importanceScoreEMA = 0.95)

Parameters

baseLayer ILayer<T>: The layer to adapt with AdaLoRA.
maxRank int: The maximum rank for the LoRA decomposition.
alpha double: The LoRA scaling factor (defaults to maxRank if negative).
freezeBaseLayer bool: Whether to freeze the base layer's parameters during training.
rankPruningThreshold double: Threshold for pruning based on importance scores (default: 0.05).
minRank int: Minimum rank to maintain after pruning (default: 1).
pruningInterval int: Number of steps between pruning operations (default: 100).
importanceScoreEMA double: EMA factor for importance score updates (default: 0.95).

Remarks

For Beginners: This creates an AdaLoRA adapter with smart rank allocation.

Parameters:

baseLayer: The layer you want to adapt (typically Dense or FullyConnected)
maxRank: Start with this many components (will prune down during training)
alpha: How strong the adaptation is
freezeBaseLayer: Lock the original weights (usually true for efficiency)
rankPruningThreshold: How unimportant a component must be to get pruned (0.05 = bottom 5%)
minRank: Never prune below this rank (safety net)
pruningInterval: How often to check for pruning (in training steps)
importanceScoreEMA: How smooth importance tracking is (higher = more stable)

The adapter will automatically adjust its rank during training to focus parameters on the most important components.

Exceptions

ArgumentNullException: Thrown when baseLayer is null.
ArgumentException: Thrown when rank parameters are invalid.

Properties

CurrentRank

Gets the current active rank after pruning.

public int CurrentRank { get; }

Property Value

int

MaxRank

Gets the maximum rank this adapter can use.

public int MaxRank { get; }

Property Value

int

Methods

Backward(Tensor<T>)

Performs the backward pass and updates importance scores based on gradients.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>: Gradient flowing back from the next layer.

Returns

Tensor<T>: Gradient to pass to the previous layer.

Remarks

During backpropagation, AdaLoRA computes importance scores based on the magnitude of gradients for each singular value component. Components with consistently large gradients are considered more important.

For Beginners: This is where we learn which components are important! As gradients flow back: 1. We see which components have large gradients (they're actively learning) 2. We update their importance scores (high gradients = high importance) 3. We use exponential moving average to smooth out noise

Components that consistently get small gradients aren't helping much, so they'll get low importance scores and eventually be pruned.

ExpandRank(int)

Expands the rank by adding new components (for cases where more capacity is needed).

public void ExpandRank(int additionalRank)

Parameters

additionalRank int: Number of components to add.

Remarks

This is the opposite of pruning - it adds new components when the model needs more capacity. New components are initialized with low importance and will need to prove their worth. The corresponding matrix elements are reinitialized with small random values so they can learn.

For Beginners: Sometimes the model realizes it needs more capacity. This method adds new components, giving the model more flexibility to learn.

Think of it like hiring more workers when the team is overloaded. The new components start with low importance and have to earn their keep.

Forward(Tensor<T>)

Performs the forward pass using only the top-k most important singular values.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>: Input tensor.

Returns

Tensor<T>: Sum of base layer output and AdaLoRA output (using current rank).

Remarks

Unlike standard LoRA which uses all rank components, AdaLoRA only uses the currentRank most important components based on importance scores. This is more efficient and focuses computation on the most impactful adaptations.

For Beginners: This computes the output using only the important components. If we started with rank=32 but pruned to rank=8, we only use the top 8 most important singular values. This makes computation faster and more focused.

GetImportanceScores()

Gets a copy of the current importance scores.

public Vector<T> GetImportanceScores()

Returns

Vector<T>

MergeToOriginalLayer()

Merges the AdaLoRA adaptation into the base layer and returns the merged layer.

public override ILayer<T> MergeToOriginalLayer()

Returns

ILayer<T>: A new layer with AdaLoRA weights merged into the base layer's weights.

Remarks

For Dense/FullyConnected layers, this merges the LoRA matrices into the base layer weights. Only the currently active components (based on currentRank) are merged.

For Beginners: This "bakes in" your adaptive LoRA to create a regular layer. Only the components that survived pruning (the important ones) are included in the merge.

This gives you a final layer that:

Includes only the useful adaptations
Is as fast as a regular layer
Can be deployed without AdaLoRA infrastructure

Table of Contents

Class AdaLoRAAdapter<T>

Type Parameters

Remarks

Constructors

AdaLoRAAdapter(ILayer<T>, int, double, bool, double, int, int, double)

Parameters

Remarks

Exceptions

Properties

CurrentRank

Property Value

MaxRank

Property Value

Methods

Backward(Tensor<T>)

Parameters

Returns

Remarks

ExpandRank(int)

Parameters

Remarks

Forward(Tensor<T>)

Parameters

Returns

Remarks

GetImportanceScores()

Returns

MergeToOriginalLayer()

Returns

Remarks