Class HRAAdapter<T>

Namespace: AiDotNet.LoRA.Adapters

Assembly: AiDotNet.dll

HRA (Hybrid Rank Adaptation) adapter that combines low-rank and full-rank updates for optimal parameter efficiency.

public class HRAAdapter<T> : LoRAAdapterBase<T>, IDisposable, ILoRAAdapter<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>

Type Parameters

T: The numeric type used for calculations, typically float or double.

Inheritance: object

LayerBase<T>

LoRAAdapterBase<T>

HRAAdapter<T>

Implements: IDisposable

ILoRAAdapter<T>

ILayer<T>

IJitCompilable<T>

IDiagnosticsProvider

IWeightLoadable<T>

Inherited Members: LoRAAdapterBase<T>._baseLayer

LoRAAdapterBase<T>._loraLayer

LoRAAdapterBase<T>._freezeBaseLayer

LoRAAdapterBase<T>.BaseLayer

LoRAAdapterBase<T>.LoRALayer

LoRAAdapterBase<T>.IsBaseLayerFrozen

LoRAAdapterBase<T>.Rank

LoRAAdapterBase<T>.Alpha

LoRAAdapterBase<T>.SupportsTraining

LoRAAdapterBase<T>.CreateLoRALayer(int, double)

LoRAAdapterBase<T>.CreateMergedLayerWithClone(Vector<T>)

LoRAAdapterBase<T>.MergeToDenseOrFullyConnected()

LoRAAdapterBase<T>.UpdateParametersFromLayers()

LoRAAdapterBase<T>.ResetState()

LoRAAdapterBase<T>.SupportsJitCompilation

LoRAAdapterBase<T>.ExportComputationGraph(List<ComputationNode<T>>)

LayerBase<T>.Engine

LayerBase<T>.ScalarActivation

LayerBase<T>.VectorActivation

LayerBase<T>.UsingVectorActivation

LayerBase<T>.NumOps

LayerBase<T>.Random

LayerBase<T>.Parameters

LayerBase<T>.ParameterGradients

LayerBase<T>.InputShape

LayerBase<T>.InputShapes

LayerBase<T>.UpdateInputShape(int[])

LayerBase<T>.OutputShape

LayerBase<T>.IsTrainingMode

LayerBase<T>.InitializationStrategy

LayerBase<T>.IsInitialized

LayerBase<T>.InitializationLock

LayerBase<T>.EnsureInitialized()

LayerBase<T>.UseAutodiff

LayerBase<T>.SetTrainingMode(bool)

LayerBase<T>.GetParameterGradients()

LayerBase<T>.ClearGradients()

LayerBase<T>.GetInputShape()

LayerBase<T>.GetInputShapes()

LayerBase<T>.GetOutputShape()

LayerBase<T>.GetWeights()

LayerBase<T>.GetBiases()

LayerBase<T>.MapActivationToFused()

LayerBase<T>.SupportsGpuExecution

LayerBase<T>.SupportsGpuTraining

LayerBase<T>.CanExecuteOnGpu

LayerBase<T>.CanTrainOnGpu

LayerBase<T>.ForwardGpu(params IGpuTensor<T>[])

LayerBase<T>.BackwardGpu(IGpuTensor<T>)

LayerBase<T>.UpdateParametersGpu(IGpuOptimizerConfig)

LayerBase<T>.UploadWeightsToGpu()

LayerBase<T>.DownloadWeightsFromGpu()

LayerBase<T>.ZeroGradientsGpu()

LayerBase<T>.GetActivationTypes()

LayerBase<T>.Forward(params Tensor<T>[])

LayerBase<T>.ApplyActivation(Tensor<T>)

LayerBase<T>.ApplyActivation(Vector<T>)

LayerBase<T>.ActivateTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ActivateTensor(IVectorActivationFunction<T>, Tensor<T>)

LayerBase<T>.CalculateInputShape(int, int, int)

LayerBase<T>.CalculateOutputShape(int, int, int)

LayerBase<T>.Clone()

LayerBase<T>.DerivativeTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ApplyActivationDerivative(T, T)

LayerBase<T>.ApplyActivationDerivative(Tensor<T>, Tensor<T>)

LayerBase<T>.ComputeActivationJacobian(Vector<T>)

LayerBase<T>.ApplyActivationDerivative(Vector<T>, Vector<T>)

LayerBase<T>.UpdateParameters(Vector<T>)

LayerBase<T>.Serialize(BinaryWriter)

LayerBase<T>.Deserialize(BinaryReader)

LayerBase<T>.GetDiagnostics()

LayerBase<T>.ApplyActivationToGraph(ComputationNode<T>)

LayerBase<T>.CanActivationBeJitted()

LayerBase<T>.RegisterTrainableParameter(Tensor<T>, PersistentTensorRole)

LayerBase<T>.InvalidateTrainableParameter(Tensor<T>)

LayerBase<T>.HasGpuActivation()

LayerBase<T>.ApplyActivationForwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.ApplyActivationBackwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.GetFusedActivationType()

LayerBase<T>.ApplyGpuActivation(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int, FusedActivationType)

LayerBase<T>.ApplyGpuActivationBackward(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int, FusedActivationType, float)

LayerBase<T>.Dispose()

LayerBase<T>.Dispose(bool)

LayerBase<T>.WeightParameterName

LayerBase<T>.BiasParameterName

LayerBase<T>.SetWeights(Tensor<T>)

LayerBase<T>.SetBiases(Tensor<T>)

LayerBase<T>.GetParameterNames()

LayerBase<T>.TryGetParameter(string, out Tensor<T>)

LayerBase<T>.SetParameter(string, Tensor<T>)

LayerBase<T>.GetParameterShape(string)

LayerBase<T>.NamedParameterCount

LayerBase<T>.ValidateWeights(IEnumerable<string>, Func<string, string>)

LayerBase<T>.LoadWeights(Dictionary<string, Tensor<T>>, Func<string, string>, bool)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

HRA addresses a key limitation of standard LoRA: while low-rank updates are efficient, some parameters benefit from full-rank updates. HRA uses a hybrid approach: - Dense low-rank updates for most parameters (efficient, like LoRA) - Sparse full-rank updates for critical parameters (precise, targeted) - Importance-based allocation between the two components

The forward computation is: output = base_layer(input) + low_rank(input) + sparse_full_rank(input) where the hybrid allocation provides the best of both worlds.

For Beginners: HRA is like having two tools instead of one:

Standard LoRA problem:

Uses only low-rank updates (compressed, efficient)
Some parameters need precise full-rank updates
Full fine-tuning is too expensive
Need something in between

HRA solution:

Most parameters use low-rank updates (efficient, covers 95% of needs)
Critical parameters get full-rank updates (precise, covers remaining 5%)
Automatically learns which parameters are critical
Best quality with minimal parameter overhead

Analogy: Think of home renovation:

Low-rank updates: Paint the walls (cheap, covers large area, good enough)
Full-rank updates: Replace key structural beams (expensive, small area, critical)
HRA: Do both where appropriate for best results

How it works:

Start with LoRA-style low-rank matrices (B * A)
Add sparse full-rank updates for most important parameters
Track importance scores during training
Allocate parameter budget optimally between low-rank and sparse full-rank

Benefits:

Better quality than pure LoRA (full-rank updates where needed)
More efficient than full fine-tuning (most updates are low-rank)
Adaptive: learns which parameters need full-rank updates
Flexible: adjustable sparsity budget for full-rank component

Use cases:

Tasks where LoRA quality is not quite sufficient
Fine-tuning with specific architectural bottlenecks
When you have slightly more parameter budget than LoRA but much less than full fine-tuning
Domains where certain parameters are known to be critical

Example parameter comparison for a 1000x1000 layer:

Full fine-tuning: 1,000,000 parameters
Standard LoRA (rank=8): 16,000 parameters (98.4% reduction)
HRA (rank=8, 1% sparsity): 26,000 parameters (97.4% reduction, better quality)

Reference: Based on "Hybrid Rank Adaptation" research combining low-rank and sparse full-rank approaches

Constructors

HRAAdapter(ILayer<T>, int, double, double, bool, int, double, bool)

Initializes a new HRA adapter with hybrid low-rank and sparse full-rank updates.

public HRAAdapter(ILayer<T> baseLayer, int rank, double sparsityRatio = 0.01, double alpha = -1, bool freezeBaseLayer = true, int importanceUpdateInterval = 100, double importanceEMA = 0.95, bool useDynamicAllocation = true)

Parameters

baseLayer ILayer<T>: The layer to adapt with HRA.
rank int: The rank of the low-rank decomposition.
sparsityRatio double: Fraction of parameters for sparse full-rank updates (0.0 to 1.0, default: 0.01).
alpha double: The LoRA scaling factor (defaults to rank if negative).
freezeBaseLayer bool: Whether to freeze the base layer's parameters during training.
importanceUpdateInterval int: Steps between importance recalculation (default: 100).
importanceEMA double: EMA factor for importance smoothing (default: 0.95).
useDynamicAllocation bool: Whether to dynamically reallocate sparse parameters (default: true).

Remarks

For Beginners: This creates an HRA adapter that combines two update strategies.

Parameters:

baseLayer: The layer you want to adapt
rank: Size of the low-rank component (typical: 8-16)
sparsityRatio: Budget for full-rank updates (0.01 = 1% of parameters get special treatment)
alpha: Strength of the low-rank adaptation
freezeBaseLayer: Lock original weights (usually true)
importanceUpdateInterval: How often to reassess which parameters are important
importanceEMA: How stable importance scores are (higher = more stable)
useDynamicAllocation: Automatically move sparse budget to most important parameters

Example: new HRAAdapter(layer, rank: 8, sparsityRatio: 0.01) This gives you LoRA-style updates for most parameters, plus precise updates for the top 1%.

Exceptions

ArgumentNullException: Thrown when baseLayer is null.
ArgumentException: Thrown when parameters are invalid.

Properties

ActiveSparseParams

Gets the number of active sparse full-rank parameters.

public int ActiveSparseParams { get; }

Property Value

int

MaxSparseParams

Gets the maximum allowed sparse parameters.

public int MaxSparseParams { get; }

Property Value

int

ParameterCount

Gets the total number of trainable parameters (low-rank + sparse full-rank).

public override int ParameterCount { get; }

Property Value

int

SparsityRatio

Gets the current sparsity ratio.

public double SparsityRatio { get; }

Property Value

double

Methods

Backward(Tensor<T>)

Performs the backward pass through the HRA adapter.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>: Gradient flowing back from the next layer.

Returns

Tensor<T>: Gradient to pass to the previous layer.

Remarks

The backward pass computes gradients for: 1. Low-rank LoRA matrices (A and B) 2. Sparse full-rank parameters 3. Updates importance scores based on gradient magnitudes

For Beginners: This is where HRA learns which parameters are important! During backpropagation: 1. Compute gradients for low-rank component (standard LoRA) 2. Compute gradients for sparse full-rank parameters 3. Track which parameters have large gradients (they're important!) 4. Periodically reassign sparse budget to most important parameters

This adaptive approach ensures the sparse full-rank budget is always allocated to the parameters that need it most.

Forward(Tensor<T>)

Performs the forward pass through the HRA adapter.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>: Input tensor.

Returns

Tensor<T>: Sum of base layer output, low-rank LoRA output, and sparse full-rank output.

Remarks

The HRA forward pass computes three components: 1. Base layer output (original behavior) 2. Low-rank LoRA output: scaling * B * A * input 3. Sparse full-rank output: sparse_scaling * S * input (where S is sparse)

For Beginners: This processes input through three paths and adds them: 1. Original layer (base behavior) 2. LoRA low-rank path (efficient updates for most parameters) 3. Sparse full-rank path (precise updates for VIP parameters)

Think of it as a team effort:

Base layer: The foundation
Low-rank: The general workforce (handles most of the load efficiently)
Sparse full-rank: The specialists (handle critical details precisely)

GetParameterImportance()

Gets a copy of the current parameter importance matrix.

public Matrix<T> GetParameterImportance()

Returns

Matrix<T>: Matrix of importance scores for each parameter.

Remarks

For Beginners: This lets you see which parameters the model considers important. High values indicate parameters that are candidates for sparse full-rank updates. Useful for understanding and debugging the hybrid allocation strategy.

GetParameters()

Gets all parameters including base, LoRA, and sparse full-rank parameters.

public override Vector<T> GetParameters()

Returns

Vector<T>: Vector containing all trainable parameters.

GetSparseUpdates()

Gets the positions and values of current sparse full-rank updates.

public Dictionary<(int row, int col), T> GetSparseUpdates()

Returns

Dictionary<(int row, int col), T>: Dictionary mapping (row, col) positions to update values.

Remarks

For Beginners: This shows you exactly which parameters are receiving the VIP treatment (full-rank updates). You can inspect this to understand where the model is allocating its sparse parameter budget.

MergeToOriginalLayer()

Merges the HRA adaptation into the base layer and returns the merged layer.

public override ILayer<T> MergeToOriginalLayer()

Returns

ILayer<T>: A new layer with both low-rank and sparse full-rank updates merged.

Remarks

This merges both the low-rank LoRA component and the sparse full-rank component into the base layer's weights, creating a single efficient layer.

For Beginners: This "bakes in" both types of updates for deployment.

The merged layer includes:

Original base layer weights
Low-rank LoRA updates (for general improvements)
Sparse full-rank updates (for critical parameters)

Result: A single layer with all adaptations built-in, ready for fast inference.

SetParameters(Vector<T>)

Sets all parameters including base, LoRA, and sparse full-rank parameters.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>: Vector containing all parameters to set.

UpdateParameters(T)

Updates parameters using the specified learning rate.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T: The learning rate for parameter updates.

Table of Contents

Class HRAAdapter<T>

Type Parameters

Remarks

Constructors

HRAAdapter(ILayer<T>, int, double, double, bool, int, double, bool)

Parameters

Remarks

Exceptions

Properties

ActiveSparseParams

Property Value

MaxSparseParams

Property Value

ParameterCount

Property Value

SparsityRatio

Property Value

Methods

Backward(Tensor<T>)

Parameters

Returns

Remarks

Forward(Tensor<T>)

Parameters

Returns

Remarks

GetParameterImportance()

Returns

Remarks

GetParameters()

Returns

GetSparseUpdates()

Returns

Remarks

MergeToOriginalLayer()

Returns

Remarks

SetParameters(Vector<T>)

Parameters

UpdateParameters(T)

Parameters