Table of Contents

Class HRAAdapter<T>

Namespace
AiDotNet.LoRA.Adapters
Assembly
AiDotNet.dll

HRA (Hybrid Rank Adaptation) adapter that combines low-rank and full-rank updates for optimal parameter efficiency.

public class HRAAdapter<T> : LoRAAdapterBase<T>, IDisposable, ILoRAAdapter<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>

Type Parameters

T

The numeric type used for calculations, typically float or double.

Inheritance
HRAAdapter<T>
Implements
Inherited Members

Remarks

HRA addresses a key limitation of standard LoRA: while low-rank updates are efficient, some parameters benefit from full-rank updates. HRA uses a hybrid approach: - Dense low-rank updates for most parameters (efficient, like LoRA) - Sparse full-rank updates for critical parameters (precise, targeted) - Importance-based allocation between the two components

The forward computation is: output = base_layer(input) + low_rank(input) + sparse_full_rank(input) where the hybrid allocation provides the best of both worlds.

For Beginners: HRA is like having two tools instead of one:

Standard LoRA problem:

  • Uses only low-rank updates (compressed, efficient)
  • Some parameters need precise full-rank updates
  • Full fine-tuning is too expensive
  • Need something in between

HRA solution:

  • Most parameters use low-rank updates (efficient, covers 95% of needs)
  • Critical parameters get full-rank updates (precise, covers remaining 5%)
  • Automatically learns which parameters are critical
  • Best quality with minimal parameter overhead

Analogy: Think of home renovation:

  • Low-rank updates: Paint the walls (cheap, covers large area, good enough)
  • Full-rank updates: Replace key structural beams (expensive, small area, critical)
  • HRA: Do both where appropriate for best results

How it works:

  1. Start with LoRA-style low-rank matrices (B * A)
  2. Add sparse full-rank updates for most important parameters
  3. Track importance scores during training
  4. Allocate parameter budget optimally between low-rank and sparse full-rank

Benefits:

  • Better quality than pure LoRA (full-rank updates where needed)
  • More efficient than full fine-tuning (most updates are low-rank)
  • Adaptive: learns which parameters need full-rank updates
  • Flexible: adjustable sparsity budget for full-rank component

Use cases:

  • Tasks where LoRA quality is not quite sufficient
  • Fine-tuning with specific architectural bottlenecks
  • When you have slightly more parameter budget than LoRA but much less than full fine-tuning
  • Domains where certain parameters are known to be critical

Example parameter comparison for a 1000x1000 layer:

  • Full fine-tuning: 1,000,000 parameters
  • Standard LoRA (rank=8): 16,000 parameters (98.4% reduction)
  • HRA (rank=8, 1% sparsity): 26,000 parameters (97.4% reduction, better quality)

Reference: Based on "Hybrid Rank Adaptation" research combining low-rank and sparse full-rank approaches

Constructors

HRAAdapter(ILayer<T>, int, double, double, bool, int, double, bool)

Initializes a new HRA adapter with hybrid low-rank and sparse full-rank updates.

public HRAAdapter(ILayer<T> baseLayer, int rank, double sparsityRatio = 0.01, double alpha = -1, bool freezeBaseLayer = true, int importanceUpdateInterval = 100, double importanceEMA = 0.95, bool useDynamicAllocation = true)

Parameters

baseLayer ILayer<T>

The layer to adapt with HRA.

rank int

The rank of the low-rank decomposition.

sparsityRatio double

Fraction of parameters for sparse full-rank updates (0.0 to 1.0, default: 0.01).

alpha double

The LoRA scaling factor (defaults to rank if negative).

freezeBaseLayer bool

Whether to freeze the base layer's parameters during training.

importanceUpdateInterval int

Steps between importance recalculation (default: 100).

importanceEMA double

EMA factor for importance smoothing (default: 0.95).

useDynamicAllocation bool

Whether to dynamically reallocate sparse parameters (default: true).

Remarks

For Beginners: This creates an HRA adapter that combines two update strategies.

Parameters:

  • baseLayer: The layer you want to adapt
  • rank: Size of the low-rank component (typical: 8-16)
  • sparsityRatio: Budget for full-rank updates (0.01 = 1% of parameters get special treatment)
  • alpha: Strength of the low-rank adaptation
  • freezeBaseLayer: Lock original weights (usually true)
  • importanceUpdateInterval: How often to reassess which parameters are important
  • importanceEMA: How stable importance scores are (higher = more stable)
  • useDynamicAllocation: Automatically move sparse budget to most important parameters

Example: new HRAAdapter(layer, rank: 8, sparsityRatio: 0.01) This gives you LoRA-style updates for most parameters, plus precise updates for the top 1%.

Exceptions

ArgumentNullException

Thrown when baseLayer is null.

ArgumentException

Thrown when parameters are invalid.

Properties

ActiveSparseParams

Gets the number of active sparse full-rank parameters.

public int ActiveSparseParams { get; }

Property Value

int

MaxSparseParams

Gets the maximum allowed sparse parameters.

public int MaxSparseParams { get; }

Property Value

int

ParameterCount

Gets the total number of trainable parameters (low-rank + sparse full-rank).

public override int ParameterCount { get; }

Property Value

int

SparsityRatio

Gets the current sparsity ratio.

public double SparsityRatio { get; }

Property Value

double

Methods

Backward(Tensor<T>)

Performs the backward pass through the HRA adapter.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>

Gradient flowing back from the next layer.

Returns

Tensor<T>

Gradient to pass to the previous layer.

Remarks

The backward pass computes gradients for: 1. Low-rank LoRA matrices (A and B) 2. Sparse full-rank parameters 3. Updates importance scores based on gradient magnitudes

For Beginners: This is where HRA learns which parameters are important! During backpropagation: 1. Compute gradients for low-rank component (standard LoRA) 2. Compute gradients for sparse full-rank parameters 3. Track which parameters have large gradients (they're important!) 4. Periodically reassign sparse budget to most important parameters

This adaptive approach ensures the sparse full-rank budget is always allocated to the parameters that need it most.

Forward(Tensor<T>)

Performs the forward pass through the HRA adapter.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>

Input tensor.

Returns

Tensor<T>

Sum of base layer output, low-rank LoRA output, and sparse full-rank output.

Remarks

The HRA forward pass computes three components: 1. Base layer output (original behavior) 2. Low-rank LoRA output: scaling * B * A * input 3. Sparse full-rank output: sparse_scaling * S * input (where S is sparse)

For Beginners: This processes input through three paths and adds them: 1. Original layer (base behavior) 2. LoRA low-rank path (efficient updates for most parameters) 3. Sparse full-rank path (precise updates for VIP parameters)

Think of it as a team effort:

  • Base layer: The foundation
  • Low-rank: The general workforce (handles most of the load efficiently)
  • Sparse full-rank: The specialists (handle critical details precisely)

GetParameterImportance()

Gets a copy of the current parameter importance matrix.

public Matrix<T> GetParameterImportance()

Returns

Matrix<T>

Matrix of importance scores for each parameter.

Remarks

For Beginners: This lets you see which parameters the model considers important. High values indicate parameters that are candidates for sparse full-rank updates. Useful for understanding and debugging the hybrid allocation strategy.

GetParameters()

Gets all parameters including base, LoRA, and sparse full-rank parameters.

public override Vector<T> GetParameters()

Returns

Vector<T>

Vector containing all trainable parameters.

GetSparseUpdates()

Gets the positions and values of current sparse full-rank updates.

public Dictionary<(int row, int col), T> GetSparseUpdates()

Returns

Dictionary<(int row, int col), T>

Dictionary mapping (row, col) positions to update values.

Remarks

For Beginners: This shows you exactly which parameters are receiving the VIP treatment (full-rank updates). You can inspect this to understand where the model is allocating its sparse parameter budget.

MergeToOriginalLayer()

Merges the HRA adaptation into the base layer and returns the merged layer.

public override ILayer<T> MergeToOriginalLayer()

Returns

ILayer<T>

A new layer with both low-rank and sparse full-rank updates merged.

Remarks

This merges both the low-rank LoRA component and the sparse full-rank component into the base layer's weights, creating a single efficient layer.

For Beginners: This "bakes in" both types of updates for deployment.

The merged layer includes:

  • Original base layer weights
  • Low-rank LoRA updates (for general improvements)
  • Sparse full-rank updates (for critical parameters)

Result: A single layer with all adaptations built-in, ready for fast inference.

SetParameters(Vector<T>)

Sets all parameters including base, LoRA, and sparse full-rank parameters.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

Vector containing all parameters to set.

UpdateParameters(T)

Updates parameters using the specified learning rate.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T

The learning rate for parameter updates.