Table of Contents

Class NOLAAdapter<T>

Namespace
AiDotNet.LoRA.Adapters
Assembly
AiDotNet.dll

Implements NOLA (Compressing LoRA using Linear Combination of Random Basis) adapter for extreme parameter efficiency.

public class NOLAAdapter<T> : LoRAAdapterBase<T>, IDisposable, ILoRAAdapter<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>

Type Parameters

T

The numeric type used for calculations, typically float or double.

Inheritance
NOLAAdapter<T>
Implements
Inherited Members

Remarks

NOLA overcomes the rank-one lower bound in traditional LoRA by re-parameterizing the low-rank matrices using linear combinations of randomly generated basis matrices. Instead of optimizing the full low-rank matrices A and B, NOLA: 1. Generates fixed random basis matrices using a deterministic seed 2. Optimizes only scalar coefficients that linearly combine these basis matrices 3. Regenerates basis matrices during forward/backward passes to minimize memory usage

This decouples the number of trainable parameters from both the choice of rank and the network architecture, achieving compression ratios of 20x over standard LoRA without accuracy degradation.

For Beginners: NOLA is an extreme compression technique for LoRA that makes fine-tuning even more efficient. Instead of storing and training two low-rank matrices (A and B), NOLA:

  • Generates random "template" matrices on-the-fly (same random numbers every time due to fixed seed)
  • Only trains small coefficients that control how much of each template to use
  • Achieves 2-3x fewer parameters than LoRA while maintaining performance

Think of it like this:

  • Traditional LoRA: You have 100 adjustable knobs (parameters)
  • NOLA: You have 5 master controls that blend pre-defined settings

Key innovations:

  1. Memory efficiency: Random basis matrices are discarded after use and regenerated when needed
  2. Parameter efficiency: Only coefficients are trained, not full matrices
  3. Performance: Achieves similar or better results than LoRA with far fewer parameters

Example compression (1000x1000 layer, rank=8):

  • LoRA: 16,000 parameters (1000×8 + 8×1000)
  • NOLA with 100 basis: 200 parameters (100 coefficients for A + 100 for B) - 80x reduction!

On LLaMA-2 70B, NOLA achieves 20x compression over LoRA with no accuracy loss.

Reference: NOLA: Compressing LoRA using Linear Combination of Random Basis (Koohpayegani et al., ICLR 2024) - https://arxiv.org/abs/2310.02556

Constructors

NOLAAdapter(ILayer<T>, int, int, double, int, bool)

Initializes a new NOLA adapter with the specified parameters.

public NOLAAdapter(ILayer<T> baseLayer, int rank, int numBasis, double alpha = -1, int seed = 42, bool freezeBaseLayer = true)

Parameters

baseLayer ILayer<T>

The layer to adapt with NOLA.

rank int

The rank of the low-rank decomposition (determines basis matrix dimensions).

numBasis int

Number of random basis matrices to use (controls compression ratio).

alpha double

The LoRA scaling factor (defaults to rank if negative).

seed int

Random seed for reproducible basis generation (default: 42).

freezeBaseLayer bool

Whether to freeze the base layer's parameters during training.

Remarks

NOLA initialization: - Coefficients are initialized to zero (so NOLA starts with no effect, like LoRA) - Random basis matrices are generated on-demand during forward/backward passes - A fixed seed ensures reproducible basis generation across training

For Beginners: This creates a new NOLA adapter. Important parameters:

  • baseLayer: The layer you want to make ultra-efficient to fine-tune
  • rank: Controls the "bottleneck" dimension (same as in LoRA)
  • numBasis: Controls compression (fewer = more compression, less flexibility)
  • seed: Ensures you get the same random "templates" every time

Recommended values:

  • For extreme compression (20x): numBasis = rank / 2
  • For balanced compression (10x): numBasis = rank
  • For moderate compression (5x): numBasis = rank * 2

Example: rank=8, numBasis=4 gives ~40x compression over full fine-tuning!

Exceptions

ArgumentNullException

Thrown when baseLayer is null.

ArgumentException

Thrown when rank or numBasis are invalid.

Properties

CompressionRatio

Gets the compression ratio compared to standard LoRA.

public double CompressionRatio { get; }

Property Value

double

Remarks

Compression ratio = (LoRA parameters) / (NOLA parameters) Higher values indicate more extreme compression.

For Beginners: This tells you how much more efficient NOLA is compared to regular LoRA. For example, a compression ratio of 20 means NOLA uses 20 times fewer parameters!

NumBasis

Gets the number of basis matrices used for compression.

public int NumBasis { get; }

Property Value

int

Remarks

This determines the compression ratio. Fewer basis matrices = more compression but less flexibility. Typical values range from 10 to 100 depending on the task.

For Beginners: This is the number of "template" matrices we use. More templates give more flexibility but require more coefficients to train. It's the main knob for controlling the compression-accuracy trade-off in NOLA.

ParameterCount

Gets the total number of trainable parameters.

public override int ParameterCount { get; }

Property Value

int

Remarks

For NOLA, this is just 2 * numBasis (coefficients for A and B), plus base layer parameters if not frozen. This is dramatically smaller than standard LoRA's (inputSize * rank) + (rank * outputSize).

Methods

Backward(Tensor<T>)

Performs the backward pass through both layers.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>

Gradient flowing back from the next layer.

Returns

Tensor<T>

Gradient to pass to the previous layer.

Remarks

The backward pass: 1. Propagates gradients through base layer (if not frozen) 2. Computes coefficient gradients by regenerating basis matrices and computing inner products 3. Propagates input gradients through NOLA path 4. Sums input gradients from both paths

For Beginners: During learning, this figures out how to improve the coefficients: - For each basis matrix, we compute how much changing its coefficient would reduce error - We regenerate the same random templates (using the fixed seed) to compute gradients - We combine gradients from both the base layer and NOLA paths

The magic is that we only need to update a few coefficients, not entire matrices!

Forward(Tensor<T>)

Performs the forward pass through both base and NOLA layers.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>

Input tensor.

Returns

Tensor<T>

Sum of base layer output and NOLA output.

Remarks

The forward pass: 1. Reconstructs matrices A and B from coefficients and random basis 2. Computes NOLA output: input * A * B * scaling 3. Adds base layer output 4. Caches A and B for use in backward pass

For Beginners: This processes the input through both the original layer and the NOLA adaptation. The NOLA part: 1. Creates A and B matrices from the learned coefficients 2. Runs the input through A and B (compression then expansion) 3. Scales the result 4. Adds it to the base layer's output

The result is the original behavior plus the ultra-compressed adaptation!

GetCoefficientsA()

Gets the current coefficient values for matrix A (for inspection).

public Vector<T> GetCoefficientsA()

Returns

Vector<T>

GetCoefficientsB()

Gets the current coefficient values for matrix B (for inspection).

public Vector<T> GetCoefficientsB()

Returns

Vector<T>

GetParameters()

Gets the current parameters as a vector.

public override Vector<T> GetParameters()

Returns

Vector<T>

MergeToOriginalLayer()

Merges the NOLA adaptation into the base layer and returns the merged layer.

public override ILayer<T> MergeToOriginalLayer()

Returns

ILayer<T>

A new layer with NOLA weights merged into the base layer's weights.

Remarks

This reconstructs the full NOLA matrices A and B from coefficients, computes the merged weight matrix (A * B * scaling), and adds it to the base layer's weights.

For Beginners: This "bakes in" your NOLA adaptation to create a regular layer. It reconstructs the full A and B matrices from your learned coefficients and merges them into the base layer. The result is a standard layer with all adaptations built-in.

ResetState()

Resets the internal state of the adapter.

public override void ResetState()

SetParameters(Vector<T>)

Sets the layer parameters from a vector.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

UpdateParameters(T)

Updates parameters using the specified learning rate.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T

The learning rate for parameter updates.