Class NOLAAdapter<T>
Implements NOLA (Compressing LoRA using Linear Combination of Random Basis) adapter for extreme parameter efficiency.
public class NOLAAdapter<T> : LoRAAdapterBase<T>, IDisposable, ILoRAAdapter<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
LayerBase<T>NOLAAdapter<T>
- Implements
-
ILoRAAdapter<T>ILayer<T>
- Inherited Members
Remarks
NOLA overcomes the rank-one lower bound in traditional LoRA by re-parameterizing the low-rank matrices using linear combinations of randomly generated basis matrices. Instead of optimizing the full low-rank matrices A and B, NOLA: 1. Generates fixed random basis matrices using a deterministic seed 2. Optimizes only scalar coefficients that linearly combine these basis matrices 3. Regenerates basis matrices during forward/backward passes to minimize memory usage
This decouples the number of trainable parameters from both the choice of rank and the network architecture, achieving compression ratios of 20x over standard LoRA without accuracy degradation.
For Beginners: NOLA is an extreme compression technique for LoRA that makes fine-tuning even more efficient. Instead of storing and training two low-rank matrices (A and B), NOLA:
- Generates random "template" matrices on-the-fly (same random numbers every time due to fixed seed)
- Only trains small coefficients that control how much of each template to use
- Achieves 2-3x fewer parameters than LoRA while maintaining performance
Think of it like this:
- Traditional LoRA: You have 100 adjustable knobs (parameters)
- NOLA: You have 5 master controls that blend pre-defined settings
Key innovations:
- Memory efficiency: Random basis matrices are discarded after use and regenerated when needed
- Parameter efficiency: Only coefficients are trained, not full matrices
- Performance: Achieves similar or better results than LoRA with far fewer parameters
Example compression (1000x1000 layer, rank=8):
- LoRA: 16,000 parameters (1000×8 + 8×1000)
- NOLA with 100 basis: 200 parameters (100 coefficients for A + 100 for B) - 80x reduction!
On LLaMA-2 70B, NOLA achieves 20x compression over LoRA with no accuracy loss.
Reference: NOLA: Compressing LoRA using Linear Combination of Random Basis (Koohpayegani et al., ICLR 2024) - https://arxiv.org/abs/2310.02556
Constructors
NOLAAdapter(ILayer<T>, int, int, double, int, bool)
Initializes a new NOLA adapter with the specified parameters.
public NOLAAdapter(ILayer<T> baseLayer, int rank, int numBasis, double alpha = -1, int seed = 42, bool freezeBaseLayer = true)
Parameters
baseLayerILayer<T>The layer to adapt with NOLA.
rankintThe rank of the low-rank decomposition (determines basis matrix dimensions).
numBasisintNumber of random basis matrices to use (controls compression ratio).
alphadoubleThe LoRA scaling factor (defaults to rank if negative).
seedintRandom seed for reproducible basis generation (default: 42).
freezeBaseLayerboolWhether to freeze the base layer's parameters during training.
Remarks
NOLA initialization: - Coefficients are initialized to zero (so NOLA starts with no effect, like LoRA) - Random basis matrices are generated on-demand during forward/backward passes - A fixed seed ensures reproducible basis generation across training
For Beginners: This creates a new NOLA adapter. Important parameters:
- baseLayer: The layer you want to make ultra-efficient to fine-tune
- rank: Controls the "bottleneck" dimension (same as in LoRA)
- numBasis: Controls compression (fewer = more compression, less flexibility)
- seed: Ensures you get the same random "templates" every time
Recommended values:
- For extreme compression (20x): numBasis = rank / 2
- For balanced compression (10x): numBasis = rank
- For moderate compression (5x): numBasis = rank * 2
Example: rank=8, numBasis=4 gives ~40x compression over full fine-tuning!
Exceptions
- ArgumentNullException
Thrown when baseLayer is null.
- ArgumentException
Thrown when rank or numBasis are invalid.
Properties
CompressionRatio
Gets the compression ratio compared to standard LoRA.
public double CompressionRatio { get; }
Property Value
Remarks
Compression ratio = (LoRA parameters) / (NOLA parameters) Higher values indicate more extreme compression.
For Beginners: This tells you how much more efficient NOLA is compared to regular LoRA. For example, a compression ratio of 20 means NOLA uses 20 times fewer parameters!
NumBasis
Gets the number of basis matrices used for compression.
public int NumBasis { get; }
Property Value
Remarks
This determines the compression ratio. Fewer basis matrices = more compression but less flexibility. Typical values range from 10 to 100 depending on the task.
For Beginners: This is the number of "template" matrices we use. More templates give more flexibility but require more coefficients to train. It's the main knob for controlling the compression-accuracy trade-off in NOLA.
ParameterCount
Gets the total number of trainable parameters.
public override int ParameterCount { get; }
Property Value
Remarks
For NOLA, this is just 2 * numBasis (coefficients for A and B), plus base layer parameters if not frozen. This is dramatically smaller than standard LoRA's (inputSize * rank) + (rank * outputSize).
Methods
Backward(Tensor<T>)
Performs the backward pass through both layers.
public override Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>Gradient flowing back from the next layer.
Returns
- Tensor<T>
Gradient to pass to the previous layer.
Remarks
The backward pass: 1. Propagates gradients through base layer (if not frozen) 2. Computes coefficient gradients by regenerating basis matrices and computing inner products 3. Propagates input gradients through NOLA path 4. Sums input gradients from both paths
For Beginners: During learning, this figures out how to improve the coefficients: - For each basis matrix, we compute how much changing its coefficient would reduce error - We regenerate the same random templates (using the fixed seed) to compute gradients - We combine gradients from both the base layer and NOLA paths
The magic is that we only need to update a few coefficients, not entire matrices!
Forward(Tensor<T>)
Performs the forward pass through both base and NOLA layers.
public override Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>Input tensor.
Returns
- Tensor<T>
Sum of base layer output and NOLA output.
Remarks
The forward pass: 1. Reconstructs matrices A and B from coefficients and random basis 2. Computes NOLA output: input * A * B * scaling 3. Adds base layer output 4. Caches A and B for use in backward pass
For Beginners: This processes the input through both the original layer and the NOLA adaptation. The NOLA part: 1. Creates A and B matrices from the learned coefficients 2. Runs the input through A and B (compression then expansion) 3. Scales the result 4. Adds it to the base layer's output
The result is the original behavior plus the ultra-compressed adaptation!
GetCoefficientsA()
Gets the current coefficient values for matrix A (for inspection).
public Vector<T> GetCoefficientsA()
Returns
- Vector<T>
GetCoefficientsB()
Gets the current coefficient values for matrix B (for inspection).
public Vector<T> GetCoefficientsB()
Returns
- Vector<T>
GetParameters()
Gets the current parameters as a vector.
public override Vector<T> GetParameters()
Returns
- Vector<T>
MergeToOriginalLayer()
Merges the NOLA adaptation into the base layer and returns the merged layer.
public override ILayer<T> MergeToOriginalLayer()
Returns
- ILayer<T>
A new layer with NOLA weights merged into the base layer's weights.
Remarks
This reconstructs the full NOLA matrices A and B from coefficients, computes the merged weight matrix (A * B * scaling), and adds it to the base layer's weights.
For Beginners: This "bakes in" your NOLA adaptation to create a regular layer. It reconstructs the full A and B matrices from your learned coefficients and merges them into the base layer. The result is a standard layer with all adaptations built-in.
ResetState()
Resets the internal state of the adapter.
public override void ResetState()
SetParameters(Vector<T>)
Sets the layer parameters from a vector.
public override void SetParameters(Vector<T> parameters)
Parameters
parametersVector<T>
UpdateParameters(T)
Updates parameters using the specified learning rate.
public override void UpdateParameters(T learningRate)
Parameters
learningRateTThe learning rate for parameter updates.