Class PiSSAAdapter<T>
Principal Singular Values and Singular Vectors Adaptation (PiSSA) adapter for parameter-efficient fine-tuning.
public class PiSSAAdapter<T> : LoRAAdapterBase<T>, IDisposable, ILoRAAdapter<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
LayerBase<T>PiSSAAdapter<T>
- Implements
-
ILoRAAdapter<T>ILayer<T>
- Inherited Members
Remarks
PiSSA (NeurIPS 2024 Spotlight) improves upon standard LoRA by initializing adapter matrices with principal components from Singular Value Decomposition (SVD) of pretrained weights, rather than random initialization. This results in more effective use of the rank budget and faster convergence.
Key Differences from Standard LoRA: - Standard LoRA: A initialized randomly, B initialized to zero - PiSSA: A and B initialized from top-r singular vectors of pretrained weights - Standard LoRA: All weights trainable - PiSSA: Residual weights frozen, only top-r components trainable
How PiSSA Works: 1. Perform SVD on pretrained weights: W = U Σ V^T 2. Initialize adapter matrices from top-r components: - A = V_r (top-r right singular vectors, dimensions: inputSize × rank) - B = Σ_r * U_r^T (top-r left singular vectors scaled by singular values, dimensions: rank × outputSize) 3. Freeze residual matrix: W_residual = W - (A*B)^T 4. During training: output = W_residual * input + LoRA(input) 5. Only B and A are updated; W_residual stays frozen
Performance Benefits: PiSSA achieves superior performance compared to standard LoRA: - GSM8K benchmark: 72.86% (PiSSA) vs 67.7% (LoRA) - Better initialization captures important pretrained knowledge - More effective gradient updates from the start - Faster convergence with fewer training steps
For Beginners: Think of PiSSA as "smart LoRA initialization".
Standard LoRA starts from random:
- Random A matrix (like throwing darts blindfolded)
- Zero B matrix (starts with no effect)
- Learns everything from scratch
PiSSA starts from the most important parts of pretrained weights:
- A and B capture the top-r "principal directions" of the pretrained model
- Starts closer to the optimal solution
- Like starting a puzzle with the border pieces already connected
Example: If you have a pretrained language model with a 4096x4096 weight matrix, PiSSA with rank=8 will:
- Find the top 8 most important patterns in those weights via SVD
- Put those patterns into A and B (making them trainable)
- Freeze the remaining "less important" patterns
- Train only the top 8 patterns to adapt to your task
This is much more efficient than starting from random and achieves better results!
References: - Paper: "PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models" - Venue: NeurIPS 2024 (Spotlight) - Key Insight: SVD-based initialization > random initialization for low-rank adaptation
Constructors
PiSSAAdapter(ILayer<T>, int, double, bool)
Initializes a new PiSSA adapter wrapping an existing layer.
public PiSSAAdapter(ILayer<T> baseLayer, int rank, double alpha = -1, bool freezeBaseLayer = true)
Parameters
baseLayerILayer<T>The layer to adapt with PiSSA.
rankintThe rank of the low-rank decomposition.
alphadoubleThe LoRA scaling factor (defaults to rank if negative).
freezeBaseLayerboolWhether to freeze the base layer's parameters during training.
Remarks
This constructor creates a PiSSA adapter. After construction, you should call InitializeFromSVD to properly initialize the adapter matrices from pretrained weights. Without SVD initialization, the adapter behaves like standard LoRA (not recommended).
For Beginners: This creates a PiSSA adapter for any layer type.
Parameters:
- baseLayer: The layer you want to adapt (Dense, Convolutional, etc.)
- rank: How many principal components to use (typically 4-32)
- alpha: Scaling factor for the adaptation strength
- freezeBaseLayer: Usually true to freeze original weights
Important: After creating the adapter, call InitializeFromSVD with the pretrained weights to get PiSSA's performance benefits. Otherwise, it's just regular LoRA.
Exceptions
- ArgumentNullException
Thrown when baseLayer is null.
Properties
InitializedFromSVD
Gets whether this adapter was initialized from SVD.
public bool InitializedFromSVD { get; }
Property Value
Remarks
Returns true if InitializeFromSVD was called successfully, false otherwise.
ResidualWeights
Gets the frozen residual weights matrix.
public Matrix<T>? ResidualWeights { get; }
Property Value
- Matrix<T>
Remarks
This matrix is computed during SVD initialization and remains frozen during training. Returns null if SVD initialization was not performed.
Methods
Backward(Tensor<T>)
Performs the backward pass, updating only the trainable adapter matrices (B and A).
public override Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>Gradient flowing back from the next layer.
Returns
- Tensor<T>
Gradient to pass to the previous layer.
Remarks
The backward pass propagates gradients through both the frozen residual path and the trainable LoRA path. However, only the LoRA parameters (A and B) are updated; the residual weights remain frozen.
For Beginners: This is where learning happens in PiSSA.
During backpropagation:
- Gradients flow through both the residual path and the LoRA path
- But only the LoRA matrices (A and B) get updated
- The residual weights stay frozen (no learning)
This is the key to PiSSA's efficiency:
- We only train the top-r most important components
- The rest of the weights stay fixed from pretraining
- Fewer parameters to update = faster training and less overfitting
Forward(Tensor<T>)
Performs the forward pass using residual weights plus trainable PiSSA adaptation.
public override Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>Input tensor.
Returns
- Tensor<T>
Output tensor computed as: residual_output + lora_output.
Remarks
If initialized from SVD, the forward pass computes: output = W_residual * input + LoRA(input)
If not initialized from SVD (falls back to standard LoRA): output = base_layer(input) + LoRA(input)
For Beginners: This runs input through the adapter.
With proper PiSSA initialization:
- First applies frozen residual weights (the "less important" parts)
- Then adds the trainable adaptation (the "important" parts from A and B)
- Result combines both for the final output
Without SVD initialization (not recommended):
- Falls back to standard LoRA behavior
- Uses base layer output + LoRA correction
InitializeFromSVD(ILayer<T>, Matrix<T>, int, double, bool, SvdAlgorithmType)
Creates a PiSSA adapter initialized from SVD of pretrained weights.
public static PiSSAAdapter<T> InitializeFromSVD(ILayer<T> baseLayer, Matrix<T> pretrainedWeights, int rank, double alpha = -1, bool freezeBaseLayer = true, SvdAlgorithmType svdAlgorithm = SvdAlgorithmType.GolubReinsch)
Parameters
baseLayerILayer<T>The layer to adapt with PiSSA.
pretrainedWeightsMatrix<T>The pretrained weight matrix to decompose.
rankintThe rank of the low-rank decomposition.
alphadoubleThe LoRA scaling factor (defaults to rank if negative).
freezeBaseLayerboolWhether to freeze the base layer's parameters during training.
svdAlgorithmSvdAlgorithmTypeThe SVD algorithm to use (default: GolubReinsch).
Returns
- PiSSAAdapter<T>
A PiSSA adapter initialized from SVD.
Remarks
This static factory method creates and fully initializes a PiSSA adapter in one step. It combines construction and SVD initialization for convenience.
For Beginners: This is the recommended way to create a PiSSA adapter.
Instead of:
- Create adapter
- Call InitializeFromSVD
You can just:
- Call this method with pretrained weights
Example: var adapter = PiSSAAdapter.InitializeFromSVD(myLayer, pretrainedWeights, rank: 8); // Ready to train!
InitializeFromSVD(Matrix<T>, SvdAlgorithmType)
Initializes the adapter matrices from SVD of pretrained weights.
public void InitializeFromSVD(Matrix<T> pretrainedWeights, SvdAlgorithmType svdAlgorithm = SvdAlgorithmType.GolubReinsch)
Parameters
pretrainedWeightsMatrix<T>The pretrained weight matrix to decompose.
svdAlgorithmSvdAlgorithmTypeThe SVD algorithm to use (default: GolubReinsch).
Remarks
This method performs the core PiSSA initialization: 1. Computes SVD: W = U Σ V^T 2. Extracts top-r components: U_r, Σ_r, V_r 3. Initializes A = V_r^T (right singular vectors) 4. Initializes B = U_r Σ_r (left singular vectors scaled by singular values) 5. Computes residual: W_residual = W - B*A
For Beginners: This is where the magic happens!
The method:
- Takes your pretrained weights (like from a large language model)
- Finds the most important patterns using SVD (mathematical technique)
- Puts those patterns into the adapter matrices A and B
- Saves the "leftover" patterns as frozen residual weights
Think of it like:
- Original weights = complete painting
- SVD = identifying the main strokes vs. minor details
- A and B = the main strokes (what we'll adjust)
- Residual = the minor details (kept frozen)
This initialization is what makes PiSSA better than LoRA - it starts from a smart place instead of random values.
Exceptions
- ArgumentNullException
Thrown when pretrainedWeights is null.
- ArgumentException
Thrown when weight matrix dimensions don't match layer dimensions.
MergeToOriginalLayer()
Merges the PiSSA adaptation into the original layer.
public override ILayer<T> MergeToOriginalLayer()
Returns
- ILayer<T>
A new layer with PiSSA weights merged back into a single weight matrix.
Remarks
This method reconstructs the full weight matrix by combining: W_merged = W_residual + (A * B)^T
This allows you to deploy the adapted model without the PiSSA overhead.
For Beginners: This "bakes in" the PiSSA adaptation.
After training:
- You have: frozen residual weights + trained A and B matrices
- Merging combines them: residual + A*B = final weights
- Result: a single regular layer with all improvements included
Benefits:
- Faster inference (no need to compute residual + LoRA separately)
- Simpler deployment (just one layer)
- Compatible with systems that don't support LoRA/PiSSA
Example: var mergedLayer = adapter.MergeToOriginalLayer(); // Now you have a standard layer with PiSSA improvements built in!
Exceptions
- InvalidOperationException
Thrown when the adapter was not initialized from SVD.