Class PiSSAAdapter<T>

Namespace: AiDotNet.LoRA.Adapters

Assembly: AiDotNet.dll

Principal Singular Values and Singular Vectors Adaptation (PiSSA) adapter for parameter-efficient fine-tuning.

public class PiSSAAdapter<T> : LoRAAdapterBase<T>, IDisposable, ILoRAAdapter<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>

Type Parameters

T: The numeric type used for calculations, typically float or double.

Inheritance: object

LayerBase<T>

LoRAAdapterBase<T>

PiSSAAdapter<T>

Implements: IDisposable

ILoRAAdapter<T>

ILayer<T>

IJitCompilable<T>

IDiagnosticsProvider

IWeightLoadable<T>

Inherited Members: LoRAAdapterBase<T>._baseLayer

LoRAAdapterBase<T>._loraLayer

LoRAAdapterBase<T>._freezeBaseLayer

LoRAAdapterBase<T>.BaseLayer

LoRAAdapterBase<T>.LoRALayer

LoRAAdapterBase<T>.IsBaseLayerFrozen

LoRAAdapterBase<T>.Rank

LoRAAdapterBase<T>.Alpha

LoRAAdapterBase<T>.ParameterCount

LoRAAdapterBase<T>.SupportsTraining

LoRAAdapterBase<T>.CreateLoRALayer(int, double)

LoRAAdapterBase<T>.UpdateParameters(T)

LoRAAdapterBase<T>.GetParameters()

LoRAAdapterBase<T>.SetParameters(Vector<T>)

LoRAAdapterBase<T>.CreateMergedLayerWithClone(Vector<T>)

LoRAAdapterBase<T>.MergeToDenseOrFullyConnected()

LoRAAdapterBase<T>.UpdateParametersFromLayers()

LoRAAdapterBase<T>.ResetState()

LoRAAdapterBase<T>.SupportsJitCompilation

LoRAAdapterBase<T>.ExportComputationGraph(List<ComputationNode<T>>)

LayerBase<T>.Engine

LayerBase<T>.ScalarActivation

LayerBase<T>.VectorActivation

LayerBase<T>.UsingVectorActivation

LayerBase<T>.NumOps

LayerBase<T>.Random

LayerBase<T>.Parameters

LayerBase<T>.ParameterGradients

LayerBase<T>.InputShape

LayerBase<T>.InputShapes

LayerBase<T>.UpdateInputShape(int[])

LayerBase<T>.OutputShape

LayerBase<T>.IsTrainingMode

LayerBase<T>.InitializationStrategy

LayerBase<T>.IsInitialized

LayerBase<T>.InitializationLock

LayerBase<T>.EnsureInitialized()

LayerBase<T>.UseAutodiff

LayerBase<T>.SetTrainingMode(bool)

LayerBase<T>.GetParameterGradients()

LayerBase<T>.ClearGradients()

LayerBase<T>.GetInputShape()

LayerBase<T>.GetInputShapes()

LayerBase<T>.GetOutputShape()

LayerBase<T>.GetWeights()

LayerBase<T>.GetBiases()

LayerBase<T>.MapActivationToFused()

LayerBase<T>.SupportsGpuExecution

LayerBase<T>.SupportsGpuTraining

LayerBase<T>.CanExecuteOnGpu

LayerBase<T>.CanTrainOnGpu

LayerBase<T>.ForwardGpu(params IGpuTensor<T>[])

LayerBase<T>.BackwardGpu(IGpuTensor<T>)

LayerBase<T>.UpdateParametersGpu(IGpuOptimizerConfig)

LayerBase<T>.UploadWeightsToGpu()

LayerBase<T>.DownloadWeightsFromGpu()

LayerBase<T>.ZeroGradientsGpu()

LayerBase<T>.GetActivationTypes()

LayerBase<T>.Forward(params Tensor<T>[])

LayerBase<T>.ApplyActivation(Tensor<T>)

LayerBase<T>.ApplyActivation(Vector<T>)

LayerBase<T>.ActivateTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ActivateTensor(IVectorActivationFunction<T>, Tensor<T>)

LayerBase<T>.CalculateInputShape(int, int, int)

LayerBase<T>.CalculateOutputShape(int, int, int)

LayerBase<T>.Clone()

LayerBase<T>.DerivativeTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ApplyActivationDerivative(T, T)

LayerBase<T>.ApplyActivationDerivative(Tensor<T>, Tensor<T>)

LayerBase<T>.ComputeActivationJacobian(Vector<T>)

LayerBase<T>.ApplyActivationDerivative(Vector<T>, Vector<T>)

LayerBase<T>.UpdateParameters(Vector<T>)

LayerBase<T>.Serialize(BinaryWriter)

LayerBase<T>.Deserialize(BinaryReader)

LayerBase<T>.GetDiagnostics()

LayerBase<T>.ApplyActivationToGraph(ComputationNode<T>)

LayerBase<T>.CanActivationBeJitted()

LayerBase<T>.RegisterTrainableParameter(Tensor<T>, PersistentTensorRole)

LayerBase<T>.InvalidateTrainableParameter(Tensor<T>)

LayerBase<T>.HasGpuActivation()

LayerBase<T>.ApplyActivationForwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.ApplyActivationBackwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.GetFusedActivationType()

LayerBase<T>.ApplyGpuActivation(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int, FusedActivationType)

LayerBase<T>.ApplyGpuActivationBackward(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int, FusedActivationType, float)

LayerBase<T>.Dispose()

LayerBase<T>.Dispose(bool)

LayerBase<T>.WeightParameterName

LayerBase<T>.BiasParameterName

LayerBase<T>.SetWeights(Tensor<T>)

LayerBase<T>.SetBiases(Tensor<T>)

LayerBase<T>.GetParameterNames()

LayerBase<T>.TryGetParameter(string, out Tensor<T>)

LayerBase<T>.SetParameter(string, Tensor<T>)

LayerBase<T>.GetParameterShape(string)

LayerBase<T>.NamedParameterCount

LayerBase<T>.ValidateWeights(IEnumerable<string>, Func<string, string>)

LayerBase<T>.LoadWeights(Dictionary<string, Tensor<T>>, Func<string, string>, bool)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

PiSSA (NeurIPS 2024 Spotlight) improves upon standard LoRA by initializing adapter matrices with principal components from Singular Value Decomposition (SVD) of pretrained weights, rather than random initialization. This results in more effective use of the rank budget and faster convergence.

Key Differences from Standard LoRA: - Standard LoRA: A initialized randomly, B initialized to zero - PiSSA: A and B initialized from top-r singular vectors of pretrained weights - Standard LoRA: All weights trainable - PiSSA: Residual weights frozen, only top-r components trainable

How PiSSA Works: 1. Perform SVD on pretrained weights: W = U Σ V^T 2. Initialize adapter matrices from top-r components: - A = V_r (top-r right singular vectors, dimensions: inputSize × rank) - B = Σ_r * U_r^T (top-r left singular vectors scaled by singular values, dimensions: rank × outputSize) 3. Freeze residual matrix: W_residual = W - (A*B)^T 4. During training: output = W_residual * input + LoRA(input) 5. Only B and A are updated; W_residual stays frozen

Performance Benefits: PiSSA achieves superior performance compared to standard LoRA: - GSM8K benchmark: 72.86% (PiSSA) vs 67.7% (LoRA) - Better initialization captures important pretrained knowledge - More effective gradient updates from the start - Faster convergence with fewer training steps

For Beginners: Think of PiSSA as "smart LoRA initialization".

Standard LoRA starts from random:

Random A matrix (like throwing darts blindfolded)
Zero B matrix (starts with no effect)
Learns everything from scratch

PiSSA starts from the most important parts of pretrained weights:

A and B capture the top-r "principal directions" of the pretrained model
Starts closer to the optimal solution
Like starting a puzzle with the border pieces already connected

Example: If you have a pretrained language model with a 4096x4096 weight matrix, PiSSA with rank=8 will:

Find the top 8 most important patterns in those weights via SVD
Put those patterns into A and B (making them trainable)
Freeze the remaining "less important" patterns
Train only the top 8 patterns to adapt to your task

This is much more efficient than starting from random and achieves better results!

References: - Paper: "PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models" - Venue: NeurIPS 2024 (Spotlight) - Key Insight: SVD-based initialization > random initialization for low-rank adaptation

Constructors

PiSSAAdapter(ILayer<T>, int, double, bool)

Initializes a new PiSSA adapter wrapping an existing layer.

public PiSSAAdapter(ILayer<T> baseLayer, int rank, double alpha = -1, bool freezeBaseLayer = true)

Parameters

baseLayer ILayer<T>: The layer to adapt with PiSSA.
rank int: The rank of the low-rank decomposition.
alpha double: The LoRA scaling factor (defaults to rank if negative).
freezeBaseLayer bool: Whether to freeze the base layer's parameters during training.

Remarks

This constructor creates a PiSSA adapter. After construction, you should call InitializeFromSVD to properly initialize the adapter matrices from pretrained weights. Without SVD initialization, the adapter behaves like standard LoRA (not recommended).

For Beginners: This creates a PiSSA adapter for any layer type.

Parameters:

baseLayer: The layer you want to adapt (Dense, Convolutional, etc.)
rank: How many principal components to use (typically 4-32)
alpha: Scaling factor for the adaptation strength
freezeBaseLayer: Usually true to freeze original weights

Important: After creating the adapter, call InitializeFromSVD with the pretrained weights to get PiSSA's performance benefits. Otherwise, it's just regular LoRA.

Exceptions

ArgumentNullException: Thrown when baseLayer is null.

Properties

InitializedFromSVD

Gets whether this adapter was initialized from SVD.

public bool InitializedFromSVD { get; }

Property Value

bool

Remarks

Returns true if InitializeFromSVD was called successfully, false otherwise.

ResidualWeights

Gets the frozen residual weights matrix.

public Matrix<T>? ResidualWeights { get; }

Property Value

Matrix<T>

Remarks

This matrix is computed during SVD initialization and remains frozen during training. Returns null if SVD initialization was not performed.

Methods

Backward(Tensor<T>)

Performs the backward pass, updating only the trainable adapter matrices (B and A).

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>: Gradient flowing back from the next layer.

Returns

Tensor<T>: Gradient to pass to the previous layer.

Remarks

The backward pass propagates gradients through both the frozen residual path and the trainable LoRA path. However, only the LoRA parameters (A and B) are updated; the residual weights remain frozen.

For Beginners: This is where learning happens in PiSSA.

During backpropagation:

Gradients flow through both the residual path and the LoRA path
But only the LoRA matrices (A and B) get updated
The residual weights stay frozen (no learning)

This is the key to PiSSA's efficiency:

We only train the top-r most important components
The rest of the weights stay fixed from pretraining
Fewer parameters to update = faster training and less overfitting

Forward(Tensor<T>)

Performs the forward pass using residual weights plus trainable PiSSA adaptation.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>: Input tensor.

Returns

Tensor<T>: Output tensor computed as: residual_output + lora_output.

Remarks

If initialized from SVD, the forward pass computes: output = W_residual * input + LoRA(input)

If not initialized from SVD (falls back to standard LoRA): output = base_layer(input) + LoRA(input)

For Beginners: This runs input through the adapter.

With proper PiSSA initialization:

First applies frozen residual weights (the "less important" parts)
Then adds the trainable adaptation (the "important" parts from A and B)
Result combines both for the final output

Without SVD initialization (not recommended):

Falls back to standard LoRA behavior
Uses base layer output + LoRA correction

InitializeFromSVD(ILayer<T>, Matrix<T>, int, double, bool, SvdAlgorithmType)

Creates a PiSSA adapter initialized from SVD of pretrained weights.

public static PiSSAAdapter<T> InitializeFromSVD(ILayer<T> baseLayer, Matrix<T> pretrainedWeights, int rank, double alpha = -1, bool freezeBaseLayer = true, SvdAlgorithmType svdAlgorithm = SvdAlgorithmType.GolubReinsch)

Parameters

baseLayer ILayer<T>: The layer to adapt with PiSSA.
pretrainedWeights Matrix<T>: The pretrained weight matrix to decompose.
rank int: The rank of the low-rank decomposition.
alpha double: The LoRA scaling factor (defaults to rank if negative).
freezeBaseLayer bool: Whether to freeze the base layer's parameters during training.
svdAlgorithm SvdAlgorithmType: The SVD algorithm to use (default: GolubReinsch).

Returns

PiSSAAdapter<T>: A PiSSA adapter initialized from SVD.

Remarks

This static factory method creates and fully initializes a PiSSA adapter in one step. It combines construction and SVD initialization for convenience.

For Beginners: This is the recommended way to create a PiSSA adapter.

Instead of:

Create adapter
Call InitializeFromSVD

You can just:

Call this method with pretrained weights

Example: var adapter = PiSSAAdapter.InitializeFromSVD(myLayer, pretrainedWeights, rank: 8); // Ready to train!

InitializeFromSVD(Matrix<T>, SvdAlgorithmType)

Initializes the adapter matrices from SVD of pretrained weights.

public void InitializeFromSVD(Matrix<T> pretrainedWeights, SvdAlgorithmType svdAlgorithm = SvdAlgorithmType.GolubReinsch)

Parameters

pretrainedWeights Matrix<T>: The pretrained weight matrix to decompose.
svdAlgorithm SvdAlgorithmType: The SVD algorithm to use (default: GolubReinsch).

Remarks

This method performs the core PiSSA initialization: 1. Computes SVD: W = U Σ V^T 2. Extracts top-r components: U_r, Σ_r, V_r 3. Initializes A = V_r^T (right singular vectors) 4. Initializes B = U_r Σ_r (left singular vectors scaled by singular values) 5. Computes residual: W_residual = W - B*A

For Beginners: This is where the magic happens!

The method:

Takes your pretrained weights (like from a large language model)
Finds the most important patterns using SVD (mathematical technique)
Puts those patterns into the adapter matrices A and B
Saves the "leftover" patterns as frozen residual weights

Think of it like:

Original weights = complete painting
SVD = identifying the main strokes vs. minor details
A and B = the main strokes (what we'll adjust)
Residual = the minor details (kept frozen)

This initialization is what makes PiSSA better than LoRA - it starts from a smart place instead of random values.

Exceptions

ArgumentNullException: Thrown when pretrainedWeights is null.
ArgumentException: Thrown when weight matrix dimensions don't match layer dimensions.

MergeToOriginalLayer()

Merges the PiSSA adaptation into the original layer.

public override ILayer<T> MergeToOriginalLayer()

Returns

ILayer<T>: A new layer with PiSSA weights merged back into a single weight matrix.

Remarks

This method reconstructs the full weight matrix by combining: W_merged = W_residual + (A * B)^T

This allows you to deploy the adapted model without the PiSSA overhead.

For Beginners: This "bakes in" the PiSSA adaptation.

After training:

You have: frozen residual weights + trained A and B matrices
Merging combines them: residual + A*B = final weights
Result: a single regular layer with all improvements included

Benefits:

Faster inference (no need to compute residual + LoRA separately)
Simpler deployment (just one layer)
Compatible with systems that don't support LoRA/PiSSA

Example: var mergedLayer = adapter.MergeToOriginalLayer(); // Now you have a standard layer with PiSSA improvements built in!

Exceptions

InvalidOperationException: Thrown when the adapter was not initialized from SVD.

Table of Contents

Class PiSSAAdapter<T>

Type Parameters

Remarks

Constructors

PiSSAAdapter(ILayer<T>, int, double, bool)

Parameters

Remarks

Exceptions

Properties

InitializedFromSVD

Property Value

Remarks

ResidualWeights

Property Value

Remarks

Methods

Backward(Tensor<T>)

Parameters

Returns

Remarks

Forward(Tensor<T>)

Parameters

Returns

Remarks

InitializeFromSVD(ILayer<T>, Matrix<T>, int, double, bool, SvdAlgorithmType)

Parameters

Returns

Remarks

InitializeFromSVD(Matrix<T>, SvdAlgorithmType)

Parameters

Remarks

Exceptions

MergeToOriginalLayer()

Returns

Remarks

Exceptions