Class LoKrAdapter<T>
LoKr (Low-Rank Kronecker Product Adaptation) adapter for parameter-efficient fine-tuning.
public class LoKrAdapter<T> : LoRAAdapterBase<T>, IDisposable, ILoRAAdapter<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
LayerBase<T>LoKrAdapter<T>
- Implements
-
ILoRAAdapter<T>ILayer<T>
- Inherited Members
Remarks
LoKr uses Kronecker products instead of standard matrix multiplication for low-rank adaptation. Instead of computing ΔW = A × B (standard LoRA), LoKr computes ΔW = A ⊗ B where ⊗ is the Kronecker product. This is particularly efficient for very large weight matrices.
Kronecker Product Definition: For matrices A (m×n) and B (p×q), the Kronecker product A ⊗ B is an (m×p) × (n×q) matrix:
A ⊗ B = [a₁₁B a₁₂B ... a₁ₙB] [a₂₁B a₂₂B ... a₂ₙB] [ ⋮ ⋮ ⋱ ⋮ ] [aₘ₁B aₘ₂B ... aₘₙB]
Each element aᵢⱼ of A is multiplied by the entire matrix B, creating a block structure.
For Beginners: LoKr is a variant of LoRA that uses a different mathematical operation called the Kronecker product. Think of it this way:
- Standard LoRA: Multiplies two small matrices (like 1000×8 and 8×1000) to approximate changes
- LoKr: Uses Kronecker product of two even smaller matrices (like 50×4 and 20×4) to create the same size output
The Kronecker product creates a larger matrix by taking every element of the first matrix and multiplying it by the entire second matrix. This creates a block pattern that's very efficient for representing certain types of structured transformations.
When to use LoKr vs standard LoRA:
- LoKr is better for very wide or very deep layers (e.g., 10000×10000 weight matrices)
- LoKr can achieve similar expressiveness with fewer parameters than LoRA
- Standard LoRA is simpler and works well for typical layer sizes
Parameter Efficiency Example: For a 1000×1000 weight matrix with rank r=8:
- Standard LoRA: 1000×8 + 8×1000 = 16,000 parameters
- LoKr: 50×4 + 20×4 = 200 + 80 = 280 parameters (57x fewer!) (where 50×20 = 1000 for both dimensions)
Constructors
LoKrAdapter(ILayer<T>, int, double, bool)
Initializes a new LoKr adapter wrapping an existing layer.
public LoKrAdapter(ILayer<T> baseLayer, int rank, double alpha = -1, bool freezeBaseLayer = true)
Parameters
baseLayerILayer<T>The layer to adapt with LoKr.
rankintThe effective rank of the decomposition (used to determine factor matrix sizes).
alphadoubleThe LoKr scaling factor (defaults to rank if negative).
freezeBaseLayerboolWhether to freeze the base layer's parameters during training.
Remarks
The LoKr matrices are initialized as follows: - Matrix A: Random values from a Gaussian distribution - Matrix B: Zero initialization (so LoKr starts with no effect)
The dimensions of A and B are chosen such that A ⊗ B produces a matrix that can be applied to the layer's weights. For a layer with inputSize and outputSize, we factor these dimensions to create A (m×n) and B (p×q) where m×p = outputSize and n×q = inputSize.
For Beginners: This creates a LoKr adapter for a layer. The rank parameter determines how the weight matrix is factored into two smaller matrices. Lower rank = fewer parameters but less flexibility.
The adapter automatically figures out the best sizes for matrices A and B based on your layer's input and output sizes and the rank you specify.
Exceptions
- ArgumentNullException
Thrown when baseLayer is null.
- ArgumentException
Thrown when the base layer doesn't have 1D input/output shapes.
Properties
MatrixADimensions
Gets the dimensions of matrix A.
public (int m, int n) MatrixADimensions { get; }
Property Value
MatrixBDimensions
Gets the dimensions of matrix B.
public (int p, int q) MatrixBDimensions { get; }
Property Value
ParameterCount
Gets the total number of trainable parameters (elements in A and B matrices, plus base layer if not frozen).
public override int ParameterCount { get; }
Property Value
Methods
Backward(Tensor<T>)
Performs the backward pass through both layers.
public override Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>Gradient flowing back from the next layer.
Returns
- Tensor<T>
Gradient to pass to the previous layer.
Remarks
The backward pass computes gradients through the Kronecker product using the vec-trick for efficient gradient computation. The gradients are: - dL/dA uses the Kronecker structure to extract A-specific gradients - dL/dB uses the Kronecker structure to extract B-specific gradients - Input gradients flow through both paths and are summed
For Beginners: This figures out how to improve both the base layer and the LoKr matrices (A and B). It uses the special structure of the Kronecker product to efficiently compute gradients without having to work with the full Kronecker product matrix.
Forward(Tensor<T>)
Performs the forward pass through both base and LoKr layers.
public override Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>Input tensor.
Returns
- Tensor<T>
Sum of base layer output and LoKr output.
Remarks
The forward pass computes: output = base_layer(input) + (A ⊗ B) * input * scaling
For Beginners: This runs the input through both the original layer and the LoKr adaptation layer (using Kronecker product), then adds their outputs together. The result is the original behavior plus the learned Kronecker-factored adaptation.
GetMatrixA()
Gets matrix A (for inspection or advanced use cases).
public Matrix<T> GetMatrixA()
Returns
- Matrix<T>
GetMatrixB()
Gets matrix B (for inspection or advanced use cases).
public Matrix<T> GetMatrixB()
Returns
- Matrix<T>
GetParameters()
Gets the current parameters as a vector.
public override Vector<T> GetParameters()
Returns
- Vector<T>
Vector containing parameters (LoKr only if base is frozen, otherwise both).
MergeToOriginalLayer()
Merges the LoKr adaptation into the base layer and returns the merged layer.
public override ILayer<T> MergeToOriginalLayer()
Returns
- ILayer<T>
A new layer with LoKr weights merged into the base layer's weights.
Remarks
This computes the full Kronecker product A ⊗ B and adds it to the base layer's weights.
For Beginners: This "bakes in" your LoKr adaptation to create a regular layer. It computes the full Kronecker product matrix and adds it to the original weights, creating a single merged layer that's faster for inference.
Exceptions
- InvalidOperationException
Thrown when the base layer type is not DenseLayer or FullyConnectedLayer.
ResetState()
Resets the internal state of the adapter.
public override void ResetState()
Remarks
For Beginners: This clears the memory of the last input and gradients. It's useful when starting to process a completely new, unrelated batch of data.
SetParameters(Vector<T>)
Sets the layer parameters from a vector.
public override void SetParameters(Vector<T> parameters)
Parameters
parametersVector<T>Vector containing parameters.
UpdateParameters(T)
Updates the layer's parameters using the specified learning rate.
public override void UpdateParameters(T learningRate)
Parameters
learningRateTThe learning rate for parameter updates.