Class LoKrAdapter<T>

Namespace: AiDotNet.LoRA.Adapters

Assembly: AiDotNet.dll

LoKr (Low-Rank Kronecker Product Adaptation) adapter for parameter-efficient fine-tuning.

public class LoKrAdapter<T> : LoRAAdapterBase<T>, IDisposable, ILoRAAdapter<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>

Type Parameters

T: The numeric type used for calculations, typically float or double.

Inheritance: object

LayerBase<T>

LoRAAdapterBase<T>

LoKrAdapter<T>

Implements: IDisposable

ILoRAAdapter<T>

ILayer<T>

IJitCompilable<T>

IDiagnosticsProvider

IWeightLoadable<T>

Inherited Members: LoRAAdapterBase<T>._baseLayer

LoRAAdapterBase<T>._loraLayer

LoRAAdapterBase<T>._freezeBaseLayer

LoRAAdapterBase<T>.BaseLayer

LoRAAdapterBase<T>.LoRALayer

LoRAAdapterBase<T>.IsBaseLayerFrozen

LoRAAdapterBase<T>.Rank

LoRAAdapterBase<T>.Alpha

LoRAAdapterBase<T>.SupportsTraining

LoRAAdapterBase<T>.CreateLoRALayer(int, double)

LoRAAdapterBase<T>.CreateMergedLayerWithClone(Vector<T>)

LoRAAdapterBase<T>.MergeToDenseOrFullyConnected()

LoRAAdapterBase<T>.UpdateParametersFromLayers()

LoRAAdapterBase<T>.SupportsJitCompilation

LoRAAdapterBase<T>.ExportComputationGraph(List<ComputationNode<T>>)

LayerBase<T>.Engine

LayerBase<T>.ScalarActivation

LayerBase<T>.VectorActivation

LayerBase<T>.UsingVectorActivation

LayerBase<T>.NumOps

LayerBase<T>.Random

LayerBase<T>.Parameters

LayerBase<T>.ParameterGradients

LayerBase<T>.InputShape

LayerBase<T>.InputShapes

LayerBase<T>.UpdateInputShape(int[])

LayerBase<T>.OutputShape

LayerBase<T>.IsTrainingMode

LayerBase<T>.InitializationStrategy

LayerBase<T>.IsInitialized

LayerBase<T>.InitializationLock

LayerBase<T>.EnsureInitialized()

LayerBase<T>.UseAutodiff

LayerBase<T>.SetTrainingMode(bool)

LayerBase<T>.GetParameterGradients()

LayerBase<T>.ClearGradients()

LayerBase<T>.GetInputShape()

LayerBase<T>.GetInputShapes()

LayerBase<T>.GetOutputShape()

LayerBase<T>.GetWeights()

LayerBase<T>.GetBiases()

LayerBase<T>.MapActivationToFused()

LayerBase<T>.SupportsGpuExecution

LayerBase<T>.SupportsGpuTraining

LayerBase<T>.CanExecuteOnGpu

LayerBase<T>.CanTrainOnGpu

LayerBase<T>.ForwardGpu(params IGpuTensor<T>[])

LayerBase<T>.BackwardGpu(IGpuTensor<T>)

LayerBase<T>.UpdateParametersGpu(IGpuOptimizerConfig)

LayerBase<T>.UploadWeightsToGpu()

LayerBase<T>.DownloadWeightsFromGpu()

LayerBase<T>.ZeroGradientsGpu()

LayerBase<T>.GetActivationTypes()

LayerBase<T>.Forward(params Tensor<T>[])

LayerBase<T>.ApplyActivation(Tensor<T>)

LayerBase<T>.ApplyActivation(Vector<T>)

LayerBase<T>.ActivateTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ActivateTensor(IVectorActivationFunction<T>, Tensor<T>)

LayerBase<T>.CalculateInputShape(int, int, int)

LayerBase<T>.CalculateOutputShape(int, int, int)

LayerBase<T>.Clone()

LayerBase<T>.DerivativeTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ApplyActivationDerivative(T, T)

LayerBase<T>.ApplyActivationDerivative(Tensor<T>, Tensor<T>)

LayerBase<T>.ComputeActivationJacobian(Vector<T>)

LayerBase<T>.ApplyActivationDerivative(Vector<T>, Vector<T>)

LayerBase<T>.UpdateParameters(Vector<T>)

LayerBase<T>.Serialize(BinaryWriter)

LayerBase<T>.Deserialize(BinaryReader)

LayerBase<T>.GetDiagnostics()

LayerBase<T>.ApplyActivationToGraph(ComputationNode<T>)

LayerBase<T>.CanActivationBeJitted()

LayerBase<T>.RegisterTrainableParameter(Tensor<T>, PersistentTensorRole)

LayerBase<T>.InvalidateTrainableParameter(Tensor<T>)

LayerBase<T>.HasGpuActivation()

LayerBase<T>.ApplyActivationForwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.ApplyActivationBackwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.GetFusedActivationType()

LayerBase<T>.ApplyGpuActivation(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int, FusedActivationType)

LayerBase<T>.ApplyGpuActivationBackward(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int, FusedActivationType, float)

LayerBase<T>.Dispose()

LayerBase<T>.Dispose(bool)

LayerBase<T>.WeightParameterName

LayerBase<T>.BiasParameterName

LayerBase<T>.SetWeights(Tensor<T>)

LayerBase<T>.SetBiases(Tensor<T>)

LayerBase<T>.GetParameterNames()

LayerBase<T>.TryGetParameter(string, out Tensor<T>)

LayerBase<T>.SetParameter(string, Tensor<T>)

LayerBase<T>.GetParameterShape(string)

LayerBase<T>.NamedParameterCount

LayerBase<T>.ValidateWeights(IEnumerable<string>, Func<string, string>)

LayerBase<T>.LoadWeights(Dictionary<string, Tensor<T>>, Func<string, string>, bool)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

LoKr uses Kronecker products instead of standard matrix multiplication for low-rank adaptation. Instead of computing ΔW = A × B (standard LoRA), LoKr computes ΔW = A ⊗ B where ⊗ is the Kronecker product. This is particularly efficient for very large weight matrices.

Kronecker Product Definition: For matrices A (m×n) and B (p×q), the Kronecker product A ⊗ B is an (m×p) × (n×q) matrix:

A ⊗ B = [a₁₁B a₁₂B ... a₁ₙB] [a₂₁B a₂₂B ... a₂ₙB] [ ⋮ ⋮ ⋱ ⋮ ] [aₘ₁B aₘ₂B ... aₘₙB]

Each element aᵢⱼ of A is multiplied by the entire matrix B, creating a block structure.

For Beginners: LoKr is a variant of LoRA that uses a different mathematical operation called the Kronecker product. Think of it this way:

Standard LoRA: Multiplies two small matrices (like 1000×8 and 8×1000) to approximate changes
LoKr: Uses Kronecker product of two even smaller matrices (like 50×4 and 20×4) to create the same size output

The Kronecker product creates a larger matrix by taking every element of the first matrix and multiplying it by the entire second matrix. This creates a block pattern that's very efficient for representing certain types of structured transformations.

When to use LoKr vs standard LoRA:

LoKr is better for very wide or very deep layers (e.g., 10000×10000 weight matrices)
LoKr can achieve similar expressiveness with fewer parameters than LoRA
Standard LoRA is simpler and works well for typical layer sizes

Parameter Efficiency Example: For a 1000×1000 weight matrix with rank r=8:

Standard LoRA: 1000×8 + 8×1000 = 16,000 parameters
LoKr: 50×4 + 20×4 = 200 + 80 = 280 parameters (57x fewer!) (where 50×20 = 1000 for both dimensions)

Constructors

LoKrAdapter(ILayer<T>, int, double, bool)

Initializes a new LoKr adapter wrapping an existing layer.

public LoKrAdapter(ILayer<T> baseLayer, int rank, double alpha = -1, bool freezeBaseLayer = true)

Parameters

baseLayer ILayer<T>: The layer to adapt with LoKr.
rank int: The effective rank of the decomposition (used to determine factor matrix sizes).
alpha double: The LoKr scaling factor (defaults to rank if negative).
freezeBaseLayer bool: Whether to freeze the base layer's parameters during training.

Remarks

The LoKr matrices are initialized as follows: - Matrix A: Random values from a Gaussian distribution - Matrix B: Zero initialization (so LoKr starts with no effect)

The dimensions of A and B are chosen such that A ⊗ B produces a matrix that can be applied to the layer's weights. For a layer with inputSize and outputSize, we factor these dimensions to create A (m×n) and B (p×q) where m×p = outputSize and n×q = inputSize.

For Beginners: This creates a LoKr adapter for a layer. The rank parameter determines how the weight matrix is factored into two smaller matrices. Lower rank = fewer parameters but less flexibility.

The adapter automatically figures out the best sizes for matrices A and B based on your layer's input and output sizes and the rank you specify.

Exceptions

ArgumentNullException: Thrown when baseLayer is null.
ArgumentException: Thrown when the base layer doesn't have 1D input/output shapes.

Properties

MatrixADimensions

Gets the dimensions of matrix A.

public (int m, int n) MatrixADimensions { get; }

Property Value

(int min, int max)

MatrixBDimensions

Gets the dimensions of matrix B.

public (int p, int q) MatrixBDimensions { get; }

Property Value

(int min, int max)

ParameterCount

Gets the total number of trainable parameters (elements in A and B matrices, plus base layer if not frozen).

public override int ParameterCount { get; }

Property Value

int

Methods

Backward(Tensor<T>)

Performs the backward pass through both layers.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>: Gradient flowing back from the next layer.

Returns

Tensor<T>: Gradient to pass to the previous layer.

Remarks

The backward pass computes gradients through the Kronecker product using the vec-trick for efficient gradient computation. The gradients are: - dL/dA uses the Kronecker structure to extract A-specific gradients - dL/dB uses the Kronecker structure to extract B-specific gradients - Input gradients flow through both paths and are summed

For Beginners: This figures out how to improve both the base layer and the LoKr matrices (A and B). It uses the special structure of the Kronecker product to efficiently compute gradients without having to work with the full Kronecker product matrix.

Forward(Tensor<T>)

Performs the forward pass through both base and LoKr layers.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>: Input tensor.

Returns

Tensor<T>: Sum of base layer output and LoKr output.

Remarks

The forward pass computes: output = base_layer(input) + (A ⊗ B) * input * scaling

For Beginners: This runs the input through both the original layer and the LoKr adaptation layer (using Kronecker product), then adds their outputs together. The result is the original behavior plus the learned Kronecker-factored adaptation.

GetMatrixA()

Gets matrix A (for inspection or advanced use cases).

public Matrix<T> GetMatrixA()

Returns

Matrix<T>

GetMatrixB()

Gets matrix B (for inspection or advanced use cases).

public Matrix<T> GetMatrixB()

Returns

Matrix<T>

GetParameters()

Gets the current parameters as a vector.

public override Vector<T> GetParameters()

Returns

Vector<T>: Vector containing parameters (LoKr only if base is frozen, otherwise both).

MergeToOriginalLayer()

Merges the LoKr adaptation into the base layer and returns the merged layer.

public override ILayer<T> MergeToOriginalLayer()

Returns

ILayer<T>: A new layer with LoKr weights merged into the base layer's weights.

Remarks

This computes the full Kronecker product A ⊗ B and adds it to the base layer's weights.

For Beginners: This "bakes in" your LoKr adaptation to create a regular layer. It computes the full Kronecker product matrix and adds it to the original weights, creating a single merged layer that's faster for inference.

Exceptions

InvalidOperationException: Thrown when the base layer type is not DenseLayer or FullyConnectedLayer.

ResetState()

Resets the internal state of the adapter.

public override void ResetState()

Remarks

For Beginners: This clears the memory of the last input and gradients. It's useful when starting to process a completely new, unrelated batch of data.

SetParameters(Vector<T>)

Sets the layer parameters from a vector.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>: Vector containing parameters.

UpdateParameters(T)

Updates the layer's parameters using the specified learning rate.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T: The learning rate for parameter updates.

Table of Contents

Class LoKrAdapter<T>

Type Parameters

Remarks

Constructors

LoKrAdapter(ILayer<T>, int, double, bool)

Parameters

Remarks

Exceptions

Properties

MatrixADimensions

Property Value

MatrixBDimensions

Property Value

ParameterCount

Property Value

Methods

Backward(Tensor<T>)

Parameters

Returns

Remarks

Forward(Tensor<T>)

Parameters

Returns

Remarks

GetMatrixA()

Returns

GetMatrixB()

Returns

GetParameters()

Returns

MergeToOriginalLayer()

Returns

Remarks

Exceptions

ResetState()

Remarks

SetParameters(Vector<T>)

Parameters

UpdateParameters(T)

Parameters