Class LoftQAdapter<T>

Namespace: AiDotNet.LoRA.Adapters

Assembly: AiDotNet.dll

LoftQ (LoRA-Fine-Tuning-Quantized) adapter that combines quantization and LoRA with improved initialization.

public class LoftQAdapter<T> : LoRAAdapterBase<T>, IDisposable, ILoRAAdapter<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>

Type Parameters

T: The numeric type used for calculations, typically float or double.

Inheritance: object

LayerBase<T>

LoRAAdapterBase<T>

LoftQAdapter<T>

Implements: IDisposable

ILoRAAdapter<T>

ILayer<T>

IJitCompilable<T>

IDiagnosticsProvider

IWeightLoadable<T>

Inherited Members: LoRAAdapterBase<T>._baseLayer

LoRAAdapterBase<T>._loraLayer

LoRAAdapterBase<T>._freezeBaseLayer

LoRAAdapterBase<T>.BaseLayer

LoRAAdapterBase<T>.LoRALayer

LoRAAdapterBase<T>.IsBaseLayerFrozen

LoRAAdapterBase<T>.Rank

LoRAAdapterBase<T>.Alpha

LoRAAdapterBase<T>.ParameterCount

LoRAAdapterBase<T>.SupportsTraining

LoRAAdapterBase<T>.CreateLoRALayer(int, double)

LoRAAdapterBase<T>.UpdateParameters(T)

LoRAAdapterBase<T>.GetParameters()

LoRAAdapterBase<T>.SetParameters(Vector<T>)

LoRAAdapterBase<T>.CreateMergedLayerWithClone(Vector<T>)

LoRAAdapterBase<T>.MergeToDenseOrFullyConnected()

LoRAAdapterBase<T>.SupportsJitCompilation

LoRAAdapterBase<T>.ExportComputationGraph(List<ComputationNode<T>>)

LayerBase<T>.Engine

LayerBase<T>.ScalarActivation

LayerBase<T>.VectorActivation

LayerBase<T>.UsingVectorActivation

LayerBase<T>.NumOps

LayerBase<T>.Random

LayerBase<T>.Parameters

LayerBase<T>.ParameterGradients

LayerBase<T>.InputShape

LayerBase<T>.InputShapes

LayerBase<T>.UpdateInputShape(int[])

LayerBase<T>.OutputShape

LayerBase<T>.IsTrainingMode

LayerBase<T>.InitializationStrategy

LayerBase<T>.IsInitialized

LayerBase<T>.InitializationLock

LayerBase<T>.EnsureInitialized()

LayerBase<T>.UseAutodiff

LayerBase<T>.SetTrainingMode(bool)

LayerBase<T>.GetParameterGradients()

LayerBase<T>.ClearGradients()

LayerBase<T>.GetInputShape()

LayerBase<T>.GetInputShapes()

LayerBase<T>.GetOutputShape()

LayerBase<T>.GetWeights()

LayerBase<T>.GetBiases()

LayerBase<T>.MapActivationToFused()

LayerBase<T>.SupportsGpuExecution

LayerBase<T>.SupportsGpuTraining

LayerBase<T>.CanExecuteOnGpu

LayerBase<T>.CanTrainOnGpu

LayerBase<T>.ForwardGpu(params IGpuTensor<T>[])

LayerBase<T>.BackwardGpu(IGpuTensor<T>)

LayerBase<T>.UpdateParametersGpu(IGpuOptimizerConfig)

LayerBase<T>.UploadWeightsToGpu()

LayerBase<T>.DownloadWeightsFromGpu()

LayerBase<T>.ZeroGradientsGpu()

LayerBase<T>.GetActivationTypes()

LayerBase<T>.Forward(params Tensor<T>[])

LayerBase<T>.ApplyActivation(Tensor<T>)

LayerBase<T>.ApplyActivation(Vector<T>)

LayerBase<T>.ActivateTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ActivateTensor(IVectorActivationFunction<T>, Tensor<T>)

LayerBase<T>.CalculateInputShape(int, int, int)

LayerBase<T>.CalculateOutputShape(int, int, int)

LayerBase<T>.Clone()

LayerBase<T>.DerivativeTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ApplyActivationDerivative(T, T)

LayerBase<T>.ApplyActivationDerivative(Tensor<T>, Tensor<T>)

LayerBase<T>.ComputeActivationJacobian(Vector<T>)

LayerBase<T>.ApplyActivationDerivative(Vector<T>, Vector<T>)

LayerBase<T>.UpdateParameters(Vector<T>)

LayerBase<T>.Serialize(BinaryWriter)

LayerBase<T>.Deserialize(BinaryReader)

LayerBase<T>.GetDiagnostics()

LayerBase<T>.ApplyActivationToGraph(ComputationNode<T>)

LayerBase<T>.CanActivationBeJitted()

LayerBase<T>.RegisterTrainableParameter(Tensor<T>, PersistentTensorRole)

LayerBase<T>.InvalidateTrainableParameter(Tensor<T>)

LayerBase<T>.HasGpuActivation()

LayerBase<T>.ApplyActivationForwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.ApplyActivationBackwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.GetFusedActivationType()

LayerBase<T>.ApplyGpuActivation(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int, FusedActivationType)

LayerBase<T>.ApplyGpuActivationBackward(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int, FusedActivationType, float)

LayerBase<T>.Dispose()

LayerBase<T>.Dispose(bool)

LayerBase<T>.WeightParameterName

LayerBase<T>.BiasParameterName

LayerBase<T>.SetWeights(Tensor<T>)

LayerBase<T>.SetBiases(Tensor<T>)

LayerBase<T>.GetParameterNames()

LayerBase<T>.TryGetParameter(string, out Tensor<T>)

LayerBase<T>.SetParameter(string, Tensor<T>)

LayerBase<T>.GetParameterShape(string)

LayerBase<T>.NamedParameterCount

LayerBase<T>.ValidateWeights(IEnumerable<string>, Func<string, string>)

LayerBase<T>.LoadWeights(Dictionary<string, Tensor<T>>, Func<string, string>, bool)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

LoftQ improves upon QLoRA by using an alternating optimization strategy during initialization to find better LoRA adapter parameters for quantized models. Instead of simply quantizing a pre-trained model and adding LoRA on top, LoftQ alternates between: 1. Optimizing the quantization of the base weights 2. Optimizing the LoRA adapter matrices to compensate for quantization error

Key Features: - Alternating optimization between quantization and LoRA initialization - Better initialization than naive quantization + LoRA - Supports both 4-bit INT4 and NF4 quantization - Reduces the gap between quantized and full-precision fine-tuning - Compatible with all QLoRA features (double quantization, block-wise quantization)

How LoftQ Differs from QLoRA: QLoRA: 1. Quantize pre-trained weights 2. Initialize LoRA randomly 3. Fine-tune LoRA only

LoftQ:

Start with pre-trained weights
Alternate K times: a. Fix LoRA, optimize quantization b. Fix quantization, optimize LoRA (via SVD to minimize error)
Fine-tune LoRA only

This alternating initialization creates better starting LoRA parameters that compensate for quantization error from the beginning, leading to better final performance.

Alternating Optimization Process: For K iterations (typically 3-5): - Quantization step: Quantize W to get Q, keeping A and B fixed - LoRA step: Update A and B to minimize ||W - (Q + AB)||, keeping Q fixed

This ensures the LoRA adapter specifically compensates for quantization error, rather than learning generic adaptations.

Memory Efficiency: Same as QLoRA - base weights in 4-bit, LoRA in full precision: - 75% memory reduction on base weights - Only LoRA parameters trainable (typically 0.1-1% of model size) - Additional one-time cost during initialization for alternating optimization

For Beginners: LoftQ is an improved version of QLoRA that starts with better settings.

Think of it like this:

QLoRA: Compress your model, then add random corrections, then train
LoftQ: Compress your model, figure out what corrections are needed upfront, then train

The key insight: If we're going to compress the weights anyway, let's make sure our correction layer (LoRA) is specifically designed to fix compression errors!

The process:

Start with your pre-trained model
Repeatedly:
- Try different compressions
- Adjust LoRA to compensate for compression error
- Pick the best combination
Now train LoRA (which already knows how to fix compression issues)

Benefits:

Better starting point for training
Converges faster during fine-tuning
Better final accuracy than QLoRA with same memory usage
Still only trains LoRA (same efficiency as QLoRA)

Trade-offs:

Longer initialization time (worth it for better results)
Same runtime memory and speed as QLoRA
More complex implementation

Research Background: LoftQ was introduced in "LoftQ: LoRA-Fine-Tuning-Aware Quantization" (Li et al., 2023). It addresses a key limitation of QLoRA: random LoRA initialization doesn't account for the specific quantization errors introduced. By using alternating optimization, LoftQ creates LoRA parameters that are "aware" of the quantization, leading to better downstream fine-tuning performance with no additional runtime cost.

When to Use LoftQ vs QLoRA: - Use LoftQ when: Training accuracy is critical, willing to spend extra time on initialization - Use QLoRA when: Fast experimentation needed, initialization time is critical - Both have identical runtime memory and speed characteristics

Constructors

LoftQAdapter(ILayer<T>, int, double, int, QuantizationType, bool, int, bool)

Initializes a new LoftQ adapter with alternating optimization for improved initialization.

public LoftQAdapter(ILayer<T> baseLayer, int rank, double alpha = -1, int numAlternatingIterations = 5, LoftQAdapter<T>.QuantizationType quantizationType = QuantizationType.NF4, bool useDoubleQuantization = true, int quantizationBlockSize = 64, bool freezeBaseLayer = true)

Parameters

baseLayer ILayer<T>: The Dense or FullyConnected layer to adapt with LoftQ.
rank int: The rank of the LoRA decomposition.
alpha double: The LoRA scaling factor (defaults to rank if negative).
numAlternatingIterations int: Number of alternating optimization iterations for initialization (default: 5).
quantizationType LoftQAdapter<T>.QuantizationType: The type of 4-bit quantization to use (default: NF4).
useDoubleQuantization bool: Whether to use double quantization for constants (default: true).
quantizationBlockSize int: The block size for quantization (default: 64).
freezeBaseLayer bool: Whether to freeze the base layer's parameters during training (default: true).

Remarks

This constructor performs LoftQ initialization using alternating optimization: 1. Extracts base layer weights 2. For K iterations: a. Quantize current weights b. Compute quantization error c. Update LoRA to minimize error (via SVD) d. Update weights = quantized + LoRA 3. Store final quantized weights and LoRA parameters

For Beginners: Creating a LoftQ adapter takes longer than QLoRA because we're doing smart initialization. Here's what happens:

Parameters:

baseLayer: Your existing layer to compress and adapt
rank: LoRA adapter size (lower = more efficient)
alpha: LoRA strength
numAlternatingIterations: How many times to optimize initialization (3-5 is good)
quantizationType: NF4 recommended for best results
Other parameters: Same as QLoRA

Initialization process (this happens once):

Look at your original weights
Try compressing them
See what errors compression creates
Adjust LoRA to fix those errors
Repeat steps 2-4 several times to find the best combination
Save the optimized compression and LoRA

This extra work during initialization pays off with better training results!

Exceptions

ArgumentNullException: Thrown when baseLayer is null.
ArgumentException: Thrown when the base layer doesn't have 1D input/output shapes or when parameters are invalid.

Properties

AlternatingIterations

Gets the number of alternating optimization iterations used during initialization.

public int AlternatingIterations { get; }

Property Value

int

BlockSize

Gets the quantization block size.

public int BlockSize { get; }

Property Value

int

Quantization

Gets the quantization type used for base layer weights.

public LoftQAdapter<T>.QuantizationType Quantization { get; }

Property Value

LoftQAdapter<T>.QuantizationType

UsesDoubleQuantization

Gets whether double quantization is enabled.

public bool UsesDoubleQuantization { get; }

Property Value

bool

Methods

Backward(Tensor<T>)

Performs the backward pass (only updates LoRA if base is frozen).

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>: Gradient from next layer.

Returns

Tensor<T>: Gradient for previous layer.

Remarks

For Beginners: Training works exactly like QLoRA: - Only LoRA parameters are updated (if base is frozen) - Gradients flow through both paths - Memory efficient because base stays frozen

The benefit of LoftQ appears in faster convergence and better final accuracy, not in the training process itself.

Forward(Tensor<T>)

Performs the forward pass through quantized base layer and LoRA.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>: Input tensor.

Returns

Tensor<T>: Combined output from quantized base and LoRA layers.

Remarks

Forward pass: 1. Dequantize base weights (cached) 2. Compute base output with dequantized weights 3. Compute LoRA output 4. Return sum

For Beginners: This works exactly like QLoRA's forward pass: - Decompress the base weights - Run input through decompressed base - Run input through LoRA adapter - Add results together

The difference from QLoRA is invisible here - it's all in the initialization! LoftQ's better LoRA parameters lead to better combined results.

MergeToOriginalLayer()

Merges LoRA adaptation into base layer and returns merged layer.

public override ILayer<T> MergeToOriginalLayer()

Returns

ILayer<T>: New DenseLayer with merged and optionally quantized weights.

Remarks

Merging process: 1. Dequantize base weights 2. Get LoRA weight contribution 3. Merge: W_merged = W_base + W_lora 4. Create new layer with merged weights

For Beginners: After training, you can "bake in" the LoRA improvements: - Decompress the base weights - Add the LoRA corrections - Create a single layer with all improvements - Optionally compress again for deployment

This gives you a single efficient layer with all the benefits of LoftQ training!

ResetState()

Resets the internal state of the adapter.

public override void ResetState()

Remarks

For Beginners: Clears cached data and resets both layers. Useful when starting a new batch or task.

UpdateParametersFromLayers()

Updates the parameter vector from both layers.

protected override void UpdateParametersFromLayers()

Table of Contents

Class LoftQAdapter<T>

Type Parameters

Remarks

Constructors

LoftQAdapter(ILayer<T>, int, double, int, QuantizationType, bool, int, bool)

Parameters

Remarks

Exceptions

Properties

AlternatingIterations

Property Value

BlockSize

Property Value

Quantization

Property Value

UsesDoubleQuantization

Property Value

Methods

Backward(Tensor<T>)

Parameters

Returns

Remarks

Forward(Tensor<T>)

Parameters

Returns

Remarks

MergeToOriginalLayer()

Returns

Remarks

ResetState()

Remarks

UpdateParametersFromLayers()