Table of Contents

Class LoftQAdapter<T>

Namespace
AiDotNet.LoRA.Adapters
Assembly
AiDotNet.dll

LoftQ (LoRA-Fine-Tuning-Quantized) adapter that combines quantization and LoRA with improved initialization.

public class LoftQAdapter<T> : LoRAAdapterBase<T>, IDisposable, ILoRAAdapter<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>

Type Parameters

T

The numeric type used for calculations, typically float or double.

Inheritance
LoftQAdapter<T>
Implements
Inherited Members

Remarks

LoftQ improves upon QLoRA by using an alternating optimization strategy during initialization to find better LoRA adapter parameters for quantized models. Instead of simply quantizing a pre-trained model and adding LoRA on top, LoftQ alternates between: 1. Optimizing the quantization of the base weights 2. Optimizing the LoRA adapter matrices to compensate for quantization error

Key Features: - Alternating optimization between quantization and LoRA initialization - Better initialization than naive quantization + LoRA - Supports both 4-bit INT4 and NF4 quantization - Reduces the gap between quantized and full-precision fine-tuning - Compatible with all QLoRA features (double quantization, block-wise quantization)

How LoftQ Differs from QLoRA: QLoRA: 1. Quantize pre-trained weights 2. Initialize LoRA randomly 3. Fine-tune LoRA only

LoftQ:

  1. Start with pre-trained weights
  2. Alternate K times: a. Fix LoRA, optimize quantization b. Fix quantization, optimize LoRA (via SVD to minimize error)
  3. Fine-tune LoRA only

This alternating initialization creates better starting LoRA parameters that compensate for quantization error from the beginning, leading to better final performance.

Alternating Optimization Process: For K iterations (typically 3-5): - Quantization step: Quantize W to get Q, keeping A and B fixed - LoRA step: Update A and B to minimize ||W - (Q + AB)||, keeping Q fixed

This ensures the LoRA adapter specifically compensates for quantization error, rather than learning generic adaptations.

Memory Efficiency: Same as QLoRA - base weights in 4-bit, LoRA in full precision: - 75% memory reduction on base weights - Only LoRA parameters trainable (typically 0.1-1% of model size) - Additional one-time cost during initialization for alternating optimization

For Beginners: LoftQ is an improved version of QLoRA that starts with better settings.

Think of it like this:

  • QLoRA: Compress your model, then add random corrections, then train
  • LoftQ: Compress your model, figure out what corrections are needed upfront, then train

The key insight: If we're going to compress the weights anyway, let's make sure our correction layer (LoRA) is specifically designed to fix compression errors!

The process:

  1. Start with your pre-trained model
  2. Repeatedly:
    • Try different compressions
    • Adjust LoRA to compensate for compression error
    • Pick the best combination
  3. Now train LoRA (which already knows how to fix compression issues)

Benefits:

  • Better starting point for training
  • Converges faster during fine-tuning
  • Better final accuracy than QLoRA with same memory usage
  • Still only trains LoRA (same efficiency as QLoRA)

Trade-offs:

  • Longer initialization time (worth it for better results)
  • Same runtime memory and speed as QLoRA
  • More complex implementation

Research Background: LoftQ was introduced in "LoftQ: LoRA-Fine-Tuning-Aware Quantization" (Li et al., 2023). It addresses a key limitation of QLoRA: random LoRA initialization doesn't account for the specific quantization errors introduced. By using alternating optimization, LoftQ creates LoRA parameters that are "aware" of the quantization, leading to better downstream fine-tuning performance with no additional runtime cost.

When to Use LoftQ vs QLoRA: - Use LoftQ when: Training accuracy is critical, willing to spend extra time on initialization - Use QLoRA when: Fast experimentation needed, initialization time is critical - Both have identical runtime memory and speed characteristics

Constructors

LoftQAdapter(ILayer<T>, int, double, int, QuantizationType, bool, int, bool)

Initializes a new LoftQ adapter with alternating optimization for improved initialization.

public LoftQAdapter(ILayer<T> baseLayer, int rank, double alpha = -1, int numAlternatingIterations = 5, LoftQAdapter<T>.QuantizationType quantizationType = QuantizationType.NF4, bool useDoubleQuantization = true, int quantizationBlockSize = 64, bool freezeBaseLayer = true)

Parameters

baseLayer ILayer<T>

The Dense or FullyConnected layer to adapt with LoftQ.

rank int

The rank of the LoRA decomposition.

alpha double

The LoRA scaling factor (defaults to rank if negative).

numAlternatingIterations int

Number of alternating optimization iterations for initialization (default: 5).

quantizationType LoftQAdapter<T>.QuantizationType

The type of 4-bit quantization to use (default: NF4).

useDoubleQuantization bool

Whether to use double quantization for constants (default: true).

quantizationBlockSize int

The block size for quantization (default: 64).

freezeBaseLayer bool

Whether to freeze the base layer's parameters during training (default: true).

Remarks

This constructor performs LoftQ initialization using alternating optimization: 1. Extracts base layer weights 2. For K iterations: a. Quantize current weights b. Compute quantization error c. Update LoRA to minimize error (via SVD) d. Update weights = quantized + LoRA 3. Store final quantized weights and LoRA parameters

For Beginners: Creating a LoftQ adapter takes longer than QLoRA because we're doing smart initialization. Here's what happens:

Parameters:

  • baseLayer: Your existing layer to compress and adapt
  • rank: LoRA adapter size (lower = more efficient)
  • alpha: LoRA strength
  • numAlternatingIterations: How many times to optimize initialization (3-5 is good)
  • quantizationType: NF4 recommended for best results
  • Other parameters: Same as QLoRA

Initialization process (this happens once):

  1. Look at your original weights
  2. Try compressing them
  3. See what errors compression creates
  4. Adjust LoRA to fix those errors
  5. Repeat steps 2-4 several times to find the best combination
  6. Save the optimized compression and LoRA

This extra work during initialization pays off with better training results!

Exceptions

ArgumentNullException

Thrown when baseLayer is null.

ArgumentException

Thrown when the base layer doesn't have 1D input/output shapes or when parameters are invalid.

Properties

AlternatingIterations

Gets the number of alternating optimization iterations used during initialization.

public int AlternatingIterations { get; }

Property Value

int

BlockSize

Gets the quantization block size.

public int BlockSize { get; }

Property Value

int

Quantization

Gets the quantization type used for base layer weights.

public LoftQAdapter<T>.QuantizationType Quantization { get; }

Property Value

LoftQAdapter<T>.QuantizationType

UsesDoubleQuantization

Gets whether double quantization is enabled.

public bool UsesDoubleQuantization { get; }

Property Value

bool

Methods

Backward(Tensor<T>)

Performs the backward pass (only updates LoRA if base is frozen).

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>

Gradient from next layer.

Returns

Tensor<T>

Gradient for previous layer.

Remarks

For Beginners: Training works exactly like QLoRA: - Only LoRA parameters are updated (if base is frozen) - Gradients flow through both paths - Memory efficient because base stays frozen

The benefit of LoftQ appears in faster convergence and better final accuracy, not in the training process itself.

Forward(Tensor<T>)

Performs the forward pass through quantized base layer and LoRA.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>

Input tensor.

Returns

Tensor<T>

Combined output from quantized base and LoRA layers.

Remarks

Forward pass: 1. Dequantize base weights (cached) 2. Compute base output with dequantized weights 3. Compute LoRA output 4. Return sum

For Beginners: This works exactly like QLoRA's forward pass: - Decompress the base weights - Run input through decompressed base - Run input through LoRA adapter - Add results together

The difference from QLoRA is invisible here - it's all in the initialization! LoftQ's better LoRA parameters lead to better combined results.

MergeToOriginalLayer()

Merges LoRA adaptation into base layer and returns merged layer.

public override ILayer<T> MergeToOriginalLayer()

Returns

ILayer<T>

New DenseLayer with merged and optionally quantized weights.

Remarks

Merging process: 1. Dequantize base weights 2. Get LoRA weight contribution 3. Merge: W_merged = W_base + W_lora 4. Create new layer with merged weights

For Beginners: After training, you can "bake in" the LoRA improvements: - Decompress the base weights - Add the LoRA corrections - Create a single layer with all improvements - Optionally compress again for deployment

This gives you a single efficient layer with all the benefits of LoftQ training!

ResetState()

Resets the internal state of the adapter.

public override void ResetState()

Remarks

For Beginners: Clears cached data and resets both layers. Useful when starting a new batch or task.

UpdateParametersFromLayers()

Updates the parameter vector from both layers.

protected override void UpdateParametersFromLayers()