Class LoftQAdapter<T>
LoftQ (LoRA-Fine-Tuning-Quantized) adapter that combines quantization and LoRA with improved initialization.
public class LoftQAdapter<T> : LoRAAdapterBase<T>, IDisposable, ILoRAAdapter<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
LayerBase<T>LoftQAdapter<T>
- Implements
-
ILoRAAdapter<T>ILayer<T>
- Inherited Members
Remarks
LoftQ improves upon QLoRA by using an alternating optimization strategy during initialization to find better LoRA adapter parameters for quantized models. Instead of simply quantizing a pre-trained model and adding LoRA on top, LoftQ alternates between: 1. Optimizing the quantization of the base weights 2. Optimizing the LoRA adapter matrices to compensate for quantization error
Key Features: - Alternating optimization between quantization and LoRA initialization - Better initialization than naive quantization + LoRA - Supports both 4-bit INT4 and NF4 quantization - Reduces the gap between quantized and full-precision fine-tuning - Compatible with all QLoRA features (double quantization, block-wise quantization)
How LoftQ Differs from QLoRA: QLoRA: 1. Quantize pre-trained weights 2. Initialize LoRA randomly 3. Fine-tune LoRA only
LoftQ:
- Start with pre-trained weights
- Alternate K times: a. Fix LoRA, optimize quantization b. Fix quantization, optimize LoRA (via SVD to minimize error)
- Fine-tune LoRA only
This alternating initialization creates better starting LoRA parameters that compensate for quantization error from the beginning, leading to better final performance.
Alternating Optimization Process: For K iterations (typically 3-5): - Quantization step: Quantize W to get Q, keeping A and B fixed - LoRA step: Update A and B to minimize ||W - (Q + AB)||, keeping Q fixed
This ensures the LoRA adapter specifically compensates for quantization error, rather than learning generic adaptations.
Memory Efficiency: Same as QLoRA - base weights in 4-bit, LoRA in full precision: - 75% memory reduction on base weights - Only LoRA parameters trainable (typically 0.1-1% of model size) - Additional one-time cost during initialization for alternating optimization
For Beginners: LoftQ is an improved version of QLoRA that starts with better settings.
Think of it like this:
- QLoRA: Compress your model, then add random corrections, then train
- LoftQ: Compress your model, figure out what corrections are needed upfront, then train
The key insight: If we're going to compress the weights anyway, let's make sure our correction layer (LoRA) is specifically designed to fix compression errors!
The process:
- Start with your pre-trained model
- Repeatedly:
- Try different compressions
- Adjust LoRA to compensate for compression error
- Pick the best combination
- Now train LoRA (which already knows how to fix compression issues)
Benefits:
- Better starting point for training
- Converges faster during fine-tuning
- Better final accuracy than QLoRA with same memory usage
- Still only trains LoRA (same efficiency as QLoRA)
Trade-offs:
- Longer initialization time (worth it for better results)
- Same runtime memory and speed as QLoRA
- More complex implementation
Research Background: LoftQ was introduced in "LoftQ: LoRA-Fine-Tuning-Aware Quantization" (Li et al., 2023). It addresses a key limitation of QLoRA: random LoRA initialization doesn't account for the specific quantization errors introduced. By using alternating optimization, LoftQ creates LoRA parameters that are "aware" of the quantization, leading to better downstream fine-tuning performance with no additional runtime cost.
When to Use LoftQ vs QLoRA: - Use LoftQ when: Training accuracy is critical, willing to spend extra time on initialization - Use QLoRA when: Fast experimentation needed, initialization time is critical - Both have identical runtime memory and speed characteristics
Constructors
LoftQAdapter(ILayer<T>, int, double, int, QuantizationType, bool, int, bool)
Initializes a new LoftQ adapter with alternating optimization for improved initialization.
public LoftQAdapter(ILayer<T> baseLayer, int rank, double alpha = -1, int numAlternatingIterations = 5, LoftQAdapter<T>.QuantizationType quantizationType = QuantizationType.NF4, bool useDoubleQuantization = true, int quantizationBlockSize = 64, bool freezeBaseLayer = true)
Parameters
baseLayerILayer<T>The Dense or FullyConnected layer to adapt with LoftQ.
rankintThe rank of the LoRA decomposition.
alphadoubleThe LoRA scaling factor (defaults to rank if negative).
numAlternatingIterationsintNumber of alternating optimization iterations for initialization (default: 5).
quantizationTypeLoftQAdapter<T>.QuantizationTypeThe type of 4-bit quantization to use (default: NF4).
useDoubleQuantizationboolWhether to use double quantization for constants (default: true).
quantizationBlockSizeintThe block size for quantization (default: 64).
freezeBaseLayerboolWhether to freeze the base layer's parameters during training (default: true).
Remarks
This constructor performs LoftQ initialization using alternating optimization: 1. Extracts base layer weights 2. For K iterations: a. Quantize current weights b. Compute quantization error c. Update LoRA to minimize error (via SVD) d. Update weights = quantized + LoRA 3. Store final quantized weights and LoRA parameters
For Beginners: Creating a LoftQ adapter takes longer than QLoRA because we're doing smart initialization. Here's what happens:
Parameters:
- baseLayer: Your existing layer to compress and adapt
- rank: LoRA adapter size (lower = more efficient)
- alpha: LoRA strength
- numAlternatingIterations: How many times to optimize initialization (3-5 is good)
- quantizationType: NF4 recommended for best results
- Other parameters: Same as QLoRA
Initialization process (this happens once):
- Look at your original weights
- Try compressing them
- See what errors compression creates
- Adjust LoRA to fix those errors
- Repeat steps 2-4 several times to find the best combination
- Save the optimized compression and LoRA
This extra work during initialization pays off with better training results!
Exceptions
- ArgumentNullException
Thrown when baseLayer is null.
- ArgumentException
Thrown when the base layer doesn't have 1D input/output shapes or when parameters are invalid.
Properties
AlternatingIterations
Gets the number of alternating optimization iterations used during initialization.
public int AlternatingIterations { get; }
Property Value
BlockSize
Gets the quantization block size.
public int BlockSize { get; }
Property Value
Quantization
Gets the quantization type used for base layer weights.
public LoftQAdapter<T>.QuantizationType Quantization { get; }
Property Value
UsesDoubleQuantization
Gets whether double quantization is enabled.
public bool UsesDoubleQuantization { get; }
Property Value
Methods
Backward(Tensor<T>)
Performs the backward pass (only updates LoRA if base is frozen).
public override Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>Gradient from next layer.
Returns
- Tensor<T>
Gradient for previous layer.
Remarks
For Beginners: Training works exactly like QLoRA: - Only LoRA parameters are updated (if base is frozen) - Gradients flow through both paths - Memory efficient because base stays frozen
The benefit of LoftQ appears in faster convergence and better final accuracy, not in the training process itself.
Forward(Tensor<T>)
Performs the forward pass through quantized base layer and LoRA.
public override Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>Input tensor.
Returns
- Tensor<T>
Combined output from quantized base and LoRA layers.
Remarks
Forward pass: 1. Dequantize base weights (cached) 2. Compute base output with dequantized weights 3. Compute LoRA output 4. Return sum
For Beginners: This works exactly like QLoRA's forward pass: - Decompress the base weights - Run input through decompressed base - Run input through LoRA adapter - Add results together
The difference from QLoRA is invisible here - it's all in the initialization! LoftQ's better LoRA parameters lead to better combined results.
MergeToOriginalLayer()
Merges LoRA adaptation into base layer and returns merged layer.
public override ILayer<T> MergeToOriginalLayer()
Returns
- ILayer<T>
New DenseLayer with merged and optionally quantized weights.
Remarks
Merging process: 1. Dequantize base weights 2. Get LoRA weight contribution 3. Merge: W_merged = W_base + W_lora 4. Create new layer with merged weights
For Beginners: After training, you can "bake in" the LoRA improvements: - Decompress the base weights - Add the LoRA corrections - Create a single layer with all improvements - Optionally compress again for deployment
This gives you a single efficient layer with all the benefits of LoftQ training!
ResetState()
Resets the internal state of the adapter.
public override void ResetState()
Remarks
For Beginners: Clears cached data and resets both layers. Useful when starting a new batch or task.
UpdateParametersFromLayers()
Updates the parameter vector from both layers.
protected override void UpdateParametersFromLayers()