LLM Fine-tuning Tutorial
{: .no_toc }
Fine-tune large language models efficiently with LoRA and QLoRA. {: .fs-6 .fw-300 }
Table of contents
{: .no_toc .text-delta }
- TOC {:toc}
Overview
AiDotNet provides 37+ LoRA adapters for efficient fine-tuning:
- LoRA: Low-Rank Adaptation
- QLoRA: 4-bit Quantized LoRA
- DoRA: Weight-Decomposed LoRA
- AdaLoRA: Adaptive LoRA
- And many more!
Why LoRA?
| Method | Memory (7B Model) | Parameters Trained |
|---|---|---|
| Full Fine-tune | 28+ GB | 100% |
| LoRA (r=8) | 8-12 GB | ~0.1% |
| QLoRA (4-bit) | 4-6 GB | ~0.1% |
Basic LoRA
using AiDotNet.LoRA;
using AiDotNet.ModelLoading;
// Load base model
var model = await HuggingFaceHub.LoadModelAsync<float>("microsoft/phi-2");
// Configure LoRA
var loraConfig = new LoRAConfig<float>
{
Rank = 8, // Low-rank dimension
Alpha = 16, // Scaling factor
TargetModules = ["q_proj", "v_proj"], // Layers to adapt
Dropout = 0.05f
};
// Apply LoRA adapters
var loraModel = model.ApplyLoRA(loraConfig);
Console.WriteLine($"Trainable parameters: {loraModel.GetTrainableParameterCount():N0}");
// ~0.1% of original parameters
Training with LoRA
// Prepare training data
var trainingData = new[]
{
new TrainingSample { Input = "Translate to French: Hello", Output = "Bonjour" },
new TrainingSample { Input = "Translate to French: Goodbye", Output = "Au revoir" }
};
// Configure training
var trainingConfig = new LoRATrainingConfig<float>
{
LearningRate = 1e-4f,
BatchSize = 4,
GradientAccumulationSteps = 4,
Epochs = 3,
WarmupSteps = 100,
WeightDecay = 0.01f
};
// Train
await loraModel.TrainAsync(trainingData, trainingConfig);
QLoRA (4-bit Quantized)
Train even larger models with minimal memory:
using AiDotNet.LoRA;
using AiDotNet.Quantization;
// Configure 4-bit quantization
var quantConfig = new QuantizationConfig<float>
{
QuantizationType = QuantizationType.NF4, // 4-bit NormalFloat
ComputeType = DataType.BFloat16,
DoubleQuantization = true // Quantize the quantization constants
};
// Load with quantization
var quantizedModel = await HuggingFaceHub.LoadModelAsync<float>(
"meta-llama/Llama-2-7b-hf",
quantConfig);
// Apply LoRA on quantized model
var qloraConfig = new QLoRAConfig<float>
{
Rank = 8,
Alpha = 16,
TargetModules = ["q_proj", "k_proj", "v_proj", "o_proj"],
Dropout = 0.05f
};
var qloraModel = quantizedModel.ApplyQLoRA(qloraConfig);
// Train (uses much less memory!)
await qloraModel.TrainAsync(trainingData, trainingConfig);
Saving and Loading Adapters
// Save only the LoRA adapters (small file)
await loraModel.SaveAdaptersAsync("my-lora-adapters");
// Load adapters onto base model
var baseModel = await HuggingFaceHub.LoadModelAsync<float>("microsoft/phi-2");
var loadedLoraModel = baseModel.LoadLoRAAdapters("my-lora-adapters");
// Merge adapters into base model (for deployment)
var mergedModel = loraModel.MergeAndUnload();
await mergedModel.SaveAsync("merged-model");
LoRA Variants
DoRA (Weight-Decomposed LoRA)
var doraConfig = new DoRAConfig<float>
{
Rank = 8,
Alpha = 16,
TargetModules = ["q_proj", "v_proj"],
DecomposeMagnitude = true
};
var doraModel = model.ApplyDoRA(doraConfig);
AdaLoRA (Adaptive Rank)
var adaLoraConfig = new AdaLoRAConfig<float>
{
InitialRank = 12,
TargetRank = 8,
Alpha = 16,
BetaStart = 0.85f,
BetaEnd = 0.95f
};
var adaLoraModel = model.ApplyAdaLoRA(adaLoraConfig);
VeRA (Vector-based LoRA)
var veraConfig = new VeRAConfig<float>
{
Rank = 256, // Higher rank possible due to shared matrices
Alpha = 16,
SharedAcrossLayers = true
};
var veraModel = model.ApplyVeRA(veraConfig);
Multiple LoRA Adapters
Switch between different adapters at inference:
// Load base model
var model = await HuggingFaceHub.LoadModelAsync<float>("microsoft/phi-2");
// Load multiple adapters
model.LoadLoRAAdapters("translation-adapter", "translation");
model.LoadLoRAAdapters("summarization-adapter", "summarization");
// Use specific adapter
model.SetActiveAdapter("translation");
var translation = model.Generate("Translate to French: Hello");
model.SetActiveAdapter("summarization");
var summary = model.Generate("Summarize: ...");
Dataset Preparation
Instruction Fine-tuning Format
var dataset = new[]
{
new InstructionSample
{
Instruction = "Summarize the following text.",
Input = "AiDotNet is a machine learning framework...",
Output = "AiDotNet is a .NET ML framework."
}
};
Chat Format
var chatDataset = new[]
{
new ChatSample
{
Messages = new[]
{
new Message { Role = "user", Content = "What is AI?" },
new Message { Role = "assistant", Content = "AI is..." }
}
}
};
Best Practices
Rank Selection
| Task | Recommended Rank |
|---|---|
| Simple tasks | 4-8 |
| Complex tasks | 16-32 |
| Multi-task | 32-64 |
Target Modules
// Attention layers only (most efficient)
TargetModules = ["q_proj", "v_proj"]
// All attention layers
TargetModules = ["q_proj", "k_proj", "v_proj", "o_proj"]
// Attention + MLP (most expressive)
TargetModules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
Training Tips
- Start small: Try rank 4-8 first
- Learning rate: Lower than full fine-tuning (1e-4 to 5e-5)
- Batch size: Use gradient accumulation if memory limited
- Regularization: Use dropout (0.05-0.1)
- Early stopping: Monitor validation loss
Memory Comparison
| Model | Full FT | LoRA | QLoRA |
|---|---|---|---|
| 7B | 28+ GB | 10 GB | 5 GB |
| 13B | 52+ GB | 18 GB | 8 GB |
| 70B | OOM | 90 GB | 24 GB |