Class DefaultLoRAConfiguration<T>
Default LoRA configuration that applies LoRA to all layers with trainable weight matrices.
public class DefaultLoRAConfiguration<T> : ILoRAConfiguration<T>
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
DefaultLoRAConfiguration<T>
- Implements
- Inherited Members
Remarks
This configuration implements an intelligent strategy: wrap all layers that have trainable weight matrices with StandardLoRAAdapter, and leave utility layers (activation, pooling, etc.) unchanged. This maximizes the benefits of LoRA across all applicable layer types.
Supported Layer Types (30+ layer types): - Dense/Linear layers (Dense, FullyConnected, FeedForward) - Convolutional layers (all Conv variants including depthwise, separable, dilated, etc.) - Recurrent layers (LSTM, GRU, ConvLSTM, Bidirectional) - Attention layers (Attention, MultiHeadAttention, SelfAttention) - Transformer layers (Encoder, Decoder) - Embedding layers (Embedding, PatchEmbedding) - Specialized layers (Highway, GatedLinearUnit, SqueezeAndExcitation, Capsule, CRF, etc.)
Available LoRA Variants: AiDotNet includes 32 cutting-edge LoRA variants for different use cases: - StandardLoRAAdapter: Generic LoRA for all layer types - QLoRAAdapter: 4-bit quantization for 75% memory reduction - DoRAAdapter: Weight decomposition (+3.7% on LLaMA-7B) - AdaLoRAAdapter: Adaptive rank allocation - VeRAAdapter: Shared matrices (10x fewer parameters) - LoRAPlusAdapter: Dual learning rates (2x faster convergence) - LoHaAdapter: Hadamard products for CNNs - LoKrAdapter: Kronecker products (57x compression) - DyLoRAAdapter: Dynamic rank training - RoSAAdapter: Robust to distribution shifts - DVoRAAdapter: DoRA+VeRA hybrid - LoRAFAAdapter: Frozen A matrix (50% reduction) - DeltaLoRAAdapter: Delta-based updates with momentum - LoRADropAdapter: Dropout regularization - PiSSAAdapter: SVD initialization (NeurIPS 2024) - GLoRAAdapter: Weight + activation adaptation - LongLoRAAdapter: Context length extension - MultiLoRAAdapter: Multi-task learning with routing - XLoRAAdapter: Mixture of experts - TiedLoRAAdapter: Weight tying (90% reduction) - ReLoRAAdapter: Restart mechanism prevents forgetting - LoftQAdapter: Alternating quantization+LoRA - QALoRAAdapter: Quantization-aware training - VBLoRAAdapter: Vector banks (2024) - SLoRAAdapter: Scalable serving (1000+ adapters) - MoRAAdapter: High-rank updates for knowledge tasks - LoRAXSAdapter: Extreme efficiency (100x compression) - FloraAdapter: Gradient compression view - ChainLoRAAdapter: Sequential task chaining - HRAAdapter: Hybrid low-rank + sparse - LoRETTAAdapter: Tensor-train decomposition - NOLAAdapter: Random basis (20x compression)
To use a specific variant, pass a factory function to the constructor. Example: new DefaultLoRAConfiguration<double>(rank: 8, adapterFactory: (layer, r, a, f) => new QLoRAAdapter<double>(layer, r, a, f))
For Beginners: This is a ready-to-use LoRA configuration for most common scenarios.
When you apply this configuration to a model:
- All Dense layers get wrapped with LoRA adapters
- All FullyConnected layers get wrapped with LoRA adapters
- All other layers (convolutional, pooling, etc.) pass through unchanged
This is perfect for:
- Fine-tuning pre-trained models on new tasks
- Adapting large language models with limited resources
- Training multiple task-specific adapters for the same base model
Example usage:
// Create a configuration with rank=8, alpha=8, and frozen base layers
var loraConfig = new DefaultLoRAConfiguration<double>(rank: 8, alpha: 8, freezeBaseLayer: true);
// Apply to all layers in your model
var adaptedLayers = model.Layers.Select(layer => loraConfig.ApplyLoRA(layer)).ToList();
The configuration respects these parameters:
- Rank: Controls compression (fewer parameters = lower rank)
- Alpha: Controls adaptation strength (typically same as rank)
- FreezeBaseLayer: Whether to freeze original weights (true for efficiency)
Constructors
DefaultLoRAConfiguration(int, double, bool, ILoRAAdapter<T>?)
Initializes a new DefaultLoRAConfiguration with the specified parameters.
public DefaultLoRAConfiguration(int rank, double alpha = -1, bool freezeBaseLayer = true, ILoRAAdapter<T>? loraAdapter = null)
Parameters
rankintThe rank of the low-rank decomposition (must be positive).
alphadoubleThe scaling factor for LoRA contributions (defaults to rank if negative).
freezeBaseLayerboolWhether to freeze base layers during training (default: true).
loraAdapterILoRAAdapter<T>Optional LoRA adapter to use. Defaults to StandardLoRAAdapter if null.
Remarks
For Beginners: This creates a configuration that will be applied to your model's layers.
Parameters explained:
- rank: How many "compression channels" to use (8 is a good starting point)
- alpha: How strong the LoRA effect is (use -1 to auto-set to rank value)
- freezeBaseLayer: Whether to lock original weights (true = more efficient, recommended)
Example configurations:
// Standard LoRA (default)
var standard = new DefaultLoRAConfiguration<double>(rank: 8, alpha: 8);
// QLoRA for 4-bit quantization (75% memory reduction)
var qloraAdapter = new QLoRAAdapter<double>(null, 8, 8, true);
var qlora = new DefaultLoRAConfiguration<double>(rank: 8, alpha: 8, loraAdapter: qloraAdapter);
// DoRA for improved weight decomposition (+3.7% accuracy on LLaMA-7B)
var doraAdapter = new DoRAAdapter<double>(null, 8, 8, true);
var dora = new DefaultLoRAConfiguration<double>(rank: 8, alpha: 8, loraAdapter: doraAdapter);
// VeRA for extreme parameter efficiency (10x fewer parameters)
var veraAdapter = new VeRAAdapter<double>(null, 8, 8, true);
var vera = new DefaultLoRAConfiguration<double>(rank: 8, alpha: 8, loraAdapter: veraAdapter);
Exceptions
- ArgumentException
Thrown when rank is not positive.
Properties
Alpha
Gets the scaling factor (alpha) for LoRA adaptations.
public double Alpha { get; }
Property Value
Remarks
Alpha controls how strongly LoRA adaptations affect outputs. Common practice: alpha = rank (for scaling factor of 1.0) Set to -1 to use rank as alpha (automatic scaling).
FreezeBaseLayer
Gets whether base layers should be frozen during training.
public bool FreezeBaseLayer { get; }
Property Value
Remarks
When true (typical), only LoRA parameters are trained while base layer weights remain frozen. This dramatically reduces memory and compute requirements.
When false, both base layer and LoRA parameters are trained. This uses more resources but may achieve better results in some scenarios.
Rank
Gets the rank of the low-rank decomposition to use for adapted layers.
public int Rank { get; }
Property Value
Remarks
The rank determines the number of parameters in the LoRA adaptation. Lower rank = fewer parameters = more efficient but less flexible.
Common values: - 1-4: Minimal parameters, very efficient - 8: Good default balance - 16-32: More flexibility - 64+: Approaching full fine-tuning
Methods
ApplyLoRA(ILayer<T>)
Applies LoRA adaptation to layers with trainable weight matrices.
public ILayer<T> ApplyLoRA(ILayer<T> layer)
Parameters
layerILayer<T>The layer to potentially adapt with LoRA.
Returns
- ILayer<T>
A StandardLoRAAdapter wrapping the layer if it has trainable weights, otherwise returns the original layer unchanged.
Remarks
This method examines the layer type and wraps it with StandardLoRAAdapter if it's a layer type that benefits from LoRA adaptation (has trainable weight matrices).
Supported Layer Types: - Dense/Linear: DenseLayer, FullyConnectedLayer, FeedForwardLayer - Convolutional: ConvolutionalLayer, DeconvolutionalLayer, DepthwiseSeparableConvolutionalLayer, DilatedConvolutionalLayer, SeparableConvolutionalLayer, SubpixelConvolutionalLayer - Recurrent: LSTMLayer, GRULayer, RecurrentLayer, ConvLSTMLayer, BidirectionalLayer - Attention: AttentionLayer, MultiHeadAttentionLayer, SelfAttentionLayer - Transformer: TransformerEncoderLayer, TransformerDecoderLayer - Embedding: EmbeddingLayer, PatchEmbeddingLayer - Specialized: LocallyConnectedLayer, HighwayLayer, GatedLinearUnitLayer, SqueezeAndExcitationLayer - Advanced: CapsuleLayer, PrimaryCapsuleLayer, DigitCapsuleLayer, ConditionalRandomFieldLayer
Excluded Layer Types:
- Activation, Pooling, Dropout, Flatten, Reshape, Normalization (no trainable weights)
- GraphConvolutionalLayer (requires specialized adapter that implements IGraphConvolutionLayer)
For Beginners: This method decides whether to add LoRA to each layer.
Decision logic:
- If the layer has trainable weight matrices → Wrap it with StandardLoRAAdapter
- If the layer is just doing math operations (activation, pooling, etc.) → Return unchanged
This intelligent approach means:
- LoRA is applied to all layers that can benefit from it
- Works with Dense, Convolutional, Recurrent, Attention, and Transformer layers
- Utility layers (pooling, dropout, etc.) pass through unchanged
Example:
var config = new DefaultLoRAConfiguration<double>(rank: 8);
// Dense layer gets adapted
var denseLayer = new DenseLayer<double>(100, 50);
var adapted1 = config.ApplyLoRA(denseLayer); // Returns StandardLoRAAdapter
// Convolutional layer gets adapted
var convLayer = new ConvolutionalLayer<double>(...);
var adapted2 = config.ApplyLoRA(convLayer); // Returns StandardLoRAAdapter
// Attention layer gets adapted
var attnLayer = new MultiHeadAttentionLayer<double>(...);
var adapted3 = config.ApplyLoRA(attnLayer); // Returns StandardLoRAAdapter
// Pooling layer passes through (no weights to adapt)
var poolLayer = new MaxPoolingLayer<double>(...);
var unchanged = config.ApplyLoRA(poolLayer); // Returns original poolLayer