Class LoRADropAdapter<T>
LoRA-drop implementation: LoRA with dropout regularization.
public class LoRADropAdapter<T> : LoRAAdapterBase<T>, IDisposable, ILoRAAdapter<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
LayerBase<T>LoRADropAdapter<T>
- Implements
-
ILoRAAdapter<T>ILayer<T>
- Inherited Members
Remarks
LoRA-drop extends standard LoRA by adding dropout to the LoRA components during training. During the forward pass in training mode, a random subset of LoRA components are "dropped out" (set to zero), forcing the model to learn more robust adaptations that don't rely on any single component.
Key differences from standard LoRA: - Applies dropout to LoRA output during training - Scales LoRA output by (1 - dropout_rate) during inference - Improves generalization and reduces overfitting - Particularly useful when adaptation data is limited
For Beginners: LoRA-drop adds dropout regularization to LoRA adapters.
Dropout is a technique where during training, we randomly "turn off" some neurons or components. This prevents the model from becoming too dependent on specific components and forces it to learn more general patterns.
Think of it like practicing a skill with random handicaps:
- Sometimes you practice with your left hand tied behind your back
- Sometimes you practice blindfolded
- This forces you to develop multiple strategies instead of relying on one approach
LoRA-drop applies this to LoRA adaptations:
- During training: Randomly drop some LoRA components (set them to zero)
- During inference: Use all components but scale them appropriately
- Result: More robust adaptations that generalize better to new data
Recommended dropout rates:
- 0.1 (10%): Light regularization, good starting point
- 0.2 (20%): Moderate regularization, common choice
- 0.3 (30%): Strong regularization, for small adaptation datasets
- Higher rates (>0.5): Typically too aggressive, may harm performance
When to use LoRA-drop over standard LoRA:
- You have limited adaptation data (risk of overfitting)
- You need better generalization to unseen data
- You're fine-tuning on a very specific task but need to maintain general capabilities
- You've observed overfitting with standard LoRA
Constructors
LoRADropAdapter(ILayer<T>, int, double, double, bool, int?)
Initializes a new LoRA-drop adapter with dropout regularization.
public LoRADropAdapter(ILayer<T> baseLayer, int rank, double dropoutRate, double alpha = -1, bool freezeBaseLayer = true, int? seed = null)
Parameters
baseLayerILayer<T>The layer to adapt with LoRA.
rankintThe rank of the LoRA decomposition.
dropoutRatedoubleThe dropout rate (probability of dropping a component). Common values: 0.1-0.3.
alphadoubleThe LoRA scaling factor (defaults to rank if negative).
freezeBaseLayerboolWhether to freeze the base layer's parameters during training.
seedint?Random seed for reproducible dropout masks (optional).
Remarks
For Beginners: This creates a LoRA adapter with dropout regularization.
Parameters:
- baseLayer: The layer you want to adapt
- rank: How much compression to use (same as standard LoRA)
- dropoutRate: What fraction to randomly drop during training (0.1 = 10%, 0.2 = 20%, etc.)
- alpha: How strong the LoRA adaptation is
- freezeBaseLayer: Whether to freeze the original layer (usually true)
- seed: Optional random seed for reproducible results
Example usage:
// Create a LoRA-drop adapter with 20% dropout
var adapter = new LoRADropAdapter<double>(denseLayer, rank: 8, dropoutRate: 0.2);
// Training mode (dropout active)
adapter.SetTraining(true);
var trainOutput = adapter.Forward(trainInput);
// Inference mode (dropout inactive)
adapter.SetTraining(false);
var testOutput = adapter.Forward(testInput);
Exceptions
- ArgumentNullException
Thrown when baseLayer is null.
- ArgumentException
Thrown when dropoutRate is not in [0, 1) range.
Properties
DropoutRate
Gets the dropout rate used for regularization.
public double DropoutRate { get; }
Property Value
IsTraining
Gets or sets whether the layer is in training mode.
public bool IsTraining { get; set; }
Property Value
Remarks
Set to true during training (dropout active), false during inference (dropout inactive).
Methods
Backward(Tensor<T>)
Performs the backward pass with dropout mask applied to gradients.
public override Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>Gradient flowing back from the next layer.
Returns
- Tensor<T>
Gradient to pass to the previous layer.
Remarks
During backpropagation, gradients are only propagated through components that were not dropped during the forward pass. This is achieved by applying the same dropout mask to the gradients and scaling appropriately.
For Beginners: This propagates gradients back through the layer.
Key insight: Gradients only flow through the components that were active during the forward pass. If a component was dropped (set to zero), its gradient is also zero - we don't update it based on this training example.
This ensures that:
- Dropped components don't get updated (they were "turned off")
- Kept components get normal gradient updates
- The scaling from the forward pass is preserved in gradients
The result is that the model learns to work with different subsets of components, making it more robust and less prone to overfitting.
Forward(Tensor<T>)
Performs the forward pass with dropout applied to LoRA output.
public override Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>Input tensor.
Returns
- Tensor<T>
Sum of base layer output and dropout-regularized LoRA output.
Remarks
During training: 1. Generate new dropout mask 2. Compute LoRA output 3. Apply dropout mask (zero out dropped components) 4. Scale kept components by 1/(1-dropout_rate) to maintain expected value 5. Add to base layer output
During inference:
- Compute LoRA output
- Scale by (1-dropout_rate) to match training expectation
- Add to base layer output
For Beginners: This runs the input through the layer with dropout applied.
Training mode:
- Randomly drops some LoRA components
- Scales up the remaining components to compensate
- This forces the model to not rely on any single component
Inference mode:
- Uses all components
- Scales them down to match what the model learned during training
- This ensures consistent behavior between training and testing
The scaling ensures that the expected output is the same whether or not dropout is active, which is important for stable training and accurate predictions.
MergeToOriginalLayer()
Merges the LoRA adaptation into the base layer and returns the merged layer.
public override ILayer<T> MergeToOriginalLayer()
Returns
- ILayer<T>
A new layer with LoRA weights merged into the base layer's weights.
Remarks
This method merges the trained LoRA weights into the base layer to create a single layer that includes the adaptations. The dropout mechanism is not preserved in the merged layer - only the learned weights are incorporated.
For Beginners: After training with LoRA-drop, you can "bake in" the adaptations.
This creates a regular layer that:
- Contains the original weights plus the learned LoRA adaptations
- Doesn't need the LoRA machinery anymore
- Is faster for inference (no separate LoRA computation)
- Doesn't include dropout (dropout is only for training)
The merging process:
- Computes the full LoRA weight contribution (A × B matrices)
- Adds these weights to the base layer's weights
- Creates a new DenseLayer with the combined weights
Note: The merged layer is in "inference mode" - it represents what the model learned during training but doesn't include the dropout mechanism.
Exceptions
- InvalidOperationException
Thrown when the base layer type is not DenseLayer or FullyConnectedLayer.
ResetState()
Resets the internal state of both layers and clears the dropout mask.
public override void ResetState()
Remarks
For Beginners: This clears all cached data from both the base layer and LoRA layer, and resets the dropout mask. It's useful when starting to process a new batch or sequence.
SetTraining(bool)
Sets whether the layer is in training mode or inference mode.
public void SetTraining(bool training)
Parameters
trainingboolTrue for training mode (dropout active), false for inference mode (dropout inactive).
Remarks
This method should be called to switch between training and inference modes. During training, dropout is applied. During inference, dropout is disabled and outputs are scaled appropriately.
For Beginners: Call this before you start training or testing: - Before training: `adapter.SetTraining(true)` - Before testing/inference: `adapter.SetTraining(false)`
This ensures dropout is only used during training, not when making predictions.