Table of Contents

Class LoRADropAdapter<T>

Namespace
AiDotNet.LoRA.Adapters
Assembly
AiDotNet.dll

LoRA-drop implementation: LoRA with dropout regularization.

public class LoRADropAdapter<T> : LoRAAdapterBase<T>, IDisposable, ILoRAAdapter<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>

Type Parameters

T

The numeric type used for calculations, typically float or double.

Inheritance
LoRADropAdapter<T>
Implements
Inherited Members

Remarks

LoRA-drop extends standard LoRA by adding dropout to the LoRA components during training. During the forward pass in training mode, a random subset of LoRA components are "dropped out" (set to zero), forcing the model to learn more robust adaptations that don't rely on any single component.

Key differences from standard LoRA: - Applies dropout to LoRA output during training - Scales LoRA output by (1 - dropout_rate) during inference - Improves generalization and reduces overfitting - Particularly useful when adaptation data is limited

For Beginners: LoRA-drop adds dropout regularization to LoRA adapters.

Dropout is a technique where during training, we randomly "turn off" some neurons or components. This prevents the model from becoming too dependent on specific components and forces it to learn more general patterns.

Think of it like practicing a skill with random handicaps:

  • Sometimes you practice with your left hand tied behind your back
  • Sometimes you practice blindfolded
  • This forces you to develop multiple strategies instead of relying on one approach

LoRA-drop applies this to LoRA adaptations:

  • During training: Randomly drop some LoRA components (set them to zero)
  • During inference: Use all components but scale them appropriately
  • Result: More robust adaptations that generalize better to new data

Recommended dropout rates:

  • 0.1 (10%): Light regularization, good starting point
  • 0.2 (20%): Moderate regularization, common choice
  • 0.3 (30%): Strong regularization, for small adaptation datasets
  • Higher rates (>0.5): Typically too aggressive, may harm performance

When to use LoRA-drop over standard LoRA:

  • You have limited adaptation data (risk of overfitting)
  • You need better generalization to unseen data
  • You're fine-tuning on a very specific task but need to maintain general capabilities
  • You've observed overfitting with standard LoRA

Constructors

LoRADropAdapter(ILayer<T>, int, double, double, bool, int?)

Initializes a new LoRA-drop adapter with dropout regularization.

public LoRADropAdapter(ILayer<T> baseLayer, int rank, double dropoutRate, double alpha = -1, bool freezeBaseLayer = true, int? seed = null)

Parameters

baseLayer ILayer<T>

The layer to adapt with LoRA.

rank int

The rank of the LoRA decomposition.

dropoutRate double

The dropout rate (probability of dropping a component). Common values: 0.1-0.3.

alpha double

The LoRA scaling factor (defaults to rank if negative).

freezeBaseLayer bool

Whether to freeze the base layer's parameters during training.

seed int?

Random seed for reproducible dropout masks (optional).

Remarks

For Beginners: This creates a LoRA adapter with dropout regularization.

Parameters:

  • baseLayer: The layer you want to adapt
  • rank: How much compression to use (same as standard LoRA)
  • dropoutRate: What fraction to randomly drop during training (0.1 = 10%, 0.2 = 20%, etc.)
  • alpha: How strong the LoRA adaptation is
  • freezeBaseLayer: Whether to freeze the original layer (usually true)
  • seed: Optional random seed for reproducible results

Example usage:

// Create a LoRA-drop adapter with 20% dropout
var adapter = new LoRADropAdapter<double>(denseLayer, rank: 8, dropoutRate: 0.2);

// Training mode (dropout active)
adapter.SetTraining(true);
var trainOutput = adapter.Forward(trainInput);

// Inference mode (dropout inactive)
adapter.SetTraining(false);
var testOutput = adapter.Forward(testInput);

Exceptions

ArgumentNullException

Thrown when baseLayer is null.

ArgumentException

Thrown when dropoutRate is not in [0, 1) range.

Properties

DropoutRate

Gets the dropout rate used for regularization.

public double DropoutRate { get; }

Property Value

double

IsTraining

Gets or sets whether the layer is in training mode.

public bool IsTraining { get; set; }

Property Value

bool

Remarks

Set to true during training (dropout active), false during inference (dropout inactive).

Methods

Backward(Tensor<T>)

Performs the backward pass with dropout mask applied to gradients.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>

Gradient flowing back from the next layer.

Returns

Tensor<T>

Gradient to pass to the previous layer.

Remarks

During backpropagation, gradients are only propagated through components that were not dropped during the forward pass. This is achieved by applying the same dropout mask to the gradients and scaling appropriately.

For Beginners: This propagates gradients back through the layer.

Key insight: Gradients only flow through the components that were active during the forward pass. If a component was dropped (set to zero), its gradient is also zero - we don't update it based on this training example.

This ensures that:

  • Dropped components don't get updated (they were "turned off")
  • Kept components get normal gradient updates
  • The scaling from the forward pass is preserved in gradients

The result is that the model learns to work with different subsets of components, making it more robust and less prone to overfitting.

Forward(Tensor<T>)

Performs the forward pass with dropout applied to LoRA output.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>

Input tensor.

Returns

Tensor<T>

Sum of base layer output and dropout-regularized LoRA output.

Remarks

During training: 1. Generate new dropout mask 2. Compute LoRA output 3. Apply dropout mask (zero out dropped components) 4. Scale kept components by 1/(1-dropout_rate) to maintain expected value 5. Add to base layer output

During inference:

  1. Compute LoRA output
  2. Scale by (1-dropout_rate) to match training expectation
  3. Add to base layer output

For Beginners: This runs the input through the layer with dropout applied.

Training mode:

  • Randomly drops some LoRA components
  • Scales up the remaining components to compensate
  • This forces the model to not rely on any single component

Inference mode:

  • Uses all components
  • Scales them down to match what the model learned during training
  • This ensures consistent behavior between training and testing

The scaling ensures that the expected output is the same whether or not dropout is active, which is important for stable training and accurate predictions.

MergeToOriginalLayer()

Merges the LoRA adaptation into the base layer and returns the merged layer.

public override ILayer<T> MergeToOriginalLayer()

Returns

ILayer<T>

A new layer with LoRA weights merged into the base layer's weights.

Remarks

This method merges the trained LoRA weights into the base layer to create a single layer that includes the adaptations. The dropout mechanism is not preserved in the merged layer - only the learned weights are incorporated.

For Beginners: After training with LoRA-drop, you can "bake in" the adaptations.

This creates a regular layer that:

  • Contains the original weights plus the learned LoRA adaptations
  • Doesn't need the LoRA machinery anymore
  • Is faster for inference (no separate LoRA computation)
  • Doesn't include dropout (dropout is only for training)

The merging process:

  1. Computes the full LoRA weight contribution (A × B matrices)
  2. Adds these weights to the base layer's weights
  3. Creates a new DenseLayer with the combined weights

Note: The merged layer is in "inference mode" - it represents what the model learned during training but doesn't include the dropout mechanism.

Exceptions

InvalidOperationException

Thrown when the base layer type is not DenseLayer or FullyConnectedLayer.

ResetState()

Resets the internal state of both layers and clears the dropout mask.

public override void ResetState()

Remarks

For Beginners: This clears all cached data from both the base layer and LoRA layer, and resets the dropout mask. It's useful when starting to process a new batch or sequence.

SetTraining(bool)

Sets whether the layer is in training mode or inference mode.

public void SetTraining(bool training)

Parameters

training bool

True for training mode (dropout active), false for inference mode (dropout inactive).

Remarks

This method should be called to switch between training and inference modes. During training, dropout is applied. During inference, dropout is disabled and outputs are scaled appropriately.

For Beginners: Call this before you start training or testing: - Before training: `adapter.SetTraining(true)` - Before testing/inference: `adapter.SetTraining(false)`

This ensures dropout is only used during training, not when making predictions.