Class LoRADropAdapter<T>

Namespace: AiDotNet.LoRA.Adapters

Assembly: AiDotNet.dll

LoRA-drop implementation: LoRA with dropout regularization.

public class LoRADropAdapter<T> : LoRAAdapterBase<T>, IDisposable, ILoRAAdapter<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>

Type Parameters

T: The numeric type used for calculations, typically float or double.

Inheritance: object

LayerBase<T>

LoRAAdapterBase<T>

LoRADropAdapter<T>

Implements: IDisposable

ILoRAAdapter<T>

ILayer<T>

IJitCompilable<T>

IDiagnosticsProvider

IWeightLoadable<T>

Inherited Members: LoRAAdapterBase<T>._baseLayer

LoRAAdapterBase<T>._loraLayer

LoRAAdapterBase<T>._freezeBaseLayer

LoRAAdapterBase<T>.BaseLayer

LoRAAdapterBase<T>.LoRALayer

LoRAAdapterBase<T>.IsBaseLayerFrozen

LoRAAdapterBase<T>.Rank

LoRAAdapterBase<T>.Alpha

LoRAAdapterBase<T>.ParameterCount

LoRAAdapterBase<T>.SupportsTraining

LoRAAdapterBase<T>.CreateLoRALayer(int, double)

LoRAAdapterBase<T>.UpdateParameters(T)

LoRAAdapterBase<T>.GetParameters()

LoRAAdapterBase<T>.SetParameters(Vector<T>)

LoRAAdapterBase<T>.CreateMergedLayerWithClone(Vector<T>)

LoRAAdapterBase<T>.MergeToDenseOrFullyConnected()

LoRAAdapterBase<T>.UpdateParametersFromLayers()

LoRAAdapterBase<T>.SupportsJitCompilation

LoRAAdapterBase<T>.ExportComputationGraph(List<ComputationNode<T>>)

LayerBase<T>.Engine

LayerBase<T>.ScalarActivation

LayerBase<T>.VectorActivation

LayerBase<T>.UsingVectorActivation

LayerBase<T>.NumOps

LayerBase<T>.Random

LayerBase<T>.Parameters

LayerBase<T>.ParameterGradients

LayerBase<T>.InputShape

LayerBase<T>.InputShapes

LayerBase<T>.UpdateInputShape(int[])

LayerBase<T>.OutputShape

LayerBase<T>.IsTrainingMode

LayerBase<T>.InitializationStrategy

LayerBase<T>.IsInitialized

LayerBase<T>.InitializationLock

LayerBase<T>.EnsureInitialized()

LayerBase<T>.UseAutodiff

LayerBase<T>.SetTrainingMode(bool)

LayerBase<T>.GetParameterGradients()

LayerBase<T>.ClearGradients()

LayerBase<T>.GetInputShape()

LayerBase<T>.GetInputShapes()

LayerBase<T>.GetOutputShape()

LayerBase<T>.GetWeights()

LayerBase<T>.GetBiases()

LayerBase<T>.MapActivationToFused()

LayerBase<T>.SupportsGpuExecution

LayerBase<T>.SupportsGpuTraining

LayerBase<T>.CanExecuteOnGpu

LayerBase<T>.CanTrainOnGpu

LayerBase<T>.ForwardGpu(params IGpuTensor<T>[])

LayerBase<T>.BackwardGpu(IGpuTensor<T>)

LayerBase<T>.UpdateParametersGpu(IGpuOptimizerConfig)

LayerBase<T>.UploadWeightsToGpu()

LayerBase<T>.DownloadWeightsFromGpu()

LayerBase<T>.ZeroGradientsGpu()

LayerBase<T>.GetActivationTypes()

LayerBase<T>.Forward(params Tensor<T>[])

LayerBase<T>.ApplyActivation(Tensor<T>)

LayerBase<T>.ApplyActivation(Vector<T>)

LayerBase<T>.ActivateTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ActivateTensor(IVectorActivationFunction<T>, Tensor<T>)

LayerBase<T>.CalculateInputShape(int, int, int)

LayerBase<T>.CalculateOutputShape(int, int, int)

LayerBase<T>.Clone()

LayerBase<T>.DerivativeTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ApplyActivationDerivative(T, T)

LayerBase<T>.ApplyActivationDerivative(Tensor<T>, Tensor<T>)

LayerBase<T>.ComputeActivationJacobian(Vector<T>)

LayerBase<T>.ApplyActivationDerivative(Vector<T>, Vector<T>)

LayerBase<T>.UpdateParameters(Vector<T>)

LayerBase<T>.Serialize(BinaryWriter)

LayerBase<T>.Deserialize(BinaryReader)

LayerBase<T>.GetDiagnostics()

LayerBase<T>.ApplyActivationToGraph(ComputationNode<T>)

LayerBase<T>.CanActivationBeJitted()

LayerBase<T>.RegisterTrainableParameter(Tensor<T>, PersistentTensorRole)

LayerBase<T>.InvalidateTrainableParameter(Tensor<T>)

LayerBase<T>.HasGpuActivation()

LayerBase<T>.ApplyActivationForwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.ApplyActivationBackwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.GetFusedActivationType()

LayerBase<T>.ApplyGpuActivation(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int, FusedActivationType)

LayerBase<T>.ApplyGpuActivationBackward(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int, FusedActivationType, float)

LayerBase<T>.Dispose()

LayerBase<T>.Dispose(bool)

LayerBase<T>.WeightParameterName

LayerBase<T>.BiasParameterName

LayerBase<T>.SetWeights(Tensor<T>)

LayerBase<T>.SetBiases(Tensor<T>)

LayerBase<T>.GetParameterNames()

LayerBase<T>.TryGetParameter(string, out Tensor<T>)

LayerBase<T>.SetParameter(string, Tensor<T>)

LayerBase<T>.GetParameterShape(string)

LayerBase<T>.NamedParameterCount

LayerBase<T>.ValidateWeights(IEnumerable<string>, Func<string, string>)

LayerBase<T>.LoadWeights(Dictionary<string, Tensor<T>>, Func<string, string>, bool)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

LoRA-drop extends standard LoRA by adding dropout to the LoRA components during training. During the forward pass in training mode, a random subset of LoRA components are "dropped out" (set to zero), forcing the model to learn more robust adaptations that don't rely on any single component.

Key differences from standard LoRA: - Applies dropout to LoRA output during training - Scales LoRA output by (1 - dropout_rate) during inference - Improves generalization and reduces overfitting - Particularly useful when adaptation data is limited

For Beginners: LoRA-drop adds dropout regularization to LoRA adapters.

Dropout is a technique where during training, we randomly "turn off" some neurons or components. This prevents the model from becoming too dependent on specific components and forces it to learn more general patterns.

Think of it like practicing a skill with random handicaps:

Sometimes you practice with your left hand tied behind your back
Sometimes you practice blindfolded
This forces you to develop multiple strategies instead of relying on one approach

LoRA-drop applies this to LoRA adaptations:

During training: Randomly drop some LoRA components (set them to zero)
During inference: Use all components but scale them appropriately
Result: More robust adaptations that generalize better to new data

Recommended dropout rates:

0.1 (10%): Light regularization, good starting point
0.2 (20%): Moderate regularization, common choice
0.3 (30%): Strong regularization, for small adaptation datasets
Higher rates (>0.5): Typically too aggressive, may harm performance

When to use LoRA-drop over standard LoRA:

You have limited adaptation data (risk of overfitting)
You need better generalization to unseen data
You're fine-tuning on a very specific task but need to maintain general capabilities
You've observed overfitting with standard LoRA

Constructors

LoRADropAdapter(ILayer<T>, int, double, double, bool, int?)

Initializes a new LoRA-drop adapter with dropout regularization.

public LoRADropAdapter(ILayer<T> baseLayer, int rank, double dropoutRate, double alpha = -1, bool freezeBaseLayer = true, int? seed = null)

Parameters

baseLayer ILayer<T>: The layer to adapt with LoRA.
rank int: The rank of the LoRA decomposition.
dropoutRate double: The dropout rate (probability of dropping a component). Common values: 0.1-0.3.
alpha double: The LoRA scaling factor (defaults to rank if negative).
freezeBaseLayer bool: Whether to freeze the base layer's parameters during training.
seed int?: Random seed for reproducible dropout masks (optional).

Remarks

For Beginners: This creates a LoRA adapter with dropout regularization.

Parameters:

baseLayer: The layer you want to adapt
rank: How much compression to use (same as standard LoRA)
dropoutRate: What fraction to randomly drop during training (0.1 = 10%, 0.2 = 20%, etc.)
alpha: How strong the LoRA adaptation is
freezeBaseLayer: Whether to freeze the original layer (usually true)
seed: Optional random seed for reproducible results

Example usage:

// Create a LoRA-drop adapter with 20% dropout
var adapter = new LoRADropAdapter<double>(denseLayer, rank: 8, dropoutRate: 0.2);

// Training mode (dropout active)
adapter.SetTraining(true);
var trainOutput = adapter.Forward(trainInput);

// Inference mode (dropout inactive)
adapter.SetTraining(false);
var testOutput = adapter.Forward(testInput);

Exceptions

ArgumentNullException: Thrown when baseLayer is null.
ArgumentException: Thrown when dropoutRate is not in [0, 1) range.

Properties

DropoutRate

Gets the dropout rate used for regularization.

public double DropoutRate { get; }

Property Value

double

IsTraining

Gets or sets whether the layer is in training mode.

public bool IsTraining { get; set; }

Property Value

bool

Remarks

Set to true during training (dropout active), false during inference (dropout inactive).

Methods

Backward(Tensor<T>)

Performs the backward pass with dropout mask applied to gradients.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>: Gradient flowing back from the next layer.

Returns

Tensor<T>: Gradient to pass to the previous layer.

Remarks

During backpropagation, gradients are only propagated through components that were not dropped during the forward pass. This is achieved by applying the same dropout mask to the gradients and scaling appropriately.

For Beginners: This propagates gradients back through the layer.

Key insight: Gradients only flow through the components that were active during the forward pass. If a component was dropped (set to zero), its gradient is also zero - we don't update it based on this training example.

This ensures that:

Dropped components don't get updated (they were "turned off")
Kept components get normal gradient updates
The scaling from the forward pass is preserved in gradients

The result is that the model learns to work with different subsets of components, making it more robust and less prone to overfitting.

Forward(Tensor<T>)

Performs the forward pass with dropout applied to LoRA output.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>: Input tensor.

Returns

Tensor<T>: Sum of base layer output and dropout-regularized LoRA output.

Remarks

During training: 1. Generate new dropout mask 2. Compute LoRA output 3. Apply dropout mask (zero out dropped components) 4. Scale kept components by 1/(1-dropout_rate) to maintain expected value 5. Add to base layer output

During inference:

Compute LoRA output
Scale by (1-dropout_rate) to match training expectation
Add to base layer output

For Beginners: This runs the input through the layer with dropout applied.

Training mode:

Randomly drops some LoRA components
Scales up the remaining components to compensate
This forces the model to not rely on any single component

Inference mode:

Uses all components
Scales them down to match what the model learned during training
This ensures consistent behavior between training and testing

The scaling ensures that the expected output is the same whether or not dropout is active, which is important for stable training and accurate predictions.

MergeToOriginalLayer()

Merges the LoRA adaptation into the base layer and returns the merged layer.

public override ILayer<T> MergeToOriginalLayer()

Returns

ILayer<T>: A new layer with LoRA weights merged into the base layer's weights.

Remarks

This method merges the trained LoRA weights into the base layer to create a single layer that includes the adaptations. The dropout mechanism is not preserved in the merged layer - only the learned weights are incorporated.

For Beginners: After training with LoRA-drop, you can "bake in" the adaptations.

This creates a regular layer that:

Contains the original weights plus the learned LoRA adaptations
Doesn't need the LoRA machinery anymore
Is faster for inference (no separate LoRA computation)
Doesn't include dropout (dropout is only for training)

The merging process:

Computes the full LoRA weight contribution (A × B matrices)
Adds these weights to the base layer's weights
Creates a new DenseLayer with the combined weights

Note: The merged layer is in "inference mode" - it represents what the model learned during training but doesn't include the dropout mechanism.

Exceptions

InvalidOperationException: Thrown when the base layer type is not DenseLayer or FullyConnectedLayer.

ResetState()

Resets the internal state of both layers and clears the dropout mask.

public override void ResetState()

Remarks

For Beginners: This clears all cached data from both the base layer and LoRA layer, and resets the dropout mask. It's useful when starting to process a new batch or sequence.

SetTraining(bool)

Sets whether the layer is in training mode or inference mode.

public void SetTraining(bool training)

Parameters

training bool: True for training mode (dropout active), false for inference mode (dropout inactive).

Remarks

This method should be called to switch between training and inference modes. During training, dropout is applied. During inference, dropout is disabled and outputs are scaled appropriately.

For Beginners: Call this before you start training or testing: - Before training: `adapter.SetTraining(true)` - Before testing/inference: `adapter.SetTraining(false)`

This ensures dropout is only used during training, not when making predictions.

Table of Contents

Class LoRADropAdapter<T>

Type Parameters

Remarks

Constructors

LoRADropAdapter(ILayer<T>, int, double, double, bool, int?)

Parameters

Remarks

Exceptions

Properties

DropoutRate

Property Value

IsTraining

Property Value

Remarks

Methods

Backward(Tensor<T>)

Parameters

Returns

Remarks

Forward(Tensor<T>)

Parameters

Returns

Remarks

MergeToOriginalLayer()

Returns

Remarks

Exceptions

ResetState()

Remarks

SetTraining(bool)

Parameters

Remarks