Class DyLoRAAdapter<T>

Namespace: AiDotNet.LoRA.Adapters

Assembly: AiDotNet.dll

DyLoRA (Dynamic LoRA) adapter that trains with multiple ranks simultaneously.

public class DyLoRAAdapter<T> : LoRAAdapterBase<T>, IDisposable, ILoRAAdapter<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>

Type Parameters

T: The numeric type used for calculations, typically float or double.

Inheritance: object

LayerBase<T>

LoRAAdapterBase<T>

DyLoRAAdapter<T>

Implements: IDisposable

ILoRAAdapter<T>

ILayer<T>

IJitCompilable<T>

IDiagnosticsProvider

IWeightLoadable<T>

Inherited Members: LoRAAdapterBase<T>._baseLayer

LoRAAdapterBase<T>._loraLayer

LoRAAdapterBase<T>._freezeBaseLayer

LoRAAdapterBase<T>.BaseLayer

LoRAAdapterBase<T>.LoRALayer

LoRAAdapterBase<T>.IsBaseLayerFrozen

LoRAAdapterBase<T>.Rank

LoRAAdapterBase<T>.Alpha

LoRAAdapterBase<T>.ParameterCount

LoRAAdapterBase<T>.SupportsTraining

LoRAAdapterBase<T>.CreateLoRALayer(int, double)

LoRAAdapterBase<T>.GetParameters()

LoRAAdapterBase<T>.SetParameters(Vector<T>)

LoRAAdapterBase<T>.CreateMergedLayerWithClone(Vector<T>)

LoRAAdapterBase<T>.MergeToDenseOrFullyConnected()

LoRAAdapterBase<T>.UpdateParametersFromLayers()

LoRAAdapterBase<T>.ResetState()

LoRAAdapterBase<T>.SupportsJitCompilation

LoRAAdapterBase<T>.ExportComputationGraph(List<ComputationNode<T>>)

LayerBase<T>.Engine

LayerBase<T>.ScalarActivation

LayerBase<T>.VectorActivation

LayerBase<T>.UsingVectorActivation

LayerBase<T>.NumOps

LayerBase<T>.Random

LayerBase<T>.Parameters

LayerBase<T>.ParameterGradients

LayerBase<T>.InputShape

LayerBase<T>.InputShapes

LayerBase<T>.UpdateInputShape(int[])

LayerBase<T>.OutputShape

LayerBase<T>.IsTrainingMode

LayerBase<T>.InitializationStrategy

LayerBase<T>.IsInitialized

LayerBase<T>.InitializationLock

LayerBase<T>.EnsureInitialized()

LayerBase<T>.UseAutodiff

LayerBase<T>.SetTrainingMode(bool)

LayerBase<T>.GetParameterGradients()

LayerBase<T>.ClearGradients()

LayerBase<T>.GetInputShape()

LayerBase<T>.GetInputShapes()

LayerBase<T>.GetOutputShape()

LayerBase<T>.GetWeights()

LayerBase<T>.GetBiases()

LayerBase<T>.MapActivationToFused()

LayerBase<T>.SupportsGpuExecution

LayerBase<T>.SupportsGpuTraining

LayerBase<T>.CanExecuteOnGpu

LayerBase<T>.CanTrainOnGpu

LayerBase<T>.ForwardGpu(params IGpuTensor<T>[])

LayerBase<T>.BackwardGpu(IGpuTensor<T>)

LayerBase<T>.UpdateParametersGpu(IGpuOptimizerConfig)

LayerBase<T>.UploadWeightsToGpu()

LayerBase<T>.DownloadWeightsFromGpu()

LayerBase<T>.ZeroGradientsGpu()

LayerBase<T>.GetActivationTypes()

LayerBase<T>.Forward(params Tensor<T>[])

LayerBase<T>.ApplyActivation(Tensor<T>)

LayerBase<T>.ApplyActivation(Vector<T>)

LayerBase<T>.ActivateTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ActivateTensor(IVectorActivationFunction<T>, Tensor<T>)

LayerBase<T>.CalculateInputShape(int, int, int)

LayerBase<T>.CalculateOutputShape(int, int, int)

LayerBase<T>.Clone()

LayerBase<T>.DerivativeTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ApplyActivationDerivative(T, T)

LayerBase<T>.ApplyActivationDerivative(Tensor<T>, Tensor<T>)

LayerBase<T>.ComputeActivationJacobian(Vector<T>)

LayerBase<T>.ApplyActivationDerivative(Vector<T>, Vector<T>)

LayerBase<T>.UpdateParameters(Vector<T>)

LayerBase<T>.Serialize(BinaryWriter)

LayerBase<T>.Deserialize(BinaryReader)

LayerBase<T>.GetDiagnostics()

LayerBase<T>.ApplyActivationToGraph(ComputationNode<T>)

LayerBase<T>.CanActivationBeJitted()

LayerBase<T>.RegisterTrainableParameter(Tensor<T>, PersistentTensorRole)

LayerBase<T>.InvalidateTrainableParameter(Tensor<T>)

LayerBase<T>.HasGpuActivation()

LayerBase<T>.ApplyActivationForwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.ApplyActivationBackwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.GetFusedActivationType()

LayerBase<T>.ApplyGpuActivation(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int, FusedActivationType)

LayerBase<T>.ApplyGpuActivationBackward(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int, FusedActivationType, float)

LayerBase<T>.Dispose()

LayerBase<T>.Dispose(bool)

LayerBase<T>.WeightParameterName

LayerBase<T>.BiasParameterName

LayerBase<T>.SetWeights(Tensor<T>)

LayerBase<T>.SetBiases(Tensor<T>)

LayerBase<T>.GetParameterNames()

LayerBase<T>.TryGetParameter(string, out Tensor<T>)

LayerBase<T>.SetParameter(string, Tensor<T>)

LayerBase<T>.GetParameterShape(string)

LayerBase<T>.NamedParameterCount

LayerBase<T>.ValidateWeights(IEnumerable<string>, Func<string, string>)

LayerBase<T>.LoadWeights(Dictionary<string, Tensor<T>>, Func<string, string>, bool)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

DyLoRA extends the standard LoRA approach by training multiple rank configurations simultaneously using a nested dropout technique. This allows a single trained adapter to be deployed at different rank levels without retraining, providing flexibility for different hardware constraints or performance requirements.

The key innovation is nested dropout: during training, for each forward pass, a random rank r is selected from the active ranks, and only the first r components of matrices A and B are used. This ensures that smaller ranks can function independently and don't rely on higher-rank components.

For Beginners: DyLoRA is like LoRA with a superpower - flexibility!

Standard LoRA problem:

You choose rank=8 and train
Later realize rank=4 would work fine (save memory/speed)
Or need rank=16 for better quality
Must retrain from scratch with the new rank

DyLoRA solution:

Train once with multiple ranks (e.g., [2, 4, 8, 16])
Deploy with ANY of those ranks without retraining
Switch between ranks at runtime based on device capabilities

How it works:

Train with MaxRank (e.g., 16) but randomly use smaller ranks during training
Nested dropout ensures each rank works independently
After training, pick deployment rank based on needs (2=fastest, 16=best quality)

Use cases:

Deploy same model to mobile (rank=2) and server (rank=16)
Dynamic quality scaling based on battery level
A/B testing different rank/quality trade-offs
Training once, deploying everywhere

Example: Train with ActiveRanks=[2,4,8], deploy with:

Rank=2 for mobile devices (98% parameter reduction, good quality)
Rank=4 for tablets (95% parameter reduction, better quality)
Rank=8 for desktops (90% parameter reduction, best quality)

Constructors

DyLoRAAdapter(ILayer<T>, int, int[], double, bool)

Initializes a new DyLoRA adapter with the specified parameters.

public DyLoRAAdapter(ILayer<T> baseLayer, int maxRank, int[] activeRanks, double alpha = -1, bool freezeBaseLayer = true)

Parameters

baseLayer ILayer<T>: The layer to adapt with DyLoRA.
maxRank int: The maximum rank of the LoRA decomposition.
activeRanks int[]: Array of ranks to train simultaneously (must be sorted ascending and all <= maxRank).
alpha double: The LoRA scaling factor (defaults to maxRank if negative).
freezeBaseLayer bool: Whether to freeze the base layer's parameters during training.

Remarks

For Beginners: This creates a DyLoRA adapter that can train and deploy with multiple ranks.

Parameters:

baseLayer: The layer you want to make flexible and efficient
maxRank: The maximum rank you might need (e.g., 16)
activeRanks: Which ranks to make available (e.g., [2, 4, 8, 16])
alpha: How strong the LoRA adaptation is (usually equals maxRank)
freezeBaseLayer: Whether to lock the original layer (usually true)

Example: new DyLoRAAdapter(denseLayer, maxRank: 16, activeRanks: new[] { 2, 4, 8, 16 }) This trains a single adapter that can deploy with ranks 2, 4, 8, or 16.

Exceptions

ArgumentNullException: Thrown when baseLayer or activeRanks is null.
ArgumentException: Thrown when activeRanks is invalid.

Properties

ActiveRanks

Gets the array of active ranks used during training.

public int[] ActiveRanks { get; }

Property Value

int[]

CurrentDeploymentRank

Gets or sets the current deployment rank used during inference.

public int CurrentDeploymentRank { get; set; }

Property Value

int

Exceptions

ArgumentException: Thrown when attempting to set a rank not in ActiveRanks.

IsTraining

Gets or sets whether the adapter is in training mode.

public bool IsTraining { get; set; }

Property Value

bool

Remarks

When in training mode, nested dropout is applied. In eval mode, the deployment rank is used.

MaxRank

Gets the maximum rank of the DyLoRA adapter.

public int MaxRank { get; }

Property Value

int

Methods

Backward(Tensor<T>)

Performs the backward pass with nested dropout training.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>: Gradient flowing back from the next layer.

Returns

Tensor<T>: Gradient to pass to the previous layer.

Remarks

During training, gradients are computed for all components, but the nested dropout ensures that only the active rank's components receive meaningful gradients. This trains all ranks simultaneously while ensuring each smaller rank can function independently.

For Beginners: This is where DyLoRA learning happens! During backpropagation:

Gradients flow back through whichever rank was used in the forward pass
Only those components get updated
Over many iterations, all ranks get trained
Smaller ranks learn to work without relying on larger rank components

This is why you can deploy with any trained rank - each one was trained independently!

Eval()

Sets the adapter to evaluation mode (uses fixed deployment rank).

public void Eval()

Remarks

For Beginners: Call this before inference/prediction to use a consistent rank. This ensures predictable behavior in production.

Forward(Tensor<T>)

Performs the forward pass with dynamic rank selection.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>: Input tensor.

Returns

Tensor<T>: Sum of base layer output and DyLoRA output.

Remarks

During training, a random rank is selected from ActiveRanks for nested dropout. During inference, the CurrentDeploymentRank is used consistently.

For Beginners: This processes input through both the base layer and DyLoRA:

Training mode:

Randomly picks a rank from ActiveRanks each forward pass
Uses only that many components of A and B matrices
This trains all ranks to work independently

Inference mode:

Always uses CurrentDeploymentRank
Consistent behavior for production
Can change rank without retraining

MergeToOriginalLayer()

Merges the DyLoRA adaptation into the base layer using the current deployment rank.

public override ILayer<T> MergeToOriginalLayer()

Returns

ILayer<T>: A new layer with DyLoRA weights merged into the base layer's weights.

Remarks

This method merges only the components up to CurrentDeploymentRank, creating a layer that's equivalent to the DyLoRA adapter at that specific rank.

For Beginners: This "bakes in" your DyLoRA adaptation at the current rank.

After training:

Set the deployment rank you want: adapter.SetDeploymentRank(8)
Merge to create a standard layer: mergedLayer = adapter.MergeToOriginalLayer()
Use the merged layer for faster inference

Benefits of merging:

Faster inference (no separate LoRA computation)
Simpler deployment (single layer instead of adapter + base)
Compatible with systems that don't support LoRA

Note: You can merge at different ranks to create multiple versions:

Mobile version: SetDeploymentRank(2), then merge
Desktop version: SetDeploymentRank(16), then merge

Exceptions

InvalidOperationException: Thrown when the base layer type is not DenseLayer or FullyConnectedLayer.

SetDeploymentRank(int)

Sets the deployment rank for inference.

public void SetDeploymentRank(int rank)

Parameters

rank int: The rank to use (must be in ActiveRanks).

Remarks

This allows switching between different ranks at runtime without retraining. The rank must be one of the ActiveRanks that were trained.

For Beginners: This changes the quality/speed trade-off of your model. Higher rank = better quality but slower. Lower rank = faster but slightly lower quality.

Example usage:

Battery low? adapter.SetDeploymentRank(2) for speed
Plugged in? adapter.SetDeploymentRank(16) for quality
On mobile? adapter.SetDeploymentRank(4) for balance

Exceptions

ArgumentException: Thrown when rank is not in ActiveRanks.

Train()

Sets the adapter to training mode (enables nested dropout).

public void Train()

Remarks

For Beginners: Call this before training to enable random rank selection. This is what makes DyLoRA train all ranks simultaneously.

TrainWithNestedDropout(Tensor<T>[], Tensor<T>[], int, T, Func<Tensor<T>, Tensor<T>, T>)

Trains the adapter with nested dropout across all active ranks.

public void TrainWithNestedDropout(Tensor<T>[] inputs, Tensor<T>[] targets, int epochs, T learningRate, Func<Tensor<T>, Tensor<T>, T> lossFunction)

Parameters

inputs Tensor<T>[]: Training input tensors.
targets Tensor<T>[]: Training target tensors.
epochs int: Number of training epochs.
learningRate T: Learning rate for parameter updates.
lossFunction Func<Tensor<T>, Tensor<T>, T>: Loss function to minimize.

Remarks

This training method ensures that all active ranks are trained by randomly selecting a rank for each forward pass. This implements the nested dropout technique that makes DyLoRA flexible for different deployment ranks.

For Beginners: This is a helper method for training your DyLoRA adapter.

During training:

Each forward pass randomly uses a different rank
This trains all ranks simultaneously
After training, you can deploy with any of the active ranks

Think of it like training a team where each member can work alone or together. The random selection ensures everyone learns to be independent.

UpdateParameters(T)

Updates parameters for the base layer and the LoRA layer using cached gradients.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T: The learning rate for parameter updates.

Table of Contents

Class DyLoRAAdapter<T>

Type Parameters

Remarks

Constructors

DyLoRAAdapter(ILayer<T>, int, int[], double, bool)

Parameters

Remarks

Exceptions

Properties

ActiveRanks

Property Value

CurrentDeploymentRank

Property Value

Exceptions

IsTraining

Property Value

Remarks

MaxRank

Property Value

Methods

Backward(Tensor<T>)

Parameters

Returns

Remarks

Eval()

Remarks

Forward(Tensor<T>)

Parameters

Returns

Remarks

MergeToOriginalLayer()

Returns

Remarks

Exceptions

SetDeploymentRank(int)

Parameters

Remarks

Exceptions

Train()

Remarks

TrainWithNestedDropout(Tensor<T>[], Tensor<T>[], int, T, Func<Tensor<T>, Tensor<T>, T>)

Parameters

Remarks

UpdateParameters(T)

Parameters