Class DyLoRAAdapter<T>
DyLoRA (Dynamic LoRA) adapter that trains with multiple ranks simultaneously.
public class DyLoRAAdapter<T> : LoRAAdapterBase<T>, IDisposable, ILoRAAdapter<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
LayerBase<T>DyLoRAAdapter<T>
- Implements
-
ILoRAAdapter<T>ILayer<T>
- Inherited Members
Remarks
DyLoRA extends the standard LoRA approach by training multiple rank configurations simultaneously using a nested dropout technique. This allows a single trained adapter to be deployed at different rank levels without retraining, providing flexibility for different hardware constraints or performance requirements.
The key innovation is nested dropout: during training, for each forward pass, a random rank r is selected from the active ranks, and only the first r components of matrices A and B are used. This ensures that smaller ranks can function independently and don't rely on higher-rank components.
For Beginners: DyLoRA is like LoRA with a superpower - flexibility!
Standard LoRA problem:
- You choose rank=8 and train
- Later realize rank=4 would work fine (save memory/speed)
- Or need rank=16 for better quality
- Must retrain from scratch with the new rank
DyLoRA solution:
- Train once with multiple ranks (e.g., [2, 4, 8, 16])
- Deploy with ANY of those ranks without retraining
- Switch between ranks at runtime based on device capabilities
How it works:
- Train with MaxRank (e.g., 16) but randomly use smaller ranks during training
- Nested dropout ensures each rank works independently
- After training, pick deployment rank based on needs (2=fastest, 16=best quality)
Use cases:
- Deploy same model to mobile (rank=2) and server (rank=16)
- Dynamic quality scaling based on battery level
- A/B testing different rank/quality trade-offs
- Training once, deploying everywhere
Example: Train with ActiveRanks=[2,4,8], deploy with:
- Rank=2 for mobile devices (98% parameter reduction, good quality)
- Rank=4 for tablets (95% parameter reduction, better quality)
- Rank=8 for desktops (90% parameter reduction, best quality)
Constructors
DyLoRAAdapter(ILayer<T>, int, int[], double, bool)
Initializes a new DyLoRA adapter with the specified parameters.
public DyLoRAAdapter(ILayer<T> baseLayer, int maxRank, int[] activeRanks, double alpha = -1, bool freezeBaseLayer = true)
Parameters
baseLayerILayer<T>The layer to adapt with DyLoRA.
maxRankintThe maximum rank of the LoRA decomposition.
activeRanksint[]Array of ranks to train simultaneously (must be sorted ascending and all <= maxRank).
alphadoubleThe LoRA scaling factor (defaults to maxRank if negative).
freezeBaseLayerboolWhether to freeze the base layer's parameters during training.
Remarks
For Beginners: This creates a DyLoRA adapter that can train and deploy with multiple ranks.
Parameters:
- baseLayer: The layer you want to make flexible and efficient
- maxRank: The maximum rank you might need (e.g., 16)
- activeRanks: Which ranks to make available (e.g., [2, 4, 8, 16])
- alpha: How strong the LoRA adaptation is (usually equals maxRank)
- freezeBaseLayer: Whether to lock the original layer (usually true)
Example: new DyLoRAAdapter(denseLayer, maxRank: 16, activeRanks: new[] { 2, 4, 8, 16 }) This trains a single adapter that can deploy with ranks 2, 4, 8, or 16.
Exceptions
- ArgumentNullException
Thrown when baseLayer or activeRanks is null.
- ArgumentException
Thrown when activeRanks is invalid.
Properties
ActiveRanks
Gets the array of active ranks used during training.
public int[] ActiveRanks { get; }
Property Value
- int[]
CurrentDeploymentRank
Gets or sets the current deployment rank used during inference.
public int CurrentDeploymentRank { get; set; }
Property Value
Exceptions
- ArgumentException
Thrown when attempting to set a rank not in ActiveRanks.
IsTraining
Gets or sets whether the adapter is in training mode.
public bool IsTraining { get; set; }
Property Value
Remarks
When in training mode, nested dropout is applied. In eval mode, the deployment rank is used.
MaxRank
Gets the maximum rank of the DyLoRA adapter.
public int MaxRank { get; }
Property Value
Methods
Backward(Tensor<T>)
Performs the backward pass with nested dropout training.
public override Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>Gradient flowing back from the next layer.
Returns
- Tensor<T>
Gradient to pass to the previous layer.
Remarks
During training, gradients are computed for all components, but the nested dropout ensures that only the active rank's components receive meaningful gradients. This trains all ranks simultaneously while ensuring each smaller rank can function independently.
For Beginners: This is where DyLoRA learning happens! During backpropagation:
- Gradients flow back through whichever rank was used in the forward pass
- Only those components get updated
- Over many iterations, all ranks get trained
- Smaller ranks learn to work without relying on larger rank components
This is why you can deploy with any trained rank - each one was trained independently!
Eval()
Sets the adapter to evaluation mode (uses fixed deployment rank).
public void Eval()
Remarks
For Beginners: Call this before inference/prediction to use a consistent rank. This ensures predictable behavior in production.
Forward(Tensor<T>)
Performs the forward pass with dynamic rank selection.
public override Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>Input tensor.
Returns
- Tensor<T>
Sum of base layer output and DyLoRA output.
Remarks
During training, a random rank is selected from ActiveRanks for nested dropout. During inference, the CurrentDeploymentRank is used consistently.
For Beginners: This processes input through both the base layer and DyLoRA:
Training mode:
- Randomly picks a rank from ActiveRanks each forward pass
- Uses only that many components of A and B matrices
- This trains all ranks to work independently
Inference mode:
- Always uses CurrentDeploymentRank
- Consistent behavior for production
- Can change rank without retraining
MergeToOriginalLayer()
Merges the DyLoRA adaptation into the base layer using the current deployment rank.
public override ILayer<T> MergeToOriginalLayer()
Returns
- ILayer<T>
A new layer with DyLoRA weights merged into the base layer's weights.
Remarks
This method merges only the components up to CurrentDeploymentRank, creating a layer that's equivalent to the DyLoRA adapter at that specific rank.
For Beginners: This "bakes in" your DyLoRA adaptation at the current rank.
After training:
- Set the deployment rank you want: adapter.SetDeploymentRank(8)
- Merge to create a standard layer: mergedLayer = adapter.MergeToOriginalLayer()
- Use the merged layer for faster inference
Benefits of merging:
- Faster inference (no separate LoRA computation)
- Simpler deployment (single layer instead of adapter + base)
- Compatible with systems that don't support LoRA
Note: You can merge at different ranks to create multiple versions:
- Mobile version: SetDeploymentRank(2), then merge
- Desktop version: SetDeploymentRank(16), then merge
Exceptions
- InvalidOperationException
Thrown when the base layer type is not DenseLayer or FullyConnectedLayer.
SetDeploymentRank(int)
Sets the deployment rank for inference.
public void SetDeploymentRank(int rank)
Parameters
rankintThe rank to use (must be in ActiveRanks).
Remarks
This allows switching between different ranks at runtime without retraining. The rank must be one of the ActiveRanks that were trained.
For Beginners: This changes the quality/speed trade-off of your model. Higher rank = better quality but slower. Lower rank = faster but slightly lower quality.
Example usage:
- Battery low? adapter.SetDeploymentRank(2) for speed
- Plugged in? adapter.SetDeploymentRank(16) for quality
- On mobile? adapter.SetDeploymentRank(4) for balance
Exceptions
- ArgumentException
Thrown when rank is not in ActiveRanks.
Train()
Sets the adapter to training mode (enables nested dropout).
public void Train()
Remarks
For Beginners: Call this before training to enable random rank selection. This is what makes DyLoRA train all ranks simultaneously.
TrainWithNestedDropout(Tensor<T>[], Tensor<T>[], int, T, Func<Tensor<T>, Tensor<T>, T>)
Trains the adapter with nested dropout across all active ranks.
public void TrainWithNestedDropout(Tensor<T>[] inputs, Tensor<T>[] targets, int epochs, T learningRate, Func<Tensor<T>, Tensor<T>, T> lossFunction)
Parameters
inputsTensor<T>[]Training input tensors.
targetsTensor<T>[]Training target tensors.
epochsintNumber of training epochs.
learningRateTLearning rate for parameter updates.
lossFunctionFunc<Tensor<T>, Tensor<T>, T>Loss function to minimize.
Remarks
This training method ensures that all active ranks are trained by randomly selecting a rank for each forward pass. This implements the nested dropout technique that makes DyLoRA flexible for different deployment ranks.
For Beginners: This is a helper method for training your DyLoRA adapter.
During training:
- Each forward pass randomly uses a different rank
- This trains all ranks simultaneously
- After training, you can deploy with any of the active ranks
Think of it like training a team where each member can work alone or together. The random selection ensures everyone learns to be independent.
UpdateParameters(T)
Updates parameters for the base layer and the LoRA layer using cached gradients.
public override void UpdateParameters(T learningRate)
Parameters
learningRateTThe learning rate for parameter updates.