Class GLoRAAdapter<T>
Generalized LoRA (GLoRA) implementation that adapts both weights AND activations.
public class GLoRAAdapter<T> : LoRAAdapterBase<T>, IDisposable, ILoRAAdapter<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
LayerBase<T>GLoRAAdapter<T>
- Implements
-
ILoRAAdapter<T>ILayer<T>
- Inherited Members
Remarks
GLoRA extends standard LoRA by adding adaptation to both the layer's weights and its activations. This provides more flexibility for multi-task learning scenarios where different tasks may need different feature representations at each layer.
The forward pass computes: - adapted_weights = base_weights + B_w * A_w (weight adaptation) - base_output = input * adapted_weights - adapted_output = base_output + B_a * A_a * input (activation adaptation)
For Beginners: While standard LoRA only adapts what the layer learns (its weights), GLoRA also adapts what the layer produces (its activations). Think of it like this:
- Standard LoRA: Adjusts the "recipe" (weights) but produces the same type of output
- GLoRA: Adjusts both the "recipe" (weights) AND transforms the output for different uses
This is especially useful when:
- Different tasks need different feature representations
- You're doing multi-task learning (e.g., the same base features used differently)
- You need more flexibility than weight-only adaptation provides
Key differences from StandardLoRA:
- WeightAdaptation: Standard LoRA component that modifies layer weights
- ActivationAdaptation: Additional LoRA component that modifies layer outputs
- ActivationRank: Can be different from weight rank for fine-tuned control
Trade-offs:
- More flexible: Can adapt representations for different tasks
- Better for multi-task: Each task can use features differently
- More parameters: Two LoRA components instead of one
- Slightly slower: Two adaptation computations per forward pass
Example: For a 1000x1000 layer with weight_rank=8 and activation_rank=4:
- Weight adaptation: 16,000 parameters (same as standard LoRA)
- Activation adaptation: 8,000 additional parameters
- Total: 24,000 parameters (still 97.6% reduction from 1M!)
Constructors
GLoRAAdapter(ILayer<T>, int, int, double, double, bool)
Initializes a new GLoRA adapter with the specified parameters.
public GLoRAAdapter(ILayer<T> baseLayer, int weightRank, int activationRank = -1, double weightAlpha = -1, double activationAlpha = -1, bool freezeBaseLayer = true)
Parameters
baseLayerILayer<T>The layer to adapt with GLoRA.
weightRankintThe rank of the weight adaptation decomposition.
activationRankintThe rank of the activation adaptation decomposition (defaults to weightRank if negative).
weightAlphadoubleThe scaling factor for weight adaptation (defaults to weightRank if negative).
activationAlphadoubleThe scaling factor for activation adaptation (defaults to activationRank if negative).
freezeBaseLayerboolWhether to freeze the base layer's parameters during training.
Remarks
For Beginners: This creates a GLoRA adapter that adds TWO types of adaptations:
Parameters:
- baseLayer: The layer you want to make more flexible
- weightRank: Compression for weight adaptation (lower = fewer parameters for weights)
- activationRank: Compression for activation adaptation (can be different!)
- weightAlpha: How strong the weight adaptation is
- activationAlpha: How strong the activation adaptation is
- freezeBaseLayer: Whether to lock the original layer's weights (usually true)
Having separate ranks and alphas for weights vs. activations gives you fine-grained control:
- Higher weight rank = more flexibility in what the layer learns
- Higher activation rank = more flexibility in how outputs are transformed
Common patterns:
- Equal ranks: Balanced adaptation (weightRank=8, activationRank=8)
- Lower activation rank: More emphasis on weight learning (weightRank=16, activationRank=4)
- Higher activation rank: More emphasis on output transformation (weightRank=4, activationRank=16)
Exceptions
- ArgumentNullException
Thrown when baseLayer is null.
Properties
ActivationAdaptation
Gets the activation adaptation LoRA layer.
public LoRALayer<T> ActivationAdaptation { get; }
Property Value
- LoRALayer<T>
Remarks
This adapts the layer's outputs/activations using a second LoRA component (B_a * A_a).
ActivationRank
Gets the rank of the activation adaptation.
public int ActivationRank { get; }
Property Value
Remarks
This can be different from the weight adaptation rank, allowing for independent control over the complexity of weight vs. activation adaptations.
ParameterCount
Gets the total number of trainable parameters (both weight and activation adaptations).
public override int ParameterCount { get; }
Property Value
Remarks
If the base layer is frozen, this returns the sum of weight and activation LoRA parameters. Otherwise, it includes base layer parameters as well.
WeightAdaptation
Gets the weight adaptation LoRA layer.
public LoRALayer<T> WeightAdaptation { get; }
Property Value
- LoRALayer<T>
Remarks
This adapts the layer's weights using standard LoRA (B_w * A_w).
Methods
Backward(Tensor<T>)
Performs the backward pass through both adaptations and the base layer.
public override Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>Gradient flowing back from the next layer.
Returns
- Tensor<T>
Gradient to pass to the previous layer.
Remarks
The backward pass propagates gradients through all three components: - Weight adaptation LoRA (always) - Activation adaptation LoRA (always) - Base layer (only if not frozen)
For Beginners: During learning, this figures out how to improve all adaptations: - Updates weight adaptation (how should weights change?) - Updates activation adaptation (how should outputs be transformed?) - Updates base layer if not frozen (how should original weights change?)
The gradients from all three paths are combined to tell earlier layers how to improve. This allows the model to learn complex adaptations that work together.
Forward(Tensor<T>)
Performs the forward pass through both base layer and both LoRA adaptations.
public override Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>Input tensor.
Returns
- Tensor<T>
Output with both weight and activation adaptations applied.
Remarks
The forward pass computes: 1. base_output = base_layer(input) (original layer behavior) 2. weight_adaptation = weight_lora(input) (standard LoRA weight adaptation) 3. activation_adaptation = activation_lora(input) (additional activation transformation) 4. output = base_output + weight_adaptation + activation_adaptation
For Beginners: This runs the input through three parallel paths: 1. The base layer (original behavior) 2. Weight LoRA (learns how weights should change) 3. Activation LoRA (learns how outputs should be transformed)
All three outputs are added together to get the final result. This allows the model to:
- Keep the original layer's learned features (base layer)
- Refine what it learns (weight adaptation)
- Transform how it represents things (activation adaptation)
GetParameters()
Gets the current parameters as a vector.
public override Vector<T> GetParameters()
Returns
- Vector<T>
Vector containing parameters from both adaptations (and base layer if not frozen).
MergeToOriginalLayer()
Merges both LoRA adaptations into the base layer and returns the merged layer.
public override ILayer<T> MergeToOriginalLayer()
Returns
- ILayer<T>
A new layer with both weight and activation adaptations merged into the base layer.
Remarks
This method merges both the weight adaptation and activation adaptation into the base layer's weights. Since activation adaptation operates on outputs, it's merged by adding it to the weight matrix as well.
For Beginners: This "bakes in" both GLoRA adaptations to create a regular layer. After training with GLoRA, you can merge both adaptations into the original weights for: - Faster inference (no need to compute two LoRA layers separately) - Simpler deployment (single layer instead of three components) - Compatibility with systems that don't support LoRA
The merging process:
- Computes weight adaptation matrix from weight LoRA (B_w * A_w)
- Computes activation adaptation matrix from activation LoRA (B_a * A_a)
- Adds both to the base layer's weights
- Copies biases unchanged
- Creates a new layer with all adaptations merged
Note: Merging currently only supports DenseLayer and FullyConnectedLayer. For other layer types, you'll need to use the adapter in production.
Exceptions
- InvalidOperationException
Thrown when the base layer type is not DenseLayer or FullyConnectedLayer.
ResetState()
Resets the internal state of the base layer and both LoRA adaptations.
public override void ResetState()
Remarks
For Beginners: This clears the memory of all three components (base layer, weight adaptation, and activation adaptation). It's useful when starting to process a completely new, unrelated batch of data.
SetParameters(Vector<T>)
Sets the layer parameters from a vector.
public override void SetParameters(Vector<T> parameters)
Parameters
parametersVector<T>Vector containing parameters for both adaptations (and base layer if not frozen).
UpdateParameters(T)
Updates parameters using the specified learning rate.
public override void UpdateParameters(T learningRate)
Parameters
learningRateTThe learning rate for parameter updates.
Remarks
Updates both weight and activation adaptation parameters. Base layer parameters are only updated if not frozen.
UpdateParametersFromLayers()
Updates the parameter vector from the current layer states.
protected override void UpdateParametersFromLayers()