Table of Contents

Class GLoRAAdapter<T>

Namespace
AiDotNet.LoRA.Adapters
Assembly
AiDotNet.dll

Generalized LoRA (GLoRA) implementation that adapts both weights AND activations.

public class GLoRAAdapter<T> : LoRAAdapterBase<T>, IDisposable, ILoRAAdapter<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>

Type Parameters

T

The numeric type used for calculations, typically float or double.

Inheritance
GLoRAAdapter<T>
Implements
Inherited Members

Remarks

GLoRA extends standard LoRA by adding adaptation to both the layer's weights and its activations. This provides more flexibility for multi-task learning scenarios where different tasks may need different feature representations at each layer.

The forward pass computes: - adapted_weights = base_weights + B_w * A_w (weight adaptation) - base_output = input * adapted_weights - adapted_output = base_output + B_a * A_a * input (activation adaptation)

For Beginners: While standard LoRA only adapts what the layer learns (its weights), GLoRA also adapts what the layer produces (its activations). Think of it like this:

  • Standard LoRA: Adjusts the "recipe" (weights) but produces the same type of output
  • GLoRA: Adjusts both the "recipe" (weights) AND transforms the output for different uses

This is especially useful when:

  1. Different tasks need different feature representations
  2. You're doing multi-task learning (e.g., the same base features used differently)
  3. You need more flexibility than weight-only adaptation provides

Key differences from StandardLoRA:

  • WeightAdaptation: Standard LoRA component that modifies layer weights
  • ActivationAdaptation: Additional LoRA component that modifies layer outputs
  • ActivationRank: Can be different from weight rank for fine-tuned control

Trade-offs:

  • More flexible: Can adapt representations for different tasks
  • Better for multi-task: Each task can use features differently
  • More parameters: Two LoRA components instead of one
  • Slightly slower: Two adaptation computations per forward pass

Example: For a 1000x1000 layer with weight_rank=8 and activation_rank=4:

  • Weight adaptation: 16,000 parameters (same as standard LoRA)
  • Activation adaptation: 8,000 additional parameters
  • Total: 24,000 parameters (still 97.6% reduction from 1M!)

Constructors

GLoRAAdapter(ILayer<T>, int, int, double, double, bool)

Initializes a new GLoRA adapter with the specified parameters.

public GLoRAAdapter(ILayer<T> baseLayer, int weightRank, int activationRank = -1, double weightAlpha = -1, double activationAlpha = -1, bool freezeBaseLayer = true)

Parameters

baseLayer ILayer<T>

The layer to adapt with GLoRA.

weightRank int

The rank of the weight adaptation decomposition.

activationRank int

The rank of the activation adaptation decomposition (defaults to weightRank if negative).

weightAlpha double

The scaling factor for weight adaptation (defaults to weightRank if negative).

activationAlpha double

The scaling factor for activation adaptation (defaults to activationRank if negative).

freezeBaseLayer bool

Whether to freeze the base layer's parameters during training.

Remarks

For Beginners: This creates a GLoRA adapter that adds TWO types of adaptations:

Parameters:

  • baseLayer: The layer you want to make more flexible
  • weightRank: Compression for weight adaptation (lower = fewer parameters for weights)
  • activationRank: Compression for activation adaptation (can be different!)
  • weightAlpha: How strong the weight adaptation is
  • activationAlpha: How strong the activation adaptation is
  • freezeBaseLayer: Whether to lock the original layer's weights (usually true)

Having separate ranks and alphas for weights vs. activations gives you fine-grained control:

  • Higher weight rank = more flexibility in what the layer learns
  • Higher activation rank = more flexibility in how outputs are transformed

Common patterns:

  • Equal ranks: Balanced adaptation (weightRank=8, activationRank=8)
  • Lower activation rank: More emphasis on weight learning (weightRank=16, activationRank=4)
  • Higher activation rank: More emphasis on output transformation (weightRank=4, activationRank=16)

Exceptions

ArgumentNullException

Thrown when baseLayer is null.

Properties

ActivationAdaptation

Gets the activation adaptation LoRA layer.

public LoRALayer<T> ActivationAdaptation { get; }

Property Value

LoRALayer<T>

Remarks

This adapts the layer's outputs/activations using a second LoRA component (B_a * A_a).

ActivationRank

Gets the rank of the activation adaptation.

public int ActivationRank { get; }

Property Value

int

Remarks

This can be different from the weight adaptation rank, allowing for independent control over the complexity of weight vs. activation adaptations.

ParameterCount

Gets the total number of trainable parameters (both weight and activation adaptations).

public override int ParameterCount { get; }

Property Value

int

Remarks

If the base layer is frozen, this returns the sum of weight and activation LoRA parameters. Otherwise, it includes base layer parameters as well.

WeightAdaptation

Gets the weight adaptation LoRA layer.

public LoRALayer<T> WeightAdaptation { get; }

Property Value

LoRALayer<T>

Remarks

This adapts the layer's weights using standard LoRA (B_w * A_w).

Methods

Backward(Tensor<T>)

Performs the backward pass through both adaptations and the base layer.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>

Gradient flowing back from the next layer.

Returns

Tensor<T>

Gradient to pass to the previous layer.

Remarks

The backward pass propagates gradients through all three components: - Weight adaptation LoRA (always) - Activation adaptation LoRA (always) - Base layer (only if not frozen)

For Beginners: During learning, this figures out how to improve all adaptations: - Updates weight adaptation (how should weights change?) - Updates activation adaptation (how should outputs be transformed?) - Updates base layer if not frozen (how should original weights change?)

The gradients from all three paths are combined to tell earlier layers how to improve. This allows the model to learn complex adaptations that work together.

Forward(Tensor<T>)

Performs the forward pass through both base layer and both LoRA adaptations.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>

Input tensor.

Returns

Tensor<T>

Output with both weight and activation adaptations applied.

Remarks

The forward pass computes: 1. base_output = base_layer(input) (original layer behavior) 2. weight_adaptation = weight_lora(input) (standard LoRA weight adaptation) 3. activation_adaptation = activation_lora(input) (additional activation transformation) 4. output = base_output + weight_adaptation + activation_adaptation

For Beginners: This runs the input through three parallel paths: 1. The base layer (original behavior) 2. Weight LoRA (learns how weights should change) 3. Activation LoRA (learns how outputs should be transformed)

All three outputs are added together to get the final result. This allows the model to:

  • Keep the original layer's learned features (base layer)
  • Refine what it learns (weight adaptation)
  • Transform how it represents things (activation adaptation)

GetParameters()

Gets the current parameters as a vector.

public override Vector<T> GetParameters()

Returns

Vector<T>

Vector containing parameters from both adaptations (and base layer if not frozen).

MergeToOriginalLayer()

Merges both LoRA adaptations into the base layer and returns the merged layer.

public override ILayer<T> MergeToOriginalLayer()

Returns

ILayer<T>

A new layer with both weight and activation adaptations merged into the base layer.

Remarks

This method merges both the weight adaptation and activation adaptation into the base layer's weights. Since activation adaptation operates on outputs, it's merged by adding it to the weight matrix as well.

For Beginners: This "bakes in" both GLoRA adaptations to create a regular layer. After training with GLoRA, you can merge both adaptations into the original weights for: - Faster inference (no need to compute two LoRA layers separately) - Simpler deployment (single layer instead of three components) - Compatibility with systems that don't support LoRA

The merging process:

  1. Computes weight adaptation matrix from weight LoRA (B_w * A_w)
  2. Computes activation adaptation matrix from activation LoRA (B_a * A_a)
  3. Adds both to the base layer's weights
  4. Copies biases unchanged
  5. Creates a new layer with all adaptations merged

Note: Merging currently only supports DenseLayer and FullyConnectedLayer. For other layer types, you'll need to use the adapter in production.

Exceptions

InvalidOperationException

Thrown when the base layer type is not DenseLayer or FullyConnectedLayer.

ResetState()

Resets the internal state of the base layer and both LoRA adaptations.

public override void ResetState()

Remarks

For Beginners: This clears the memory of all three components (base layer, weight adaptation, and activation adaptation). It's useful when starting to process a completely new, unrelated batch of data.

SetParameters(Vector<T>)

Sets the layer parameters from a vector.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

Vector containing parameters for both adaptations (and base layer if not frozen).

UpdateParameters(T)

Updates parameters using the specified learning rate.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T

The learning rate for parameter updates.

Remarks

Updates both weight and activation adaptation parameters. Base layer parameters are only updated if not frozen.

UpdateParametersFromLayers()

Updates the parameter vector from the current layer states.

protected override void UpdateParametersFromLayers()