Class LoHaAdapter<T>
LoHa (Low-Rank Hadamard Product Adaptation) adapter for parameter-efficient fine-tuning.
public class LoHaAdapter<T> : LoRAAdapterBase<T>, IDisposable, ILoRAAdapter<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
LayerBase<T>LoHaAdapter<T>
- Implements
-
ILoRAAdapter<T>ILayer<T>
- Inherited Members
Remarks
LoHa uses element-wise Hadamard products (⊙) instead of matrix multiplication for adaptation. Instead of computing ΔW = B * A like standard LoRA, LoHa computes: ΔW = sum over rank of (A[i] ⊙ B[i])
This formulation can capture element-wise patterns that matrix multiplication may miss, making it particularly effective for:
- Convolutional layers (local spatial patterns)
- Element-wise transformations
- Fine-grained weight adjustments
Mathematical Formulation:
Standard LoRA: ΔW = B * A where B is rank×output, A is input×rank LoHa: ΔW = Σ(A[i] ⊙ B[i]) where A[i] and B[i] are both input×output
The Hadamard product (⊙) performs element-wise multiplication, allowing each element of the weight matrix to be adjusted independently across the rank dimensions.
For Beginners: LoHa is a variant of LoRA that uses element-wise multiplication instead of matrix multiplication. Think of it this way:
- Standard LoRA: Learns "row and column patterns" that combine via matrix multiply
- LoHa: Learns "pixel-by-pixel patterns" that combine via element-wise multiply
LoHa is especially good when:
- You need to capture local, element-wise patterns (like in images)
- The weight matrix has spatial structure (like convolutional filters)
- You want each weight to be adjusted somewhat independently
Trade-offs compared to LoRA:
- More parameters: Both A and B must be full-sized (input×output) per rank dimension
- Different expressiveness: Better for element-wise patterns, different from matrix patterns
- Better for CNNs: The element-wise nature matches convolutional structure better
Example: A 100×100 weight matrix with rank=8
- Standard LoRA: 8×100 + 100×8 = 1,600 parameters
- LoHa: 2 × 8 × 100 × 100 = 160,000 parameters (each rank has 2 full-sized matrices)
LoHa uses MORE parameters than LoRA but models element-wise weight interactions via Hadamard products.
Constructors
LoHaAdapter(ILayer<T>, int, double, bool)
Initializes a new LoHa adapter wrapping an existing layer.
public LoHaAdapter(ILayer<T> baseLayer, int rank, double alpha = -1, bool freezeBaseLayer = true)
Parameters
baseLayerILayer<T>The layer to adapt with LoHa.
rankintThe rank of the low-rank decomposition.
alphadoubleThe LoHa scaling factor (defaults to rank if negative).
freezeBaseLayerboolWhether to freeze the base layer's parameters during training.
Remarks
For Beginners: This creates a LoHa adapter for any layer with 1D input/output.
Parameters:
- baseLayer: The layer you want to make more efficient to fine-tune
- rank: How many element-wise patterns to learn (more = more flexibility, more parameters)
- alpha: How strong the LoHa adaptation is (typically same as rank)
- freezeBaseLayer: Whether to lock the original layer's weights (usually true for efficiency)
The adapter creates 2×rank full-sized matrices (A and B for each rank dimension), which are combined using element-wise Hadamard products during forward/backward passes.
Exceptions
- ArgumentNullException
Thrown when baseLayer is null.
- ArgumentException
Thrown when the base layer doesn't have 1D input/output shapes.
Properties
ParameterCount
Gets the total number of trainable parameters.
public override int ParameterCount { get; }
Property Value
Remarks
LoHa has 2 * rank * inputSize * outputSize parameters (A and B matrices for each rank). This is more than standard LoRA but still far less than full fine-tuning.
Methods
Backward(Tensor<T>)
Performs the backward pass through both layers, computing gradients for LoHa matrices.
public override Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>Gradient flowing back from the next layer.
Returns
- Tensor<T>
Gradient to pass to the previous layer.
Remarks
The backward pass computes gradients using the chain rule for Hadamard products:
dL/dA[r] = input^T * (dL/doutput ⊙ B[r]) * scaling dL/dB[r] = (input * A[r]) ⊙ dL/doutput * scaling dL/dinput = base_gradient + sum over rank of (dL/doutput ⊙ B[r]) * A[r]^T * scaling
The Hadamard product gradient rule: d/dx (f ⊙ g) = df ⊙ g + f ⊙ dg
For Beginners: This is the learning phase for LoHa. It computes:
- How to adjust each A[i] matrix to reduce error
- How to adjust each B[i] matrix to reduce error
- What gradient to send to earlier layers
The math is more complex than standard LoRA because Hadamard products have different derivative rules than matrix multiplication, but the idea is the same: figure out how each parameter contributed to the error and adjust accordingly.
Forward(Tensor<T>)
Performs the forward pass through both base layer and LoHa adaptation.
public override Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>Input tensor.
Returns
- Tensor<T>
Sum of base layer output and LoHa delta (computed via Hadamard products).
Remarks
The forward pass computes: 1. base_output = base_layer(input) 2. loha_delta = sum over rank of (input * A[i] ⊙ B[i]) * scaling 3. output = base_output + loha_delta
The Hadamard product (⊙) multiplies corresponding elements, allowing element-wise adaptations.
For Beginners: This runs the input through the original layer and adds a correction.
The correction is computed by:
- Transforming input through each A[i] matrix (one per rank dimension)
- Multiplying element-wise with corresponding B[i] matrix (Hadamard product)
- Summing all rank contributions together
- Scaling by alpha/rank
This element-wise approach lets LoHa learn fine-grained adjustments to each weight independently.
GetParameters()
Gets the current parameters as a vector.
public override Vector<T> GetParameters()
Returns
- Vector<T>
Vector containing all LoHa parameters (A and B matrices for all ranks).
MergeToOriginalLayer()
Merges the LoHa adaptation into the base layer and returns the merged layer.
public override ILayer<T> MergeToOriginalLayer()
Returns
- ILayer<T>
A new DenseLayer with LoHa weights merged into the base layer's weights.
Remarks
This method computes the full LoHa weight delta by summing all Hadamard products: ΔW = scaling * sum over rank of (A[i] ⊙ B[i])
The delta is then added to the base layer's weights to create a merged layer.
For Beginners: This "bakes in" your LoHa adaptation to create a regular Dense layer.
The merging process:
- Computes the full weight delta from all A[i] and B[i] matrices using Hadamard products
- Adds this delta to the base layer's existing weights
- Copies biases unchanged (LoHa doesn't modify biases)
- Creates a new DenseLayer with the merged weights
After merging, you have a single layer that includes all the learned adaptations, making inference faster and simpler.
Exceptions
- InvalidOperationException
Thrown when the base layer type is not DenseLayer or FullyConnectedLayer.
ResetState()
Resets the internal state of both the base layer and LoHa adapter.
public override void ResetState()
Remarks
For Beginners: This clears the memory of the adapter and base layer. It's useful when starting to process a completely new, unrelated batch of data.
SetParameters(Vector<T>)
Sets the layer parameters from a vector.
public override void SetParameters(Vector<T> parameters)
Parameters
parametersVector<T>Vector containing all LoHa parameters.
UpdateParameters(T)
Updates parameters using the specified learning rate.
public override void UpdateParameters(T learningRate)
Parameters
learningRateTThe learning rate for parameter updates.