Class MultiplyLayer<T>
- Namespace
- AiDotNet.NeuralNetworks.Layers
- Assembly
- AiDotNet.dll
Represents a layer that performs element-wise multiplication of multiple input tensors.
public class MultiplyLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
LayerBase<T>MultiplyLayer<T>
- Implements
-
ILayer<T>
- Inherited Members
Remarks
The MultiplyLayer performs element-wise multiplication (Hadamard product) of two or more input tensors of identical shape. This operation can be useful for implementing gating mechanisms, attention masks, or feature-wise interactions in neural networks. The layer requires that all input tensors have the same shape, and it produces an output tensor of that same shape.
For Beginners: This layer multiplies tensors together, element by element.
Think of it like multiplying numbers together in corresponding positions:
- If you have two vectors [1, 2, 3] and [4, 5, 6]
- The result would be [1×4, 2×5, 3×6] = [4, 10, 18]
This is useful for:
- Controlling information flow (like gates in LSTM or GRU cells)
- Applying masks (to selectively focus on certain values)
- Combining features in a multiplicative way
For example, in an attention mechanism, you might multiply feature values by attention weights to focus on important features and diminish the influence of less relevant ones.
Constructors
MultiplyLayer(int[][], IActivationFunction<T>?)
Initializes a new instance of the MultiplyLayer<T> class with the specified input shapes and a scalar activation function.
public MultiplyLayer(int[][] inputShapes, IActivationFunction<T>? activationFunction = null)
Parameters
inputShapesint[][]An array of input shapes, all of which must be identical.
activationFunctionIActivationFunction<T>The activation function to apply after processing. Defaults to Identity if not specified.
Remarks
This constructor creates a MultiplyLayer that expects multiple input tensors with identical shapes. It validates that at least two input shapes are provided and that all shapes are identical, since element-wise multiplication requires matching dimensions.
For Beginners: This constructor sets up the layer to handle multiple inputs of the same shape.
When creating a MultiplyLayer, you need to specify:
- inputShapes: The shapes of all the inputs you'll provide (which must match)
- activationFunction: The function that processes the final output (optional)
For example, if you want to multiply three tensors with shape [32, 10, 128]:
- You would specify inputShapes as [[32, 10, 128], [32, 10, 128], [32, 10, 128]]
- The layer would validate that all these shapes match
- The output shape would also be [32, 10, 128]
The constructor throws an exception if you provide fewer than two input shapes or if the shapes don't all match exactly.
Exceptions
- ArgumentException
Thrown when fewer than two input shapes are provided or when input shapes are not identical.
MultiplyLayer(int[][], IVectorActivationFunction<T>?)
Initializes a new instance of the MultiplyLayer<T> class with the specified input shapes and a vector activation function.
public MultiplyLayer(int[][] inputShapes, IVectorActivationFunction<T>? vectorActivationFunction = null)
Parameters
inputShapesint[][]An array of input shapes, all of which must be identical.
vectorActivationFunctionIVectorActivationFunction<T>The vector activation function to apply after processing. Defaults to Identity if not specified.
Remarks
This constructor creates a MultiplyLayer that expects multiple input tensors with identical shapes. It validates that at least two input shapes are provided and that all shapes are identical, since element-wise multiplication requires matching dimensions. This overload accepts a vector activation function, which operates on entire vectors rather than individual elements.
For Beginners: This constructor sets up the layer with a vector-based activation function.
A vector activation function:
- Operates on entire groups of numbers at once, rather than one at a time
- Can capture relationships between different elements in the output
- Defaults to the Identity function, which doesn't change the values
This constructor is useful when you need more complex activation patterns that consider the relationships between different values after multiplication.
Exceptions
- ArgumentException
Thrown when fewer than two input shapes are provided or when input shapes are not identical.
Properties
SupportsGpuExecution
Gets a value indicating whether this layer supports GPU execution.
protected override bool SupportsGpuExecution { get; }
Property Value
SupportsGpuTraining
Gets whether this layer has full GPU training support (forward, backward, and parameter updates).
public override bool SupportsGpuTraining { get; }
Property Value
Remarks
This property indicates whether the layer can perform its entire training cycle on GPU without downloading data to CPU. A layer has full GPU training support when:
- ForwardGpu is implemented
- BackwardGpu is implemented
- UpdateParametersGpu is implemented (for layers with trainable parameters)
- GPU weight/bias/gradient buffers are properly managed
For Beginners: This tells you if training can happen entirely on GPU.
GPU-resident training is much faster because:
- Data stays on GPU between forward and backward passes
- No expensive CPU-GPU transfers during each training step
- GPU kernels handle all gradient computation
Only layers that return true here can participate in fully GPU-resident training.
SupportsJitCompilation
Gets whether this layer supports JIT compilation.
public override bool SupportsJitCompilation { get; }
Property Value
- bool
True if the layer can be JIT compiled, false otherwise.
Remarks
This property indicates whether the layer has implemented ExportComputationGraph() and can benefit from JIT compilation. All layers MUST implement this property.
For Beginners: JIT compilation can make inference 5-10x faster by converting the layer's operations into optimized native code.
Layers should return false if they:
- Have not yet implemented a working ExportComputationGraph()
- Use dynamic operations that change based on input data
- Are too simple to benefit from JIT compilation
When false, the layer will use the standard Forward() method instead.
SupportsTraining
Gets a value indicating whether this layer supports training.
public override bool SupportsTraining { get; }
Property Value
- bool
Always
truebecause the MultiplyLayer supports backpropagation, even though it has no parameters.
Remarks
This property indicates whether the layer supports backpropagation during training. Although the MultiplyLayer has no trainable parameters, it still supports the backward pass to propagate gradients to previous layers.
For Beginners: This property tells you if the layer can participate in the training process.
A value of true means:
- The layer can pass gradient information backward during training
- It's part of the learning process, even though it doesn't have learnable parameters
While this layer doesn't have weights or biases that get updated during training, it still needs to properly handle gradients to ensure that layers before it can learn correctly.
Methods
Backward(Tensor<T>)
Performs the backward pass of the multiply layer.
public override Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>The gradient of the loss with respect to the layer's output.
Returns
- Tensor<T>
The gradient of the loss with respect to the layer's inputs.
Remarks
This method implements the backward pass of the multiply layer, which is used during training to propagate error gradients back through the network. For element-wise multiplication, the gradient with respect to each input is the product of the output gradient and all other inputs. The method calculates and returns the gradients for all input tensors.
For Beginners: This method calculates how changes in each input affect the final output.
During the backward pass:
- The layer receives gradients indicating how the output should change
- It calculates how each input tensor contributed to the output
- For each input, its gradient is the product of:
- The output gradient (after applying the activation function derivative)
- All OTHER input tensors (not including itself)
This follows the chain rule of calculus for multiplication: If z = x * y, then:
- dz/dx = y * (gradient flowing back from later layers)
- dz/dy = x * (gradient flowing back from later layers)
The method returns a stacked tensor containing gradients for all inputs.
Exceptions
- InvalidOperationException
Thrown when backward is called before forward.
BackwardGpu(IGpuTensor<T>)
Computes the gradients of the loss with respect to the inputs on the GPU.
public IGpuTensor<T>[] BackwardGpu(IGpuTensor<T> outputGradient)
Parameters
outputGradientIGpuTensor<T>The gradient of the loss with respect to the layer's output.
Returns
- IGpuTensor<T>[]
Array of gradients for each input tensor.
Remarks
For element-wise multiplication z = x * y, the gradient with respect to each input is the product of the output gradient and all other inputs.
ExportComputationGraph(List<ComputationNode<T>>)
Exports the layer's computation graph for JIT compilation.
public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)
Parameters
inputNodesList<ComputationNode<T>>List to populate with input computation nodes.
Returns
- ComputationNode<T>
The output computation node representing the layer's operation.
Remarks
This method constructs a computation graph representation of the layer's forward pass that can be JIT compiled for faster inference. All layers MUST implement this method to support JIT compilation.
For Beginners: JIT (Just-In-Time) compilation converts the layer's operations into optimized native code for 5-10x faster inference.
To support JIT compilation, a layer must:
- Implement this method to export its computation graph
- Set SupportsJitCompilation to true
- Use ComputationNode and TensorOperations to build the graph
All layers are required to implement this method, even if they set SupportsJitCompilation = false.
Forward(Tensor<T>)
public override Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>
Returns
- Tensor<T>
Forward(params Tensor<T>[])
Performs the forward pass of the multiply layer with multiple input tensors.
public override Tensor<T> Forward(params Tensor<T>[] inputs)
Parameters
inputsTensor<T>[]The array of input tensors to multiply.
Returns
- Tensor<T>
The output tensor after element-wise multiplication and activation.
Remarks
This method implements the forward pass of the multiply layer. It performs element-wise multiplication of all input tensors, then applies the activation function to the result. The input tensors and output tensor are cached for use during the backward pass.
For Beginners: This method performs the actual multiplication operation.
During the forward pass:
- The method checks that you've provided at least two input tensors
- It makes a copy of the first input tensor as the starting point
- It then multiplies this copy element-by-element with each of the other input tensors
- Finally, it applies the activation function to the result
For example, with inputs [1,2,3], [4,5,6], and [0.5,0.5,0.5]:
- Start with [1,2,3]
- Multiply by [4,5,6] to get [4,10,18]
- Multiply by [0.5,0.5,0.5] to get [2,5,9]
- Apply activation function (if any)
The method also saves all inputs and the output for later use in backpropagation.
Exceptions
- ArgumentException
Thrown when fewer than two input tensors are provided.
ForwardGpu(params IGpuTensor<T>[])
Performs the forward pass on GPU using actual GPU element-wise multiplication.
public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)
Parameters
inputsIGpuTensor<T>[]The GPU input tensors.
Returns
- IGpuTensor<T>
The GPU output tensor.
GetParameters()
Gets all trainable parameters from the multiply layer as a single vector.
public override Vector<T> GetParameters()
Returns
- Vector<T>
An empty vector since MultiplyLayer has no trainable parameters.
Remarks
This method retrieves all trainable parameters from the layer as a single vector. Since MultiplyLayer has no trainable parameters, it returns an empty vector.
For Beginners: This method returns all the learnable values in the layer.
Since MultiplyLayer:
- Only performs fixed mathematical operations (multiplication)
- Has no weights, biases, or other learnable parameters
- The method returns an empty list
This is different from layers like Dense layers, which would return their weights and biases.
ResetState()
Resets the internal state of the multiply layer.
public override void ResetState()
Remarks
This method resets the internal state of the multiply layer, including the cached inputs and output. This is useful when starting to process a new sequence or batch of data.
For Beginners: This method clears the layer's memory to start fresh.
When resetting the state:
- Stored inputs and outputs from previous processing are cleared
- The layer forgets any information from previous data batches
This is important for:
- Processing a new, unrelated batch of data
- Ensuring clean state before a new training epoch
- Preventing information from one batch affecting another
While the MultiplyLayer doesn't maintain long-term state across samples, clearing these cached values helps with memory management and ensuring a clean processing pipeline.
UpdateParameters(T)
Updates the parameters of the multiply layer using the calculated gradients.
public override void UpdateParameters(T learningRate)
Parameters
learningRateTThe learning rate to use for the parameter updates.
Remarks
This method is part of the training process, but since MultiplyLayer has no trainable parameters, this method does nothing.
For Beginners: This method would normally update a layer's internal values during training.
However, since MultiplyLayer just performs a fixed mathematical operation (multiplication) and doesn't have any internal values that can be learned or adjusted, this method is empty.
This is unlike layers such as Dense or Convolutional layers, which have weights and biases that get updated during training.