Table of Contents

Class MultiplyLayer<T>

Namespace
AiDotNet.NeuralNetworks.Layers
Assembly
AiDotNet.dll

Represents a layer that performs element-wise multiplication of multiple input tensors.

public class MultiplyLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable

Type Parameters

T

The numeric type used for calculations, typically float or double.

Inheritance
MultiplyLayer<T>
Implements
Inherited Members

Remarks

The MultiplyLayer performs element-wise multiplication (Hadamard product) of two or more input tensors of identical shape. This operation can be useful for implementing gating mechanisms, attention masks, or feature-wise interactions in neural networks. The layer requires that all input tensors have the same shape, and it produces an output tensor of that same shape.

For Beginners: This layer multiplies tensors together, element by element.

Think of it like multiplying numbers together in corresponding positions:

  • If you have two vectors [1, 2, 3] and [4, 5, 6]
  • The result would be [1×4, 2×5, 3×6] = [4, 10, 18]

This is useful for:

  • Controlling information flow (like gates in LSTM or GRU cells)
  • Applying masks (to selectively focus on certain values)
  • Combining features in a multiplicative way

For example, in an attention mechanism, you might multiply feature values by attention weights to focus on important features and diminish the influence of less relevant ones.

Constructors

MultiplyLayer(int[][], IActivationFunction<T>?)

Initializes a new instance of the MultiplyLayer<T> class with the specified input shapes and a scalar activation function.

public MultiplyLayer(int[][] inputShapes, IActivationFunction<T>? activationFunction = null)

Parameters

inputShapes int[][]

An array of input shapes, all of which must be identical.

activationFunction IActivationFunction<T>

The activation function to apply after processing. Defaults to Identity if not specified.

Remarks

This constructor creates a MultiplyLayer that expects multiple input tensors with identical shapes. It validates that at least two input shapes are provided and that all shapes are identical, since element-wise multiplication requires matching dimensions.

For Beginners: This constructor sets up the layer to handle multiple inputs of the same shape.

When creating a MultiplyLayer, you need to specify:

  • inputShapes: The shapes of all the inputs you'll provide (which must match)
  • activationFunction: The function that processes the final output (optional)

For example, if you want to multiply three tensors with shape [32, 10, 128]:

  • You would specify inputShapes as [[32, 10, 128], [32, 10, 128], [32, 10, 128]]
  • The layer would validate that all these shapes match
  • The output shape would also be [32, 10, 128]

The constructor throws an exception if you provide fewer than two input shapes or if the shapes don't all match exactly.

Exceptions

ArgumentException

Thrown when fewer than two input shapes are provided or when input shapes are not identical.

MultiplyLayer(int[][], IVectorActivationFunction<T>?)

Initializes a new instance of the MultiplyLayer<T> class with the specified input shapes and a vector activation function.

public MultiplyLayer(int[][] inputShapes, IVectorActivationFunction<T>? vectorActivationFunction = null)

Parameters

inputShapes int[][]

An array of input shapes, all of which must be identical.

vectorActivationFunction IVectorActivationFunction<T>

The vector activation function to apply after processing. Defaults to Identity if not specified.

Remarks

This constructor creates a MultiplyLayer that expects multiple input tensors with identical shapes. It validates that at least two input shapes are provided and that all shapes are identical, since element-wise multiplication requires matching dimensions. This overload accepts a vector activation function, which operates on entire vectors rather than individual elements.

For Beginners: This constructor sets up the layer with a vector-based activation function.

A vector activation function:

  • Operates on entire groups of numbers at once, rather than one at a time
  • Can capture relationships between different elements in the output
  • Defaults to the Identity function, which doesn't change the values

This constructor is useful when you need more complex activation patterns that consider the relationships between different values after multiplication.

Exceptions

ArgumentException

Thrown when fewer than two input shapes are provided or when input shapes are not identical.

Properties

SupportsGpuExecution

Gets a value indicating whether this layer supports GPU execution.

protected override bool SupportsGpuExecution { get; }

Property Value

bool

SupportsGpuTraining

Gets whether this layer has full GPU training support (forward, backward, and parameter updates).

public override bool SupportsGpuTraining { get; }

Property Value

bool

Remarks

This property indicates whether the layer can perform its entire training cycle on GPU without downloading data to CPU. A layer has full GPU training support when:

  • ForwardGpu is implemented
  • BackwardGpu is implemented
  • UpdateParametersGpu is implemented (for layers with trainable parameters)
  • GPU weight/bias/gradient buffers are properly managed

For Beginners: This tells you if training can happen entirely on GPU.

GPU-resident training is much faster because:

  • Data stays on GPU between forward and backward passes
  • No expensive CPU-GPU transfers during each training step
  • GPU kernels handle all gradient computation

Only layers that return true here can participate in fully GPU-resident training.

SupportsJitCompilation

Gets whether this layer supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool

True if the layer can be JIT compiled, false otherwise.

Remarks

This property indicates whether the layer has implemented ExportComputationGraph() and can benefit from JIT compilation. All layers MUST implement this property.

For Beginners: JIT compilation can make inference 5-10x faster by converting the layer's operations into optimized native code.

Layers should return false if they:

  • Have not yet implemented a working ExportComputationGraph()
  • Use dynamic operations that change based on input data
  • Are too simple to benefit from JIT compilation

When false, the layer will use the standard Forward() method instead.

SupportsTraining

Gets a value indicating whether this layer supports training.

public override bool SupportsTraining { get; }

Property Value

bool

Always true because the MultiplyLayer supports backpropagation, even though it has no parameters.

Remarks

This property indicates whether the layer supports backpropagation during training. Although the MultiplyLayer has no trainable parameters, it still supports the backward pass to propagate gradients to previous layers.

For Beginners: This property tells you if the layer can participate in the training process.

A value of true means:

  • The layer can pass gradient information backward during training
  • It's part of the learning process, even though it doesn't have learnable parameters

While this layer doesn't have weights or biases that get updated during training, it still needs to properly handle gradients to ensure that layers before it can learn correctly.

Methods

Backward(Tensor<T>)

Performs the backward pass of the multiply layer.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>

The gradient of the loss with respect to the layer's output.

Returns

Tensor<T>

The gradient of the loss with respect to the layer's inputs.

Remarks

This method implements the backward pass of the multiply layer, which is used during training to propagate error gradients back through the network. For element-wise multiplication, the gradient with respect to each input is the product of the output gradient and all other inputs. The method calculates and returns the gradients for all input tensors.

For Beginners: This method calculates how changes in each input affect the final output.

During the backward pass:

  • The layer receives gradients indicating how the output should change
  • It calculates how each input tensor contributed to the output
  • For each input, its gradient is the product of:
    • The output gradient (after applying the activation function derivative)
    • All OTHER input tensors (not including itself)

This follows the chain rule of calculus for multiplication: If z = x * y, then:

  • dz/dx = y * (gradient flowing back from later layers)
  • dz/dy = x * (gradient flowing back from later layers)

The method returns a stacked tensor containing gradients for all inputs.

Exceptions

InvalidOperationException

Thrown when backward is called before forward.

BackwardGpu(IGpuTensor<T>)

Computes the gradients of the loss with respect to the inputs on the GPU.

public IGpuTensor<T>[] BackwardGpu(IGpuTensor<T> outputGradient)

Parameters

outputGradient IGpuTensor<T>

The gradient of the loss with respect to the layer's output.

Returns

IGpuTensor<T>[]

Array of gradients for each input tensor.

Remarks

For element-wise multiplication z = x * y, the gradient with respect to each input is the product of the output gradient and all other inputs.

ExportComputationGraph(List<ComputationNode<T>>)

Exports the layer's computation graph for JIT compilation.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>

List to populate with input computation nodes.

Returns

ComputationNode<T>

The output computation node representing the layer's operation.

Remarks

This method constructs a computation graph representation of the layer's forward pass that can be JIT compiled for faster inference. All layers MUST implement this method to support JIT compilation.

For Beginners: JIT (Just-In-Time) compilation converts the layer's operations into optimized native code for 5-10x faster inference.

To support JIT compilation, a layer must:

  1. Implement this method to export its computation graph
  2. Set SupportsJitCompilation to true
  3. Use ComputationNode and TensorOperations to build the graph

All layers are required to implement this method, even if they set SupportsJitCompilation = false.

Forward(Tensor<T>)

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>

Returns

Tensor<T>

Forward(params Tensor<T>[])

Performs the forward pass of the multiply layer with multiple input tensors.

public override Tensor<T> Forward(params Tensor<T>[] inputs)

Parameters

inputs Tensor<T>[]

The array of input tensors to multiply.

Returns

Tensor<T>

The output tensor after element-wise multiplication and activation.

Remarks

This method implements the forward pass of the multiply layer. It performs element-wise multiplication of all input tensors, then applies the activation function to the result. The input tensors and output tensor are cached for use during the backward pass.

For Beginners: This method performs the actual multiplication operation.

During the forward pass:

  • The method checks that you've provided at least two input tensors
  • It makes a copy of the first input tensor as the starting point
  • It then multiplies this copy element-by-element with each of the other input tensors
  • Finally, it applies the activation function to the result

For example, with inputs [1,2,3], [4,5,6], and [0.5,0.5,0.5]:

  1. Start with [1,2,3]
  2. Multiply by [4,5,6] to get [4,10,18]
  3. Multiply by [0.5,0.5,0.5] to get [2,5,9]
  4. Apply activation function (if any)

The method also saves all inputs and the output for later use in backpropagation.

Exceptions

ArgumentException

Thrown when fewer than two input tensors are provided.

ForwardGpu(params IGpuTensor<T>[])

Performs the forward pass on GPU using actual GPU element-wise multiplication.

public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)

Parameters

inputs IGpuTensor<T>[]

The GPU input tensors.

Returns

IGpuTensor<T>

The GPU output tensor.

GetParameters()

Gets all trainable parameters from the multiply layer as a single vector.

public override Vector<T> GetParameters()

Returns

Vector<T>

An empty vector since MultiplyLayer has no trainable parameters.

Remarks

This method retrieves all trainable parameters from the layer as a single vector. Since MultiplyLayer has no trainable parameters, it returns an empty vector.

For Beginners: This method returns all the learnable values in the layer.

Since MultiplyLayer:

  • Only performs fixed mathematical operations (multiplication)
  • Has no weights, biases, or other learnable parameters
  • The method returns an empty list

This is different from layers like Dense layers, which would return their weights and biases.

ResetState()

Resets the internal state of the multiply layer.

public override void ResetState()

Remarks

This method resets the internal state of the multiply layer, including the cached inputs and output. This is useful when starting to process a new sequence or batch of data.

For Beginners: This method clears the layer's memory to start fresh.

When resetting the state:

  • Stored inputs and outputs from previous processing are cleared
  • The layer forgets any information from previous data batches

This is important for:

  • Processing a new, unrelated batch of data
  • Ensuring clean state before a new training epoch
  • Preventing information from one batch affecting another

While the MultiplyLayer doesn't maintain long-term state across samples, clearing these cached values helps with memory management and ensuring a clean processing pipeline.

UpdateParameters(T)

Updates the parameters of the multiply layer using the calculated gradients.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T

The learning rate to use for the parameter updates.

Remarks

This method is part of the training process, but since MultiplyLayer has no trainable parameters, this method does nothing.

For Beginners: This method would normally update a layer's internal values during training.

However, since MultiplyLayer just performs a fixed mathematical operation (multiplication) and doesn't have any internal values that can be learned or adjusted, this method is empty.

This is unlike layers such as Dense or Convolutional layers, which have weights and biases that get updated during training.