Class AddLayer<T>

Namespace: AiDotNet.NeuralNetworks.Layers

Assembly: AiDotNet.dll

A layer that adds multiple input tensors element-wise and optionally applies an activation function.

public class AddLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable

Type Parameters

T: The numeric type used for calculations (like float, double, etc.)

Inheritance: object

LayerBase<T>

AddLayer<T>

Implements: ILayer<T>

IJitCompilable<T>

IDiagnosticsProvider

IWeightLoadable<T>

IDisposable

Inherited Members: LayerBase<T>.Engine

LayerBase<T>.ScalarActivation

LayerBase<T>.VectorActivation

LayerBase<T>.UsingVectorActivation

LayerBase<T>.NumOps

LayerBase<T>.Random

LayerBase<T>.Parameters

LayerBase<T>.ParameterGradients

LayerBase<T>.InputShape

LayerBase<T>.InputShapes

LayerBase<T>.UpdateInputShape(int[])

LayerBase<T>.OutputShape

LayerBase<T>.IsTrainingMode

LayerBase<T>.InitializationStrategy

LayerBase<T>.IsInitialized

LayerBase<T>.InitializationLock

LayerBase<T>.EnsureInitialized()

LayerBase<T>.UseAutodiff

LayerBase<T>.SetTrainingMode(bool)

LayerBase<T>.GetParameterGradients()

LayerBase<T>.ClearGradients()

LayerBase<T>.GetInputShape()

LayerBase<T>.GetInputShapes()

LayerBase<T>.GetOutputShape()

LayerBase<T>.GetWeights()

LayerBase<T>.GetBiases()

LayerBase<T>.MapActivationToFused()

LayerBase<T>.CanExecuteOnGpu

LayerBase<T>.CanTrainOnGpu

LayerBase<T>.UpdateParametersGpu(IGpuOptimizerConfig)

LayerBase<T>.UploadWeightsToGpu()

LayerBase<T>.DownloadWeightsFromGpu()

LayerBase<T>.ZeroGradientsGpu()

LayerBase<T>.GetActivationTypes()

LayerBase<T>.ApplyActivation(Tensor<T>)

LayerBase<T>.ApplyActivation(Vector<T>)

LayerBase<T>.ActivateTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ActivateTensor(IVectorActivationFunction<T>, Tensor<T>)

LayerBase<T>.CalculateInputShape(int, int, int)

LayerBase<T>.CalculateOutputShape(int, int, int)

LayerBase<T>.Clone()

LayerBase<T>.DerivativeTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ApplyActivationDerivative(T, T)

LayerBase<T>.ApplyActivationDerivative(Tensor<T>, Tensor<T>)

LayerBase<T>.ComputeActivationJacobian(Vector<T>)

LayerBase<T>.ApplyActivationDerivative(Vector<T>, Vector<T>)

LayerBase<T>.UpdateParameters(Vector<T>)

LayerBase<T>.ParameterCount

LayerBase<T>.Serialize(BinaryWriter)

LayerBase<T>.Deserialize(BinaryReader)

LayerBase<T>.SetParameters(Vector<T>)

LayerBase<T>.GetDiagnostics()

LayerBase<T>.ApplyActivationToGraph(ComputationNode<T>)

LayerBase<T>.CanActivationBeJitted()

LayerBase<T>.RegisterTrainableParameter(Tensor<T>, PersistentTensorRole)

LayerBase<T>.InvalidateTrainableParameter(Tensor<T>)

LayerBase<T>.HasGpuActivation()

LayerBase<T>.ApplyActivationForwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.ApplyActivationBackwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.GetFusedActivationType()

LayerBase<T>.ApplyGpuActivation(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int, FusedActivationType)

LayerBase<T>.ApplyGpuActivationBackward(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int, FusedActivationType, float)

LayerBase<T>.Dispose()

LayerBase<T>.Dispose(bool)

LayerBase<T>.WeightParameterName

LayerBase<T>.BiasParameterName

LayerBase<T>.SetWeights(Tensor<T>)

LayerBase<T>.SetBiases(Tensor<T>)

LayerBase<T>.GetParameterNames()

LayerBase<T>.TryGetParameter(string, out Tensor<T>)

LayerBase<T>.SetParameter(string, Tensor<T>)

LayerBase<T>.GetParameterShape(string)

LayerBase<T>.NamedParameterCount

LayerBase<T>.ValidateWeights(IEnumerable<string>, Func<string, string>)

LayerBase<T>.LoadWeights(Dictionary<string, Tensor<T>>, Func<string, string>, bool)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

The AddLayer combines multiple tensors of identical shape by adding their values element-wise. This is useful for implementing residual connections, skip connections, or any architecture that requires combining information from multiple sources. After adding the inputs, an optional activation function can be applied to the result.

For Beginners: This layer adds together multiple inputs of the same shape.

Think of this layer as performing element-wise addition:

If you have two 3×3 matrices, it adds corresponding elements together
All inputs must have exactly the same dimensions
After adding, it can optionally apply an activation function

This is commonly used in:

Residual networks (ResNets) where outputs from earlier layers are added to later layers
Skip connections that help information flow more directly through deep networks
Any situation where you want to combine information from multiple sources

For example, if you have two feature maps from different parts of a network, this layer lets you combine them by adding their values together.

Constructors

AddLayer(int[][], IActivationFunction<T>?)

public AddLayer(int[][] inputShapes, IActivationFunction<T>? activationFunction = null)

Parameters

inputShapes int[][]
activationFunction IActivationFunction<T>

AddLayer(int[][], IVectorActivationFunction<T>?)

public AddLayer(int[][] inputShapes, IVectorActivationFunction<T>? vectorActivationFunction = null)

Parameters

inputShapes int[][]
vectorActivationFunction IVectorActivationFunction<T>

Properties

SupportsGpuExecution

Gets whether this layer has a GPU execution implementation for inference.

protected override bool SupportsGpuExecution { get; }

Property Value

bool

Remarks

Override this to return true when the layer implements ForwardGpu(params IGpuTensor<T>[]). The actual CanExecuteOnGpu property combines this with engine availability.

For Beginners: This flag indicates if the layer has GPU code for the forward pass. Set this to true in derived classes that implement ForwardGpu.

SupportsGpuTraining

Gets whether this layer has full GPU training support (forward, backward, and parameter updates).

public override bool SupportsGpuTraining { get; }

Property Value

bool

Remarks

This property indicates whether the layer can perform its entire training cycle on GPU without downloading data to CPU. A layer has full GPU training support when:

ForwardGpu is implemented
BackwardGpu is implemented
UpdateParametersGpu is implemented (for layers with trainable parameters)
GPU weight/bias/gradient buffers are properly managed

For Beginners: This tells you if training can happen entirely on GPU.

GPU-resident training is much faster because:

Data stays on GPU between forward and backward passes
No expensive CPU-GPU transfers during each training step
GPU kernels handle all gradient computation

Only layers that return true here can participate in fully GPU-resident training.

SupportsJitCompilation

Gets whether this layer supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool: True if the activation function supports JIT compilation, false otherwise.

Remarks

Addition layers support JIT compilation as long as their activation function does. The element-wise addition operation is straightforward to compile and optimize.

SupportsTraining

Indicates whether this layer has trainable parameters.

public override bool SupportsTraining { get; }

Property Value

bool: Always returns false because addition layers don't have parameters to train.

Remarks

This property overrides the base class property to specify that addition layers do not have trainable parameters. Trainable parameters are values within a layer that are adjusted during the training process to minimize the loss function. Since addition layers simply add their inputs together without any adjustable parameters, this property always returns false.

For Beginners: This tells you that addition layers don't learn or change during training.

While layers like Dense layers have weights that get updated during training, addition layers just perform a fixed mathematical operation (addition) that never changes.

This property helps the training system know that it doesn't need to update anything in this layer during the training process.

Methods

Backward(Tensor<T>)

Calculates how changes in the output affect the inputs during training.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>: How much the network's error changes with respect to this layer's output.

Returns

Tensor<T>: How much the network's error changes with respect to this layer's first input.

Remarks

This method implements the backward pass for the addition layer. It first applies the derivative of the activation function to the output gradient, then copies this gradient for each input. In an addition operation, the gradient with respect to each input is the same as the gradient with respect to the output (after accounting for the activation function). This method returns the gradient for the first input only, as required by the interface, but internally it calculates gradients for all inputs.

For Beginners: This method calculates how the error gradient flows backward through this layer.

During backpropagation, this method:

Checks that Forward() was called first
Calculates how the gradient changes due to the activation function (if any)
Creates a copy of this gradient for each input
Returns the gradient for the first input

For addition, the gradient flows equally to all inputs. This means if the output needs to change by some amount, each input contributes equally to that change.

Note: This method only returns the gradient for the first input due to interface constraints. In a real network, you would need to handle returning all gradients to their respective sources.

Exceptions

InvalidOperationException: Thrown if called before Forward method.

BackwardGpu(IGpuTensor<T>)

Performs the backward pass of the layer on GPU.

public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)

Parameters

outputGradient IGpuTensor<T>: The GPU-resident gradient of the loss with respect to the layer's output.

Returns

IGpuTensor<T>: The GPU-resident gradient of the loss with respect to the layer's input.

Remarks

This method performs the layer's backward computation entirely on GPU, including:

Computing input gradients to pass to previous layers
Computing and storing weight gradients on GPU (for layers with trainable parameters)
Computing and storing bias gradients on GPU

For Beginners: This is like Backward() but runs entirely on GPU.

During GPU training:

Output gradients come in (on GPU)
Input gradients are computed (stay on GPU)
Weight/bias gradients are computed and stored (on GPU)
Input gradients are returned for the previous layer

All data stays on GPU - no CPU round-trips needed!

Exceptions

NotSupportedException: Thrown when the layer does not support GPU training.
InvalidOperationException: Thrown if ForwardGpu was not called first.

ExportComputationGraph(List<ComputationNode<T>>)

Exports this layer's computation as a differentiable computation graph for JIT compilation.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>: List to which input variable nodes should be added.

Returns

ComputationNode<T>: The output computation node representing this layer's operation.

Remarks

This method builds a computation graph representation of the addition operation that can be compiled and optimized for efficient execution. The graph represents element-wise addition of multiple inputs followed by optional activation.

For Beginners: This method creates a reusable, optimized version of the layer for faster inference.

For addition layers:

Creates placeholder nodes for each input
Chains addition operations together
Applies the activation function to the result
Returns a computation graph that can be executed efficiently

This is used during inference to make predictions faster by pre-compiling the operations.

Exceptions

ArgumentNullException: Thrown when inputNodes is null.
NotSupportedException: Thrown when the activation function is not supported for JIT compilation.

Forward(Tensor<T>)

This method is not supported for AddLayer, which requires multiple inputs.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>: A single input tensor.

Returns

Tensor<T>: Never returns as this method always throws an exception.

Remarks

This method overrides the base class method but is not supported for AddLayer, which requires multiple inputs. Instead, use the Forward(params Tensor<T>[] inputs) method, which accepts multiple input tensors.

For Beginners: This method is not supported because addition requires multiple inputs.

Since addition requires at least two operands, this layer doesn't support the single-input Forward method. If you try to use it, you'll get an error.

Instead, use the version of Forward that accepts multiple inputs:

var output = addLayer.Forward(input1, input2);

This design ensures that the layer is used correctly - you can't accidentally try to add just one tensor.

Exceptions

NotSupportedException: Always thrown because AddLayer requires multiple inputs.

Forward(params Tensor<T>[])

Processes multiple input tensors by adding them element-wise and optionally applying an activation function.

public override Tensor<T> Forward(params Tensor<T>[] inputs)

Parameters

inputs Tensor<T>[]: An array of input tensors to add together.

Returns

Tensor<T>: The result of adding the input tensors and applying the activation function.

Remarks

This method implements the forward pass for the addition layer. It adds all input tensors element-wise, then applies the activation function (if any) to the result. The input tensors and the output are stored for later use in the backward pass.

For Beginners: This method adds multiple input tensors together element by element.

During the forward pass, this method:

Checks that you've provided at least two input tensors
Saves the inputs for later use in backpropagation
Creates a copy of the first input tensor
Adds each of the other input tensors to it
Applies the activation function (if any)
Saves and returns the result

For example, with inputs [1, 2, 3] and [4, 5, 6]:

The addition gives [5, 7, 9]
If using ReLU activation, the output remains [5, 7, 9]
If using a different activation, it would transform these values

This operation combines information from multiple sources in your network.

Exceptions

ArgumentException: Thrown when fewer than two input tensors are provided.

ForwardGpu(params IGpuTensor<T>[])

Performs the forward pass of the layer on GPU.

public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)

Parameters

inputs IGpuTensor<T>[]: The GPU-resident input tensor(s).

Returns

IGpuTensor<T>: The GPU-resident output tensor.

Remarks

This method performs the layer's forward computation entirely on GPU. The input and output tensors remain in GPU memory, avoiding expensive CPU-GPU transfers.

For Beginners: This is like Forward() but runs on the graphics card.

The key difference:

Forward() uses CPU tensors that may be copied to/from GPU
ForwardGpu() keeps everything on GPU the whole time

Override this in derived classes that support GPU acceleration.

Exceptions

NotSupportedException: Thrown when the layer does not support GPU execution.

GetParameters()

Gets all trainable parameters of this layer as a flat vector.

public override Vector<T> GetParameters()

Returns

Vector<T>: An empty vector since addition layers have no trainable parameters.

Remarks

This method returns all trainable parameters of the layer as a flat vector. For layers with trainable parameters, this would involve reshaping multi-dimensional parameters (like weight matrices) into a one-dimensional vector. However, since addition layers have no trainable parameters, this method returns an empty vector.

For Beginners: This method returns all the layer's trainable values as a single list, but addition layers have none.

Some operations in neural networks need to work with all parameters at once:

Saving and loading models
Applying regularization (techniques to prevent overfitting)
Using advanced optimization algorithms

This method provides those parameters as a single vector, but since addition layers don't have any trainable parameters, it returns an empty vector.

For comparison:

A Dense layer with 100 inputs and 10 outputs would return a vector with 1,010 values (1,000 weights + 10 biases)
This AddLayer returns an empty vector with 0 values

ResetState()

Clears the layer's memory of previous inputs and outputs.

public override void ResetState()

Remarks

This method resets the internal state of the layer by clearing the cached input tensors and output tensor. The addition layer stores the inputs and output from the most recent forward pass to use during the backward pass for calculating gradients. Resetting this state is useful when starting to process new data or when you want to ensure the layer behaves deterministically.

For Beginners: This method clears the layer's memory of previous calculations.

During training, the layer remembers the inputs and output from the last forward pass to help with backpropagation calculations. This method makes the layer "forget" those values.

You might need to reset state:

When starting a new batch of training data
Between training epochs
When switching from training to testing
When you want to ensure consistent behavior

For addition layers, this simply clears the saved input and output tensors.

This helps ensure that processing one batch doesn't accidentally affect the processing of the next batch.

UpdateParameters(T)

Updates the layer's internal parameters during training.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T: How quickly the network should learn from new data.

Remarks

This method is called during the training process after the forward and backward passes have been completed. For layers with trainable parameters, this method would update those parameters based on the gradients calculated during backpropagation and the provided learning rate. However, since addition layers have no trainable parameters, this method does nothing.

For Beginners: This method would update the layer's internal values during training, but addition layers have nothing to update.

In neural networks, training involves adjusting parameters to reduce errors. This method is where those adjustments happen, but addition layers don't have any adjustable parameters, so this method is empty.

For comparison:

In a Dense layer, this would update weights and biases
In a BatchNorm layer, this would update scale and shift parameters
In this AddLayer, there's nothing to update

The learning rate parameter controls how big the updates would be if there were any parameters to update - higher values mean bigger changes.

Table of Contents

Class AddLayer<T>

Type Parameters

Remarks

Constructors

AddLayer(int[][], IActivationFunction<T>?)

Parameters

AddLayer(int[][], IVectorActivationFunction<T>?)

Parameters

Properties

SupportsGpuExecution

Property Value

Remarks

SupportsGpuTraining

Property Value

Remarks

SupportsJitCompilation

Property Value

Remarks

SupportsTraining

Property Value

Remarks

Methods

Backward(Tensor<T>)

Parameters

Returns

Remarks

Exceptions

BackwardGpu(IGpuTensor<T>)

Parameters

Returns

Remarks

Exceptions

ExportComputationGraph(List<ComputationNode<T>>)

Parameters

Returns

Remarks

Exceptions

Forward(Tensor<T>)

Parameters

Returns

Remarks

Exceptions

Forward(params Tensor<T>[])

Parameters

Returns

Remarks

Exceptions

ForwardGpu(params IGpuTensor<T>[])

Parameters

Returns

Remarks

Exceptions

GetParameters()

Returns

Remarks

ResetState()

Remarks

UpdateParameters(T)

Parameters

Remarks