Table of Contents

Class ResidualLayer<T>

Namespace
AiDotNet.NeuralNetworks.Layers
Assembly
AiDotNet.dll

Represents a residual layer that adds the identity mapping (input) to the output of an inner layer.

public class ResidualLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable

Type Parameters

T

The numeric type used for calculations, typically float or double.

Inheritance
ResidualLayer<T>
Implements
Inherited Members

Remarks

A residual layer implements the core concept of residual networks (ResNets), where the layer learns the residual (difference) between the identity mapping and the desired underlying mapping rather than the complete transformation. This is achieved by adding a skip connection that passes the input directly to the output, where it's added to the transformed output of an inner layer.

For Beginners: This layer helps neural networks learn more effectively, especially when they're very deep.

Think of it as a "correction mechanism":

  • The inner layer tries to learn how to improve or adjust the input
  • The original input is preserved and added back in at the end
  • This allows the network to focus on learning the changes needed, rather than recreating the entire signal

Benefits include:

  • Solves the "vanishing gradient problem" that makes deep networks hard to train
  • Enables training of much deeper networks (hundreds of layers instead of just dozens)
  • Improves learning speed and accuracy

For example, in image recognition, a residual layer might learn to emphasize important features while preserving the original image information through the skip connection.

Constructors

ResidualLayer(int[], ILayer<T>?, IActivationFunction<T>?)

Initializes a new instance of the ResidualLayer<T> class with the specified input shape, inner layer, and scalar activation function.

public ResidualLayer(int[] inputShape, ILayer<T>? innerLayer = null, IActivationFunction<T>? activationFunction = null)

Parameters

inputShape int[]

The shape of the input tensor.

innerLayer ILayer<T>

The optional inner layer that learns the residual mapping.

activationFunction IActivationFunction<T>

Remarks

This constructor creates a residual layer with the specified input shape, inner layer, and scalar activation function. The output shape is set to be the same as the input shape, as required for residual connections. If the inner layer has different input and output shapes, an exception will be thrown.

For Beginners: This creates a new residual layer with basic settings.

When you create a residual layer this way:

  • You specify the size and shape of the data it will process
  • You can provide an inner layer that will learn the transformation
  • You can specify an activation function that operates on each value individually

The residual layer requires that the inner layer produces output of the same shape as its input, because the original input and the transformed output need to be added together.

This constructor is for the more common case where you want to use a scalar activation function.

ResidualLayer(int[], ILayer<T>?, IVectorActivationFunction<T>?)

Initializes a new instance of the ResidualLayer<T> class with the specified input shape, inner layer, and vector activation function.

public ResidualLayer(int[] inputShape, ILayer<T>? innerLayer = null, IVectorActivationFunction<T>? vectorActivation = null)

Parameters

inputShape int[]

The shape of the input tensor.

innerLayer ILayer<T>

The optional inner layer that learns the residual mapping.

vectorActivation IVectorActivationFunction<T>

The vector activation function to apply after addition. Defaults to Identity if not specified.

Remarks

This constructor creates a residual layer with the specified input shape, inner layer, and vector activation function. The output shape is set to be the same as the input shape, as required for residual connections. If the inner layer has different input and output shapes, an exception will be thrown.

For Beginners: This creates a new residual layer with advanced settings.

When you create a residual layer this way:

  • You specify the size and shape of the data it will process
  • You can provide an inner layer that will learn the transformation
  • You can specify a vector activation function that operates on groups of values

The residual layer requires that the inner layer produces output of the same shape as its input, because the original input and the transformed output need to be added together.

This constructor is for advanced cases where you want to use a vector activation function that can capture relationships between different elements in the output.

Properties

SupportsGpuExecution

Gets a value indicating whether this layer supports GPU execution.

protected override bool SupportsGpuExecution { get; }

Property Value

bool

Returns true if the inner layer supports GPU execution or if there is no inner layer (pass-through with activation); otherwise, false.

Remarks

The residual layer can execute on GPU when its inner layer (if present) supports GPU execution. If there is no inner layer, the residual layer acts as a pass-through with activation, which can be done efficiently on GPU.

SupportsJitCompilation

Gets a value indicating whether this layer supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool

true if the activation and inner layer (if present) support JIT compilation; otherwise, false.

SupportsTraining

Gets a value indicating whether this layer supports training through backpropagation.

public override bool SupportsTraining { get; }

Property Value

bool

Returns true if the inner layer exists and supports training; otherwise, false.

Remarks

This property indicates whether the residual layer can be trained using backpropagation. The layer supports training if it has an inner layer that supports training. If there is no inner layer, the residual layer is considered not to support training since it would just act as an identity function.

For Beginners: This property tells you if the layer can learn from data.

A value of true means:

  • The layer can improve through training
  • It has parameters that can be adjusted

For residual layers:

  • If there's an inner layer that can be trained, this will be true
  • If there's no inner layer or the inner layer can't be trained, this will be false

Residual layers themselves don't have trainable parameters - all learning happens in the inner layer.

Methods

Backward(Tensor<T>)

Performs the backward pass of the residual layer.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>

The gradient of the loss with respect to the layer's output.

Returns

Tensor<T>

The gradient of the loss with respect to the layer's input.

Remarks

This method implements the backward pass of the residual layer, which is used during training to propagate error gradients back through the network. It computes gradients for the inner layer (if present) and returns the gradient with respect to the input.

For Beginners: This method is used during training to calculate how the layer's input should change to reduce errors.

During the backward pass:

  • The method throws an error if the forward pass hasn't been called first
  • The gradient is computed for the combined output after the addition
  • If there's an inner layer, the gradient is propagated through it
  • The original gradient and the inner layer gradient are combined
  • The combined gradient is returned for further backpropagation

This process ensures that gradient information flows both through the inner layer and directly back to earlier layers, preventing the vanishing gradient problem in deep networks.

BackwardGpu(IGpuTensor<T>)

Computes the gradient of the loss with respect to the input on the GPU.

public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)

Parameters

outputGradient IGpuTensor<T>

The gradient of the loss with respect to the layer's output.

Returns

IGpuTensor<T>

The gradient of the loss with respect to the layer's input.

Remarks

For residual connection: output = activation(input + innerLayer(input)) Gradient flows to both the input directly and through the inner layer.

ExportComputationGraph(List<ComputationNode<T>>)

Exports the residual layer's forward pass as a JIT-compilable computation graph.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>

List to populate with input computation nodes.

Returns

ComputationNode<T>

The output computation node representing the residual connection with activation.

Remarks

This method builds a computation graph for the residual connection: output = activation(input + innerLayer(input)). If there is no inner layer, it simply returns: output = activation(input).

Forward(Tensor<T>)

Performs the forward pass of the residual layer.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>

The input tensor to process.

Returns

Tensor<T>

The output tensor after processing through the residual layer.

Remarks

This method implements the forward pass of the residual layer. It passes the input through the inner layer (if present), adds the result to the original input, and then applies the activation function.

For Beginners: This method processes data through the residual layer.

During the forward pass:

  • The input is saved for later use in training
  • If there's an inner layer, the input is processed through it
  • The original input is added to the processed result
  • The activation function is applied to the combined result
  • The final output is returned

If there's no inner layer, the input is simply passed through the activation function.

The key to residual learning is the addition step, which allows information to flow directly from the input to the output, making it easier to train deep networks.

ForwardGpu(params IGpuTensor<T>[])

Performs the forward pass of the residual layer on the GPU.

public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)

Parameters

inputs IGpuTensor<T>[]

Returns

IGpuTensor<T>

The output GPU tensor after processing through the residual layer.

Remarks

This method implements the GPU-accelerated forward pass: output = activation(input + innerLayer(input)). All operations remain on GPU until explicit download is requested.

GetParameters()

Gets all trainable parameters from the inner layer as a single vector.

public override Vector<T> GetParameters()

Returns

Vector<T>

A vector containing all trainable parameters from the inner layer, or an empty vector if there is no inner layer.

Remarks

This method retrieves all trainable parameters from the inner layer (if present) and returns them as a single vector. If there is no inner layer, it returns an empty vector since there are no parameters to retrieve.

For Beginners: This method collects all the learnable values from the layer.

For a residual layer:

  • If there's an inner layer, it returns that layer's parameters
  • If there's no inner layer, it returns an empty list (no parameters)

Residual layers themselves don't have parameters to learn - all learning happens in the inner layer (if one exists).

This method is useful for:

  • Saving the model to disk
  • Loading parameters from a previously trained model
  • Advanced optimization techniques

ResetState()

Resets the internal state of the residual layer and its inner layer.

public override void ResetState()

Remarks

This method resets the internal state of the residual layer, including the cached input, as well as the state of the inner layer (if present). This is useful when starting to process a new batch or when implementing stateful networks that need to be reset between sequences.

For Beginners: This method clears the layer's memory to start fresh.

When resetting the state:

  • The stored input from the previous forward pass is cleared
  • If there's an inner layer, its state is also reset

This is important for:

  • Processing a new batch of unrelated data
  • Preventing information from one batch affecting another
  • Starting a new training episode

Think of it like clearing your workspace before starting a new project - it ensures that old information doesn't interfere with new processing.

SetInnerLayer(ILayer<T>)

Sets a new inner layer for the residual layer.

public void SetInnerLayer(ILayer<T> innerLayer)

Parameters

innerLayer ILayer<T>

The new inner layer to use.

Remarks

This method allows changing the inner layer of the residual layer after construction. It validates that the new inner layer has the same input and output shape before setting it.

For Beginners: This method lets you change the transformation part of the residual layer.

You can use this to:

  • Replace the current inner layer with a different one
  • Add an inner layer if there wasn't one before

The method checks that the new inner layer is compatible (has matching input and output shapes) before making the change. If the shapes don't match, it will throw an error.

This is useful for building complex networks where you might want to add or change parts of the network after initial construction.

Exceptions

ArgumentException

Thrown when the inner layer has different input and output shapes.

UpdateParameters(T)

Updates the parameters of the inner layer using the calculated gradients.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T

The learning rate to use for the parameter updates.

Remarks

This method updates the parameters of the inner layer (if present) based on the gradients calculated during the backward pass. If there is no inner layer, this method does nothing since there are no parameters to update.

For Beginners: This method updates the layer's learnable values during training.

When updating parameters:

  • If there's an inner layer, its parameters are updated using the specified learning rate
  • If there's no inner layer, nothing happens since there are no parameters to update

The learning rate controls how big each update step is:

  • Smaller learning rates: slower but more stable learning
  • Larger learning rates: faster but potentially unstable learning

Residual layers themselves don't have parameters to learn - they just pass the updates to their inner layer if one exists.