Table of Contents

Class RecurrentLayer<T>

Namespace
AiDotNet.NeuralNetworks.Layers
Assembly
AiDotNet.dll

Represents a recurrent neural network layer that processes sequential data by maintaining a hidden state.

public class RecurrentLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable

Type Parameters

T

The numeric type used for calculations, typically float or double.

Inheritance
RecurrentLayer<T>
Implements
Inherited Members

Remarks

The RecurrentLayer implements a basic recurrent neural network (RNN) that processes sequence data by maintaining and updating a hidden state over time steps. For each element in the sequence, the layer computes a new hidden state based on the current input and the previous hidden state. This allows the network to capture temporal dependencies and patterns in sequential data.

For Beginners: This layer is designed to work with data that comes in sequences.

Think of the RecurrentLayer as having a memory that helps it understand sequences:

  • When reading a sentence word by word, it remembers previous words to understand context
  • When analyzing time series data, it remembers past values to predict future trends
  • When processing video frames, it remembers earlier frames to track movement

Unlike regular layers that process each input independently, this layer:

  • Takes both the current input and its own memory (hidden state) to make decisions
  • Updates its memory after seeing each item in the sequence
  • Passes this updated memory forward to the next time step

For example, when processing the sentence "The cat sat on the mat":

  • At the word "cat", it remembers "The" came before
  • At the word "sat", it remembers both "The" and "cat" came before
  • This context helps it understand the full meaning of the sentence

This ability to maintain information across a sequence makes recurrent layers powerful for tasks involving text, time series, audio, and other sequential data.

Constructors

RecurrentLayer(int, int, IActivationFunction<T>?)

Initializes a new instance of the RecurrentLayer<T> class with a scalar activation function.

public RecurrentLayer(int inputSize, int hiddenSize, IActivationFunction<T>? activationFunction = null)

Parameters

inputSize int

The size of the input to the layer at each time step.

hiddenSize int

The size of the hidden state and output at each time step.

activationFunction IActivationFunction<T>

The activation function to apply to the hidden state. Defaults to Tanh if not specified.

Remarks

This constructor creates a new RecurrentLayer with the specified dimensions and a scalar activation function. The weights are initialized using Xavier/Glorot initialization to improve training dynamics, and the biases are initialized to zero. A scalar activation function is applied element-wise to each hidden neuron independently.

For Beginners: This creates a new recurrent layer for your neural network using a simple activation function.

When you create this layer, you specify:

  • inputSize: How many features come into the layer at each time step
  • hiddenSize: How many memory units (neurons) the layer has
  • activationFunction: How to transform the hidden state (defaults to tanh)

The hiddenSize determines the "memory capacity" of the layer:

  • Larger values can remember more information about the sequence
  • But also require more computation and might be harder to train

Tanh is commonly used as the activation function because:

  • It outputs values between -1 and 1
  • It has a nice gradient for training
  • It works well for capturing both positive and negative patterns

The layer starts with carefully initialized weights to help training proceed smoothly.

RecurrentLayer(int, int, IVectorActivationFunction<T>?)

Initializes a new instance of the RecurrentLayer<T> class with a vector activation function.

public RecurrentLayer(int inputSize, int hiddenSize, IVectorActivationFunction<T>? vectorActivationFunction = null)

Parameters

inputSize int

The size of the input to the layer at each time step.

hiddenSize int

The size of the hidden state and output at each time step.

vectorActivationFunction IVectorActivationFunction<T>

The vector activation function to apply to the hidden state. Defaults to Tanh if not specified.

Remarks

This constructor creates a new RecurrentLayer with the specified dimensions and a vector activation function. The weights are initialized using Xavier/Glorot initialization to improve training dynamics, and the biases are initialized to zero. A vector activation function is applied to the entire hidden state vector at once, which allows for interactions between different hidden neurons.

For Beginners: This creates a new recurrent layer for your neural network using an advanced activation function.

When you create this layer, you specify:

  • inputSize: How many features come into the layer at each time step
  • hiddenSize: How many memory units (neurons) the layer has
  • vectorActivationFunction: How to transform the entire hidden state as a group

A vector activation means all hidden neurons are calculated together, which can capture relationships between them. This is an advanced option that might be useful for specific types of sequence problems.

This constructor works the same as the scalar version, but allows for more sophisticated activation patterns across the hidden state. Most RNN implementations use the scalar version with tanh activation.

Properties

ParameterCount

Gets the total number of trainable parameters in this recurrent layer.

public override int ParameterCount { get; }

Property Value

int

Remarks

The parameter count includes all weights and biases:

  • Input weights: inputSize × hiddenSize
  • Hidden weights: hiddenSize × hiddenSize
  • Biases: hiddenSize

SupportsGpuExecution

Gets a value indicating whether this layer supports GPU execution.

protected override bool SupportsGpuExecution { get; }

Property Value

bool

SupportsGpuTraining

Gets a value indicating whether this layer supports GPU-resident training.

public override bool SupportsGpuTraining { get; }

Property Value

bool

SupportsJitCompilation

Gets whether this layer currently supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool

True if the layer's activation function is supported for JIT compilation. Supported activations: ReLU, Sigmoid, Tanh, Softmax.

SupportsTraining

The computation engine (CPU or GPU) for vectorized operations.

public override bool SupportsTraining { get; }

Property Value

bool

Always true for RecurrentLayer, indicating that the layer can be trained through backpropagation.

Remarks

This property indicates that the RecurrentLayer has trainable parameters (input weights, hidden weights, and biases) that can be optimized during the training process using backpropagation through time (BPTT). The gradients of these parameters are calculated during the backward pass and used to update the parameters.

For Beginners: This property tells you if the layer can learn from data.

A value of true means:

  • The layer has values (weights and biases) that can be adjusted during training
  • It will improve its performance as it sees more data
  • It participates in the learning process of the neural network

When you train a neural network containing this layer, the weights and biases will automatically adjust to better recognize patterns in your sequence data.

Methods

Backward(Tensor<T>)

Performs the backward pass of the recurrent layer.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>

The gradient of the loss with respect to the layer's output.

Returns

Tensor<T>

The gradient of the loss with respect to the layer's input.

Remarks

This method implements the backward pass of the recurrent layer, which is used during training to propagate error gradients back through the network. It implements backpropagation through time (BPTT) by starting at the end of the sequence and working backward, accumulating gradients for the weights and biases. For each time step, it calculates gradients with respect to the input, the hidden state, and the parameters.

For Beginners: This method is used during training to calculate how the layer should change to reduce errors.

During the backward pass:

  1. The layer starts from the end of the sequence and works backward
  2. At each time step:
    • It receives error gradients from two sources: the layer above and the future time step
    • It calculates how each of its weights and biases should change
    • It calculates how the error should flow back to the previous layer and to the previous time step

This is like figuring out how a mistake at the end of a sentence affects your understanding of each word that came before it. The further back in time, the more complex these relationships become.

This process, called "backpropagation through time," is what allows recurrent networks to learn from sequences.

Exceptions

InvalidOperationException

Thrown when backward is called before forward.

BackwardGpu(IGpuTensor<T>)

Performs the backward pass on GPU tensors.

public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)

Parameters

outputGradient IGpuTensor<T>

GPU tensor containing the gradient of the loss with respect to the output.

Returns

IGpuTensor<T>

GPU tensor containing the gradient of the loss with respect to the input.

ExportComputationGraph(List<ComputationNode<T>>)

Exports the recurrent layer's single time-step computation as a JIT-compilable computation graph.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>

List to populate with input computation nodes.

Returns

ComputationNode<T>

The output computation node representing the hidden state at one time step.

Remarks

This method exports a single RNN cell computation for JIT compilation. The graph computes: h_t = activation(W_input @ x_t + W_hidden @ h_{t-1} + b) using the standard vanilla RNN equation.

Forward(Tensor<T>)

Performs the forward pass of the recurrent layer.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>

The input tensor to process, with shape [sequenceLength, batchSize, inputSize].

Returns

Tensor<T>

The output tensor after recurrent processing, with shape [sequenceLength, batchSize, hiddenSize].

Remarks

This method implements the forward pass of the recurrent layer. It processes each element in the input sequence in order, updating the hidden state at each time step based on the current input and the previous hidden state. The initial hidden state is set to zero. The method caches the input, hidden states, and outputs for use during the backward pass.

For Beginners: This method processes your sequence data through the recurrent layer.

During the forward pass:

  1. The layer starts with an empty memory (hidden state of zeros)
  2. For each item in the sequence (like each word in a sentence):
    • It takes both the current input and its current memory
    • It calculates a new memory state based on these values
    • It saves this memory for the next item in the sequence
  3. The outputs at each time step become the overall output of the layer

The formula at each step is approximately: new_memory = activation(input_weights × current_input + hidden_weights × previous_memory + bias)

This step-by-step processing allows the layer to build up an understanding of the entire sequence. The layer saves all inputs, hidden states, and outputs for later use during training.

ForwardGpu(params IGpuTensor<T>[])

Performs the forward pass on GPU tensors.

public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)

Parameters

inputs IGpuTensor<T>[]

GPU tensor inputs.

Returns

IGpuTensor<T>

GPU tensor output after RNN processing.

Exceptions

ArgumentException

Thrown when no input tensor is provided.

InvalidOperationException

Thrown when GPU backend is unavailable.

GetParameterGradients()

Gets all parameter gradients of the recurrent layer as a single vector.

public override Vector<T> GetParameterGradients()

Returns

Vector<T>

A vector containing all parameter gradients (input weight gradients, hidden weight gradients, and bias gradients).

GetParameters()

Gets all trainable parameters of the recurrent layer as a single vector.

public override Vector<T> GetParameters()

Returns

Vector<T>

A vector containing all trainable parameters (input weights, hidden weights, and biases).

Remarks

This method retrieves all trainable parameters of the recurrent layer as a single vector. The input weights are stored first, followed by the hidden weights, and finally the biases. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights.

For Beginners: This method collects all the learnable values from the recurrent layer.

The parameters:

  • Are the weights and biases that the recurrent layer learns during training
  • Control how the layer processes sequence information
  • Are returned as a single list (vector)

This is useful for:

  • Saving the model to disk
  • Loading parameters from a previously trained model
  • Advanced optimization techniques that need access to all parameters

The input weights are stored first in the vector, followed by the hidden weights, and finally the biases.

ResetState()

Resets the internal state of the recurrent layer.

public override void ResetState()

Remarks

This method resets the internal state of the recurrent layer, including the cached inputs, hidden states, and outputs from the forward pass, and the gradients from the backward pass. This is useful when starting to process a new sequence or batch of data.

For Beginners: This method clears the layer's memory to start fresh.

When resetting the state:

  • Stored inputs, hidden states, and outputs from previous calculations are cleared
  • Calculated gradients are cleared
  • The layer forgets any information from previous sequences

This is important for:

  • Processing a new, unrelated sequence of data
  • Preventing information from one sequence affecting another
  • Starting a new training episode

The weights and biases (the learned parameters) are not reset, only the temporary state information.

SetParameters(Vector<T>)

Sets the trainable parameters of the recurrent layer.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

A vector containing all parameters (input weights, hidden weights, and biases) to set.

Remarks

This method sets the trainable parameters of the recurrent layer from a single vector. The vector should contain the input weight values first, followed by the hidden weight values, and finally the bias values. This is useful for loading saved model weights or for implementing optimization algorithms that operate on all parameters at once.

For Beginners: This method updates all the weights and biases in the recurrent layer.

When setting parameters:

  • The input must be a vector with the correct total length
  • The first part of the vector is used for the input weights
  • The middle part of the vector is used for the hidden weights
  • The last part of the vector is used for the biases

This is useful for:

  • Loading a previously saved model
  • Transferring parameters from another model
  • Testing different parameter values

An error is thrown if the input vector doesn't have the expected number of parameters.

Exceptions

ArgumentException

Thrown when the parameters vector has incorrect length.

UpdateParameters(T)

Updates the parameters of the recurrent layer using the calculated gradients.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T

The learning rate to use for the parameter updates.

Remarks

This method updates the input weights, hidden weights, and biases of the recurrent layer based on the gradients calculated during the backward pass. The learning rate controls the size of the parameter updates. This method should be called after the backward pass to apply the calculated updates.

For Beginners: This method updates the layer's internal values during training.

When updating parameters:

  1. The input weight values are adjusted based on their gradients
  2. The hidden weight values are adjusted based on their gradients
  3. The bias values are adjusted based on their gradients
  4. The learning rate controls how big each update step is

These updates help the layer:

  • Pay more attention to important input features
  • Better remember relevant information from previous time steps
  • Adjust its baseline activation levels

Smaller learning rates mean slower but more stable learning, while larger learning rates mean faster but potentially unstable learning.

Exceptions

InvalidOperationException

Thrown when UpdateParameters is called before Backward.

UpdateParametersGpu(IGpuOptimizerConfig)

Updates parameters on GPU using the configured optimizer.

public override void UpdateParametersGpu(IGpuOptimizerConfig config)

Parameters

config IGpuOptimizerConfig

The GPU optimizer configuration.