Class RecurrentLayer<T>

Namespace: AiDotNet.NeuralNetworks.Layers

Assembly: AiDotNet.dll

Represents a recurrent neural network layer that processes sequential data by maintaining a hidden state.

public class RecurrentLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable

Type Parameters

T: The numeric type used for calculations, typically float or double.

Inheritance: object

LayerBase<T>

RecurrentLayer<T>

Implements: ILayer<T>

IJitCompilable<T>

IDiagnosticsProvider

IWeightLoadable<T>

IDisposable

Inherited Members: LayerBase<T>.Engine

LayerBase<T>.ScalarActivation

LayerBase<T>.VectorActivation

LayerBase<T>.UsingVectorActivation

LayerBase<T>.NumOps

LayerBase<T>.Random

LayerBase<T>.Parameters

LayerBase<T>.ParameterGradients

LayerBase<T>.InputShape

LayerBase<T>.InputShapes

LayerBase<T>.UpdateInputShape(int[])

LayerBase<T>.OutputShape

LayerBase<T>.IsTrainingMode

LayerBase<T>.InitializationStrategy

LayerBase<T>.IsInitialized

LayerBase<T>.InitializationLock

LayerBase<T>.EnsureInitialized()

LayerBase<T>.UseAutodiff

LayerBase<T>.SetTrainingMode(bool)

LayerBase<T>.ClearGradients()

LayerBase<T>.GetInputShape()

LayerBase<T>.GetInputShapes()

LayerBase<T>.GetOutputShape()

LayerBase<T>.GetWeights()

LayerBase<T>.GetBiases()

LayerBase<T>.MapActivationToFused()

LayerBase<T>.CanExecuteOnGpu

LayerBase<T>.CanTrainOnGpu

LayerBase<T>.UploadWeightsToGpu()

LayerBase<T>.DownloadWeightsFromGpu()

LayerBase<T>.ZeroGradientsGpu()

LayerBase<T>.GetActivationTypes()

LayerBase<T>.Forward(params Tensor<T>[])

LayerBase<T>.ApplyActivation(Tensor<T>)

LayerBase<T>.ApplyActivation(Vector<T>)

LayerBase<T>.ActivateTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ActivateTensor(IVectorActivationFunction<T>, Tensor<T>)

LayerBase<T>.CalculateInputShape(int, int, int)

LayerBase<T>.CalculateOutputShape(int, int, int)

LayerBase<T>.Clone()

LayerBase<T>.DerivativeTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ApplyActivationDerivative(T, T)

LayerBase<T>.ApplyActivationDerivative(Tensor<T>, Tensor<T>)

LayerBase<T>.ComputeActivationJacobian(Vector<T>)

LayerBase<T>.ApplyActivationDerivative(Vector<T>, Vector<T>)

LayerBase<T>.UpdateParameters(Vector<T>)

LayerBase<T>.Serialize(BinaryWriter)

LayerBase<T>.Deserialize(BinaryReader)

LayerBase<T>.GetDiagnostics()

LayerBase<T>.ApplyActivationToGraph(ComputationNode<T>)

LayerBase<T>.CanActivationBeJitted()

LayerBase<T>.RegisterTrainableParameter(Tensor<T>, PersistentTensorRole)

LayerBase<T>.InvalidateTrainableParameter(Tensor<T>)

LayerBase<T>.HasGpuActivation()

LayerBase<T>.ApplyActivationForwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.ApplyActivationBackwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.GetFusedActivationType()

LayerBase<T>.ApplyGpuActivation(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int, FusedActivationType)

LayerBase<T>.ApplyGpuActivationBackward(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int, FusedActivationType, float)

LayerBase<T>.Dispose()

LayerBase<T>.Dispose(bool)

LayerBase<T>.WeightParameterName

LayerBase<T>.BiasParameterName

LayerBase<T>.SetWeights(Tensor<T>)

LayerBase<T>.SetBiases(Tensor<T>)

LayerBase<T>.GetParameterNames()

LayerBase<T>.TryGetParameter(string, out Tensor<T>)

LayerBase<T>.SetParameter(string, Tensor<T>)

LayerBase<T>.GetParameterShape(string)

LayerBase<T>.NamedParameterCount

LayerBase<T>.ValidateWeights(IEnumerable<string>, Func<string, string>)

LayerBase<T>.LoadWeights(Dictionary<string, Tensor<T>>, Func<string, string>, bool)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

The RecurrentLayer implements a basic recurrent neural network (RNN) that processes sequence data by maintaining and updating a hidden state over time steps. For each element in the sequence, the layer computes a new hidden state based on the current input and the previous hidden state. This allows the network to capture temporal dependencies and patterns in sequential data.

For Beginners: This layer is designed to work with data that comes in sequences.

Think of the RecurrentLayer as having a memory that helps it understand sequences:

When reading a sentence word by word, it remembers previous words to understand context
When analyzing time series data, it remembers past values to predict future trends
When processing video frames, it remembers earlier frames to track movement

Unlike regular layers that process each input independently, this layer:

Takes both the current input and its own memory (hidden state) to make decisions
Updates its memory after seeing each item in the sequence
Passes this updated memory forward to the next time step

For example, when processing the sentence "The cat sat on the mat":

At the word "cat", it remembers "The" came before
At the word "sat", it remembers both "The" and "cat" came before
This context helps it understand the full meaning of the sentence

This ability to maintain information across a sequence makes recurrent layers powerful for tasks involving text, time series, audio, and other sequential data.

Constructors

RecurrentLayer(int, int, IActivationFunction<T>?)

Initializes a new instance of the RecurrentLayer<T> class with a scalar activation function.

public RecurrentLayer(int inputSize, int hiddenSize, IActivationFunction<T>? activationFunction = null)

Parameters

inputSize int: The size of the input to the layer at each time step.
hiddenSize int: The size of the hidden state and output at each time step.
activationFunction IActivationFunction<T>: The activation function to apply to the hidden state. Defaults to Tanh if not specified.

Remarks

This constructor creates a new RecurrentLayer with the specified dimensions and a scalar activation function. The weights are initialized using Xavier/Glorot initialization to improve training dynamics, and the biases are initialized to zero. A scalar activation function is applied element-wise to each hidden neuron independently.

For Beginners: This creates a new recurrent layer for your neural network using a simple activation function.

When you create this layer, you specify:

inputSize: How many features come into the layer at each time step
hiddenSize: How many memory units (neurons) the layer has
activationFunction: How to transform the hidden state (defaults to tanh)

The hiddenSize determines the "memory capacity" of the layer:

Larger values can remember more information about the sequence
But also require more computation and might be harder to train

Tanh is commonly used as the activation function because:

It outputs values between -1 and 1
It has a nice gradient for training
It works well for capturing both positive and negative patterns

The layer starts with carefully initialized weights to help training proceed smoothly.

RecurrentLayer(int, int, IVectorActivationFunction<T>?)

Initializes a new instance of the RecurrentLayer<T> class with a vector activation function.

public RecurrentLayer(int inputSize, int hiddenSize, IVectorActivationFunction<T>? vectorActivationFunction = null)

Parameters

inputSize int: The size of the input to the layer at each time step.
hiddenSize int: The size of the hidden state and output at each time step.
vectorActivationFunction IVectorActivationFunction<T>: The vector activation function to apply to the hidden state. Defaults to Tanh if not specified.

Remarks

This constructor creates a new RecurrentLayer with the specified dimensions and a vector activation function. The weights are initialized using Xavier/Glorot initialization to improve training dynamics, and the biases are initialized to zero. A vector activation function is applied to the entire hidden state vector at once, which allows for interactions between different hidden neurons.

For Beginners: This creates a new recurrent layer for your neural network using an advanced activation function.

When you create this layer, you specify:

inputSize: How many features come into the layer at each time step
hiddenSize: How many memory units (neurons) the layer has
vectorActivationFunction: How to transform the entire hidden state as a group

A vector activation means all hidden neurons are calculated together, which can capture relationships between them. This is an advanced option that might be useful for specific types of sequence problems.

This constructor works the same as the scalar version, but allows for more sophisticated activation patterns across the hidden state. Most RNN implementations use the scalar version with tanh activation.

Properties

ParameterCount

Gets the total number of trainable parameters in this recurrent layer.

public override int ParameterCount { get; }

Property Value

int

Remarks

The parameter count includes all weights and biases:

Input weights: inputSize × hiddenSize
Hidden weights: hiddenSize × hiddenSize
Biases: hiddenSize

SupportsGpuExecution

Gets a value indicating whether this layer supports GPU execution.

protected override bool SupportsGpuExecution { get; }

Property Value

bool

SupportsGpuTraining

Gets a value indicating whether this layer supports GPU-resident training.

public override bool SupportsGpuTraining { get; }

Property Value

bool

SupportsJitCompilation

Gets whether this layer currently supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool: True if the layer's activation function is supported for JIT compilation. Supported activations: ReLU, Sigmoid, Tanh, Softmax.

SupportsTraining

The computation engine (CPU or GPU) for vectorized operations.

public override bool SupportsTraining { get; }

Property Value

bool: Always true for RecurrentLayer, indicating that the layer can be trained through backpropagation.

Remarks

This property indicates that the RecurrentLayer has trainable parameters (input weights, hidden weights, and biases) that can be optimized during the training process using backpropagation through time (BPTT). The gradients of these parameters are calculated during the backward pass and used to update the parameters.

For Beginners: This property tells you if the layer can learn from data.

A value of true means:

The layer has values (weights and biases) that can be adjusted during training
It will improve its performance as it sees more data
It participates in the learning process of the neural network

When you train a neural network containing this layer, the weights and biases will automatically adjust to better recognize patterns in your sequence data.

Methods

Backward(Tensor<T>)

Performs the backward pass of the recurrent layer.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>: The gradient of the loss with respect to the layer's output.

Returns

Tensor<T>: The gradient of the loss with respect to the layer's input.

Remarks

This method implements the backward pass of the recurrent layer, which is used during training to propagate error gradients back through the network. It implements backpropagation through time (BPTT) by starting at the end of the sequence and working backward, accumulating gradients for the weights and biases. For each time step, it calculates gradients with respect to the input, the hidden state, and the parameters.

For Beginners: This method is used during training to calculate how the layer should change to reduce errors.

During the backward pass:

The layer starts from the end of the sequence and works backward
At each time step:
- It receives error gradients from two sources: the layer above and the future time step
- It calculates how each of its weights and biases should change
- It calculates how the error should flow back to the previous layer and to the previous time step

This is like figuring out how a mistake at the end of a sentence affects your understanding of each word that came before it. The further back in time, the more complex these relationships become.

This process, called "backpropagation through time," is what allows recurrent networks to learn from sequences.

Exceptions

InvalidOperationException: Thrown when backward is called before forward.

BackwardGpu(IGpuTensor<T>)

Performs the backward pass on GPU tensors.

public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)

Parameters

outputGradient IGpuTensor<T>: GPU tensor containing the gradient of the loss with respect to the output.

Returns

IGpuTensor<T>: GPU tensor containing the gradient of the loss with respect to the input.

ExportComputationGraph(List<ComputationNode<T>>)

Exports the recurrent layer's single time-step computation as a JIT-compilable computation graph.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>: List to populate with input computation nodes.

Returns

ComputationNode<T>: The output computation node representing the hidden state at one time step.

Remarks

This method exports a single RNN cell computation for JIT compilation. The graph computes: h_t = activation(W_input @ x_t + W_hidden @ h_{t-1} + b) using the standard vanilla RNN equation.

Forward(Tensor<T>)

Performs the forward pass of the recurrent layer.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>: The input tensor to process, with shape [sequenceLength, batchSize, inputSize].

Returns

Tensor<T>: The output tensor after recurrent processing, with shape [sequenceLength, batchSize, hiddenSize].

Remarks

This method implements the forward pass of the recurrent layer. It processes each element in the input sequence in order, updating the hidden state at each time step based on the current input and the previous hidden state. The initial hidden state is set to zero. The method caches the input, hidden states, and outputs for use during the backward pass.

For Beginners: This method processes your sequence data through the recurrent layer.

During the forward pass:

The layer starts with an empty memory (hidden state of zeros)
For each item in the sequence (like each word in a sentence):
- It takes both the current input and its current memory
- It calculates a new memory state based on these values
- It saves this memory for the next item in the sequence
The outputs at each time step become the overall output of the layer

The formula at each step is approximately: new_memory = activation(input_weights × current_input + hidden_weights × previous_memory + bias)

This step-by-step processing allows the layer to build up an understanding of the entire sequence. The layer saves all inputs, hidden states, and outputs for later use during training.

ForwardGpu(params IGpuTensor<T>[])

Performs the forward pass on GPU tensors.

public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)

Parameters

inputs IGpuTensor<T>[]: GPU tensor inputs.

Returns

IGpuTensor<T>: GPU tensor output after RNN processing.

Exceptions

ArgumentException: Thrown when no input tensor is provided.
InvalidOperationException: Thrown when GPU backend is unavailable.

GetParameterGradients()

Gets all parameter gradients of the recurrent layer as a single vector.

public override Vector<T> GetParameterGradients()

Returns

Vector<T>: A vector containing all parameter gradients (input weight gradients, hidden weight gradients, and bias gradients).

GetParameters()

Gets all trainable parameters of the recurrent layer as a single vector.

public override Vector<T> GetParameters()

Returns

Vector<T>: A vector containing all trainable parameters (input weights, hidden weights, and biases).

Remarks

This method retrieves all trainable parameters of the recurrent layer as a single vector. The input weights are stored first, followed by the hidden weights, and finally the biases. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights.

For Beginners: This method collects all the learnable values from the recurrent layer.

The parameters:

Are the weights and biases that the recurrent layer learns during training
Control how the layer processes sequence information
Are returned as a single list (vector)

This is useful for:

Saving the model to disk
Loading parameters from a previously trained model
Advanced optimization techniques that need access to all parameters

The input weights are stored first in the vector, followed by the hidden weights, and finally the biases.

ResetState()

Resets the internal state of the recurrent layer.

public override void ResetState()

Remarks

This method resets the internal state of the recurrent layer, including the cached inputs, hidden states, and outputs from the forward pass, and the gradients from the backward pass. This is useful when starting to process a new sequence or batch of data.

For Beginners: This method clears the layer's memory to start fresh.

When resetting the state:

Stored inputs, hidden states, and outputs from previous calculations are cleared
Calculated gradients are cleared
The layer forgets any information from previous sequences

This is important for:

Processing a new, unrelated sequence of data
Preventing information from one sequence affecting another
Starting a new training episode

The weights and biases (the learned parameters) are not reset, only the temporary state information.

SetParameters(Vector<T>)

Sets the trainable parameters of the recurrent layer.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>: A vector containing all parameters (input weights, hidden weights, and biases) to set.

Remarks

This method sets the trainable parameters of the recurrent layer from a single vector. The vector should contain the input weight values first, followed by the hidden weight values, and finally the bias values. This is useful for loading saved model weights or for implementing optimization algorithms that operate on all parameters at once.

For Beginners: This method updates all the weights and biases in the recurrent layer.

When setting parameters:

The input must be a vector with the correct total length
The first part of the vector is used for the input weights
The middle part of the vector is used for the hidden weights
The last part of the vector is used for the biases

This is useful for:

Loading a previously saved model
Transferring parameters from another model
Testing different parameter values

An error is thrown if the input vector doesn't have the expected number of parameters.

Exceptions

ArgumentException: Thrown when the parameters vector has incorrect length.

UpdateParameters(T)

Updates the parameters of the recurrent layer using the calculated gradients.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T: The learning rate to use for the parameter updates.

Remarks

This method updates the input weights, hidden weights, and biases of the recurrent layer based on the gradients calculated during the backward pass. The learning rate controls the size of the parameter updates. This method should be called after the backward pass to apply the calculated updates.

For Beginners: This method updates the layer's internal values during training.

When updating parameters:

The input weight values are adjusted based on their gradients
The hidden weight values are adjusted based on their gradients
The bias values are adjusted based on their gradients
The learning rate controls how big each update step is

These updates help the layer:

Pay more attention to important input features
Better remember relevant information from previous time steps
Adjust its baseline activation levels

Smaller learning rates mean slower but more stable learning, while larger learning rates mean faster but potentially unstable learning.

Exceptions

InvalidOperationException: Thrown when UpdateParameters is called before Backward.

UpdateParametersGpu(IGpuOptimizerConfig)

Updates parameters on GPU using the configured optimizer.

public override void UpdateParametersGpu(IGpuOptimizerConfig config)

Parameters

config IGpuOptimizerConfig: The GPU optimizer configuration.

Table of Contents

Class RecurrentLayer<T>

Type Parameters

Remarks

Constructors

RecurrentLayer(int, int, IActivationFunction<T>?)

Parameters

Remarks

RecurrentLayer(int, int, IVectorActivationFunction<T>?)

Parameters

Remarks

Properties

ParameterCount

Property Value

Remarks

SupportsGpuExecution

Property Value

SupportsGpuTraining

Property Value

SupportsJitCompilation

Property Value

SupportsTraining

Property Value

Remarks

Methods

Backward(Tensor<T>)

Parameters

Returns

Remarks

Exceptions

BackwardGpu(IGpuTensor<T>)

Parameters

Returns

ExportComputationGraph(List<ComputationNode<T>>)

Parameters

Returns

Remarks

Forward(Tensor<T>)

Parameters

Returns

Remarks

ForwardGpu(params IGpuTensor<T>[])

Parameters

Returns

Exceptions

GetParameterGradients()

Returns

GetParameters()

Returns

Remarks

ResetState()

Remarks

SetParameters(Vector<T>)

Parameters

Remarks

Exceptions

UpdateParameters(T)

Parameters

Remarks

Exceptions

UpdateParametersGpu(IGpuOptimizerConfig)

Parameters