Table of Contents

Class ConvLSTMLayer<T>

Namespace
AiDotNet.NeuralNetworks.Layers
Assembly
AiDotNet.dll

Implements a Convolutional Long Short-Term Memory (ConvLSTM) layer for processing sequential spatial data.

public class ConvLSTMLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable

Type Parameters

T

The numeric type used for computations (e.g., float, double).

Inheritance
ConvLSTMLayer<T>
Implements
Inherited Members

Remarks

ConvLSTM combines convolutional operations with LSTM (Long Short-Term Memory) to handle spatial-temporal data. It's particularly useful for tasks involving sequences of images or spatial data, such as video prediction, weather forecasting, and spatiotemporal sequence prediction.

Key features of ConvLSTM: - Maintains spatial information throughout the processing - Captures both spatial and temporal dependencies - Uses convolutional operations instead of matrix multiplications in the LSTM cell - Suitable for data with both spatial and temporal structure

For Beginners: ConvLSTM is like a smart video analyzer that remembers spatial patterns over time.

Imagine you're watching a video of clouds moving across the sky:

  1. ConvLSTM looks at each frame (like a photo) in the video sequence
  2. It remembers important spatial features (like cloud shapes) from previous frames
  3. It uses this memory to predict how these features might change in future frames

This layer is particularly good at:

  • Predicting what might happen next in a video
  • Analyzing patterns in weather maps over time
  • Understanding how spatial arrangements change in a sequence

Unlike simpler layers that treat each frame independently, ConvLSTM connects the dots between frames, making it powerful for tasks involving moving images or changing spatial data.

Constructors

ConvLSTMLayer(int[], int, int, int, int, IActivationFunction<T>?)

public ConvLSTMLayer(int[] inputShape, int kernelSize, int filters, int padding = 1, int strides = 1, IActivationFunction<T>? activationFunction = null)

Parameters

inputShape int[]
kernelSize int
filters int
padding int
strides int
activationFunction IActivationFunction<T>

ConvLSTMLayer(int[], int, int, int, int, IVectorActivationFunction<T>?)

Initializes a new instance of the ConvLSTMLayer class with a vector activation function.

public ConvLSTMLayer(int[] inputShape, int kernelSize, int filters, int padding = 1, int strides = 1, IVectorActivationFunction<T>? vectorActivationFunction = null)

Parameters

inputShape int[]

The shape of the input tensor [batch, time, height, width, channels].

kernelSize int

The size of the convolutional kernel (filter).

filters int

The number of output filters (channels) for the layer.

padding int

The padding added to the input.

strides int

The stride of the convolution.

vectorActivationFunction IVectorActivationFunction<T>

The vector activation function to use. Defaults to Tanh if not specified.

Remarks

This constructor allows using a vector activation function that can process entire tensors at once, which may be more efficient for certain operations.

For Beginners: This constructor is similar to the first one, but uses a special type of activation function.

A vector activation function:

  • Processes entire groups of numbers at once, rather than one at a time
  • Can be faster for large datasets
  • Works the same way as the regular activation function, just with different internal machinery

You would use this version if you're working with very large datasets where processing speed is important, or if you have a specific vector activation function you want to use.

Properties

SupportsGpuExecution

Gets a value indicating whether this layer supports GPU-accelerated forward pass.

protected override bool SupportsGpuExecution { get; }

Property Value

bool

Remarks

ConvLSTM supports GPU execution when a DirectGpuTensorEngine is available. The GPU implementation uses FusedConv2DGpu for convolutions and GPU-native gate operations.

SupportsGpuTraining

Gets a value indicating whether this layer supports GPU training.

public override bool SupportsGpuTraining { get; }

Property Value

bool

SupportsJitCompilation

Gets a value indicating whether this layer supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool

Always true. ConvLSTMLayer exports a single-step LSTM cell computation with full Conv2D operations for all gates.

Remarks

JIT compilation for ConvLSTM exports a single timestep of the LSTM cell computation. The exported graph uses proper Conv2D operations for all gate computations, matching the behavior of the Forward method.

For processing sequences with the JIT-compiled graph:

  1. Initialize hidden and cell states to zero tensors
  2. For each timestep, call the compiled graph with (input, h_prev, c_prev)
  3. The output is the new hidden state h_t
  4. Track cell state c_t for the next iteration (available from intermediate computation)

SupportsTraining

The computation engine (CPU or GPU) for vectorized operations.

public override bool SupportsTraining { get; }

Property Value

bool

true indicating that the layer supports training; this value is always true for ConvLSTM layers.

Remarks

This property indicates whether the ConvLSTM layer can be trained through backpropagation. ConvLSTM layers always return true as they contain trainable parameters (weights and biases).

For Beginners: This property tells you if the layer can learn from data.

A value of true means:

  • The layer can adjust its internal values during training
  • It will improve its performance as it sees more data
  • It participates in the learning process

ConvLSTM layers always return true because they have parameters (like weights and biases) that can be updated during training to learn patterns in spatio-temporal data (like videos or weather data).

Methods

Backward(Tensor<T>)

Performs the backward pass of the ConvLSTM layer, computing gradients for all parameters.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>

Gradient flowing back from the next layer with shape [batchSize, timeSteps, height, width, filters]

Returns

Tensor<T>

Gradient with respect to the input with shape [batchSize, timeSteps, height, width, channels]

Remarks

This method implements backpropagation through time (BPTT) for the ConvLSTM layer: 1. Initializes gradient tensors for all parameters 2. Iterates backward through time steps 3. Computes gradients for each time step using BackwardStep 4. Accumulates gradients across all time steps 5. Stores gradients for later use in parameter updates

For Beginners: This method figures out how to improve the layer during training.

During the backward pass:

  • The layer receives information about how to adjust its output to reduce errors
  • It works backwards through the sequence (from the most recent frame to the earliest)
  • It calculates how each of its internal values (weights and biases) should change
  • It also calculates how the input should have been different to reduce errors

Think of it like a coach reviewing a game film backwards, noting what each player should have done differently at each moment to get a better outcome.

BackwardGpu(IGpuTensor<T>)

Performs GPU-accelerated backward pass for ConvLSTM using Backpropagation Through Time (BPTT).

public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)

Parameters

outputGradient IGpuTensor<T>

GPU tensor with gradient from next layer [batch, timesteps, H, W, filters].

Returns

IGpuTensor<T>

GPU tensor with input gradients [batch, timesteps, H, W, channels].

Remarks

This method implements full BPTT on GPU, computing gradients through all timesteps in reverse order. It uses the cached gate values, hidden states, and cell states from the forward pass.

Exceptions

InvalidOperationException

Thrown when ForwardGpu has not been called in training mode.

ExportComputationGraph(List<ComputationNode<T>>)

Exports the ConvLSTM computation graph for JIT compilation.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>

List to which input nodes will be added. The method adds:

  • x_t: Current input tensor [batch, height, width, channels]
  • h_prev: Previous hidden state [batch, height, width, filters]
  • c_prev: Previous cell state [batch, height, width, filters]

Returns

ComputationNode<T>

A computation node representing the new hidden state h_t.

Remarks

This method exports a single timestep of the ConvLSTM cell for JIT compilation. The computation graph implements the full ConvLSTM equations using Conv2D operations:

Gates (all use Conv2D operations):

  • Forget gate: f_t = σ(Conv2D(x_t, W_fi) + Conv2D(h_{t-1}, W_fh) + b_f)
  • Input gate: i_t = σ(Conv2D(x_t, W_ii) + Conv2D(h_{t-1}, W_ih) + b_i)
  • Cell candidate: c̃_t = tanh(Conv2D(x_t, W_ci) + Conv2D(h_{t-1}, W_ch) + b_c)
  • Output gate: o_t = σ(Conv2D(x_t, W_oi) + Conv2D(h_{t-1}, W_oh) + b_o)

State updates:

  • Cell state: c_t = f_t ⊙ c_{t-1} + i_t ⊙ c̃_t
  • Hidden state: h_t = o_t ⊙ tanh(c_t)

For Beginners: This method creates a blueprint for running ConvLSTM faster.

For processing sequences:

  1. Initialize h_prev and c_prev to zeros for the first timestep
  2. Call the JIT-compiled graph for each timestep in your sequence
  3. Pass the output hidden state as h_prev for the next timestep
  4. Track cell state separately if needed for stateful operation

Forward(Tensor<T>)

Performs the forward pass of the ConvLSTM layer.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>

The input tensor with shape [batchSize, timeSteps, height, width, channels].

Returns

Tensor<T>

The output tensor after processing through the ConvLSTM layer.

Remarks

The forward pass processes the input sequence through the ConvLSTM cells, updating the hidden state and cell state at each time step. It applies the convolutional operations within the LSTM structure to maintain spatial information.

For Beginners: This method is like running your video through the analyzer.

During the forward pass, for each frame in the sequence:

  1. The layer looks at the current frame and its memory of previous frames
  2. It updates its memory based on what it sees in the current frame
  3. It produces an output that combines information from the current frame and its memory

This process allows the layer to:

  • Remember important features from earlier in the sequence
  • Understand how spatial patterns are changing over time
  • Produce outputs that consider both the current input and the history

The result is a new sequence that captures the layer's understanding of the spatial-temporal patterns in your input data.

ForwardGpu(params IGpuTensor<T>[])

Performs a GPU-resident forward pass of the ConvLSTM layer.

public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)

Parameters

inputs IGpuTensor<T>[]

GPU-resident input tensor(s).

Returns

IGpuTensor<T>

GPU-resident output tensor after ConvLSTM processing.

Remarks

For Beginners: This is the GPU-optimized version of the Forward method. All data stays on the GPU throughout the computation, avoiding expensive CPU-GPU transfers. The ConvLSTM gates are computed using GPU convolutions and element-wise operations.

During training (IsTrainingMode == true), this method caches gate values and state buffers needed by BackwardGpu to perform full BPTT on GPU.

Exceptions

ArgumentException

Thrown when no input tensor is provided.

InvalidOperationException

Thrown when GPU backend is unavailable.

GetParameters()

Retrieves all trainable parameters of the ConvLSTM layer as a flattened vector.

public override Vector<T> GetParameters()

Returns

Vector<T>

A vector containing all weights and biases of the layer

Remarks

This method flattens all trainable parameters into a single vector in the following order:

1. Input weights: _weightsFi, _weightsIi, _weightsCi, _weightsOi 2. Hidden weights: _weightsFh, _weightsIh, _weightsCh, _weightsOh 3. Biases: _biasF, _biasI, _biasC, _biasO

For Beginners: This method collects all the learnable values into one long list.

It's like taking all the knobs and dials from the control panel and listing them in a single row:

  • First, it counts how many total numbers need to be stored
  • Then it creates a vector (a one-dimensional array) of that size
  • Finally, it copies all the weights and biases into this vector in a specific order

This is useful for:

  • Saving all parameters to a file
  • Loading parameters from a file
  • Certain optimization techniques that work with all parameters at once
  • Tracking how many learnable parameters the layer has in total

ResetState()

Resets the internal state of the ConvLSTM layer.

public override void ResetState()

Remarks

This method clears all cached values and gradients from previous forward and backward passes:

1. Clears the cached input tensor (_lastInput) 2. Clears the cached hidden state (_lastHiddenState) 3. Clears the cached cell state (_lastCellState) 4. Clears all accumulated gradients

For Beginners: This method clears the layer's memory to start fresh.

It's like erasing a whiteboard to start a new lesson:

  • The layer forgets the last input it processed
  • It clears its internal memory states (hidden and cell states)
  • It discards any stored gradients from previous training

This is important when:

  • Starting to process a new, unrelated sequence
  • Beginning a new training epoch
  • Testing the model on different data
  • You want to ensure that information from previous sequences doesn't influence the processing of new sequences

SetParameters(Vector<T>)

Sets all trainable parameters of the ConvLSTM layer from a flattened vector.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

Vector containing all weights and biases to set

Remarks

This method updates all trainable parameters from a single vector in the following order:

1. Input weights: _weightsFi, _weightsIi, _weightsCi, _weightsOi 2. Hidden weights: _weightsFh, _weightsIh, _weightsCh, _weightsOh 3. Biases: _biasF, _biasI, _biasC, _biasO

For Beginners: This method loads all learnable values from a single list.

It's the opposite of GetParameters():

  • It takes a long list of numbers (the parameters vector)
  • It distributes these numbers back into the appropriate weight and bias tensors
  • It follows the same order that was used when creating the vector

This is useful when:

  • Loading a previously saved model
  • Initializing with pre-trained weights
  • Testing with specific parameter values
  • Implementing advanced optimization techniques

UpdateParameters(T)

Updates all trainable parameters of the layer using the computed gradients and specified learning rate.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T

The learning rate controlling how much to adjust parameters

Remarks

This method applies gradient descent with momentum to update all weights and biases:

1. First checks if gradients are available from a previous backward pass 2. Updates all input weights (weightsFi, weightsIi, weightsCi, weightsOi) 3. Updates all hidden weights (weightsFh, weightsIh, weightsCh, weightsOh) 4. Updates all biases (biasF, biasI, biasC, biasO) 5. Clears gradients after all updates are complete

For Beginners: This method applies the calculated updates to all weights and biases.

After figuring out how parameters should change:

  • The learningRate controls how big each adjustment is
  • Smaller values make small, cautious changes
  • Larger values make bigger, more aggressive changes

The method also uses "momentum," which is like inertia:

  • If parameters have been moving in a certain direction, they tend to keep going
  • This helps navigate flat regions and avoid getting stuck in local minima
  • Think of it like rolling a ball downhill - it builds up speed in the right direction

After updating all parameters, the gradients are cleared to prepare for the next training batch.

UpdateParametersGpu(IGpuOptimizerConfig)

GPU-resident parameter update with polymorphic optimizer support. Updates all weight tensors directly on GPU using the specified optimizer configuration.

public override void UpdateParametersGpu(IGpuOptimizerConfig config)

Parameters

config IGpuOptimizerConfig

GPU optimizer configuration specifying the optimizer type and hyperparameters.