Class LSTMLayer<T>

Namespace: AiDotNet.NeuralNetworks.Layers

Assembly: AiDotNet.dll

Represents a Long Short-Term Memory (LSTM) layer for processing sequential data.

public class LSTMLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable

Type Parameters

T: The numeric type used for calculations, typically float or double.

Inheritance: object

LayerBase<T>

LSTMLayer<T>

Implements: ILayer<T>

IJitCompilable<T>

IDiagnosticsProvider

IWeightLoadable<T>

IDisposable

Inherited Members: LayerBase<T>.Engine

LayerBase<T>.ScalarActivation

LayerBase<T>.VectorActivation

LayerBase<T>.UsingVectorActivation

LayerBase<T>.NumOps

LayerBase<T>.Random

LayerBase<T>.Parameters

LayerBase<T>.ParameterGradients

LayerBase<T>.InputShape

LayerBase<T>.InputShapes

LayerBase<T>.UpdateInputShape(int[])

LayerBase<T>.OutputShape

LayerBase<T>.IsTrainingMode

LayerBase<T>.InitializationStrategy

LayerBase<T>.IsInitialized

LayerBase<T>.InitializationLock

LayerBase<T>.EnsureInitialized()

LayerBase<T>.UseAutodiff

LayerBase<T>.SetTrainingMode(bool)

LayerBase<T>.GetParameterGradients()

LayerBase<T>.ClearGradients()

LayerBase<T>.GetInputShape()

LayerBase<T>.GetInputShapes()

LayerBase<T>.GetOutputShape()

LayerBase<T>.GetWeights()

LayerBase<T>.GetBiases()

LayerBase<T>.MapActivationToFused()

LayerBase<T>.SupportsGpuTraining

LayerBase<T>.CanExecuteOnGpu

LayerBase<T>.CanTrainOnGpu

LayerBase<T>.UploadWeightsToGpu()

LayerBase<T>.DownloadWeightsFromGpu()

LayerBase<T>.ZeroGradientsGpu()

LayerBase<T>.GetActivationTypes()

LayerBase<T>.Forward(params Tensor<T>[])

LayerBase<T>.ApplyActivation(Tensor<T>)

LayerBase<T>.ApplyActivation(Vector<T>)

LayerBase<T>.ActivateTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ActivateTensor(IVectorActivationFunction<T>, Tensor<T>)

LayerBase<T>.CalculateInputShape(int, int, int)

LayerBase<T>.CalculateOutputShape(int, int, int)

LayerBase<T>.Clone()

LayerBase<T>.DerivativeTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ApplyActivationDerivative(T, T)

LayerBase<T>.ApplyActivationDerivative(Tensor<T>, Tensor<T>)

LayerBase<T>.ComputeActivationJacobian(Vector<T>)

LayerBase<T>.ApplyActivationDerivative(Vector<T>, Vector<T>)

LayerBase<T>.UpdateParameters(Vector<T>)

LayerBase<T>.GetDiagnostics()

LayerBase<T>.ApplyActivationToGraph(ComputationNode<T>)

LayerBase<T>.CanActivationBeJitted()

LayerBase<T>.RegisterTrainableParameter(Tensor<T>, PersistentTensorRole)

LayerBase<T>.InvalidateTrainableParameter(Tensor<T>)

LayerBase<T>.HasGpuActivation()

LayerBase<T>.ApplyActivationForwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.ApplyActivationBackwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.GetFusedActivationType()

LayerBase<T>.ApplyGpuActivation(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int, FusedActivationType)

LayerBase<T>.ApplyGpuActivationBackward(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int, FusedActivationType, float)

LayerBase<T>.Dispose()

LayerBase<T>.Dispose(bool)

LayerBase<T>.WeightParameterName

LayerBase<T>.BiasParameterName

LayerBase<T>.SetWeights(Tensor<T>)

LayerBase<T>.SetBiases(Tensor<T>)

LayerBase<T>.GetParameterNames()

LayerBase<T>.TryGetParameter(string, out Tensor<T>)

LayerBase<T>.SetParameter(string, Tensor<T>)

LayerBase<T>.GetParameterShape(string)

LayerBase<T>.NamedParameterCount

LayerBase<T>.ValidateWeights(IEnumerable<string>, Func<string, string>)

LayerBase<T>.LoadWeights(Dictionary<string, Tensor<T>>, Func<string, string>, bool)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

The LSTM layer is a specialized type of recurrent neural network (RNN) that is designed to capture long-term dependencies in sequential data. It uses a cell state and a series of gates (forget, input, and output) to control the flow of information through the network, allowing it to remember important patterns over long sequences while forgetting irrelevant information.

For Beginners: An LSTM layer is like a smart memory system for your AI.

Think of it like a notepad with special features:

It can remember important information for a long time (unlike simpler neural networks)
It can forget irrelevant details (using its "forget gate")
It can decide what new information to write down (using its "input gate")
It can decide what information to share (using its "output gate")

LSTMs are great for:

Text generation and language understanding
Time series prediction (like stock prices)
Speech recognition
Any task where the order and context of information matters

For example, when processing the sentence "The clouds are in the ___", an LSTM would remember that "clouds" appeared earlier, helping it predict "sky" as the missing word.

Constructors

LSTMLayer(int, int, int[], IActivationFunction<T>?, IActivationFunction<T>?, IEngine?)

Initializes a new instance of the LSTMLayer<T> class with scalar activation functions.

public LSTMLayer(int inputSize, int hiddenSize, int[] inputShape, IActivationFunction<T>? activation = null, IActivationFunction<T>? recurrentActivation = null, IEngine? engine = null)

Parameters

inputSize int: The size of each input vector (number of features).
hiddenSize int: The size of the hidden state (number of LSTM units).
inputShape int[]: The shape of the input tensor.
activation IActivationFunction<T>: The activation function to use for the cell state, defaults to tanh if not specified.
recurrentActivation IActivationFunction<T>: The activation function to use for the gates, defaults to sigmoid if not specified.
engine IEngine

Remarks

This constructor creates an LSTM layer with the specified dimensions and activation functions. It initializes all the weights and biases needed for the LSTM gates (forget, input, cell state, and output). The weights are initialized using the Xavier/Glorot initialization technique, which helps with training stability.

For Beginners: This creates a new LSTM layer with your desired settings using standard activation functions.

When setting up this layer:

inputSize is how many features each data point has
hiddenSize is how much "memory" each LSTM unit will have
inputShape defines the expected dimensions of your data
activation controls how the cell state is processed (usually tanh)
recurrentActivation controls how the gates operate (usually sigmoid)

For example, if you're processing words represented as 100-dimensional vectors, inputSize would be 100. If you want 200 LSTM units, hiddenSize would be 200.

LSTMLayer(int, int, int[], IVectorActivationFunction<T>?, IVectorActivationFunction<T>?, IEngine?)

Initializes a new instance of the LSTMLayer<T> class with vector activation functions.

public LSTMLayer(int inputSize, int hiddenSize, int[] inputShape, IVectorActivationFunction<T>? activation = null, IVectorActivationFunction<T>? recurrentActivation = null, IEngine? engine = null)

Parameters

inputSize int: The size of each input vector (number of features).
hiddenSize int: The size of the hidden state (number of LSTM units).
inputShape int[]: The shape of the input tensor.
activation IVectorActivationFunction<T>: The vector activation function to use for the cell state, defaults to tanh if not specified.
recurrentActivation IVectorActivationFunction<T>: The vector activation function to use for the gates, defaults to sigmoid if not specified.
engine IEngine

Remarks

This constructor creates an LSTM layer that uses vector activation functions, which operate on entire tensors at once rather than element by element. This can be more efficient for certain operations and allows for more complex activation patterns that consider relationships between different elements.

For Beginners: This creates a new LSTM layer using advanced vector-based activation functions.

Vector activation functions:

Process entire groups of numbers at once, rather than one at a time
Can be more efficient on certain hardware
May capture more complex relationships between different values

When you might use this constructor instead of the standard one:

When working with very large models
When you need maximum performance
When using specialized activation functions that work on vectors

The basic functionality is the same as the standard constructor, but with potentially better performance for large-scale applications.

Properties

BiasC

Gets the cell gate bias for weight loading.

public Tensor<T> BiasC { get; }

Property Value

Tensor<T>

BiasF

Gets the forget gate bias for weight loading.

public Tensor<T> BiasF { get; }

Property Value

Tensor<T>

BiasI

Gets the input gate bias for weight loading.

public Tensor<T> BiasI { get; }

Property Value

Tensor<T>

BiasO

Gets the output gate bias for weight loading.

public Tensor<T> BiasO { get; }

Property Value

Tensor<T>

Gradients

Gets a dictionary containing the gradients for all trainable parameters after a backward pass.

public Dictionary<string, Tensor<T>> Gradients { get; }

Property Value

Dictionary<string, Tensor<T>>

Remarks

This property stores the gradients computed during the backward pass, which indicate how each parameter should be updated to minimize the loss function. The dictionary keys correspond to parameter names, and the values are tensors containing the gradients.

For Beginners: This is like a learning notebook for the layer.

During training:

The layer calculates how it needs to change its internal values
These changes (gradients) are stored in this dictionary
Later, these values are used to update the weights and make the layer smarter

Each key in the dictionary refers to a specific part of the LSTM that needs updating, and the corresponding value shows how much and in what direction to change it.

ParameterCount

Gets the total number of trainable parameters in this layer.

public override int ParameterCount { get; }

Property Value

int: The total number of parameters across all weight matrices and bias vectors. For an LSTM with input size I and hidden size H, this is: 4 * (H * I) + 4 * (H * H) + 4 * H = 4 * H * (I + H + 1)

Remarks

The LSTM has 4 gates (forget, input, cell, output), each with: - Input-to-hidden weights: [hiddenSize × inputSize] - Hidden-to-hidden weights: [hiddenSize × hiddenSize] - Biases: [hiddenSize]

SupportsGpuExecution

Gets a value indicating whether this layer supports GPU execution.

protected override bool SupportsGpuExecution { get; }

Property Value

bool

SupportsJitCompilation

Gets whether this layer currently supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool: True for LSTM layers, as single time-step JIT compilation is supported.

SupportsTraining

Gets a value indicating whether this layer supports training through backpropagation.

public override bool SupportsTraining { get; }

Property Value

bool

Remarks

This property returns true because the LSTM layer has trainable parameters (weights and biases) that can be updated during training through backpropagation.

For Beginners: This tells you if the layer can learn from training data.

A value of true means:

This layer has internal values (weights and biases) that get updated during training
It can improve its performance as it sees more data
It actively participates in the learning process

Unlike some layers that just do fixed calculations, LSTM layers can adapt and learn from patterns in your data.

WeightsCh

Gets the cell gate hidden weights for weight loading.

public Tensor<T> WeightsCh { get; }

Property Value

Tensor<T>

WeightsCi

Gets the cell gate input weights for weight loading.

public Tensor<T> WeightsCi { get; }

Property Value

Tensor<T>

WeightsFh

Gets the forget gate hidden weights for weight loading.

public Tensor<T> WeightsFh { get; }

Property Value

Tensor<T>

WeightsFi

Gets the forget gate input weights for weight loading.

public Tensor<T> WeightsFi { get; }

Property Value

Tensor<T>

WeightsIh

Gets the input gate hidden weights for weight loading.

public Tensor<T> WeightsIh { get; }

Property Value

Tensor<T>

WeightsIi

Gets the input gate input weights for weight loading.

public Tensor<T> WeightsIi { get; }

Property Value

Tensor<T>

WeightsOh

Gets the output gate hidden weights for weight loading.

public Tensor<T> WeightsOh { get; }

Property Value

Tensor<T>

WeightsOi

Gets the output gate input weights for weight loading.

public Tensor<T> WeightsOi { get; }

Property Value

Tensor<T>

Methods

Backward(Tensor<T>)

Performs the backward pass of the LSTM layer.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>: The gradient of the loss with respect to the layer's output.

Returns

Tensor<T>: The gradient of the loss with respect to the layer's input.

Remarks

This method implements the backward pass of the LSTM layer, which is used during training to propagate error gradients back through the network. It processes the sequence in reverse order (from the last time step to the first), calculating gradients for all parameters and the input. The gradients are stored for use in the UpdateParameters method.

For Beginners: This method is used during training to calculate how the layer's inputs should change to reduce errors.

During the backward pass:

The layer processes the sequence in reverse order (last step to first)
At each step, it calculates how each part contributed to the error
It computes gradients for all weights, biases, and inputs
These gradients show how to adjust the parameters to improve performance

This process is part of the "backpropagation through time" algorithm that helps recurrent neural networks learn from their mistakes.

BackwardGpu(IGpuTensor<T>)

GPU-resident backward pass for LSTM using fused sequence kernel. Computes gradients for all weights in a single kernel launch.

public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)

Parameters

outputGradient IGpuTensor<T>: GPU-resident gradient from upstream layer.

Returns

IGpuTensor<T>: GPU-resident gradient to pass to previous layer.

Remarks

This method implements the GPU-accelerated backward pass of the LSTM layer using a fused sequence kernel that processes all timesteps in one kernel launch.

For Beginners: This is the GPU version of backpropagation through time. All computations stay on the GPU for maximum performance.

Deserialize(BinaryReader)

Deserializes the LSTM layer's parameters from a binary stream.

public override void Deserialize(BinaryReader reader)

Parameters

reader BinaryReader: The binary reader to read from.

Remarks

This method loads all weights and biases of the LSTM layer from a binary stream. This allows the layer to restore its state from a previously saved file, which is useful for loading trained models or for transferring parameters between different instances.

For Beginners: This method loads previously saved values into the layer.

Deserialization is like restoring a saved snapshot:

All weights and biases are read from a file
The layer's internal state is set to match what was saved
This lets you use a previously trained model without retraining

For example, you could train a model on a powerful computer, save it, and then load it on a less powerful device for actual use.

ExportComputationGraph(List<ComputationNode<T>>)

Exports the LSTM layer's single time-step computation as a JIT-compilable computation graph.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>: List to populate with input computation nodes.

Returns

ComputationNode<T>: The output computation node representing the hidden state at one time step.

Remarks

This method exports a single LSTM cell computation for JIT compilation. The graph computes: h_t, c_t = LSTMCell(x_t, h_{t-1}, c_{t-1}) using the standard LSTM equations with forget, input, output gates and cell candidate.

Forward(Tensor<T>)

Performs the forward pass of the LSTM layer.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>: The input tensor to process.

Returns

Tensor<T>: The output tensor after LSTM processing.

Remarks

This method implements the forward pass of the LSTM layer. It processes the input sequence one time step at a time, updating the hidden state and cell state for each step. The hidden state at each time step is collected to form the output tensor. The input, hidden state, and cell state are cached for use during the backward pass.

For Beginners: This method processes your data through the LSTM layer.

During the forward pass:

The layer processes the input sequence step by step
For each step, it updates its internal memory (hidden state and cell state)
It produces an output for each step in the sequence
It remembers the inputs and states for later use during training

For example, if processing a sentence, the LSTM would process one word at a time, updating its understanding of the context with each word, and producing an output that reflects that understanding.

ForwardGpu(params IGpuTensor<T>[])

Performs a GPU-resident forward pass using GPU-accelerated LSTM operations.

public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)

Parameters

inputs IGpuTensor<T>[]: GPU-resident input tensor.

Returns

IGpuTensor<T>: GPU-resident output tensor.

Remarks

For Beginners: This is the GPU-optimized version of the Forward method. All data stays on the GPU throughout the computation, avoiding expensive CPU-GPU transfers. The LSTM gates (forget, input, cell, output) are computed using GPU matrix operations.

GetParameters()

Gets all trainable parameters of the LSTM layer as a single vector.

public override Vector<T> GetParameters()

Returns

Vector<T>: A vector containing all trainable parameters.

Remarks

This method retrieves all trainable parameters (weights and biases) from the LSTM layer and combines them into a single vector. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights in a uniform format.

For Beginners: This method collects all the learned values into a single list.

The parameters:

Are the numbers that the neural network has learned during training
Include all weights and biases for each gate in the LSTM
Are combined into a single long list (vector)

This is useful for:

Saving the model to disk in a simple format
Advanced optimization techniques that need access to all parameters
Sharing parameters between different models

ResetState()

Resets the internal state of the LSTM layer.

public override void ResetState()

Remarks

This method clears any cached data from previous forward passes, essentially resetting the layer to its initial state. This is useful when starting to process a new sequence or when implementing stateful recurrent networks where you want to explicitly control when states are reset.

For Beginners: This method clears the layer's memory to start fresh.

When resetting the state:

Stored inputs and hidden states are cleared
Gradients from previous training steps are cleared
The layer forgets any information from previous sequences

This is important when:

Processing a new, unrelated sequence
Starting a new training episode
You want the network to forget its previous context

For example, if you've processed one paragraph and want to start with a completely new paragraph, you should reset the state to prevent the new paragraph from being influenced by the previous one.

Serialize(BinaryWriter)

Serializes the LSTM layer's parameters to a binary stream.

public override void Serialize(BinaryWriter writer)

Parameters

writer BinaryWriter: The binary writer to write to.

Remarks

This method saves all weights and biases of the LSTM layer to a binary stream. This allows the layer's state to be saved to a file and loaded later, which is useful for saving trained models or for transferring parameters between different instances.

For Beginners: This method saves the layer's learned values to a file.

Serialization is like taking a snapshot of the layer's current state:

All weights and biases are written to a file
The exact format ensures they can be loaded back correctly
This lets you save a trained model for later use

For example, after training your model for hours or days, you can save it and then load it later without having to retrain.

SetParameters(Vector<T>)

Sets the trainable parameters of the LSTM layer from a single vector.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>: A vector containing all parameters to set.

Remarks

This method sets all trainable parameters (weights and biases) of the LSTM layer from a single vector. It extracts the appropriate portions of the input vector for each parameter. This is useful for loading saved model weights or for implementing optimization algorithms that operate on all parameters at once.

For Beginners: This method updates all the learned values from a single list.

When setting parameters:

The input must be a vector with the correct length
The method distributes values to the appropriate weights and biases
This allows you to restore a previously saved model

For example, after loading a parameter vector from a file, this method would update all the internal weights and biases of the LSTM to match what was saved.

Exceptions

ArgumentException: Thrown when the parameters vector has incorrect length.

UpdateParameters(T)

Updates the parameters of the LSTM layer based on the calculated gradients.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T: The learning rate to use for the parameter updates.

Remarks

This method updates all the weights and biases of the LSTM layer based on the gradients computed during the backward pass. The learning rate controls the size of the parameter updates. Each parameter is updated by subtracting the product of its gradient and the learning rate.

For Beginners: This method updates the layer's internal values during training.

When updating parameters:

Each weight and bias is adjusted based on its gradient
The learning rate controls how big each update step is
Smaller learning rates mean slower but more stable learning
Larger learning rates mean faster but potentially unstable learning

This is how the layer "learns" from data over time, gradually adjusting its internal values to better process the input sequences.

UpdateParametersGpu(IGpuOptimizerConfig)

GPU-resident parameter update with polymorphic optimizer support. Updates all weight tensors directly on GPU using the specified optimizer configuration.

public override void UpdateParametersGpu(IGpuOptimizerConfig config)

Parameters

config IGpuOptimizerConfig: GPU optimizer configuration specifying the optimizer type and hyperparameters.

Table of Contents

Class LSTMLayer<T>

Type Parameters

Remarks

Constructors

LSTMLayer(int, int, int[], IActivationFunction<T>?, IActivationFunction<T>?, IEngine?)

Parameters

Remarks

LSTMLayer(int, int, int[], IVectorActivationFunction<T>?, IVectorActivationFunction<T>?, IEngine?)

Parameters

Remarks

Properties

BiasC

Property Value

BiasF

Property Value

BiasI

Property Value

BiasO

Property Value

Gradients

Property Value

Remarks

ParameterCount

Property Value

Remarks

SupportsGpuExecution

Property Value

SupportsJitCompilation

Property Value

SupportsTraining

Property Value

Remarks

WeightsCh

Property Value

WeightsCi

Property Value

WeightsFh

Property Value

WeightsFi

Property Value

WeightsIh

Property Value

WeightsIi

Property Value

WeightsOh

Property Value

WeightsOi

Property Value

Methods

Backward(Tensor<T>)

Parameters

Returns

Remarks

BackwardGpu(IGpuTensor<T>)

Parameters

Returns

Remarks

Deserialize(BinaryReader)

Parameters

Remarks

ExportComputationGraph(List<ComputationNode<T>>)

Parameters

Returns

Remarks

Forward(Tensor<T>)

Parameters

Returns

Remarks

ForwardGpu(params IGpuTensor<T>[])

Parameters

Returns

Remarks

GetParameters()

Returns

Remarks

ResetState()

Remarks

Serialize(BinaryWriter)

Parameters