Class LSTMLayer<T>
- Namespace
- AiDotNet.NeuralNetworks.Layers
- Assembly
- AiDotNet.dll
Represents a Long Short-Term Memory (LSTM) layer for processing sequential data.
public class LSTMLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
LayerBase<T>LSTMLayer<T>
- Implements
-
ILayer<T>
- Inherited Members
Remarks
The LSTM layer is a specialized type of recurrent neural network (RNN) that is designed to capture long-term dependencies in sequential data. It uses a cell state and a series of gates (forget, input, and output) to control the flow of information through the network, allowing it to remember important patterns over long sequences while forgetting irrelevant information.
For Beginners: An LSTM layer is like a smart memory system for your AI.
Think of it like a notepad with special features:
- It can remember important information for a long time (unlike simpler neural networks)
- It can forget irrelevant details (using its "forget gate")
- It can decide what new information to write down (using its "input gate")
- It can decide what information to share (using its "output gate")
LSTMs are great for:
- Text generation and language understanding
- Time series prediction (like stock prices)
- Speech recognition
- Any task where the order and context of information matters
For example, when processing the sentence "The clouds are in the ___", an LSTM would remember that "clouds" appeared earlier, helping it predict "sky" as the missing word.
Constructors
LSTMLayer(int, int, int[], IActivationFunction<T>?, IActivationFunction<T>?, IEngine?)
Initializes a new instance of the LSTMLayer<T> class with scalar activation functions.
public LSTMLayer(int inputSize, int hiddenSize, int[] inputShape, IActivationFunction<T>? activation = null, IActivationFunction<T>? recurrentActivation = null, IEngine? engine = null)
Parameters
inputSizeintThe size of each input vector (number of features).
hiddenSizeintThe size of the hidden state (number of LSTM units).
inputShapeint[]The shape of the input tensor.
activationIActivationFunction<T>The activation function to use for the cell state, defaults to tanh if not specified.
recurrentActivationIActivationFunction<T>The activation function to use for the gates, defaults to sigmoid if not specified.
engineIEngine
Remarks
This constructor creates an LSTM layer with the specified dimensions and activation functions. It initializes all the weights and biases needed for the LSTM gates (forget, input, cell state, and output). The weights are initialized using the Xavier/Glorot initialization technique, which helps with training stability.
For Beginners: This creates a new LSTM layer with your desired settings using standard activation functions.
When setting up this layer:
- inputSize is how many features each data point has
- hiddenSize is how much "memory" each LSTM unit will have
- inputShape defines the expected dimensions of your data
- activation controls how the cell state is processed (usually tanh)
- recurrentActivation controls how the gates operate (usually sigmoid)
For example, if you're processing words represented as 100-dimensional vectors, inputSize would be 100. If you want 200 LSTM units, hiddenSize would be 200.
LSTMLayer(int, int, int[], IVectorActivationFunction<T>?, IVectorActivationFunction<T>?, IEngine?)
Initializes a new instance of the LSTMLayer<T> class with vector activation functions.
public LSTMLayer(int inputSize, int hiddenSize, int[] inputShape, IVectorActivationFunction<T>? activation = null, IVectorActivationFunction<T>? recurrentActivation = null, IEngine? engine = null)
Parameters
inputSizeintThe size of each input vector (number of features).
hiddenSizeintThe size of the hidden state (number of LSTM units).
inputShapeint[]The shape of the input tensor.
activationIVectorActivationFunction<T>The vector activation function to use for the cell state, defaults to tanh if not specified.
recurrentActivationIVectorActivationFunction<T>The vector activation function to use for the gates, defaults to sigmoid if not specified.
engineIEngine
Remarks
This constructor creates an LSTM layer that uses vector activation functions, which operate on entire tensors at once rather than element by element. This can be more efficient for certain operations and allows for more complex activation patterns that consider relationships between different elements.
For Beginners: This creates a new LSTM layer using advanced vector-based activation functions.
Vector activation functions:
- Process entire groups of numbers at once, rather than one at a time
- Can be more efficient on certain hardware
- May capture more complex relationships between different values
When you might use this constructor instead of the standard one:
- When working with very large models
- When you need maximum performance
- When using specialized activation functions that work on vectors
The basic functionality is the same as the standard constructor, but with potentially better performance for large-scale applications.
Properties
BiasC
Gets the cell gate bias for weight loading.
public Tensor<T> BiasC { get; }
Property Value
- Tensor<T>
BiasF
Gets the forget gate bias for weight loading.
public Tensor<T> BiasF { get; }
Property Value
- Tensor<T>
BiasI
Gets the input gate bias for weight loading.
public Tensor<T> BiasI { get; }
Property Value
- Tensor<T>
BiasO
Gets the output gate bias for weight loading.
public Tensor<T> BiasO { get; }
Property Value
- Tensor<T>
Gradients
Gets a dictionary containing the gradients for all trainable parameters after a backward pass.
public Dictionary<string, Tensor<T>> Gradients { get; }
Property Value
- Dictionary<string, Tensor<T>>
Remarks
This property stores the gradients computed during the backward pass, which indicate how each parameter should be updated to minimize the loss function. The dictionary keys correspond to parameter names, and the values are tensors containing the gradients.
For Beginners: This is like a learning notebook for the layer.
During training:
- The layer calculates how it needs to change its internal values
- These changes (gradients) are stored in this dictionary
- Later, these values are used to update the weights and make the layer smarter
Each key in the dictionary refers to a specific part of the LSTM that needs updating, and the corresponding value shows how much and in what direction to change it.
ParameterCount
Gets the total number of trainable parameters in this layer.
public override int ParameterCount { get; }
Property Value
- int
The total number of parameters across all weight matrices and bias vectors. For an LSTM with input size I and hidden size H, this is: 4 * (H * I) + 4 * (H * H) + 4 * H = 4 * H * (I + H + 1)
Remarks
The LSTM has 4 gates (forget, input, cell, output), each with: - Input-to-hidden weights: [hiddenSize × inputSize] - Hidden-to-hidden weights: [hiddenSize × hiddenSize] - Biases: [hiddenSize]
SupportsGpuExecution
Gets a value indicating whether this layer supports GPU execution.
protected override bool SupportsGpuExecution { get; }
Property Value
SupportsJitCompilation
Gets whether this layer currently supports JIT compilation.
public override bool SupportsJitCompilation { get; }
Property Value
- bool
True for LSTM layers, as single time-step JIT compilation is supported.
SupportsTraining
Gets a value indicating whether this layer supports training through backpropagation.
public override bool SupportsTraining { get; }
Property Value
Remarks
This property returns true because the LSTM layer has trainable parameters (weights and biases) that can be updated during training through backpropagation.
For Beginners: This tells you if the layer can learn from training data.
A value of true means:
- This layer has internal values (weights and biases) that get updated during training
- It can improve its performance as it sees more data
- It actively participates in the learning process
Unlike some layers that just do fixed calculations, LSTM layers can adapt and learn from patterns in your data.
WeightsCh
Gets the cell gate hidden weights for weight loading.
public Tensor<T> WeightsCh { get; }
Property Value
- Tensor<T>
WeightsCi
Gets the cell gate input weights for weight loading.
public Tensor<T> WeightsCi { get; }
Property Value
- Tensor<T>
WeightsFh
Gets the forget gate hidden weights for weight loading.
public Tensor<T> WeightsFh { get; }
Property Value
- Tensor<T>
WeightsFi
Gets the forget gate input weights for weight loading.
public Tensor<T> WeightsFi { get; }
Property Value
- Tensor<T>
WeightsIh
Gets the input gate hidden weights for weight loading.
public Tensor<T> WeightsIh { get; }
Property Value
- Tensor<T>
WeightsIi
Gets the input gate input weights for weight loading.
public Tensor<T> WeightsIi { get; }
Property Value
- Tensor<T>
WeightsOh
Gets the output gate hidden weights for weight loading.
public Tensor<T> WeightsOh { get; }
Property Value
- Tensor<T>
WeightsOi
Gets the output gate input weights for weight loading.
public Tensor<T> WeightsOi { get; }
Property Value
- Tensor<T>
Methods
Backward(Tensor<T>)
Performs the backward pass of the LSTM layer.
public override Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>The gradient of the loss with respect to the layer's output.
Returns
- Tensor<T>
The gradient of the loss with respect to the layer's input.
Remarks
This method implements the backward pass of the LSTM layer, which is used during training to propagate error gradients back through the network. It processes the sequence in reverse order (from the last time step to the first), calculating gradients for all parameters and the input. The gradients are stored for use in the UpdateParameters method.
For Beginners: This method is used during training to calculate how the layer's inputs should change to reduce errors.
During the backward pass:
- The layer processes the sequence in reverse order (last step to first)
- At each step, it calculates how each part contributed to the error
- It computes gradients for all weights, biases, and inputs
- These gradients show how to adjust the parameters to improve performance
This process is part of the "backpropagation through time" algorithm that helps recurrent neural networks learn from their mistakes.
BackwardGpu(IGpuTensor<T>)
GPU-resident backward pass for LSTM using fused sequence kernel. Computes gradients for all weights in a single kernel launch.
public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)
Parameters
outputGradientIGpuTensor<T>GPU-resident gradient from upstream layer.
Returns
- IGpuTensor<T>
GPU-resident gradient to pass to previous layer.
Remarks
This method implements the GPU-accelerated backward pass of the LSTM layer using a fused sequence kernel that processes all timesteps in one kernel launch.
For Beginners: This is the GPU version of backpropagation through time. All computations stay on the GPU for maximum performance.
Deserialize(BinaryReader)
Deserializes the LSTM layer's parameters from a binary stream.
public override void Deserialize(BinaryReader reader)
Parameters
readerBinaryReaderThe binary reader to read from.
Remarks
This method loads all weights and biases of the LSTM layer from a binary stream. This allows the layer to restore its state from a previously saved file, which is useful for loading trained models or for transferring parameters between different instances.
For Beginners: This method loads previously saved values into the layer.
Deserialization is like restoring a saved snapshot:
- All weights and biases are read from a file
- The layer's internal state is set to match what was saved
- This lets you use a previously trained model without retraining
For example, you could train a model on a powerful computer, save it, and then load it on a less powerful device for actual use.
ExportComputationGraph(List<ComputationNode<T>>)
Exports the LSTM layer's single time-step computation as a JIT-compilable computation graph.
public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)
Parameters
inputNodesList<ComputationNode<T>>List to populate with input computation nodes.
Returns
- ComputationNode<T>
The output computation node representing the hidden state at one time step.
Remarks
This method exports a single LSTM cell computation for JIT compilation. The graph computes: h_t, c_t = LSTMCell(x_t, h_{t-1}, c_{t-1}) using the standard LSTM equations with forget, input, output gates and cell candidate.
Forward(Tensor<T>)
Performs the forward pass of the LSTM layer.
public override Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>The input tensor to process.
Returns
- Tensor<T>
The output tensor after LSTM processing.
Remarks
This method implements the forward pass of the LSTM layer. It processes the input sequence one time step at a time, updating the hidden state and cell state for each step. The hidden state at each time step is collected to form the output tensor. The input, hidden state, and cell state are cached for use during the backward pass.
For Beginners: This method processes your data through the LSTM layer.
During the forward pass:
- The layer processes the input sequence step by step
- For each step, it updates its internal memory (hidden state and cell state)
- It produces an output for each step in the sequence
- It remembers the inputs and states for later use during training
For example, if processing a sentence, the LSTM would process one word at a time, updating its understanding of the context with each word, and producing an output that reflects that understanding.
ForwardGpu(params IGpuTensor<T>[])
Performs a GPU-resident forward pass using GPU-accelerated LSTM operations.
public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)
Parameters
inputsIGpuTensor<T>[]GPU-resident input tensor.
Returns
- IGpuTensor<T>
GPU-resident output tensor.
Remarks
For Beginners: This is the GPU-optimized version of the Forward method. All data stays on the GPU throughout the computation, avoiding expensive CPU-GPU transfers. The LSTM gates (forget, input, cell, output) are computed using GPU matrix operations.
GetParameters()
Gets all trainable parameters of the LSTM layer as a single vector.
public override Vector<T> GetParameters()
Returns
- Vector<T>
A vector containing all trainable parameters.
Remarks
This method retrieves all trainable parameters (weights and biases) from the LSTM layer and combines them into a single vector. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights in a uniform format.
For Beginners: This method collects all the learned values into a single list.
The parameters:
- Are the numbers that the neural network has learned during training
- Include all weights and biases for each gate in the LSTM
- Are combined into a single long list (vector)
This is useful for:
- Saving the model to disk in a simple format
- Advanced optimization techniques that need access to all parameters
- Sharing parameters between different models
ResetState()
Resets the internal state of the LSTM layer.
public override void ResetState()
Remarks
This method clears any cached data from previous forward passes, essentially resetting the layer to its initial state. This is useful when starting to process a new sequence or when implementing stateful recurrent networks where you want to explicitly control when states are reset.
For Beginners: This method clears the layer's memory to start fresh.
When resetting the state:
- Stored inputs and hidden states are cleared
- Gradients from previous training steps are cleared
- The layer forgets any information from previous sequences
This is important when:
- Processing a new, unrelated sequence
- Starting a new training episode
- You want the network to forget its previous context
For example, if you've processed one paragraph and want to start with a completely new paragraph, you should reset the state to prevent the new paragraph from being influenced by the previous one.
Serialize(BinaryWriter)
Serializes the LSTM layer's parameters to a binary stream.
public override void Serialize(BinaryWriter writer)
Parameters
writerBinaryWriterThe binary writer to write to.
Remarks
This method saves all weights and biases of the LSTM layer to a binary stream. This allows the layer's state to be saved to a file and loaded later, which is useful for saving trained models or for transferring parameters between different instances.
For Beginners: This method saves the layer's learned values to a file.
Serialization is like taking a snapshot of the layer's current state:
- All weights and biases are written to a file
- The exact format ensures they can be loaded back correctly
- This lets you save a trained model for later use
For example, after training your model for hours or days, you can save it and then load it later without having to retrain.
SetParameters(Vector<T>)
Sets the trainable parameters of the LSTM layer from a single vector.
public override void SetParameters(Vector<T> parameters)
Parameters
parametersVector<T>A vector containing all parameters to set.
Remarks
This method sets all trainable parameters (weights and biases) of the LSTM layer from a single vector. It extracts the appropriate portions of the input vector for each parameter. This is useful for loading saved model weights or for implementing optimization algorithms that operate on all parameters at once.
For Beginners: This method updates all the learned values from a single list.
When setting parameters:
- The input must be a vector with the correct length
- The method distributes values to the appropriate weights and biases
- This allows you to restore a previously saved model
For example, after loading a parameter vector from a file, this method would update all the internal weights and biases of the LSTM to match what was saved.
Exceptions
- ArgumentException
Thrown when the parameters vector has incorrect length.
UpdateParameters(T)
Updates the parameters of the LSTM layer based on the calculated gradients.
public override void UpdateParameters(T learningRate)
Parameters
learningRateTThe learning rate to use for the parameter updates.
Remarks
This method updates all the weights and biases of the LSTM layer based on the gradients computed during the backward pass. The learning rate controls the size of the parameter updates. Each parameter is updated by subtracting the product of its gradient and the learning rate.
For Beginners: This method updates the layer's internal values during training.
When updating parameters:
- Each weight and bias is adjusted based on its gradient
- The learning rate controls how big each update step is
- Smaller learning rates mean slower but more stable learning
- Larger learning rates mean faster but potentially unstable learning
This is how the layer "learns" from data over time, gradually adjusting its internal values to better process the input sequences.
UpdateParametersGpu(IGpuOptimizerConfig)
GPU-resident parameter update with polymorphic optimizer support. Updates all weight tensors directly on GPU using the specified optimizer configuration.
public override void UpdateParametersGpu(IGpuOptimizerConfig config)
Parameters
configIGpuOptimizerConfigGPU optimizer configuration specifying the optimizer type and hyperparameters.