Class GRULayer<T>
- Namespace
- AiDotNet.NeuralNetworks.Layers
- Assembly
- AiDotNet.dll
Represents a Gated Recurrent Unit (GRU) layer for processing sequential data.
public class GRULayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
LayerBase<T>GRULayer<T>
- Implements
-
ILayer<T>
- Inherited Members
Remarks
The GRU (Gated Recurrent Unit) layer is a type of recurrent neural network layer that is designed to capture dependencies over time in sequential data. It addresses the vanishing gradient problem that standard recurrent neural networks face when dealing with long sequences. The GRU uses update and reset gates to control the flow of information, allowing the network to retain relevant information over many time steps while forgetting irrelevant details.
For Beginners: This layer helps neural networks understand sequences of data, like sentences or time series.
Think of the GRU as having a "memory" that helps it understand context:
- When reading a sentence, it remembers important words from earlier
- When analyzing stock prices, it remembers relevant trends from previous days
- It uses special "gates" to decide what information to keep or forget
For example, in the sentence "The clouds were dark and it started to ___", the GRU would recognize the context and predict "rain" because it remembers the earlier words about dark clouds.
GRUs are simpler versions of LSTMs (Long Short-Term Memory) but often perform similarly well while being more efficient to train.
Constructors
GRULayer(int, int, bool, IActivationFunction<T>?, IActivationFunction<T>?)
Initializes a new instance of the GRULayer<T> class with the specified dimensions, return behavior, and element-wise activation functions.
public GRULayer(int inputSize, int hiddenSize, bool returnSequences = false, IActivationFunction<T>? activation = null, IActivationFunction<T>? recurrentActivation = null)
Parameters
inputSizeintThe size of the input feature vector at each time step.
hiddenSizeintThe size of the hidden state vector.
returnSequencesboolIf
true, returns all hidden states; iffalse, returns only the final hidden state.activationIActivationFunction<T>The activation function for the candidate hidden state. Defaults to tanh if not specified.
recurrentActivationIActivationFunction<T>The activation function for the gates. Defaults to sigmoid if not specified.
Remarks
This constructor creates a new GRU layer with the specified dimensions and element-wise activation functions. The weights are initialized randomly with a scale factor based on the hidden size, and the biases are initialized to zero.
For Beginners: This creates a new GRU layer with standard activation functions.
When creating a GRU layer, you specify:
- inputSize: How many features each element in your sequence has
- hiddenSize: How large the GRU's "memory" should be
- returnSequences: Whether you want information about every element or just a final summary
- activation: How to shape new information (default is tanh, outputting values between -1 and 1)
- recurrentActivation: How the gates should work (default is sigmoid, outputting values between 0 and 1)
For example, if processing sentences where each word is represented by a 100-dimensional vector, and you want a 200-dimensional memory, you would use inputSize=100 and hiddenSize=200.
GRULayer(int, int, bool, IVectorActivationFunction<T>?, IVectorActivationFunction<T>?)
Initializes a new instance of the GRULayer<T> class with the specified dimensions, return behavior, and vector activation functions.
public GRULayer(int inputSize, int hiddenSize, bool returnSequences = false, IVectorActivationFunction<T>? vectorActivation = null, IVectorActivationFunction<T>? vectorRecurrentActivation = null)
Parameters
inputSizeintThe size of the input feature vector at each time step.
hiddenSizeintThe size of the hidden state vector.
returnSequencesboolIf
true, returns all hidden states; iffalse, returns only the final hidden state.vectorActivationIVectorActivationFunction<T>The vector activation function for the candidate hidden state. Defaults to tanh if not specified.
vectorRecurrentActivationIVectorActivationFunction<T>The vector activation function for the gates. Defaults to sigmoid if not specified.
Remarks
This constructor creates a new GRU layer with the specified dimensions and vector activation functions. Vector activation functions operate on entire vectors rather than individual elements, which can capture dependencies between different elements of the vectors.
For Beginners: This creates a new GRU layer with more advanced vector-based activation functions.
Vector activation functions:
- Process entire groups of numbers together, not just one at a time
- Can capture relationships between different features
- May be more powerful for complex patterns
This constructor is useful when you need the layer to understand how different features interact with each other, rather than treating each feature independently.
Properties
ParameterCount
Gets the total number of trainable parameters in the layer.
public override int ParameterCount { get; }
Property Value
- int
The total number of weight and bias parameters in the GRU layer.
Remarks
This property calculates the total number of trainable parameters in the GRU layer, which includes all the weights and biases for the gates and candidate hidden state.
For Beginners: This tells you how many numbers the layer needs to learn.
The formula counts:
- Weights connecting inputs to the GRU (Wz, Wr, Wh)
- Weights connecting the previous hidden state (Uz, Ur, Uh)
- Bias values for each gate and candidate state (bz, br, bh)
A higher parameter count means the model can capture more complex patterns but requires more data and time to train effectively.
SupportsGpuExecution
Gets a value indicating whether this layer supports GPU execution.
protected override bool SupportsGpuExecution { get; }
Property Value
SupportsJitCompilation
Gets whether this layer currently supports JIT compilation.
public override bool SupportsJitCompilation { get; }
Property Value
- bool
True for GRU layers, as single time-step JIT compilation is supported.
SupportsTraining
Gets a value indicating whether this layer supports training.
public override bool SupportsTraining { get; }
Property Value
- bool
truebecause this layer has trainable parameters (weights and biases).
Remarks
This property indicates whether the layer can be trained through backpropagation. The GRULayer always returns true because it contains trainable weights and biases.
For Beginners: This property tells you if the layer can learn from data.
A value of true means:
- The layer can adjust its internal values during training
- It will improve its performance as it sees more data
- It participates in the learning process
The GRU layer always supports training because it has weights and biases that can be updated.
Methods
Backward(Tensor<T>)
Performs the backward pass of the GRU layer.
public override Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>The gradient of the loss with respect to the layer's output.
Returns
- Tensor<T>
The gradient of the loss with respect to the layer's input.
Remarks
This method implements the backward pass of the GRU layer, which is used during training to propagate error gradients back through the network. It calculates the gradients for all the weights and biases, and returns the gradient with respect to the layer's input for further backpropagation.
For Beginners: This method is used during training to calculate how the layer's input and parameters should change to reduce errors.
During the backward pass:
- The layer receives information about how its output should change to reduce the overall error
- It calculates how each of its weights and biases should change to produce better output
- It calculates how its input should change, which will be used by earlier layers
This complex calculation essentially runs the GRU's logic in reverse, tracking how changes to the output would affect each internal part of the layer.
Exceptions
- InvalidOperationException
Thrown when Forward has not been called before Backward.
BackwardGpu(IGpuTensor<T>)
GPU-resident backward pass using fused sequence kernel. Computes gradients for all weights and biases in a single kernel launch.
public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)
Parameters
outputGradientIGpuTensor<T>Gradient of the loss with respect to the layer output.
Returns
- IGpuTensor<T>
Gradient of the loss with respect to the layer input.
Clone()
Creates a deep copy of this GRU layer with independent weights and reset state.
public override LayerBase<T> Clone()
Returns
- LayerBase<T>
A new GRULayer with the same weights but independent of the original.
ExportComputationGraph(List<ComputationNode<T>>)
Exports the GRU layer's single time-step computation as a JIT-compilable computation graph.
public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)
Parameters
inputNodesList<ComputationNode<T>>List to populate with input computation nodes.
Returns
- ComputationNode<T>
The output computation node representing the hidden state at one time step.
Remarks
This method exports a single GRU cell computation for JIT compilation. The graph computes: h_t = GRUCell(x_t, h_{t-1}) using the standard GRU equations with update gate, reset gate, and candidate hidden state.
Forward(Tensor<T>)
Performs the forward pass of the GRU layer.
public override Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>The input tensor to process. Shape should be [batchSize, sequenceLength, inputSize].
Returns
- Tensor<T>
The output tensor. If returnSequences is true, shape will be [batchSize, sequenceLength, hiddenSize]; otherwise, [batchSize, hiddenSize].
Remarks
This method implements the forward pass of the GRU layer. It processes the input sequence step by step, updating the hidden state at each time step according to the GRU equations. The update gate (z) controls how much of the previous hidden state to keep, the reset gate (r) controls how much of the previous hidden state to reset, and the candidate hidden state (h_candidate) contains new information from the current input.
For Beginners: This method processes your sequence data through the GRU.
For each element in your sequence (like each word in a sentence):
- The update gate (z) decides how much of the old memory to keep
- The reset gate (r) decides how much of the old memory to forget
- The layer creates new information based on the current input and relevant memory
- It combines the kept memory and new information to update its understanding
This process repeats for each element in the sequence, with the memory evolving to capture the relevant context from the entire sequence.
The final output depends on the returnSequences setting:
- If true: Returns information about every element in the sequence
- If false: Returns only the final memory state
ForwardGpu(params IGpuTensor<T>[])
Performs the forward pass on GPU tensors using fused sequence kernel.
public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)
Parameters
inputsIGpuTensor<T>[]GPU tensor inputs.
Returns
- IGpuTensor<T>
GPU tensor output after GRU processing.
Exceptions
- ArgumentException
Thrown when no input tensor is provided.
- InvalidOperationException
Thrown when GPU backend is unavailable.
GetParameters()
Gets all trainable parameters of the layer as a single vector.
public override Vector<T> GetParameters()
Returns
- Vector<T>
A vector containing all trainable parameters.
Remarks
This method retrieves all trainable parameters (weights and biases) and combines them into a single vector. The parameters are arranged in the following order: Wz, Wr, Wh, Uz, Ur, Uh, bz, br, bh. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights.
For Beginners: This method collects all the learnable values from the layer.
It gathers all parameters in this specific order:
- Weights for input to update gate (Wz)
- Weights for input to reset gate (Wr)
- Weights for input to candidate hidden state (Wh)
- Weights for hidden state to update gate (Uz)
- Weights for hidden state to reset gate (Ur)
- Weights for hidden state to candidate hidden state (Uh)
- Biases for update gate (bz)
- Biases for reset gate (br)
- Biases for candidate hidden state (bh)
This is useful for:
- Saving the model to disk
- Loading parameters from a previously trained model
- Advanced optimization techniques that need access to all parameters
ResetState()
Resets the internal state of the layer.
public override void ResetState()
Remarks
This method resets the internal state of the layer, clearing cached values from forward and backward passes. This includes the last input, hidden state, activation values, and all hidden states if returning sequences.
For Beginners: This method clears the layer's memory to start fresh.
When resetting the state:
- The hidden state (memory) is cleared
- All stored information about previous inputs is removed
- All gate activations are reset
This is important for:
- Processing a new, unrelated sequence
- Preventing information from one sequence affecting another
- Starting a new training episode
For example, if you've processed one sentence and want to start with a new sentence, you should reset the state to prevent the new sentence from being influenced by the previous one.
UpdateParameters(Vector<T>)
Updates the parameters of the layer with the given vector of parameter values.
public override void UpdateParameters(Vector<T> parameters)
Parameters
parametersVector<T>A vector containing all parameters to set.
Remarks
This method sets all the weight matrices and bias vectors of the GRU layer from a single vector of parameters. The parameters are arranged in the following order: Wz, Wr, Wh, Uz, Ur, Uh, bz, br, bh.
For Beginners: This method lets you directly set all the learnable values in the layer.
The parameters vector contains all weights and biases in a specific order:
- Weights for input to update gate (Wz)
- Weights for input to reset gate (Wr)
- Weights for input to candidate hidden state (Wh)
- Weights for hidden state to update gate (Uz)
- Weights for hidden state to reset gate (Ur)
- Weights for hidden state to candidate hidden state (Uh)
- Biases for update gate (bz)
- Biases for reset gate (br)
- Biases for candidate hidden state (bh)
This is useful for:
- Loading a previously saved model
- Transferring parameters from another model
- Setting specific parameter values for testing
UpdateParameters(T)
Updates the parameters of the layer using the calculated gradients.
public override void UpdateParameters(T learningRate)
Parameters
learningRateTThe learning rate to use for the parameter updates.
Remarks
This method updates all the weight matrices and bias vectors of the GRU layer based on the gradients calculated during the backward pass. The learning rate controls the size of the parameter updates. This is typically called after the backward pass during training.
For Beginners: This method updates the layer's internal values during training.
When updating parameters:
- All weights and biases are adjusted to reduce prediction errors
- The learning rate controls how big each update step is
- Smaller learning rates mean slower but more stable learning
- Larger learning rates mean faster but potentially unstable learning
This is how the layer "learns" from data over time, gradually improving its ability to understand and predict sequences.
Exceptions
- InvalidOperationException
Thrown when Backward has not been called before UpdateParameters.
UpdateParametersGpu(IGpuOptimizerConfig)
GPU-resident parameter update with polymorphic optimizer support. Updates all weight tensors directly on GPU using the specified optimizer configuration.
public override void UpdateParametersGpu(IGpuOptimizerConfig config)
Parameters
configIGpuOptimizerConfigGPU optimizer configuration specifying the optimizer type and hyperparameters.