Class GRULayer<T>

Namespace: AiDotNet.NeuralNetworks.Layers

Assembly: AiDotNet.dll

Represents a Gated Recurrent Unit (GRU) layer for processing sequential data.

public class GRULayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable

Type Parameters

T: The numeric type used for calculations, typically float or double.

Inheritance: object

LayerBase<T>

GRULayer<T>

Implements: ILayer<T>

IJitCompilable<T>

IDiagnosticsProvider

IWeightLoadable<T>

IDisposable

Inherited Members: LayerBase<T>.Engine

LayerBase<T>.ScalarActivation

LayerBase<T>.VectorActivation

LayerBase<T>.UsingVectorActivation

LayerBase<T>.NumOps

LayerBase<T>.Random

LayerBase<T>.Parameters

LayerBase<T>.ParameterGradients

LayerBase<T>.InputShape

LayerBase<T>.InputShapes

LayerBase<T>.UpdateInputShape(int[])

LayerBase<T>.OutputShape

LayerBase<T>.IsTrainingMode

LayerBase<T>.InitializationStrategy

LayerBase<T>.IsInitialized

LayerBase<T>.InitializationLock

LayerBase<T>.EnsureInitialized()

LayerBase<T>.UseAutodiff

LayerBase<T>.SetTrainingMode(bool)

LayerBase<T>.GetParameterGradients()

LayerBase<T>.ClearGradients()

LayerBase<T>.GetInputShape()

LayerBase<T>.GetInputShapes()

LayerBase<T>.GetOutputShape()

LayerBase<T>.GetWeights()

LayerBase<T>.GetBiases()

LayerBase<T>.MapActivationToFused()

LayerBase<T>.SupportsGpuTraining

LayerBase<T>.CanExecuteOnGpu

LayerBase<T>.CanTrainOnGpu

LayerBase<T>.UploadWeightsToGpu()

LayerBase<T>.DownloadWeightsFromGpu()

LayerBase<T>.ZeroGradientsGpu()

LayerBase<T>.GetActivationTypes()

LayerBase<T>.Forward(params Tensor<T>[])

LayerBase<T>.ApplyActivation(Tensor<T>)

LayerBase<T>.ApplyActivation(Vector<T>)

LayerBase<T>.ActivateTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ActivateTensor(IVectorActivationFunction<T>, Tensor<T>)

LayerBase<T>.CalculateInputShape(int, int, int)

LayerBase<T>.CalculateOutputShape(int, int, int)

LayerBase<T>.DerivativeTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ApplyActivationDerivative(T, T)

LayerBase<T>.ApplyActivationDerivative(Tensor<T>, Tensor<T>)

LayerBase<T>.ComputeActivationJacobian(Vector<T>)

LayerBase<T>.ApplyActivationDerivative(Vector<T>, Vector<T>)

LayerBase<T>.Serialize(BinaryWriter)

LayerBase<T>.Deserialize(BinaryReader)

LayerBase<T>.SetParameters(Vector<T>)

LayerBase<T>.GetDiagnostics()

LayerBase<T>.ApplyActivationToGraph(ComputationNode<T>)

LayerBase<T>.CanActivationBeJitted()

LayerBase<T>.RegisterTrainableParameter(Tensor<T>, PersistentTensorRole)

LayerBase<T>.InvalidateTrainableParameter(Tensor<T>)

LayerBase<T>.HasGpuActivation()

LayerBase<T>.ApplyActivationForwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.ApplyActivationBackwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.GetFusedActivationType()

LayerBase<T>.ApplyGpuActivation(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int, FusedActivationType)

LayerBase<T>.ApplyGpuActivationBackward(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int, FusedActivationType, float)

LayerBase<T>.Dispose()

LayerBase<T>.Dispose(bool)

LayerBase<T>.WeightParameterName

LayerBase<T>.BiasParameterName

LayerBase<T>.SetWeights(Tensor<T>)

LayerBase<T>.SetBiases(Tensor<T>)

LayerBase<T>.GetParameterNames()

LayerBase<T>.TryGetParameter(string, out Tensor<T>)

LayerBase<T>.SetParameter(string, Tensor<T>)

LayerBase<T>.GetParameterShape(string)

LayerBase<T>.NamedParameterCount

LayerBase<T>.ValidateWeights(IEnumerable<string>, Func<string, string>)

LayerBase<T>.LoadWeights(Dictionary<string, Tensor<T>>, Func<string, string>, bool)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

The GRU (Gated Recurrent Unit) layer is a type of recurrent neural network layer that is designed to capture dependencies over time in sequential data. It addresses the vanishing gradient problem that standard recurrent neural networks face when dealing with long sequences. The GRU uses update and reset gates to control the flow of information, allowing the network to retain relevant information over many time steps while forgetting irrelevant details.

For Beginners: This layer helps neural networks understand sequences of data, like sentences or time series.

Think of the GRU as having a "memory" that helps it understand context:

When reading a sentence, it remembers important words from earlier
When analyzing stock prices, it remembers relevant trends from previous days
It uses special "gates" to decide what information to keep or forget

For example, in the sentence "The clouds were dark and it started to ___", the GRU would recognize the context and predict "rain" because it remembers the earlier words about dark clouds.

GRUs are simpler versions of LSTMs (Long Short-Term Memory) but often perform similarly well while being more efficient to train.

Constructors

GRULayer(int, int, bool, IActivationFunction<T>?, IActivationFunction<T>?)

Initializes a new instance of the GRULayer<T> class with the specified dimensions, return behavior, and element-wise activation functions.

public GRULayer(int inputSize, int hiddenSize, bool returnSequences = false, IActivationFunction<T>? activation = null, IActivationFunction<T>? recurrentActivation = null)

Parameters

inputSize int: The size of the input feature vector at each time step.
hiddenSize int: The size of the hidden state vector.
returnSequences bool: If true, returns all hidden states; if false, returns only the final hidden state.
activation IActivationFunction<T>: The activation function for the candidate hidden state. Defaults to tanh if not specified.
recurrentActivation IActivationFunction<T>: The activation function for the gates. Defaults to sigmoid if not specified.

Remarks

This constructor creates a new GRU layer with the specified dimensions and element-wise activation functions. The weights are initialized randomly with a scale factor based on the hidden size, and the biases are initialized to zero.

For Beginners: This creates a new GRU layer with standard activation functions.

When creating a GRU layer, you specify:

inputSize: How many features each element in your sequence has
hiddenSize: How large the GRU's "memory" should be
returnSequences: Whether you want information about every element or just a final summary
activation: How to shape new information (default is tanh, outputting values between -1 and 1)
recurrentActivation: How the gates should work (default is sigmoid, outputting values between 0 and 1)

For example, if processing sentences where each word is represented by a 100-dimensional vector, and you want a 200-dimensional memory, you would use inputSize=100 and hiddenSize=200.

GRULayer(int, int, bool, IVectorActivationFunction<T>?, IVectorActivationFunction<T>?)

Initializes a new instance of the GRULayer<T> class with the specified dimensions, return behavior, and vector activation functions.

public GRULayer(int inputSize, int hiddenSize, bool returnSequences = false, IVectorActivationFunction<T>? vectorActivation = null, IVectorActivationFunction<T>? vectorRecurrentActivation = null)

Parameters

inputSize int: The size of the input feature vector at each time step.
hiddenSize int: The size of the hidden state vector.
returnSequences bool: If true, returns all hidden states; if false, returns only the final hidden state.
vectorActivation IVectorActivationFunction<T>: The vector activation function for the candidate hidden state. Defaults to tanh if not specified.
vectorRecurrentActivation IVectorActivationFunction<T>: The vector activation function for the gates. Defaults to sigmoid if not specified.

Remarks

This constructor creates a new GRU layer with the specified dimensions and vector activation functions. Vector activation functions operate on entire vectors rather than individual elements, which can capture dependencies between different elements of the vectors.

For Beginners: This creates a new GRU layer with more advanced vector-based activation functions.

Vector activation functions:

Process entire groups of numbers together, not just one at a time
Can capture relationships between different features
May be more powerful for complex patterns

This constructor is useful when you need the layer to understand how different features interact with each other, rather than treating each feature independently.

Properties

ParameterCount

Gets the total number of trainable parameters in the layer.

public override int ParameterCount { get; }

Property Value

int: The total number of weight and bias parameters in the GRU layer.

Remarks

This property calculates the total number of trainable parameters in the GRU layer, which includes all the weights and biases for the gates and candidate hidden state.

For Beginners: This tells you how many numbers the layer needs to learn.

The formula counts:

Weights connecting inputs to the GRU (Wz, Wr, Wh)
Weights connecting the previous hidden state (Uz, Ur, Uh)
Bias values for each gate and candidate state (bz, br, bh)

A higher parameter count means the model can capture more complex patterns but requires more data and time to train effectively.

SupportsGpuExecution

Gets a value indicating whether this layer supports GPU execution.

protected override bool SupportsGpuExecution { get; }

Property Value

bool

SupportsJitCompilation

Gets whether this layer currently supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool: True for GRU layers, as single time-step JIT compilation is supported.

SupportsTraining

Gets a value indicating whether this layer supports training.

public override bool SupportsTraining { get; }

Property Value

bool: true because this layer has trainable parameters (weights and biases).

Remarks

This property indicates whether the layer can be trained through backpropagation. The GRULayer always returns true because it contains trainable weights and biases.

For Beginners: This property tells you if the layer can learn from data.

A value of true means:

The layer can adjust its internal values during training
It will improve its performance as it sees more data
It participates in the learning process

The GRU layer always supports training because it has weights and biases that can be updated.

Methods

Backward(Tensor<T>)

Performs the backward pass of the GRU layer.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>: The gradient of the loss with respect to the layer's output.

Returns

Tensor<T>: The gradient of the loss with respect to the layer's input.

Remarks

This method implements the backward pass of the GRU layer, which is used during training to propagate error gradients back through the network. It calculates the gradients for all the weights and biases, and returns the gradient with respect to the layer's input for further backpropagation.

For Beginners: This method is used during training to calculate how the layer's input and parameters should change to reduce errors.

During the backward pass:

The layer receives information about how its output should change to reduce the overall error
It calculates how each of its weights and biases should change to produce better output
It calculates how its input should change, which will be used by earlier layers

This complex calculation essentially runs the GRU's logic in reverse, tracking how changes to the output would affect each internal part of the layer.

Exceptions

InvalidOperationException: Thrown when Forward has not been called before Backward.

BackwardGpu(IGpuTensor<T>)

GPU-resident backward pass using fused sequence kernel. Computes gradients for all weights and biases in a single kernel launch.

public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)

Parameters

outputGradient IGpuTensor<T>: Gradient of the loss with respect to the layer output.

Returns

IGpuTensor<T>: Gradient of the loss with respect to the layer input.

Clone()

Creates a deep copy of this GRU layer with independent weights and reset state.

public override LayerBase<T> Clone()

Returns

LayerBase<T>: A new GRULayer with the same weights but independent of the original.

ExportComputationGraph(List<ComputationNode<T>>)

Exports the GRU layer's single time-step computation as a JIT-compilable computation graph.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>: List to populate with input computation nodes.

Returns

ComputationNode<T>: The output computation node representing the hidden state at one time step.

Remarks

This method exports a single GRU cell computation for JIT compilation. The graph computes: h_t = GRUCell(x_t, h_{t-1}) using the standard GRU equations with update gate, reset gate, and candidate hidden state.

Forward(Tensor<T>)

Performs the forward pass of the GRU layer.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>: The input tensor to process. Shape should be [batchSize, sequenceLength, inputSize].

Returns

Tensor<T>: The output tensor. If returnSequences is true, shape will be [batchSize, sequenceLength, hiddenSize]; otherwise, [batchSize, hiddenSize].

Remarks

This method implements the forward pass of the GRU layer. It processes the input sequence step by step, updating the hidden state at each time step according to the GRU equations. The update gate (z) controls how much of the previous hidden state to keep, the reset gate (r) controls how much of the previous hidden state to reset, and the candidate hidden state (h_candidate) contains new information from the current input.

For Beginners: This method processes your sequence data through the GRU.

For each element in your sequence (like each word in a sentence):

The update gate (z) decides how much of the old memory to keep
The reset gate (r) decides how much of the old memory to forget
The layer creates new information based on the current input and relevant memory
It combines the kept memory and new information to update its understanding

This process repeats for each element in the sequence, with the memory evolving to capture the relevant context from the entire sequence.

The final output depends on the returnSequences setting:

If true: Returns information about every element in the sequence
If false: Returns only the final memory state

ForwardGpu(params IGpuTensor<T>[])

Performs the forward pass on GPU tensors using fused sequence kernel.

public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)

Parameters

inputs IGpuTensor<T>[]: GPU tensor inputs.

Returns

IGpuTensor<T>: GPU tensor output after GRU processing.

Exceptions

ArgumentException: Thrown when no input tensor is provided.
InvalidOperationException: Thrown when GPU backend is unavailable.

GetParameters()

Gets all trainable parameters of the layer as a single vector.

public override Vector<T> GetParameters()

Returns

Vector<T>: A vector containing all trainable parameters.

Remarks

This method retrieves all trainable parameters (weights and biases) and combines them into a single vector. The parameters are arranged in the following order: Wz, Wr, Wh, Uz, Ur, Uh, bz, br, bh. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights.

For Beginners: This method collects all the learnable values from the layer.

It gathers all parameters in this specific order:

Weights for input to update gate (Wz)
Weights for input to reset gate (Wr)
Weights for input to candidate hidden state (Wh)
Weights for hidden state to update gate (Uz)
Weights for hidden state to reset gate (Ur)
Weights for hidden state to candidate hidden state (Uh)
Biases for update gate (bz)
Biases for reset gate (br)
Biases for candidate hidden state (bh)

This is useful for:

Saving the model to disk
Loading parameters from a previously trained model
Advanced optimization techniques that need access to all parameters

ResetState()

Resets the internal state of the layer.

public override void ResetState()

Remarks

This method resets the internal state of the layer, clearing cached values from forward and backward passes. This includes the last input, hidden state, activation values, and all hidden states if returning sequences.

For Beginners: This method clears the layer's memory to start fresh.

When resetting the state:

The hidden state (memory) is cleared
All stored information about previous inputs is removed
All gate activations are reset

This is important for:

Processing a new, unrelated sequence
Preventing information from one sequence affecting another
Starting a new training episode

For example, if you've processed one sentence and want to start with a new sentence, you should reset the state to prevent the new sentence from being influenced by the previous one.

UpdateParameters(Vector<T>)

Updates the parameters of the layer with the given vector of parameter values.

public override void UpdateParameters(Vector<T> parameters)

Parameters

parameters Vector<T>: A vector containing all parameters to set.

Remarks

This method sets all the weight matrices and bias vectors of the GRU layer from a single vector of parameters. The parameters are arranged in the following order: Wz, Wr, Wh, Uz, Ur, Uh, bz, br, bh.

For Beginners: This method lets you directly set all the learnable values in the layer.

The parameters vector contains all weights and biases in a specific order:

Weights for input to update gate (Wz)
Weights for input to reset gate (Wr)
Weights for input to candidate hidden state (Wh)
Weights for hidden state to update gate (Uz)
Weights for hidden state to reset gate (Ur)
Weights for hidden state to candidate hidden state (Uh)
Biases for update gate (bz)
Biases for reset gate (br)
Biases for candidate hidden state (bh)

This is useful for:

Loading a previously saved model
Transferring parameters from another model
Setting specific parameter values for testing

UpdateParameters(T)

Updates the parameters of the layer using the calculated gradients.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T: The learning rate to use for the parameter updates.

Remarks

This method updates all the weight matrices and bias vectors of the GRU layer based on the gradients calculated during the backward pass. The learning rate controls the size of the parameter updates. This is typically called after the backward pass during training.

For Beginners: This method updates the layer's internal values during training.

When updating parameters:

All weights and biases are adjusted to reduce prediction errors
The learning rate controls how big each update step is
Smaller learning rates mean slower but more stable learning
Larger learning rates mean faster but potentially unstable learning

This is how the layer "learns" from data over time, gradually improving its ability to understand and predict sequences.

Exceptions

InvalidOperationException: Thrown when Backward has not been called before UpdateParameters.

UpdateParametersGpu(IGpuOptimizerConfig)

GPU-resident parameter update with polymorphic optimizer support. Updates all weight tensors directly on GPU using the specified optimizer configuration.

public override void UpdateParametersGpu(IGpuOptimizerConfig config)

Parameters

config IGpuOptimizerConfig: GPU optimizer configuration specifying the optimizer type and hyperparameters.

Table of Contents

Class GRULayer<T>

Type Parameters

Remarks

Constructors

GRULayer(int, int, bool, IActivationFunction<T>?, IActivationFunction<T>?)

Parameters

Remarks

GRULayer(int, int, bool, IVectorActivationFunction<T>?, IVectorActivationFunction<T>?)

Parameters

Remarks

Properties

ParameterCount

Property Value

Remarks

SupportsGpuExecution

Property Value

SupportsJitCompilation

Property Value

SupportsTraining

Property Value

Remarks

Methods

Backward(Tensor<T>)

Parameters

Returns

Remarks

Exceptions

BackwardGpu(IGpuTensor<T>)

Parameters

Returns

Clone()

Returns

ExportComputationGraph(List<ComputationNode<T>>)

Parameters

Returns

Remarks

Forward(Tensor<T>)

Parameters

Returns

Remarks

ForwardGpu(params IGpuTensor<T>[])

Parameters

Returns

Exceptions

GetParameters()

Returns

Remarks

ResetState()

Remarks

UpdateParameters(Vector<T>)

Parameters

Remarks

UpdateParameters(T)

Parameters

Remarks

Exceptions

UpdateParametersGpu(IGpuOptimizerConfig)

Parameters