Class BidirectionalLayer<T>
- Namespace
- AiDotNet.NeuralNetworks.Layers
- Assembly
- AiDotNet.dll
Represents a bidirectional layer that processes input sequences in both forward and backward directions.
public class BidirectionalLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
LayerBase<T>BidirectionalLayer<T>
- Implements
-
ILayer<T>
- Inherited Members
Remarks
A bidirectional layer processes input sequences in two directions: forward (from first to last) and backward (from last to first). This approach allows the layer to capture patterns that depend on both past and future context. The outputs from both directions can either be merged (typically added together) or kept separate, depending on the configuration.
For Beginners: This layer looks at input data in two ways at the same time - both forward and backward.
Think of it like reading a sentence:
- Forward reading: "The cat sat on the mat" (left to right)
- Backward reading: "mat the on sat cat The" (right to left)
By processing data in both directions:
- The layer can understand context from both past and future elements
- It can discover patterns that might be missed if only looking in one direction
- It often improves performance on sequence tasks like text processing
For example, in the sentence "The bank is by the river", the meaning of "bank" depends on both previous words ("The") and future words ("by the river"). A bidirectional layer helps capture these relationships.
Constructors
BidirectionalLayer(LayerBase<T>, bool, IActivationFunction<T>?, IEngine?)
Initializes a new instance of the BidirectionalLayer<T> class with the specified inner layer and a ReLU activation function.
public BidirectionalLayer(LayerBase<T> innerLayer, bool mergeMode = true, IActivationFunction<T>? activationFunction = null, IEngine? engine = null)
Parameters
innerLayerLayerBase<T>The layer to be used for both forward and backward processing.
mergeModeboolIf
true, outputs from both directions are added; otherwise, they are kept separate.activationFunctionIActivationFunction<T>The activation function to apply after processing. Defaults to ReLU if not specified.
engineIEngine
Remarks
This constructor creates a bidirectional layer using the specified inner layer for both forward and backward processing. A copy of the inner layer is created for backward processing to ensure independent parameters. The mergeMode parameter determines how outputs from both directions are combined.
For Beginners: This constructor creates a new bidirectional layer with a standard activation function.
When you create a bidirectional layer this way:
- The same type of layer is used for both forward and backward processing
- The mergeMode parameter decides how to combine the results from both directions
- The ReLU activation function is used by default, which helps the network learn non-linear patterns
For example, if innerLayer is an LSTM layer, this creates a bidirectional LSTM that processes sequences in both directions.
BidirectionalLayer(LayerBase<T>, bool, IVectorActivationFunction<T>?, IEngine?)
Initializes a new instance of the BidirectionalLayer<T> class with the specified inner layer and a vector activation function.
public BidirectionalLayer(LayerBase<T> innerLayer, bool mergeMode = true, IVectorActivationFunction<T>? vectorActivationFunction = null, IEngine? engine = null)
Parameters
innerLayerLayerBase<T>The layer to be used for both forward and backward processing.
mergeModeboolIf
true, outputs from both directions are added; otherwise, they are kept separate.vectorActivationFunctionIVectorActivationFunction<T>The vector activation function to apply after processing. Defaults to Identity if not specified.
engineIEngine
Remarks
This constructor creates a bidirectional layer using the specified inner layer for both forward and backward processing. A copy of the inner layer is created for backward processing to ensure independent parameters. This overload accepts a vector activation function, which operates on entire vectors rather than individual elements.
For Beginners: This constructor creates a new bidirectional layer with a vector-based activation function.
A vector activation function:
- Operates on entire groups of numbers at once, rather than one at a time
- Can capture relationships between different elements in the output
- Defaults to the Identity function, which doesn't change the values
This constructor is useful when you need more complex activation patterns that consider the relationships between different outputs.
Properties
SupportsGpuExecution
Gets a value indicating whether this layer supports GPU-accelerated forward pass.
protected override bool SupportsGpuExecution { get; }
Property Value
Remarks
The bidirectional layer supports GPU execution when both the forward and backward inner layers support GPU execution. This ensures that the entire bidirectional processing can be done on the GPU.
For Beginners: This property indicates whether this layer can use the GPU for faster processing. Since the bidirectional layer wraps two inner layers, it can only use the GPU if both of those layers support GPU execution.
SupportsGpuTraining
Gets a value indicating whether this layer supports GPU-resident training.
public override bool SupportsGpuTraining { get; }
Property Value
Remarks
The bidirectional layer supports GPU training when both inner layers support GPU training.
SupportsJitCompilation
Gets whether this layer supports JIT compilation.
public override bool SupportsJitCompilation { get; }
Property Value
- bool
True if the layer can be JIT compiled, false otherwise.
Remarks
This property indicates whether the layer has implemented ExportComputationGraph() and can benefit from JIT compilation. All layers MUST implement this property.
For Beginners: JIT compilation can make inference 5-10x faster by converting the layer's operations into optimized native code.
Layers should return false if they:
- Have not yet implemented a working ExportComputationGraph()
- Use dynamic operations that change based on input data
- Are too simple to benefit from JIT compilation
When false, the layer will use the standard Forward() method instead.
SupportsTraining
The computation engine (CPU or GPU) for vectorized operations.
public override bool SupportsTraining { get; }
Property Value
- bool
trueif either the forward or backward layer supports training; otherwise,false.
Remarks
This property indicates whether the bidirectional layer can be trained through backpropagation. The layer supports training if either of its internal layers (forward or backward) supports training. This is typically the case for layers that have trainable parameters, such as weights or biases.
For Beginners: This property tells you if the layer can learn from data.
A value of true means:
- The layer can adjust its internal values during training
- It will improve its performance as it sees more data
- It participates in the learning process
The bidirectional layer supports training if either of its two internal layers (forward or backward) supports training.
Methods
Backward(Tensor<T>)
Performs the backward pass of the bidirectional layer.
public override Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>The gradient of the loss with respect to the layer's output.
Returns
- Tensor<T>
The gradient of the loss with respect to the layer's input.
Remarks
This method implements the backward pass of the bidirectional layer, which is used during training to propagate error gradients back through the network. It splits the output gradient according to the merge mode, propagates it through both forward and backward inner layers, and combines the resulting input gradients.
For Beginners: This method is used during training to calculate how the layer's input should change to reduce errors.
During the backward pass:
- The error gradient from the next layer is received
- If the outputs were merged, the same gradient is sent to both forward and backward layers
- If the outputs were separate, the gradient is split for each direction
- The gradients from both layers are combined to update the previous layer
This process is part of the "backpropagation" algorithm that helps neural networks learn.
BackwardGpu(IGpuTensor<T>)
GPU-accelerated backward pass for the bidirectional layer.
public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)
Parameters
outputGradientIGpuTensor<T>The gradient of the loss with respect to the layer's output.
Returns
- IGpuTensor<T>
The gradient of the loss with respect to the layer's input.
Remarks
For Beginners: This is the GPU-optimized backward pass that propagates gradients through both forward and backward inner layers while keeping all data on the GPU.
ExportComputationGraph(List<ComputationNode<T>>)
Exports the layer's computation graph for JIT compilation.
public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)
Parameters
inputNodesList<ComputationNode<T>>List to populate with input computation nodes.
Returns
- ComputationNode<T>
The output computation node representing the layer's operation.
Remarks
This method constructs a computation graph representation of the layer's forward pass that can be JIT compiled for faster inference. All layers MUST implement this method to support JIT compilation.
For Beginners: JIT (Just-In-Time) compilation converts the layer's operations into optimized native code for 5-10x faster inference.
To support JIT compilation, a layer must:
- Implement this method to export its computation graph
- Set SupportsJitCompilation to true
- Use ComputationNode and TensorOperations to build the graph
All layers are required to implement this method, even if they set SupportsJitCompilation = false.
Forward(Tensor<T>)
Performs the forward pass of the bidirectional layer.
public override Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>The input tensor to process.
Returns
- Tensor<T>
The output tensor after bidirectional processing.
Remarks
This method implements the forward pass of the bidirectional layer. It processes the input in both forward and backward directions using the respective inner layers, and then combines the outputs according to the merge mode. The input and outputs are cached for use during the backward pass.
For Beginners: This method processes the input data through both forward and backward layers.
During the forward pass:
- The original input is sent through the forward layer
- A reversed version of the input is sent through the backward layer
- The results from both directions are either combined or kept separate
This method also saves the inputs and outputs for later use during training.
For example, with a text sequence, the forward layer sees "Hello world" while the backward layer sees "world Hello", and then the results are combined.
ForwardGpu(params IGpuTensor<T>[])
Performs a GPU-resident forward pass of the bidirectional layer.
public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)
Parameters
inputsIGpuTensor<T>[]GPU-resident input tensor(s).
Returns
- IGpuTensor<T>
GPU-resident output tensor after bidirectional processing.
Remarks
For Beginners: This is the GPU-optimized version of the Forward method. All data stays on the GPU throughout the computation, avoiding expensive CPU-GPU transfers. The input sequence is processed in both forward and backward directions using GPU operations, and the results are merged on the GPU.
Exceptions
- ArgumentException
Thrown when no input tensor is provided.
- InvalidOperationException
Thrown when GPU backend is unavailable or inner layers don't support GPU.
GetParameters()
Gets all trainable parameters from both the forward and backward layers as a single vector.
public override Vector<T> GetParameters()
Returns
- Vector<T>
A vector containing all trainable parameters.
Remarks
This method retrieves all trainable parameters from both the forward and backward inner layers and combines them into a single vector. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights.
For Beginners: This method collects all the learnable values from both forward and backward layers.
The parameters:
- Are the numbers that the neural network learns during training
- Include weights and biases from both forward and backward layers
- Are combined into a single long list (vector)
This is useful for:
- Saving the model to disk
- Loading parameters from a previously trained model
- Advanced optimization techniques that need access to all parameters
ResetState()
Resets the internal state of the bidirectional layer and its inner layers.
public override void ResetState()
Remarks
This method resets the internal state of the bidirectional layer, including the cached inputs and outputs, as well as the states of both forward and backward inner layers. This is useful when starting to process a new sequence or when implementing stateful recurrent networks.
For Beginners: This method clears the layer's memory to start fresh.
When resetting the state:
- Stored inputs and outputs are cleared
- Both forward and backward layers are also reset
- The layer forgets any information from previous sequences
This is important for:
- Processing a new, unrelated sequence
- Preventing information from one sequence affecting another
- Starting a new training episode
For example, if you've processed one sentence and want to start with a new sentence, you should reset the state to prevent the new sentence from being influenced by the previous one.
SetParameters(Vector<T>)
Sets the trainable parameters for both the forward and backward layers.
public override void SetParameters(Vector<T> parameters)
Parameters
parametersVector<T>A vector containing all parameters to set.
Remarks
This method sets the trainable parameters for both the forward and backward inner layers from a single vector. It extracts the appropriate portions of the input vector for each inner layer. This is useful for loading saved model weights or for implementing optimization algorithms that operate on all parameters at once.
For Beginners: This method updates all the learnable values in both forward and backward layers.
When setting parameters:
- The input must be a vector with the correct length
- The first part of the vector is used for the forward layer
- The second part of the vector is used for the backward layer
This is useful for:
- Loading a previously saved model
- Transferring parameters from another model
- Testing different parameter values
An error is thrown if the input vector doesn't have the expected number of parameters.
Exceptions
- ArgumentException
Thrown when the parameters vector has incorrect length.
UpdateParameters(T)
Updates the parameters of both forward and backward layers using the calculated gradients.
public override void UpdateParameters(T learningRate)
Parameters
learningRateTThe learning rate to use for the parameter updates.
Remarks
This method updates the parameters of both the forward and backward inner layers based on the gradients calculated during the backward pass. The learning rate controls the size of the parameter updates.
For Beginners: This method updates the layer's internal values during training.
When updating parameters:
- Both forward and backward layers are updated independently
- The learning rate controls how big each update step is
- Smaller learning rates mean slower but more stable learning
- Larger learning rates mean faster but potentially unstable learning
This is how the layer "learns" from data over time.
UpdateParametersGpu(IGpuOptimizerConfig)
Performs the backward pass on GPU tensors.
public override void UpdateParametersGpu(IGpuOptimizerConfig config)
Parameters
configIGpuOptimizerConfigThe GPU optimizer configuration.