Class DecoderLayer<T>
- Namespace
- AiDotNet.NeuralNetworks.Layers
- Assembly
- AiDotNet.dll
Represents a Decoder Layer in a Transformer architecture.
public class DecoderLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable
Type Parameters
TThe numeric type used for calculations (e.g., float, double).
- Inheritance
-
LayerBase<T>DecoderLayer<T>
- Implements
-
ILayer<T>
- Inherited Members
Remarks
The Decoder Layer is a key component in sequence-to-sequence models, particularly in Transformer architectures. It processes the target sequence and incorporates information from the encoder output.
For Beginners: The Decoder Layer helps in generating output sequences (like translations) by considering both what it has generated so far and the information from the input sequence.
It's like writing a story where you:
- Look at what you've written so far (self-attention)
- Refer back to your source material (cross-attention to encoder output)
- Think about how to continue the story (feed-forward network)
This process helps in creating coherent and context-aware outputs.
Constructors
DecoderLayer(int, int, int, IActivationFunction<T>?)
Initializes a new instance of the DecoderLayer class with scalar activation.
public DecoderLayer(int inputSize, int attentionSize, int feedForwardSize, IActivationFunction<T>? activation = null)
Parameters
inputSizeintThe size of the input features.
attentionSizeintThe size of the attention mechanism.
feedForwardSizeintThe size of the feed-forward network.
activationIActivationFunction<T>The scalar activation function to use. If null, ReLUActivation is used.
DecoderLayer(int, int, int, IVectorActivationFunction<T>?)
Initializes a new instance of the DecoderLayer class with vector activation.
public DecoderLayer(int inputSize, int attentionSize, int feedForwardSize, IVectorActivationFunction<T>? activation = null)
Parameters
inputSizeintThe size of the input features.
attentionSizeintThe size of the attention mechanism.
feedForwardSizeintThe size of the feed-forward network.
activationIVectorActivationFunction<T>The vector activation function to use. If null, ReLUActivation is used.
Properties
InputSize
Gets the size of the input features for this layer.
public int InputSize { get; }
Property Value
LastBackwardGradients
Gets the most recent gradients calculated during the backward pass.
public (Tensor<T> InputGradient, Tensor<T> EncoderOutputGradient) LastBackwardGradients { get; }
Property Value
- (Tensor<T> grad1, Tensor<T> grad2)
A tuple containing the gradient with respect to the input and the gradient with respect to the encoder output.
Remarks
For Beginners: This property provides easy access to the separate gradients calculated during the last backward pass. It's useful when you need to handle the input gradient and encoder output gradient separately, rather than dealing with the concatenated gradient returned by the Backward method.
Exceptions
- InvalidOperationException
Thrown when accessed before a backward pass has been performed.
ParameterCount
Gets the total number of trainable parameters in the layer.
public override int ParameterCount { get; }
Property Value
Remarks
For Beginners: This property calculates and returns the total number of parameters in the decoder layer by summing the parameter counts of all its components. This is useful for understanding the complexity of the layer and for certain optimization techniques.
SupportsGpuExecution
Gets a value indicating whether this layer supports GPU execution.
protected override bool SupportsGpuExecution { get; }
Property Value
SupportsJitCompilation
Gets a value indicating whether this layer supports JIT compilation.
public override bool SupportsJitCompilation { get; }
Property Value
- bool
truebecause DecoderLayer can be compiled with multiple input nodes representing the decoder input and encoder output. The computation graph supports multiple inputs.
SupportsTraining
Gets a value indicating whether this layer supports training.
public override bool SupportsTraining { get; }
Property Value
Methods
Backward(Tensor<T>)
Performs the backward pass of the decoder layer.
public override Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>The gradient of the loss with respect to the layer's output.
Returns
- Tensor<T>
A concatenated tensor containing gradients for both the input and the encoder output.
Remarks
For Beginners: This method calculates how much each part of the input and the encoder output contributed to the error. It's used during training to update the layer's parameters. The method expects the forward pass to have been called first, as it uses information stored during the forward pass.
The returned tensor is a concatenation of two gradients: the gradient with respect to the input and the gradient with respect to the encoder output. Use the LastBackwardGradients property to access these gradients separately.
Exceptions
- InvalidOperationException
Thrown when backward is called before forward.
BackwardGpu(IGpuTensor<T>)
Computes the gradient of the loss with respect to the inputs on the GPU.
public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)
Parameters
outputGradientIGpuTensor<T>The gradient of the loss with respect to the layer's output.
Returns
- IGpuTensor<T>
The gradient of the loss with respect to the decoder input.
ExportComputationGraph(List<ComputationNode<T>>)
Exports the layer's computation graph for JIT compilation.
public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)
Parameters
inputNodesList<ComputationNode<T>>List to populate with input computation nodes.
Returns
- ComputationNode<T>
The output computation node representing the layer's operation.
Remarks
This method constructs a computation graph representation of the layer's forward pass that can be JIT compiled for faster inference. All layers MUST implement this method to support JIT compilation.
For Beginners: JIT (Just-In-Time) compilation converts the layer's operations into optimized native code for 5-10x faster inference.
To support JIT compilation, a layer must:
- Implement this method to export its computation graph
- Set SupportsJitCompilation to true
- Use ComputationNode and TensorOperations to build the graph
All layers are required to implement this method, even if they set SupportsJitCompilation = false.
Forward(Tensor<T>)
Single-input forward pass uses the input as both decoder input and encoder output.
public override Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>The input tensor.
Returns
- Tensor<T>
Remarks
For Beginners: When only one input is provided, the decoder attends to itself by reusing the input for both decoder and encoder streams. Use the overload that accepts multiple tensors to supply a separate encoder output when available.
Forward(params Tensor<T>[])
Performs the forward pass of the decoder layer.
public override Tensor<T> Forward(params Tensor<T>[] inputs)
Parameters
inputsTensor<T>[]An array of input tensors. The first tensor is the decoder input, the second is the encoder output, and the third (optional) is the attention mask.
Returns
- Tensor<T>
The output tensor after processing through the decoder layer.
Remarks
For Beginners: This method processes the inputs through the decoder layer. It expects two or three input tensors: the decoder's own input, the encoder's output, and optionally an attention mask. The method combines these inputs, processes them through the layer, and returns the final output. The attention mask, if provided, helps control which parts of the input sequence the layer should focus on.
Exceptions
- ArgumentException
Thrown when the number of input tensors or their shapes are invalid.
ForwardGpu(params IGpuTensor<T>[])
Performs the GPU-resident forward pass of the decoder layer.
public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)
Parameters
inputsIGpuTensor<T>[]The GPU input tensors: [decoderInput, encoderOutput].
Returns
- IGpuTensor<T>
The GPU output tensor after self-attention, cross-attention, and FFN.
Remarks
All computations stay on GPU. Chains: SelfAttention → Norm1 → CrossAttention → Norm2 → FFN → Norm3.
GetParameters()
Retrieves the current parameters of the layer.
public override Vector<T> GetParameters()
Returns
- Vector<T>
A vector containing all the parameters of the layer.
Remarks
For Beginners: This method collects all the parameters from the various components of the decoder layer (self-attention, cross-attention, feed-forward network, and layer normalizations) and combines them into a single vector. This is useful for operations that need to work with all parameters at once, such as optimization algorithms.
ResetState()
Resets the state of the decoder layer.
public override void ResetState()
Remarks
For Beginners: This method clears any stored state in the decoder layer and its components. It's typically called between processing different sequences or at the start of a new epoch in training. Resetting the state ensures that information from previous inputs doesn't affect the processing of new, unrelated inputs.
UpdateParameters(Vector<T>)
Updates the layer's parameters with the provided values.
public override void UpdateParameters(Vector<T> parameters)
Parameters
parametersVector<T>A vector containing new parameter values.
Remarks
For Beginners: This method takes a vector of new parameter values and distributes them to the various components of the decoder layer. It's the opposite of GetParameters() and is typically used after an optimization step to update the layer with improved parameter values.
UpdateParameters(T)
Updates the layer's parameters based on the computed gradients and a learning rate.
public override void UpdateParameters(T learningRate)
Parameters
learningRateTThe learning rate to use for the update.
Remarks
For Beginners: This method applies the calculated gradients to the layer's parameters, effectively "learning" from the training data. The learning rate determines how big of a step we take in the direction suggested by the gradients.
UpdateParametersGpu(IGpuOptimizerConfig)
Updates layer parameters using GPU-resident optimizer.
public override void UpdateParametersGpu(IGpuOptimizerConfig config)
Parameters
configIGpuOptimizerConfigThe GPU optimizer configuration.
Remarks
This method delegates to each sublayer's UpdateParametersGpu method. All sublayers (self-attention, cross-attention, layer norms, feed-forward) are updated.