Table of Contents

Class DecoderLayer<T>

Namespace
AiDotNet.NeuralNetworks.Layers
Assembly
AiDotNet.dll

Represents a Decoder Layer in a Transformer architecture.

public class DecoderLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable

Type Parameters

T

The numeric type used for calculations (e.g., float, double).

Inheritance
DecoderLayer<T>
Implements
Inherited Members

Remarks

The Decoder Layer is a key component in sequence-to-sequence models, particularly in Transformer architectures. It processes the target sequence and incorporates information from the encoder output.

For Beginners: The Decoder Layer helps in generating output sequences (like translations) by considering both what it has generated so far and the information from the input sequence.

It's like writing a story where you:

  1. Look at what you've written so far (self-attention)
  2. Refer back to your source material (cross-attention to encoder output)
  3. Think about how to continue the story (feed-forward network)

This process helps in creating coherent and context-aware outputs.

Constructors

DecoderLayer(int, int, int, IActivationFunction<T>?)

Initializes a new instance of the DecoderLayer class with scalar activation.

public DecoderLayer(int inputSize, int attentionSize, int feedForwardSize, IActivationFunction<T>? activation = null)

Parameters

inputSize int

The size of the input features.

attentionSize int

The size of the attention mechanism.

feedForwardSize int

The size of the feed-forward network.

activation IActivationFunction<T>

The scalar activation function to use. If null, ReLUActivation is used.

DecoderLayer(int, int, int, IVectorActivationFunction<T>?)

Initializes a new instance of the DecoderLayer class with vector activation.

public DecoderLayer(int inputSize, int attentionSize, int feedForwardSize, IVectorActivationFunction<T>? activation = null)

Parameters

inputSize int

The size of the input features.

attentionSize int

The size of the attention mechanism.

feedForwardSize int

The size of the feed-forward network.

activation IVectorActivationFunction<T>

The vector activation function to use. If null, ReLUActivation is used.

Properties

InputSize

Gets the size of the input features for this layer.

public int InputSize { get; }

Property Value

int

LastBackwardGradients

Gets the most recent gradients calculated during the backward pass.

public (Tensor<T> InputGradient, Tensor<T> EncoderOutputGradient) LastBackwardGradients { get; }

Property Value

(Tensor<T> grad1, Tensor<T> grad2)

A tuple containing the gradient with respect to the input and the gradient with respect to the encoder output.

Remarks

For Beginners: This property provides easy access to the separate gradients calculated during the last backward pass. It's useful when you need to handle the input gradient and encoder output gradient separately, rather than dealing with the concatenated gradient returned by the Backward method.

Exceptions

InvalidOperationException

Thrown when accessed before a backward pass has been performed.

ParameterCount

Gets the total number of trainable parameters in the layer.

public override int ParameterCount { get; }

Property Value

int

Remarks

For Beginners: This property calculates and returns the total number of parameters in the decoder layer by summing the parameter counts of all its components. This is useful for understanding the complexity of the layer and for certain optimization techniques.

SupportsGpuExecution

Gets a value indicating whether this layer supports GPU execution.

protected override bool SupportsGpuExecution { get; }

Property Value

bool

SupportsJitCompilation

Gets a value indicating whether this layer supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool

true because DecoderLayer can be compiled with multiple input nodes representing the decoder input and encoder output. The computation graph supports multiple inputs.

SupportsTraining

Gets a value indicating whether this layer supports training.

public override bool SupportsTraining { get; }

Property Value

bool

Methods

Backward(Tensor<T>)

Performs the backward pass of the decoder layer.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>

The gradient of the loss with respect to the layer's output.

Returns

Tensor<T>

A concatenated tensor containing gradients for both the input and the encoder output.

Remarks

For Beginners: This method calculates how much each part of the input and the encoder output contributed to the error. It's used during training to update the layer's parameters. The method expects the forward pass to have been called first, as it uses information stored during the forward pass.

The returned tensor is a concatenation of two gradients: the gradient with respect to the input and the gradient with respect to the encoder output. Use the LastBackwardGradients property to access these gradients separately.

Exceptions

InvalidOperationException

Thrown when backward is called before forward.

BackwardGpu(IGpuTensor<T>)

Computes the gradient of the loss with respect to the inputs on the GPU.

public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)

Parameters

outputGradient IGpuTensor<T>

The gradient of the loss with respect to the layer's output.

Returns

IGpuTensor<T>

The gradient of the loss with respect to the decoder input.

ExportComputationGraph(List<ComputationNode<T>>)

Exports the layer's computation graph for JIT compilation.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>

List to populate with input computation nodes.

Returns

ComputationNode<T>

The output computation node representing the layer's operation.

Remarks

This method constructs a computation graph representation of the layer's forward pass that can be JIT compiled for faster inference. All layers MUST implement this method to support JIT compilation.

For Beginners: JIT (Just-In-Time) compilation converts the layer's operations into optimized native code for 5-10x faster inference.

To support JIT compilation, a layer must:

  1. Implement this method to export its computation graph
  2. Set SupportsJitCompilation to true
  3. Use ComputationNode and TensorOperations to build the graph

All layers are required to implement this method, even if they set SupportsJitCompilation = false.

Forward(Tensor<T>)

Single-input forward pass uses the input as both decoder input and encoder output.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>

The input tensor.

Returns

Tensor<T>

Remarks

For Beginners: When only one input is provided, the decoder attends to itself by reusing the input for both decoder and encoder streams. Use the overload that accepts multiple tensors to supply a separate encoder output when available.

Forward(params Tensor<T>[])

Performs the forward pass of the decoder layer.

public override Tensor<T> Forward(params Tensor<T>[] inputs)

Parameters

inputs Tensor<T>[]

An array of input tensors. The first tensor is the decoder input, the second is the encoder output, and the third (optional) is the attention mask.

Returns

Tensor<T>

The output tensor after processing through the decoder layer.

Remarks

For Beginners: This method processes the inputs through the decoder layer. It expects two or three input tensors: the decoder's own input, the encoder's output, and optionally an attention mask. The method combines these inputs, processes them through the layer, and returns the final output. The attention mask, if provided, helps control which parts of the input sequence the layer should focus on.

Exceptions

ArgumentException

Thrown when the number of input tensors or their shapes are invalid.

ForwardGpu(params IGpuTensor<T>[])

Performs the GPU-resident forward pass of the decoder layer.

public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)

Parameters

inputs IGpuTensor<T>[]

The GPU input tensors: [decoderInput, encoderOutput].

Returns

IGpuTensor<T>

The GPU output tensor after self-attention, cross-attention, and FFN.

Remarks

All computations stay on GPU. Chains: SelfAttention → Norm1 → CrossAttention → Norm2 → FFN → Norm3.

GetParameters()

Retrieves the current parameters of the layer.

public override Vector<T> GetParameters()

Returns

Vector<T>

A vector containing all the parameters of the layer.

Remarks

For Beginners: This method collects all the parameters from the various components of the decoder layer (self-attention, cross-attention, feed-forward network, and layer normalizations) and combines them into a single vector. This is useful for operations that need to work with all parameters at once, such as optimization algorithms.

ResetState()

Resets the state of the decoder layer.

public override void ResetState()

Remarks

For Beginners: This method clears any stored state in the decoder layer and its components. It's typically called between processing different sequences or at the start of a new epoch in training. Resetting the state ensures that information from previous inputs doesn't affect the processing of new, unrelated inputs.

UpdateParameters(Vector<T>)

Updates the layer's parameters with the provided values.

public override void UpdateParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

A vector containing new parameter values.

Remarks

For Beginners: This method takes a vector of new parameter values and distributes them to the various components of the decoder layer. It's the opposite of GetParameters() and is typically used after an optimization step to update the layer with improved parameter values.

UpdateParameters(T)

Updates the layer's parameters based on the computed gradients and a learning rate.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T

The learning rate to use for the update.

Remarks

For Beginners: This method applies the calculated gradients to the layer's parameters, effectively "learning" from the training data. The learning rate determines how big of a step we take in the direction suggested by the gradients.

UpdateParametersGpu(IGpuOptimizerConfig)

Updates layer parameters using GPU-resident optimizer.

public override void UpdateParametersGpu(IGpuOptimizerConfig config)

Parameters

config IGpuOptimizerConfig

The GPU optimizer configuration.

Remarks

This method delegates to each sublayer's UpdateParametersGpu method. All sublayers (self-attention, cross-attention, layer norms, feed-forward) are updated.