Class PositionalEncodingLayer<T>

Namespace: AiDotNet.NeuralNetworks.Layers

Assembly: AiDotNet.dll

Represents a layer that adds positional encodings to input sequences.

public class PositionalEncodingLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable

Type Parameters

T: The numeric type used for calculations, typically float or double.

Inheritance: object

LayerBase<T>

PositionalEncodingLayer<T>

Implements: ILayer<T>

IJitCompilable<T>

IDiagnosticsProvider

IWeightLoadable<T>

IDisposable

Inherited Members: LayerBase<T>.Engine

LayerBase<T>.ScalarActivation

LayerBase<T>.VectorActivation

LayerBase<T>.UsingVectorActivation

LayerBase<T>.NumOps

LayerBase<T>.Random

LayerBase<T>.Parameters

LayerBase<T>.ParameterGradients

LayerBase<T>.InputShape

LayerBase<T>.InputShapes

LayerBase<T>.UpdateInputShape(int[])

LayerBase<T>.OutputShape

LayerBase<T>.IsTrainingMode

LayerBase<T>.InitializationStrategy

LayerBase<T>.IsInitialized

LayerBase<T>.InitializationLock

LayerBase<T>.EnsureInitialized()

LayerBase<T>.UseAutodiff

LayerBase<T>.SetTrainingMode(bool)

LayerBase<T>.GetParameterGradients()

LayerBase<T>.ClearGradients()

LayerBase<T>.GetInputShape()

LayerBase<T>.GetInputShapes()

LayerBase<T>.GetOutputShape()

LayerBase<T>.GetWeights()

LayerBase<T>.GetBiases()

LayerBase<T>.MapActivationToFused()

LayerBase<T>.SupportsGpuTraining

LayerBase<T>.CanExecuteOnGpu

LayerBase<T>.CanTrainOnGpu

LayerBase<T>.UpdateParametersGpu(IGpuOptimizerConfig)

LayerBase<T>.UploadWeightsToGpu()

LayerBase<T>.DownloadWeightsFromGpu()

LayerBase<T>.ZeroGradientsGpu()

LayerBase<T>.GetActivationTypes()

LayerBase<T>.Forward(params Tensor<T>[])

LayerBase<T>.ApplyActivation(Tensor<T>)

LayerBase<T>.ApplyActivation(Vector<T>)

LayerBase<T>.ActivateTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ActivateTensor(IVectorActivationFunction<T>, Tensor<T>)

LayerBase<T>.CalculateInputShape(int, int, int)

LayerBase<T>.CalculateOutputShape(int, int, int)

LayerBase<T>.Clone()

LayerBase<T>.DerivativeTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ApplyActivationDerivative(T, T)

LayerBase<T>.ApplyActivationDerivative(Tensor<T>, Tensor<T>)

LayerBase<T>.ComputeActivationJacobian(Vector<T>)

LayerBase<T>.ApplyActivationDerivative(Vector<T>, Vector<T>)

LayerBase<T>.UpdateParameters(Vector<T>)

LayerBase<T>.ParameterCount

LayerBase<T>.Serialize(BinaryWriter)

LayerBase<T>.Deserialize(BinaryReader)

LayerBase<T>.SetParameters(Vector<T>)

LayerBase<T>.GetDiagnostics()

LayerBase<T>.ApplyActivationToGraph(ComputationNode<T>)

LayerBase<T>.CanActivationBeJitted()

LayerBase<T>.RegisterTrainableParameter(Tensor<T>, PersistentTensorRole)

LayerBase<T>.InvalidateTrainableParameter(Tensor<T>)

LayerBase<T>.HasGpuActivation()

LayerBase<T>.ApplyActivationForwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.ApplyActivationBackwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.GetFusedActivationType()

LayerBase<T>.ApplyGpuActivation(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int, FusedActivationType)

LayerBase<T>.ApplyGpuActivationBackward(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int, FusedActivationType, float)

LayerBase<T>.Dispose()

LayerBase<T>.Dispose(bool)

LayerBase<T>.WeightParameterName

LayerBase<T>.BiasParameterName

LayerBase<T>.SetWeights(Tensor<T>)

LayerBase<T>.SetBiases(Tensor<T>)

LayerBase<T>.GetParameterNames()

LayerBase<T>.TryGetParameter(string, out Tensor<T>)

LayerBase<T>.SetParameter(string, Tensor<T>)

LayerBase<T>.GetParameterShape(string)

LayerBase<T>.NamedParameterCount

LayerBase<T>.ValidateWeights(IEnumerable<string>, Func<string, string>)

LayerBase<T>.LoadWeights(Dictionary<string, Tensor<T>>, Func<string, string>, bool)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

The PositionalEncodingLayer adds position-dependent signals to input embeddings, which helps sequence models like Transformers understand the order of elements in a sequence. Since attention-based models have no inherent notion of sequence order, positional encodings provide this critical information. The encodings use sine and cosine functions of different frequencies to create unique position-dependent patterns.

For Beginners: This layer adds information about position to your sequence data.

Think of it like numbering the words in a sentence:

Without position information, a model only knows which words are in the sentence
With position information, it knows which word comes first, second, third, etc.

For example, the sentences "dog bites man" and "man bites dog" contain the same words but have completely different meanings because of word order. Positional encoding helps models understand this difference.

The layer uses a clever mathematical pattern of sine and cosine waves to encode positions. This approach has several advantages:

It creates a unique pattern for each position
Similar positions have similar encodings (helpful for generalization)
It can potentially handle sequences longer than those seen during training
The encodings have consistent patterns that models can learn from

Constructors

PositionalEncodingLayer(int, int)

Initializes a new instance of the PositionalEncodingLayer<T> class with the specified maximum sequence length and embedding size.

public PositionalEncodingLayer(int maxSequenceLength, int embeddingSize)

Parameters

maxSequenceLength int: The maximum sequence length that this layer can handle.
embeddingSize int: The size of each embedding vector.

Remarks

This constructor creates a PositionalEncodingLayer with the specified maximum sequence length and embedding size. It initializes the positional encodings using sine and cosine functions of different frequencies, following the formula from the "Attention Is All You Need" paper.

For Beginners: This constructor sets up the layer with the necessary dimensions.

When creating a PositionalEncodingLayer, you need to specify:

maxSequenceLength: The longest sequence your model will handle (e.g., 512 for text processing)
embeddingSize: The size of your embedding vectors (e.g., 512 or 768 dimensions)

During initialization, the layer pre-calculates all the positional encodings using the sine/cosine formula. This is more efficient than calculating them each time.

The formula alternates between sine and cosine functions across the embedding dimensions, with different frequencies for different dimensions. This creates a unique pattern for each position that the model can learn to recognize.

Properties

SupportsGpuExecution

Gets whether this layer supports GPU execution.

protected override bool SupportsGpuExecution { get; }

Property Value

bool

SupportsJitCompilation

Gets whether this layer supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool: True if the layer can be JIT compiled, false otherwise.

Remarks

This property indicates whether the layer has implemented ExportComputationGraph() and can benefit from JIT compilation. All layers MUST implement this property.

For Beginners: JIT compilation can make inference 5-10x faster by converting the layer's operations into optimized native code.

Layers should return false if they:

Have not yet implemented a working ExportComputationGraph()
Use dynamic operations that change based on input data
Are too simple to benefit from JIT compilation

When false, the layer will use the standard Forward() method instead.

SupportsTraining

The computation engine (CPU or GPU) for vectorized operations.

public override bool SupportsTraining { get; }

Property Value

bool: Always true because the PositionalEncodingLayer supports backpropagation, even though it has no trainable parameters.

Remarks

This property indicates whether the layer supports backpropagation during training. Although the PositionalEncodingLayer has no trainable parameters, it still supports the backward pass to propagate gradients to previous layers.

For Beginners: This property tells you if the layer can participate in the training process.

A value of true means:

The layer can pass gradient information backward during training
It's part of the learning process, even though it doesn't have learnable parameters

While this layer doesn't have weights or biases that get updated during training, it still needs to properly handle gradients to ensure that layers before it can learn correctly.

Methods

Backward(Tensor<T>)

Performs the backward pass of the positional encoding layer.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>: The gradient of the loss with respect to the layer's output.

Returns

Tensor<T>: The gradient of the loss with respect to the layer's input.

Remarks

This method implements the backward pass of the positional encoding layer. Since the layer simply adds fixed positional encodings to the input, the gradient flows through unchanged. The gradient of the addition operation with respect to the input is just the gradient of the output.

For Beginners: This method handles how gradients flow backward through this layer.

During the backward pass:

The layer receives gradients indicating how the output should change
Since this layer just adds fixed positional encodings to the input, any change in the output should directly affect the input in the same way
So the gradients are passed back unchanged

This makes sense because:

The derivative of (x + constant) with respect to x is 1
So the gradient flows through addition operations unchanged
The positional encodings are constants that don't change during training

BackwardGpu(IGpuTensor<T>)

Computes the gradient of the loss with respect to the input on the GPU.

public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)

Parameters

outputGradient IGpuTensor<T>: The gradient of the loss with respect to the layer's output.

Returns

IGpuTensor<T>: The same output gradient, unchanged.

Remarks

Since positional encodings are constants, the gradient flows through unchanged. The derivative of (x + constant) with respect to x is 1.

ExportComputationGraph(List<ComputationNode<T>>)

Exports the layer's computation graph for JIT compilation.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>: List to populate with input computation nodes.

Returns

ComputationNode<T>: The output computation node representing the layer's operation.

Remarks

This method constructs a computation graph representation of the layer's forward pass that can be JIT compiled for faster inference. All layers MUST implement this method to support JIT compilation.

For Beginners: JIT (Just-In-Time) compilation converts the layer's operations into optimized native code for 5-10x faster inference.

To support JIT compilation, a layer must:

Implement this method to export its computation graph
Set SupportsJitCompilation to true
Use ComputationNode and TensorOperations to build the graph

All layers are required to implement this method, even if they set SupportsJitCompilation = false.

Forward(Tensor<T>)

Performs the forward pass of the positional encoding layer.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>: The input tensor to process.

Returns

Tensor<T>: The output tensor with positional encodings added.

Remarks

This method implements the forward pass of the positional encoding layer. It first checks that the input sequence length does not exceed the maximum allowed length. Then, it slices the pre-computed encodings tensor to match the input sequence length and adds the encodings to the input tensor element-wise.

For Beginners: This method adds the position information to your input data.

During the forward pass:

The method checks that your sequence isn't too long
It takes the appropriate slice of the pre-computed encodings (matching the length of your input sequence)
It adds these encodings directly to your input data

The addition operation combines your original data (like word embeddings) with the position information, allowing the model to use both.

For example, if your input is word embeddings for "The cat sat on the mat", after this layer, each word's embedding will also contain information about which position in the sentence it occupies.

Exceptions

ArgumentException: Thrown when the input sequence length exceeds the maximum sequence length.

ForwardGpu(params IGpuTensor<T>[])

Performs the forward pass on GPU, adding positional encodings to input embeddings.

public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)

Parameters

inputs IGpuTensor<T>[]: GPU-resident input tensors (uses first input).

Returns

IGpuTensor<T>: GPU-resident output tensor with positional encodings added.

GetParameters()

Gets all trainable parameters from the positional encoding layer as a single vector.

public override Vector<T> GetParameters()

Returns

Vector<T>: An empty vector since PositionalEncodingLayer has no trainable parameters.

Remarks

This method retrieves all trainable parameters from the layer as a single vector. Since PositionalEncodingLayer has no trainable parameters, it returns an empty vector. The positional encodings are fixed values determined by a mathematical formula, not learnable parameters.

For Beginners: This method returns all the learnable values in the layer.

Since PositionalEncodingLayer:

Uses fixed encodings based on a mathematical formula
Has no weights, biases, or other learnable parameters
The method returns an empty list

This is different from layers like Dense layers, which would return their weights and biases. The positional encodings are fixed by design and don't need to be learned from data.

ResetState()

Resets the internal state of the positional encoding layer.

public override void ResetState()

Remarks

This method is intended to reset any internal state that might change during training or inference. However, since PositionalEncodingLayer has no state that changes (the encodings are fixed), this method does nothing.

For Beginners: This method would normally clear the layer's memory to start fresh.

However, since PositionalEncodingLayer doesn't maintain any changing state during processing (the encodings are fixed at initialization and don't change), this method is empty.

The encodings tensor is a fixed part of the layer that remains constant throughout the lifetime of the layer, so there's nothing to reset.

UpdateParameters(T)

Updates the parameters of the positional encoding layer using the calculated gradients.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T: The learning rate to use for the parameter updates.

Remarks

This method is part of the training process, but since PositionalEncodingLayer has no trainable parameters, this method does nothing. The positional encodings are fixed and do not change during training.

For Beginners: This method would normally update a layer's internal values during training.

However, since PositionalEncodingLayer uses fixed encodings that are calculated once at initialization and don't change during training, this method is empty.

This is different from layers like Dense or Convolutional layers, which have weights and biases that get updated during training. The positional encodings are based on a mathematical formula rather than learned from data.

Table of Contents

Class PositionalEncodingLayer<T>

Type Parameters

Remarks

Constructors

PositionalEncodingLayer(int, int)

Parameters

Remarks

Properties

SupportsGpuExecution

Property Value

SupportsJitCompilation

Property Value

Remarks

SupportsTraining

Property Value

Remarks

Methods

Backward(Tensor<T>)

Parameters

Returns

Remarks

BackwardGpu(IGpuTensor<T>)

Parameters

Returns

Remarks

ExportComputationGraph(List<ComputationNode<T>>)

Parameters

Returns

Remarks

Forward(Tensor<T>)

Parameters

Returns

Remarks

Exceptions

ForwardGpu(params IGpuTensor<T>[])

Parameters

Returns

GetParameters()

Returns

Remarks

ResetState()

Remarks

UpdateParameters(T)

Parameters

Remarks