Table of Contents

Class PatchEmbeddingLayer<T>

Namespace
AiDotNet.NeuralNetworks.Layers
Assembly
AiDotNet.dll

Implements a patch embedding layer for Vision Transformer (ViT) architecture.

public class PatchEmbeddingLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable

Type Parameters

T

The numeric type used for computations (typically float or double).

Inheritance
PatchEmbeddingLayer<T>
Implements
Inherited Members

Remarks

The patch embedding layer divides an input image into fixed-size patches and projects them into an embedding space. This is a key component of Vision Transformers, converting 2D spatial information into a sequence of embeddings that can be processed by transformer encoder blocks.

For Beginners: This layer breaks an image into small square pieces (patches) and converts each patch into a numerical representation that can be processed by a transformer.

Think of it like cutting a photo into a grid of smaller squares, then describing each square with numbers. For example, a 224x224 image with 16x16 patches would be cut into 196 patches (14x14 grid), and each patch would be represented by a vector of numbers (the embedding).

This allows transformers, which were originally designed for text, to process images by treating the patches like "words" in a sentence.

Constructors

PatchEmbeddingLayer(int, int, int, int, int, IActivationFunction<T>?)

Creates a new patch embedding layer with the specified dimensions.

public PatchEmbeddingLayer(int imageHeight, int imageWidth, int channels, int patchSize, int embeddingDim, IActivationFunction<T>? activationFunction = null)

Parameters

imageHeight int

The height of the input image.

imageWidth int

The width of the input image.

channels int

The number of color channels in the input image.

patchSize int

The size of each square patch.

embeddingDim int

The dimension of the embedding vector for each patch.

activationFunction IActivationFunction<T>

The activation function to apply (defaults to identity if null).

Remarks

For Beginners: This constructor sets up the patch embedding layer with your image specifications.

The parameters define:

  • imageHeight/imageWidth: The size of your input images
  • channels: How many color channels (3 for RGB, 1 for grayscale)
  • patchSize: How big each patch should be (commonly 16x16 or 32x32)
  • embeddingDim: How many numbers represent each patch (commonly 768 or 1024)

For example, a 224x224 RGB image with 16x16 patches and 768 embedding dimensions would create 196 patches (14x14 grid), each represented by 768 numbers.

Exceptions

ArgumentException

Thrown when image dimensions are not divisible by patch size.

Properties

ParameterCount

Gets the total number of parameters in this layer.

public override int ParameterCount { get; }

Property Value

int

The total number of trainable parameters (projection weights + projection bias).

SupportsGpuExecution

Gets a value indicating whether this layer supports GPU execution.

protected override bool SupportsGpuExecution { get; }

Property Value

bool

SupportsJitCompilation

Gets whether this layer supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool

True if the layer can be JIT compiled, false otherwise.

Remarks

This property indicates whether the layer has implemented ExportComputationGraph() and can benefit from JIT compilation. All layers MUST implement this property.

For Beginners: JIT compilation can make inference 5-10x faster by converting the layer's operations into optimized native code.

Layers should return false if they:

  • Have not yet implemented a working ExportComputationGraph()
  • Use dynamic operations that change based on input data
  • Are too simple to benefit from JIT compilation

When false, the layer will use the standard Forward() method instead.

SupportsTraining

Indicates whether this layer supports training.

public override bool SupportsTraining { get; }

Property Value

bool

Methods

Backward(Tensor<T>)

Performs the backward pass of the patch embedding layer.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>

The gradient flowing back from the next layer.

Returns

Tensor<T>

The gradient to be passed to the previous layer.

Remarks

For Beginners: This method calculates how the layer's parameters should change during training.

During the backward pass:

  • The gradient tells us how much each output contributed to the error
  • We use this to figure out how to adjust the projection weights and biases
  • We also calculate gradients to pass back to earlier layers

This allows the entire network to learn through backpropagation.

BackwardGpu(IGpuTensor<T>)

Performs GPU-resident backward pass for patch embedding.

public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)

Parameters

outputGradient IGpuTensor<T>

The gradient from subsequent layer [B, N, embedDim].

Returns

IGpuTensor<T>

The gradient with respect to input [B, C, H, W].

ExportComputationGraph(List<ComputationNode<T>>)

Exports the layer's computation graph for JIT compilation.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>

List to populate with input computation nodes.

Returns

ComputationNode<T>

The output computation node representing the layer's operation.

Remarks

This method constructs a computation graph representation of the layer's forward pass that can be JIT compiled for faster inference. All layers MUST implement this method to support JIT compilation.

For Beginners: JIT (Just-In-Time) compilation converts the layer's operations into optimized native code for 5-10x faster inference.

To support JIT compilation, a layer must:

  1. Implement this method to export its computation graph
  2. Set SupportsJitCompilation to true
  3. Use ComputationNode and TensorOperations to build the graph

All layers are required to implement this method, even if they set SupportsJitCompilation = false.

Forward(Tensor<T>)

Performs the forward pass of the patch embedding layer.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>

The input tensor with shape [batch, channels, height, width].

Returns

Tensor<T>

The output tensor with shape [batch, num_patches, embedding_dim].

Remarks

For Beginners: This method converts the image into patch embeddings.

The forward pass:

  1. Divides the image into patches (like cutting a photo into a grid)
  2. Flattens each patch into a vector (like unrolling each grid square into a line)
  3. Projects each flattened patch to the embedding dimension (transforming the numbers)
  4. Returns a sequence of patch embeddings

For example, a 224x224 image becomes 196 embeddings, each with 768 dimensions, ready to be processed by transformer encoder blocks.

ForwardGpu(params IGpuTensor<T>[])

Performs GPU-accelerated forward pass for patch embedding.

public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)

Parameters

inputs IGpuTensor<T>[]

The input GPU tensors. Expects one tensor with shape [batch, channels, height, width].

Returns

IGpuTensor<T>

The GPU-resident output tensor with shape [batch, num_patches, embedding_dim].

Remarks

This implementation keeps all operations GPU-resident without CPU roundtrips: 1. Reshape to split spatial dimensions into patches 2. Permute to group patch dimensions 3. Reshape to flatten patches 4. Linear projection with fused bias addition

Exceptions

ArgumentException

Thrown when no inputs provided.

InvalidOperationException

Thrown when engine is not a DirectGpuTensorEngine.

GetParameters()

Gets all parameters of the layer as a single vector.

public override Vector<T> GetParameters()

Returns

Vector<T>

A vector containing all weights and biases.

Remarks

For Beginners: This method collects all the layer's learnable values into one list. This is useful for saving the model or for optimization algorithms.

ResetState()

Resets the internal state of the patch embedding layer.

public override void ResetState()

Remarks

For Beginners: This method clears cached values to prepare for processing new data. It keeps the learned parameters but clears temporary calculation values.

SetParameters(Vector<T>)

Sets all parameters of the layer from a single vector.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

A vector containing all weights and biases to set.

Remarks

For Beginners: This method loads saved parameter values back into the layer. This is used when loading a previously trained model.

Exceptions

ArgumentException

Thrown when the parameter vector has incorrect length.

UpdateParameters(T)

Updates the layer's parameters using the calculated gradients.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T

The learning rate controlling the size of parameter updates.

Remarks

For Beginners: This method adjusts the layer's weights and biases to improve performance.

The learning rate controls:

  • How much to adjust each parameter
  • Larger values mean bigger adjustments (faster learning but less stable)
  • Smaller values mean smaller adjustments (slower but more stable learning)

UpdateParametersGpu(IGpuOptimizerConfig)

Updates layer parameters using GPU-resident optimizer.

public override void UpdateParametersGpu(IGpuOptimizerConfig config)

Parameters

config IGpuOptimizerConfig

The GPU optimizer configuration.