Table of Contents

Class SpatialTransformerLayer<T>

Namespace
AiDotNet.NeuralNetworks.Layers
Assembly
AiDotNet.dll

Represents a spatial transformer layer that enables spatial manipulations of data via a learnable transformation.

public class SpatialTransformerLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IWeightLoadable<T>, IDisposable, IAuxiliaryLossLayer<T>, IDiagnosticsProvider

Type Parameters

T

The numeric type used for calculations, typically float or double.

Inheritance
SpatialTransformerLayer<T>
Implements
Inherited Members

Remarks

A spatial transformer layer performs learnable geometric transformations on input feature maps. It consists of three main components: a localization network that predicts transformation parameters, a grid generator that creates a sampling grid, and a sampler that applies the transformation using bilinear interpolation. This allows the network to automatically learn invariance to translation, scale, rotation, and more general warping.

For Beginners: This layer helps a neural network focus on the important parts of an image by learning to transform it.

Think of it like having a smart camera that can:

  • Zoom in on the important objects
  • Rotate images to make them easier to recognize
  • Crop out distractions
  • Fix distortions or perspective problems

Benefits include:

  • Automatic learning of the best transformation for the task
  • Improved recognition of objects regardless of their position or orientation
  • Better handling of distorted or warped inputs

For example, when recognizing handwritten digits, a spatial transformer might learn to straighten tilted digits or zoom in on the digit, making it easier for the rest of the network to classify.

Constructors

SpatialTransformerLayer(int, int, int, int, IActivationFunction<T>?, SpatialTransformerDataFormat)

Initializes a new instance of the SpatialTransformerLayer<T> class with a scalar activation function.

public SpatialTransformerLayer(int inputHeight, int inputWidth, int outputHeight, int outputWidth, IActivationFunction<T>? activationFunction = null, SpatialTransformerDataFormat dataFormat = SpatialTransformerDataFormat.Auto)

Parameters

inputHeight int

The height of the input feature map.

inputWidth int

The width of the input feature map.

outputHeight int

The height of the output feature map.

outputWidth int

The width of the output feature map.

activationFunction IActivationFunction<T>

The activation function to apply in the localization network. Defaults to Tanh if not specified.

dataFormat SpatialTransformerDataFormat

Specifies channel layout for rank >= 3 inputs. Auto infers from the last dimensions.

Remarks

This constructor creates a spatial transformer layer with the specified input and output dimensions and a scalar activation function for the localization network. It initializes the weights and biases of the localization network.

For Beginners: This creates a new spatial transformer layer with basic settings.

When creating a spatial transformer, you need to specify:

  • inputHeight and inputWidth: The dimensions of the data going into the layer
  • outputHeight and outputWidth: The dimensions you want after transformation
  • activationFunction: The function applied to neuron outputs in the localization network

The constructor automatically sets up the localization network that will learn how to transform the data. By default, it uses the tanh activation function, which works well for predicting transformation parameters.

SpatialTransformerLayer(int, int, int, int, IVectorActivationFunction<T>?, SpatialTransformerDataFormat)

Initializes a new instance of the SpatialTransformerLayer<T> class with a vector activation function.

public SpatialTransformerLayer(int inputHeight, int inputWidth, int outputHeight, int outputWidth, IVectorActivationFunction<T>? vectorActivationFunction = null, SpatialTransformerDataFormat dataFormat = SpatialTransformerDataFormat.Auto)

Parameters

inputHeight int

The height of the input feature map.

inputWidth int

The width of the input feature map.

outputHeight int

The height of the output feature map.

outputWidth int

The width of the output feature map.

vectorActivationFunction IVectorActivationFunction<T>

The vector activation function to apply in the localization network. Defaults to Tanh if not specified.

dataFormat SpatialTransformerDataFormat

Specifies channel layout for rank >= 3 inputs. Auto infers from the last dimensions.

Remarks

This constructor creates a spatial transformer layer with the specified input and output dimensions and a vector activation function for the localization network. It initializes the weights and biases of the localization network.

For Beginners: This creates a new spatial transformer layer with advanced settings.

This is similar to the basic constructor, but with one key difference:

  • It uses a vector activation function, which works on groups of numbers at once
  • This can capture relationships between different elements in the output

This constructor is for advanced users who need more sophisticated activation patterns for their neural networks. For most cases, the basic constructor is sufficient.

Properties

AuxiliaryLossWeight

Gets or sets the weight for the auxiliary loss contribution.

public T AuxiliaryLossWeight { get; set; }

Property Value

T

SupportsJitCompilation

Gets whether this layer supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool

True if the layer can be JIT compiled, false otherwise.

Remarks

This property indicates whether the layer has implemented ExportComputationGraph() and can benefit from JIT compilation. All layers MUST implement this property.

For Beginners: JIT compilation can make inference 5-10x faster by converting the layer's operations into optimized native code.

Layers should return false if they:

  • Have not yet implemented a working ExportComputationGraph()
  • Use dynamic operations that change based on input data
  • Are too simple to benefit from JIT compilation

When false, the layer will use the standard Forward() method instead.

SupportsTraining

Gets a value indicating whether this layer supports training through backpropagation.

public override bool SupportsTraining { get; }

Property Value

bool

Always returns true as spatial transformer layers have trainable parameters.

Remarks

This property indicates that the spatial transformer layer can be trained. The layer contains trainable parameters in the localization network that are updated during the training process.

For Beginners: This property tells you that the layer can learn from data.

A value of true means:

  • The layer contains numbers (parameters) that can be adjusted during training
  • It will improve its performance as it sees more examples
  • It participates in the learning process of the neural network

The spatial transformer will gradually learn the best transformations for the task at hand.

UseAuxiliaryLoss

Gets or sets a value indicating whether auxiliary loss is enabled for this layer.

public bool UseAuxiliaryLoss { get; set; }

Property Value

bool

Methods

Backward(Tensor<T>)

Performs the backward pass of the spatial transformer layer.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>

The gradient of the loss with respect to the layer's output.

Returns

Tensor<T>

The gradient of the loss with respect to the layer's input.

Remarks

This method implements the backward pass of the spatial transformer layer, which is used during training to propagate error gradients back through the network. It computes gradients for the localization network parameters and returns the gradient with respect to the input.

For Beginners: This method is used during training to calculate how the layer's input and parameters should change to reduce errors.

During the backward pass:

  1. The method throws an error if the forward pass hasn't been called first
  2. It computes gradients for:
    • The input tensor (how the input should change)
    • The localization network parameters (how the transformation should change)
  3. This involves backpropagating through three components:
    • The sampler (how changes in output affect the sampling process)
    • The grid generator (how changes in sampling affect the grid coordinates)
    • The localization network (how changes in grid coordinates affect the transformation parameters)

The backward pass is complex because it must calculate how small changes in the transformation parameters affect the final output. This allows the network to learn the optimal transformation.

ComputeAuxiliaryLoss()

Computes the auxiliary loss for this layer based on transformation regularization.

public T ComputeAuxiliaryLoss()

Returns

T

The computed auxiliary loss value.

ExportComputationGraph(List<ComputationNode<T>>)

Exports the layer's computation graph for JIT compilation.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>

List to populate with input computation nodes.

Returns

ComputationNode<T>

The output computation node representing the layer's operation.

Remarks

This method constructs a computation graph representation of the layer's forward pass that can be JIT compiled for faster inference. All layers MUST implement this method to support JIT compilation.

For Beginners: JIT (Just-In-Time) compilation converts the layer's operations into optimized native code for 5-10x faster inference.

To support JIT compilation, a layer must:

  1. Implement this method to export its computation graph
  2. Set SupportsJitCompilation to true
  3. Use ComputationNode and TensorOperations to build the graph

All layers are required to implement this method, even if they set SupportsJitCompilation = false.

Forward(Tensor<T>)

Performs the forward pass of the spatial transformer layer.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>

The input tensor to process.

Returns

Tensor<T>

The output tensor after spatial transformation.

Remarks

This method implements the forward pass of the spatial transformer layer. It first uses the localization network to predict the transformation parameters, then generates a sampling grid in the output space, and finally samples the input according to the grid to produce the transformed output.

For Beginners: This method processes data through the spatial transformer.

The forward pass happens in three steps:

  1. Localization Network: Analyzes the input and decides what transformation to apply

    • Predicts 6 parameters that define how to transform the input (rotation, scaling, etc.)
  2. Grid Generator: Creates a grid of sampling points in the output space

    • Determines where each output pixel should come from in the input
  3. Sampler: Applies the transformation by sampling the input at the calculated positions

    • Uses bilinear interpolation to smoothly sample between pixels
    • Produces the final transformed output

The method also saves the input, output, and transformation matrix for later use during training.

ForwardGpu(params IGpuTensor<T>[])

Performs the forward pass on GPU tensors by applying spatial transformation.

public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)

Parameters

inputs IGpuTensor<T>[]

GPU tensor inputs (uses first input).

Returns

IGpuTensor<T>

GPU tensor output after spatial transformation.

Remarks

The GPU forward pass: 1. Downloads input for localization network processing (uses GPU-accelerated engine operations) 2. Computes transformation parameters via localization network 3. Generates sampling grid using AffineGrid (GPU) 4. Samples from input using GridSample (GPU) 5. Uploads result back to GPU

GetAuxiliaryLossDiagnostics()

Gets diagnostic information about the auxiliary loss computation.

public Dictionary<string, string> GetAuxiliaryLossDiagnostics()

Returns

Dictionary<string, string>

A dictionary containing diagnostic information about the auxiliary loss.

GetDiagnostics()

Gets diagnostic information about this component's state and behavior. Overrides GetDiagnostics() to include auxiliary loss diagnostics.

public override Dictionary<string, string> GetDiagnostics()

Returns

Dictionary<string, string>

A dictionary containing diagnostic metrics including both base layer diagnostics and auxiliary loss diagnostics from GetAuxiliaryLossDiagnostics().

GetParameters()

Gets all trainable parameters of the layer as a single vector.

public override Vector<T> GetParameters()

Returns

Vector<T>

A vector containing all trainable parameters.

Remarks

This method retrieves all trainable parameters of the layer (localization network weights and biases) and combines them into a single vector. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights.

For Beginners: This method collects all the learnable values from the layer into a single list.

The parameters:

  • Are the weights and biases of the localization network
  • Are converted from matrices and vectors to a single long list (vector)
  • Can be used to save the state of the layer or apply optimization techniques

This method is useful for:

  • Saving the model to disk
  • Loading parameters from a previously trained model
  • Advanced optimization techniques that need access to all parameters

ResetState()

Resets the internal state of the spatial transformer layer.

public override void ResetState()

Remarks

This method resets the internal state of the spatial transformer layer, including the cached inputs, outputs, transformation matrix, and gradients. This is useful when starting to process a new batch or when implementing stateful networks that need to be reset between sequences.

For Beginners: This method clears the layer's memory to start fresh.

When resetting the state:

  • Stored inputs and outputs from previous passes are cleared
  • The transformation matrix is cleared
  • All gradients are cleared

This is important for:

  • Processing a new batch of unrelated data
  • Preventing information from one batch affecting another
  • Starting a new training episode

Think of it like clearing your workspace before starting a new project - it ensures that old information doesn't interfere with new processing.

SetParameters(Vector<T>)

Sets the trainable parameters of the layer from a single vector.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

A vector containing all parameters to set.

Remarks

This method sets the trainable parameters of the layer (localization network weights and biases) from a single vector. It expects the vector to contain the parameters in the same order as they are retrieved by GetParameters(). This is useful for loading saved model weights or for implementing optimization algorithms that operate on all parameters at once.

For Beginners: This method updates all the learnable values in the layer from a single list.

When setting parameters:

  • The input must be a vector with exactly the right number of values
  • The values are distributed back into the weights and biases matrices and vectors
  • The order must match how they were stored in GetParameters()

This method is useful for:

  • Loading a previously saved model
  • Transferring parameters from another model
  • Testing different parameter values

An error is thrown if the input vector doesn't have the expected number of parameters.

Exceptions

ArgumentException

Thrown when the parameters vector has incorrect length.

UpdateParameters(T)

Updates the parameters of the layer using the calculated gradients.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T

The learning rate to use for the parameter updates.

Remarks

This method updates the weights and biases of the localization network based on the gradients calculated during the backward pass. The learning rate controls the size of the parameter updates.

For Beginners: This method updates the layer's learnable values during training.

When updating parameters:

  • Each weight and bias is adjusted in the direction that reduces the error
  • The learning rate controls how big each update step is
  • Smaller learning rates lead to more stable but slower learning
  • Larger learning rates can learn faster but might become unstable

The update process:

  • Subtract the gradient (multiplied by the learning rate) from each parameter
  • This moves the parameters in the direction that reduces the error
  • Over many updates, the layer learns the optimal transformation for the task