Table of Contents

Class SwinPatchMergingLayer<T>

Namespace
AiDotNet.NeuralNetworks.Layers
Assembly
AiDotNet.dll

Patch merging layer for Swin Transformer that performs downsampling between stages.

public class SwinPatchMergingLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable

Type Parameters

T

The numeric type used for calculations.

Inheritance
SwinPatchMergingLayer<T>
Implements
Inherited Members

Remarks

This layer merges 2x2 neighboring patches into a single patch, reducing spatial resolution by half while doubling the channel dimension. This creates the hierarchical structure characteristic of Swin Transformer.

For Beginners: Think of this like pooling in CNNs, but instead of taking max or average, we concatenate 4 neighboring patches together (2x2 grid) and then use a linear layer to reduce the combined channels. This lets the network process information at multiple scales.

Reference: Liu et al., "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows", ICCV 2021

Constructors

SwinPatchMergingLayer(int)

Creates a new Swin patch merging layer.

public SwinPatchMergingLayer(int inputDim)

Parameters

inputDim int

Input channel dimension.

Exceptions

ArgumentException

Thrown if inputDim is not positive.

Properties

ParameterCount

Gets the total number of parameters in this layer.

public override int ParameterCount { get; }

Property Value

int

The total number of trainable parameters.

Remarks

This property returns the total number of trainable parameters in the layer. By default, it returns the length of the Parameters vector, but derived classes can override this to calculate the number of parameters differently.

For Beginners: This tells you how many learnable values the layer has.

The parameter count:

  • Shows how complex the layer is
  • Indicates how many values need to be learned during training
  • Can help estimate memory usage and computational requirements

Layers with more parameters can potentially learn more complex patterns but may also require more data to train effectively.

SupportsJitCompilation

Gets whether this layer supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool

True if the layer can be JIT compiled, false otherwise.

Remarks

This property indicates whether the layer has implemented ExportComputationGraph() and can benefit from JIT compilation. All layers MUST implement this property.

For Beginners: JIT compilation can make inference 5-10x faster by converting the layer's operations into optimized native code.

Layers should return false if they:

  • Have not yet implemented a working ExportComputationGraph()
  • Use dynamic operations that change based on input data
  • Are too simple to benefit from JIT compilation

When false, the layer will use the standard Forward() method instead.

SupportsTraining

Gets a value indicating whether this layer supports training.

public override bool SupportsTraining { get; }

Property Value

bool

true if the layer has trainable parameters and supports backpropagation; otherwise, false.

Remarks

This property indicates whether the layer can be trained through backpropagation. Layers with trainable parameters such as weights and biases typically return true, while layers that only perform fixed transformations (like pooling or activation layers) typically return false.

For Beginners: This property tells you if the layer can learn from data.

A value of true means:

  • The layer has parameters that can be adjusted during training
  • It will improve its performance as it sees more data
  • It participates in the learning process

A value of false means:

  • The layer doesn't have any adjustable parameters
  • It performs the same operation regardless of training
  • It doesn't need to learn (but may still be useful)

Methods

Backward(Tensor<T>)

Performs the backward pass.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>

Gradient from the next layer of shape [batch, newSeqLen, 2*dim].

Returns

Tensor<T>

Gradient for the input of shape [batch, seqLen, dim].

ExportComputationGraph(List<ComputationNode<T>>)

Exports the layer's computation graph for JIT compilation.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>

List to populate with input computation nodes.

Returns

ComputationNode<T>

The output computation node representing the layer's operation.

Remarks

This method constructs a computation graph representation of the layer's forward pass that can be JIT compiled for faster inference. All layers MUST implement this method to support JIT compilation.

For Beginners: JIT (Just-In-Time) compilation converts the layer's operations into optimized native code for 5-10x faster inference.

To support JIT compilation, a layer must:

  1. Implement this method to export its computation graph
  2. Set SupportsJitCompilation to true
  3. Use ComputationNode and TensorOperations to build the graph

All layers are required to implement this method, even if they set SupportsJitCompilation = false.

Forward(Tensor<T>)

Performs the forward pass, merging 2x2 patches.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>

Input tensor of shape [batch, seqLen, dim] where seqLen = H*W.

Returns

Tensor<T>

Output tensor of shape [batch, seqLen/4, dim*2].

Exceptions

InvalidOperationException

Thrown if spatial dimensions are not even.

GetParameterGradients()

Gets the gradients of all trainable parameters in this layer.

public override Vector<T> GetParameterGradients()

Returns

Vector<T>

A vector containing the gradients of all trainable parameters.

Remarks

This method returns the gradients of all trainable parameters in the layer. If the gradients haven't been calculated yet, it initializes a new vector of the appropriate size.

For Beginners: This method provides the current adjustment values for all parameters.

The parameter gradients:

  • Show how each parameter should be adjusted during training
  • Are calculated during the backward pass
  • Guide the optimization process

These gradients are usually passed to an optimizer like SGD or Adam, which uses them to update the parameters in a way that reduces errors.

GetParameters()

Gets all trainable parameters of the layer as a single vector.

public override Vector<T> GetParameters()

Returns

Vector<T>

A vector containing all trainable parameters.

Remarks

This abstract method must be implemented by derived classes to provide access to all trainable parameters of the layer as a single vector. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights.

For Beginners: This method collects all the learnable values from the layer.

The parameters:

  • Are the numbers that the neural network learns during training
  • Include weights, biases, and other learnable values
  • Are combined into a single long list (vector)

This is useful for:

  • Saving the model to disk
  • Loading parameters from a previously trained model
  • Advanced optimization techniques that need access to all parameters

ResetState()

Resets the internal state of the layer.

public override void ResetState()

Remarks

This abstract method must be implemented by derived classes to reset any internal state the layer maintains between forward and backward passes. This is useful when starting to process a new sequence or when implementing stateful recurrent networks.

For Beginners: This method clears the layer's memory to start fresh.

When resetting the state:

  • Cached inputs and outputs are cleared
  • Any temporary calculations are discarded
  • The layer is ready to process new data without being influenced by previous data

This is important for:

  • Processing a new, unrelated sequence
  • Preventing information from one sequence affecting another
  • Starting a new training episode

SetParameters(Vector<T>)

Sets the trainable parameters of the layer.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

A vector containing all parameters to set.

Remarks

This method sets all the trainable parameters of the layer from a single vector of parameters. The parameters vector must have the correct length to match the total number of parameters in the layer. By default, it simply assigns the parameters vector to the Parameters field, but derived classes may override this to handle the parameters differently.

For Beginners: This method updates all the learnable values in the layer.

When setting parameters:

  • The input must be a vector with the correct length
  • The layer parses this vector to set all its internal parameters
  • Throws an error if the input doesn't match the expected number of parameters

This is useful for:

  • Loading a previously saved model
  • Transferring parameters from another model
  • Setting specific parameter values for testing

Exceptions

ArgumentException

Thrown when the parameters vector has incorrect length.

UpdateParameters(T)

Updates the parameters of the layer using the calculated gradients.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T

The learning rate to use for the parameter updates.

Remarks

This abstract method must be implemented by derived classes to define how the layer's parameters are updated during training. The learning rate controls the size of the parameter updates.

For Beginners: This method updates the layer's internal values during training.

When updating parameters:

  • The weights, biases, or other parameters are adjusted to reduce prediction errors
  • The learning rate controls how big each update step is
  • Smaller learning rates mean slower but more stable learning
  • Larger learning rates mean faster but potentially unstable learning

This is how the layer "learns" from data over time, gradually improving its ability to extract useful patterns from inputs.