Table of Contents

Class SeparableConvolutionalLayer<T>

Namespace
AiDotNet.NeuralNetworks.Layers
Assembly
AiDotNet.dll

Represents a separable convolutional layer that decomposes standard convolution into depthwise and pointwise operations.

public class SeparableConvolutionalLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable

Type Parameters

T

The numeric type used for calculations, typically float or double.

Inheritance
SeparableConvolutionalLayer<T>
Implements
Inherited Members

Remarks

A separable convolutional layer splits the standard convolution operation into two simpler operations: a depthwise convolution followed by a pointwise convolution. This factorization significantly reduces computational complexity and number of parameters while maintaining similar model expressiveness.

For Beginners: This layer processes images or other grid-like data more efficiently than standard convolution.

Think of it like a two-step process:

  • First step (depthwise): Applies filters to each input channel separately to extract features
  • Second step (pointwise): Combines these features across all channels to create new feature maps

Benefits include:

  • Fewer calculations needed (faster processing)
  • Fewer parameters to learn (uses less memory)
  • Often similar performance to standard convolution

For example, in image processing, the depthwise convolution might detect edges in each color channel separately, while the pointwise convolution would combine these edges into more complex features like shapes or textures.

Constructors

SeparableConvolutionalLayer(int[], int, int, int, int, IActivationFunction<T>?)

Initializes a new instance of the SeparableConvolutionalLayer<T> class with a scalar activation function.

public SeparableConvolutionalLayer(int[] inputShape, int outputDepth, int kernelSize, int stride = 1, int padding = 0, IActivationFunction<T>? scalarActivation = null)

Parameters

inputShape int[]

The shape of the input tensor [batch, height, width, channels].

outputDepth int

The number of output channels (feature maps).

kernelSize int

The size of the convolution kernel (assumed to be square).

stride int

The stride of the convolution. Defaults to 1.

padding int

The padding applied to the input. Defaults to 0 (no padding).

scalarActivation IActivationFunction<T>

The activation function to apply after convolution. Defaults to identity if not specified.

Remarks

This constructor creates a separable convolutional layer with the specified parameters and a scalar activation function that operates on individual elements. The input shape should be a 4D tensor with dimensions [batch, height, width, channels].

For Beginners: This creates a new separable convolutional layer with basic settings.

The parameters control how the layer processes data:

  • inputShape: The size and structure of the incoming data (like image dimensions)
  • outputDepth: How many different features the layer will look for
  • kernelSize: The size of the "window" that slides over the input (e.g., 3×3 or 5×5)
  • stride: How many pixels to move the window each step (smaller = more overlap)
  • padding: Whether to add extra space around the input edges
  • scalarActivation: A function that adds non-linearity (helping the network learn complex patterns)

For example, with images, larger kernels can detect bigger patterns, while more output channels can detect more varieties of patterns.

SeparableConvolutionalLayer(int[], int, int, int, int, IVectorActivationFunction<T>?)

Initializes a new instance of the SeparableConvolutionalLayer<T> class with a vector activation function.

public SeparableConvolutionalLayer(int[] inputShape, int outputDepth, int kernelSize, int stride = 1, int padding = 0, IVectorActivationFunction<T>? vectorActivation = null)

Parameters

inputShape int[]

The shape of the input tensor [batch, height, width, channels].

outputDepth int

The number of output channels (feature maps).

kernelSize int

The size of the convolution kernel (assumed to be square).

stride int

The stride of the convolution. Defaults to 1.

padding int

The padding applied to the input. Defaults to 0 (no padding).

vectorActivation IVectorActivationFunction<T>

The vector activation function to apply after convolution. Defaults to identity if not specified.

Remarks

This constructor creates a separable convolutional layer with the specified parameters and a vector activation function that operates on entire vectors rather than individual elements. The input shape should be a 4D tensor with dimensions [batch, height, width, channels].

For Beginners: This creates a new separable convolutional layer with advanced settings.

Similar to the basic constructor, but with one key difference:

  • It uses a vector activation function instead of a scalar one

A vector activation function:

  • Works on entire groups of numbers at once, not just one at a time
  • Can capture relationships between different elements in the output
  • Is useful for more complex AI tasks

This constructor is for advanced users who need more sophisticated activation patterns for their neural networks.

Properties

SupportsGpuExecution

Gets a value indicating whether this layer supports GPU-accelerated execution.

protected override bool SupportsGpuExecution { get; }

Property Value

bool

true when kernels and biases are initialized and the engine is a DirectGpuTensorEngine.

Remarks

GPU execution for separable convolution uses DepthwiseConv2DGpu for the depthwise step and FusedConv2DGpu for the pointwise step with fused bias and activation.

SupportsGpuTraining

Gets a value indicating whether this layer supports GPU-resident training.

public override bool SupportsGpuTraining { get; }

Property Value

bool

SupportsJitCompilation

Gets a value indicating whether this layer supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool

true when kernels are initialized and activation function supports JIT.

Remarks

Separable convolutional layers support JIT compilation using DepthwiseConv2D and Conv2D operations from TensorOperations. The layer performs depthwise convolution followed by pointwise (1x1) convolution.

SupportsTraining

Gets a value indicating whether this layer supports training through backpropagation.

public override bool SupportsTraining { get; }

Property Value

bool

Always returns true as separable convolutional layers have trainable parameters.

Remarks

This property indicates that the separable convolutional layer can be trained using backpropagation. The layer contains trainable parameters (kernels and biases) that are updated during the training process.

For Beginners: This property tells you that the layer can learn from data.

A value of true means:

  • The layer contains numbers (parameters) that can be adjusted during training
  • It will improve its performance as it sees more examples
  • It participates in the learning process of the neural network

Think of it like a student who can improve by studying - this layer can get better at its job through a process called backpropagation, which adjusts its internal values based on errors it makes.

Methods

Backward(Tensor<T>)

Performs the backward pass of the separable convolutional layer.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>

The gradient of the loss with respect to the layer's output.

Returns

Tensor<T>

The gradient of the loss with respect to the layer's input.

Remarks

This method implements the backward pass of the separable convolutional layer, which is used during training to propagate error gradients back through the network. It computes gradients for both depthwise and pointwise kernels, as well as biases, and returns the gradient with respect to the input for further backpropagation.

For Beginners: This method is used during training to calculate how the layer's inputs and parameters should change to reduce errors.

The backward pass:

  1. Starts with gradients (error signals) from the next layer
  2. Computes how to adjust the layer's parameters (kernels and biases)
  3. Calculates how to adjust the input that was received

This happens in reverse order compared to the forward pass:

  • First backpropagates through the pointwise convolution
  • Then backpropagates through the depthwise convolution

The calculated gradients are stored for later use when updating the parameters, and the input gradient is returned to continue the backpropagation process.

BackwardGpu(IGpuTensor<T>)

Performs the backward pass on GPU tensors.

public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)

Parameters

outputGradient IGpuTensor<T>

GPU tensor containing the gradient of the loss with respect to the output.

Returns

IGpuTensor<T>

GPU tensor containing the gradient of the loss with respect to the input.

ExportComputationGraph(List<ComputationNode<T>>)

Exports the separable convolutional layer's forward pass as a JIT-compilable computation graph.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>

List to populate with input computation nodes.

Returns

ComputationNode<T>

The output computation node representing the separable convolution output.

Remarks

The separable convolution computation graph implements: 1. Depthwise convolution: Applies separate filters to each input channel 2. Pointwise convolution: 1x1 convolution to combine channels 3. Activation function

For Beginners: This creates an optimized version of the separable convolution. It's more efficient than standard convolution by splitting the operation into two steps.

Forward(Tensor<T>)

Performs the forward pass of the separable convolutional layer.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>

The input tensor to process.

Returns

Tensor<T>

The output tensor after separable convolution and activation.

Remarks

This method implements the forward pass of the separable convolutional layer. It performs a depthwise convolution followed by a pointwise convolution. The depthwise convolution applies a separate filter to each input channel, and the pointwise convolution applies a 1x1 convolution to combine the channels. The result is passed through an activation function.

For Beginners: This method processes the input data through the layer.

The forward pass happens in three steps:

  1. Depthwise convolution: Applies separate filters to each input channel

    • Like having a specialized detector for each input feature
    • Captures spatial patterns within each channel independently
  2. Pointwise convolution: Combines results across all channels

    • Uses 1×1 filters to mix information between channels
    • Creates new feature maps that combine information from all inputs
    • Adds bias values to each output channel
  3. Activation: Applies a non-linear function to the results

    • Helps the network learn more complex patterns

The method also saves the input and output for later use during training.

ForwardGpu(params IGpuTensor<T>[])

Performs the forward pass on GPU, keeping all tensors GPU-resident.

public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)

Parameters

inputs IGpuTensor<T>[]

The input GPU tensors in NCHW format [batch, channels, height, width].

Returns

IGpuTensor<T>

The output GPU tensor in NCHW format.

Remarks

This method executes separable convolution entirely on GPU: 1. Depthwise convolution: Each input channel is convolved with its own filter 2. Pointwise convolution: 1x1 conv combines channels with fused bias and activation

Performance Notes:

  • Input tensors remain GPU-resident throughout computation
  • Intermediate depthwise output is disposed after use
  • Kernels are converted to NCHW format for GPU operations
  • Activation is fused into the pointwise convolution when possible

GetParameters()

Gets all trainable parameters of the layer as a single vector.

public override Vector<T> GetParameters()

Returns

Vector<T>

A vector containing all trainable parameters.

Remarks

This method retrieves all trainable parameters of the layer (depthwise kernels, pointwise kernels, and biases) and combines them into a single vector. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights.

For Beginners: This method collects all the learnable values from the layer into a single list.

The parameters:

  • Are the numbers that the neural network learns during training
  • Include depthwise kernels, pointwise kernels, and biases
  • Are combined into a single long list (vector)

This is useful for:

  • Saving the model to disk
  • Loading parameters from a previously trained model
  • Advanced optimization techniques that need access to all parameters

ResetState()

Resets the internal state of the separable convolutional layer.

public override void ResetState()

Remarks

This method resets the internal state of the separable convolutional layer, including the cached inputs and outputs, gradients, and velocity tensors. This is useful when starting to process a new batch or when implementing stateful networks that need to be reset between sequences.

For Beginners: This method clears the layer's memory to start fresh.

When resetting the state:

  • Stored inputs and outputs from previous passes are cleared
  • Calculated gradients are cleared
  • Momentum (velocity) information is cleared

This is important for:

  • Processing a new batch of unrelated data
  • Preventing information from one batch affecting another
  • Starting a new training episode

Think of it like erasing the whiteboard before starting a new calculation - it ensures that old information doesn't interfere with new processing.

SetParameters(Vector<T>)

Sets the trainable parameters of the layer from a single vector.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

A vector containing all parameters to set.

Remarks

This method sets the trainable parameters of the layer (depthwise kernels, pointwise kernels, and biases) from a single vector. It expects the vector to contain the parameters in the same order as they are retrieved by GetParameters(). This is useful for loading saved model weights or for implementing optimization algorithms that operate on all parameters at once.

For Beginners: This method updates all the learnable values in the layer from a single list.

When setting parameters:

  • The input must be a vector with exactly the right number of values
  • The values are distributed to the appropriate places (depthwise kernels, pointwise kernels, and biases)
  • The order must match how they were stored in GetParameters()

This is useful for:

  • Loading a previously saved model
  • Transferring parameters from another model
  • Testing different parameter values

An error is thrown if the input vector doesn't have the expected number of parameters.

Exceptions

ArgumentException

Thrown when the parameters vector has incorrect length.

UpdateParameters(T)

Updates the parameters of the layer using the calculated gradients and momentum.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T

The learning rate to use for the parameter updates.

Remarks

This method updates the depthwise kernels, pointwise kernels, and biases of the layer based on the gradients calculated during the backward pass. It uses momentum and L2 regularization to improve training stability and prevent overfitting. The learning rate controls the size of the parameter updates.

For Beginners: This method updates the layer's internal values during training.

When updating parameters:

  1. Momentum is used to speed up learning

    • Like a ball rolling downhill, gaining speed in consistent directions
    • Helps overcome small obstacles and reach better solutions faster
  2. L2 regularization helps prevent overfitting

    • Slightly reduces the size of parameters over time
    • Encourages the network to learn simpler patterns
    • Helps the model generalize better to new data
  3. The learning rate controls how big each update step is

    • Smaller learning rates: slower but more stable learning
    • Larger learning rates: faster but potentially unstable learning

This process is repeated many times during training, gradually improving the layer's performance on the task.

UpdateParametersGpu(IGpuOptimizerConfig)

Updates parameters on GPU using the configured optimizer.

public override void UpdateParametersGpu(IGpuOptimizerConfig config)

Parameters

config IGpuOptimizerConfig

The GPU optimizer configuration.