Class SeparableConvolutionalLayer<T>
- Namespace
- AiDotNet.NeuralNetworks.Layers
- Assembly
- AiDotNet.dll
Represents a separable convolutional layer that decomposes standard convolution into depthwise and pointwise operations.
public class SeparableConvolutionalLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
LayerBase<T>SeparableConvolutionalLayer<T>
- Implements
-
ILayer<T>
- Inherited Members
Remarks
A separable convolutional layer splits the standard convolution operation into two simpler operations: a depthwise convolution followed by a pointwise convolution. This factorization significantly reduces computational complexity and number of parameters while maintaining similar model expressiveness.
For Beginners: This layer processes images or other grid-like data more efficiently than standard convolution.
Think of it like a two-step process:
- First step (depthwise): Applies filters to each input channel separately to extract features
- Second step (pointwise): Combines these features across all channels to create new feature maps
Benefits include:
- Fewer calculations needed (faster processing)
- Fewer parameters to learn (uses less memory)
- Often similar performance to standard convolution
For example, in image processing, the depthwise convolution might detect edges in each color channel separately, while the pointwise convolution would combine these edges into more complex features like shapes or textures.
Constructors
SeparableConvolutionalLayer(int[], int, int, int, int, IActivationFunction<T>?)
Initializes a new instance of the SeparableConvolutionalLayer<T> class with a scalar activation function.
public SeparableConvolutionalLayer(int[] inputShape, int outputDepth, int kernelSize, int stride = 1, int padding = 0, IActivationFunction<T>? scalarActivation = null)
Parameters
inputShapeint[]The shape of the input tensor [batch, height, width, channels].
outputDepthintThe number of output channels (feature maps).
kernelSizeintThe size of the convolution kernel (assumed to be square).
strideintThe stride of the convolution. Defaults to 1.
paddingintThe padding applied to the input. Defaults to 0 (no padding).
scalarActivationIActivationFunction<T>The activation function to apply after convolution. Defaults to identity if not specified.
Remarks
This constructor creates a separable convolutional layer with the specified parameters and a scalar activation function that operates on individual elements. The input shape should be a 4D tensor with dimensions [batch, height, width, channels].
For Beginners: This creates a new separable convolutional layer with basic settings.
The parameters control how the layer processes data:
- inputShape: The size and structure of the incoming data (like image dimensions)
- outputDepth: How many different features the layer will look for
- kernelSize: The size of the "window" that slides over the input (e.g., 3×3 or 5×5)
- stride: How many pixels to move the window each step (smaller = more overlap)
- padding: Whether to add extra space around the input edges
- scalarActivation: A function that adds non-linearity (helping the network learn complex patterns)
For example, with images, larger kernels can detect bigger patterns, while more output channels can detect more varieties of patterns.
SeparableConvolutionalLayer(int[], int, int, int, int, IVectorActivationFunction<T>?)
Initializes a new instance of the SeparableConvolutionalLayer<T> class with a vector activation function.
public SeparableConvolutionalLayer(int[] inputShape, int outputDepth, int kernelSize, int stride = 1, int padding = 0, IVectorActivationFunction<T>? vectorActivation = null)
Parameters
inputShapeint[]The shape of the input tensor [batch, height, width, channels].
outputDepthintThe number of output channels (feature maps).
kernelSizeintThe size of the convolution kernel (assumed to be square).
strideintThe stride of the convolution. Defaults to 1.
paddingintThe padding applied to the input. Defaults to 0 (no padding).
vectorActivationIVectorActivationFunction<T>The vector activation function to apply after convolution. Defaults to identity if not specified.
Remarks
This constructor creates a separable convolutional layer with the specified parameters and a vector activation function that operates on entire vectors rather than individual elements. The input shape should be a 4D tensor with dimensions [batch, height, width, channels].
For Beginners: This creates a new separable convolutional layer with advanced settings.
Similar to the basic constructor, but with one key difference:
- It uses a vector activation function instead of a scalar one
A vector activation function:
- Works on entire groups of numbers at once, not just one at a time
- Can capture relationships between different elements in the output
- Is useful for more complex AI tasks
This constructor is for advanced users who need more sophisticated activation patterns for their neural networks.
Properties
SupportsGpuExecution
Gets a value indicating whether this layer supports GPU-accelerated execution.
protected override bool SupportsGpuExecution { get; }
Property Value
- bool
truewhen kernels and biases are initialized and the engine is a DirectGpuTensorEngine.
Remarks
GPU execution for separable convolution uses DepthwiseConv2DGpu for the depthwise step and FusedConv2DGpu for the pointwise step with fused bias and activation.
SupportsGpuTraining
Gets a value indicating whether this layer supports GPU-resident training.
public override bool SupportsGpuTraining { get; }
Property Value
SupportsJitCompilation
Gets a value indicating whether this layer supports JIT compilation.
public override bool SupportsJitCompilation { get; }
Property Value
- bool
truewhen kernels are initialized and activation function supports JIT.
Remarks
Separable convolutional layers support JIT compilation using DepthwiseConv2D and Conv2D operations from TensorOperations. The layer performs depthwise convolution followed by pointwise (1x1) convolution.
SupportsTraining
Gets a value indicating whether this layer supports training through backpropagation.
public override bool SupportsTraining { get; }
Property Value
- bool
Always returns
trueas separable convolutional layers have trainable parameters.
Remarks
This property indicates that the separable convolutional layer can be trained using backpropagation. The layer contains trainable parameters (kernels and biases) that are updated during the training process.
For Beginners: This property tells you that the layer can learn from data.
A value of true means:
- The layer contains numbers (parameters) that can be adjusted during training
- It will improve its performance as it sees more examples
- It participates in the learning process of the neural network
Think of it like a student who can improve by studying - this layer can get better at its job through a process called backpropagation, which adjusts its internal values based on errors it makes.
Methods
Backward(Tensor<T>)
Performs the backward pass of the separable convolutional layer.
public override Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>The gradient of the loss with respect to the layer's output.
Returns
- Tensor<T>
The gradient of the loss with respect to the layer's input.
Remarks
This method implements the backward pass of the separable convolutional layer, which is used during training to propagate error gradients back through the network. It computes gradients for both depthwise and pointwise kernels, as well as biases, and returns the gradient with respect to the input for further backpropagation.
For Beginners: This method is used during training to calculate how the layer's inputs and parameters should change to reduce errors.
The backward pass:
- Starts with gradients (error signals) from the next layer
- Computes how to adjust the layer's parameters (kernels and biases)
- Calculates how to adjust the input that was received
This happens in reverse order compared to the forward pass:
- First backpropagates through the pointwise convolution
- Then backpropagates through the depthwise convolution
The calculated gradients are stored for later use when updating the parameters, and the input gradient is returned to continue the backpropagation process.
BackwardGpu(IGpuTensor<T>)
Performs the backward pass on GPU tensors.
public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)
Parameters
outputGradientIGpuTensor<T>GPU tensor containing the gradient of the loss with respect to the output.
Returns
- IGpuTensor<T>
GPU tensor containing the gradient of the loss with respect to the input.
ExportComputationGraph(List<ComputationNode<T>>)
Exports the separable convolutional layer's forward pass as a JIT-compilable computation graph.
public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)
Parameters
inputNodesList<ComputationNode<T>>List to populate with input computation nodes.
Returns
- ComputationNode<T>
The output computation node representing the separable convolution output.
Remarks
The separable convolution computation graph implements: 1. Depthwise convolution: Applies separate filters to each input channel 2. Pointwise convolution: 1x1 convolution to combine channels 3. Activation function
For Beginners: This creates an optimized version of the separable convolution. It's more efficient than standard convolution by splitting the operation into two steps.
Forward(Tensor<T>)
Performs the forward pass of the separable convolutional layer.
public override Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>The input tensor to process.
Returns
- Tensor<T>
The output tensor after separable convolution and activation.
Remarks
This method implements the forward pass of the separable convolutional layer. It performs a depthwise convolution followed by a pointwise convolution. The depthwise convolution applies a separate filter to each input channel, and the pointwise convolution applies a 1x1 convolution to combine the channels. The result is passed through an activation function.
For Beginners: This method processes the input data through the layer.
The forward pass happens in three steps:
Depthwise convolution: Applies separate filters to each input channel
- Like having a specialized detector for each input feature
- Captures spatial patterns within each channel independently
Pointwise convolution: Combines results across all channels
- Uses 1×1 filters to mix information between channels
- Creates new feature maps that combine information from all inputs
- Adds bias values to each output channel
Activation: Applies a non-linear function to the results
- Helps the network learn more complex patterns
The method also saves the input and output for later use during training.
ForwardGpu(params IGpuTensor<T>[])
Performs the forward pass on GPU, keeping all tensors GPU-resident.
public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)
Parameters
inputsIGpuTensor<T>[]The input GPU tensors in NCHW format [batch, channels, height, width].
Returns
- IGpuTensor<T>
The output GPU tensor in NCHW format.
Remarks
This method executes separable convolution entirely on GPU: 1. Depthwise convolution: Each input channel is convolved with its own filter 2. Pointwise convolution: 1x1 conv combines channels with fused bias and activation
Performance Notes:
- Input tensors remain GPU-resident throughout computation
- Intermediate depthwise output is disposed after use
- Kernels are converted to NCHW format for GPU operations
- Activation is fused into the pointwise convolution when possible
GetParameters()
Gets all trainable parameters of the layer as a single vector.
public override Vector<T> GetParameters()
Returns
- Vector<T>
A vector containing all trainable parameters.
Remarks
This method retrieves all trainable parameters of the layer (depthwise kernels, pointwise kernels, and biases) and combines them into a single vector. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights.
For Beginners: This method collects all the learnable values from the layer into a single list.
The parameters:
- Are the numbers that the neural network learns during training
- Include depthwise kernels, pointwise kernels, and biases
- Are combined into a single long list (vector)
This is useful for:
- Saving the model to disk
- Loading parameters from a previously trained model
- Advanced optimization techniques that need access to all parameters
ResetState()
Resets the internal state of the separable convolutional layer.
public override void ResetState()
Remarks
This method resets the internal state of the separable convolutional layer, including the cached inputs and outputs, gradients, and velocity tensors. This is useful when starting to process a new batch or when implementing stateful networks that need to be reset between sequences.
For Beginners: This method clears the layer's memory to start fresh.
When resetting the state:
- Stored inputs and outputs from previous passes are cleared
- Calculated gradients are cleared
- Momentum (velocity) information is cleared
This is important for:
- Processing a new batch of unrelated data
- Preventing information from one batch affecting another
- Starting a new training episode
Think of it like erasing the whiteboard before starting a new calculation - it ensures that old information doesn't interfere with new processing.
SetParameters(Vector<T>)
Sets the trainable parameters of the layer from a single vector.
public override void SetParameters(Vector<T> parameters)
Parameters
parametersVector<T>A vector containing all parameters to set.
Remarks
This method sets the trainable parameters of the layer (depthwise kernels, pointwise kernels, and biases) from a single vector. It expects the vector to contain the parameters in the same order as they are retrieved by GetParameters(). This is useful for loading saved model weights or for implementing optimization algorithms that operate on all parameters at once.
For Beginners: This method updates all the learnable values in the layer from a single list.
When setting parameters:
- The input must be a vector with exactly the right number of values
- The values are distributed to the appropriate places (depthwise kernels, pointwise kernels, and biases)
- The order must match how they were stored in GetParameters()
This is useful for:
- Loading a previously saved model
- Transferring parameters from another model
- Testing different parameter values
An error is thrown if the input vector doesn't have the expected number of parameters.
Exceptions
- ArgumentException
Thrown when the parameters vector has incorrect length.
UpdateParameters(T)
Updates the parameters of the layer using the calculated gradients and momentum.
public override void UpdateParameters(T learningRate)
Parameters
learningRateTThe learning rate to use for the parameter updates.
Remarks
This method updates the depthwise kernels, pointwise kernels, and biases of the layer based on the gradients calculated during the backward pass. It uses momentum and L2 regularization to improve training stability and prevent overfitting. The learning rate controls the size of the parameter updates.
For Beginners: This method updates the layer's internal values during training.
When updating parameters:
Momentum is used to speed up learning
- Like a ball rolling downhill, gaining speed in consistent directions
- Helps overcome small obstacles and reach better solutions faster
L2 regularization helps prevent overfitting
- Slightly reduces the size of parameters over time
- Encourages the network to learn simpler patterns
- Helps the model generalize better to new data
The learning rate controls how big each update step is
- Smaller learning rates: slower but more stable learning
- Larger learning rates: faster but potentially unstable learning
This process is repeated many times during training, gradually improving the layer's performance on the task.
UpdateParametersGpu(IGpuOptimizerConfig)
Updates parameters on GPU using the configured optimizer.
public override void UpdateParametersGpu(IGpuOptimizerConfig config)
Parameters
configIGpuOptimizerConfigThe GPU optimizer configuration.