Class DepthwiseSeparableConvolutionalLayer<T>
- Namespace
- AiDotNet.NeuralNetworks.Layers
- Assembly
- AiDotNet.dll
Represents a depthwise separable convolutional layer that performs convolution as two separate operations.
public class DepthwiseSeparableConvolutionalLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
LayerBase<T>DepthwiseSeparableConvolutionalLayer<T>
- Implements
-
ILayer<T>
- Inherited Members
Remarks
A depthwise separable convolutional layer splits the standard convolution operation into two parts: a depthwise convolution, which applies a single filter per input channel, and a pointwise convolution, which uses 1×1 convolutions to combine the outputs. This approach dramatically reduces the number of parameters and computational cost compared to standard convolution.
For Beginners: A depthwise separable convolution is like a more efficient way to filter an image.
Think of it as a two-step process:
- First step (depthwise): Apply separate filters to each input channel (like filtering red, green, and blue separately)
- Second step (pointwise): Mix these filtered channels together (like combining the filtered colors)
For example, in image processing:
- Standard convolution might use 100,000 calculations for a single operation
- Depthwise separable convolution might do the same job with only 10,000 calculations
This makes your neural network faster and smaller while still capturing important patterns. It's commonly used in mobile and edge devices where efficiency is critical.
Constructors
DepthwiseSeparableConvolutionalLayer(int, int, int, int, int, int, int, IActivationFunction<T>?)
Initializes a new instance of the DepthwiseSeparableConvolutionalLayer<T> class with the specified parameters and a scalar activation function.
public DepthwiseSeparableConvolutionalLayer(int inputDepth, int outputDepth, int kernelSize, int inputHeight, int inputWidth, int stride = 1, int padding = 0, IActivationFunction<T>? activation = null)
Parameters
inputDepthintThe number of channels in the input data.
outputDepthintThe number of output channels to create.
kernelSizeintThe size of each filter kernel (width and height).
inputHeightintThe height of the input data.
inputWidthintThe width of the input data.
strideintThe step size for moving the kernel. Defaults to 1.
paddingintThe amount of zero-padding to add around the input. Defaults to 0.
activationIActivationFunction<T>The activation function to apply. Defaults to ReLU if not specified.
Remarks
This constructor creates a depthwise separable convolutional layer with the specified configuration. It initializes both depthwise and pointwise kernels with appropriate scaling factors to help with training convergence. The biases are initialized to zero.
For Beginners: This setup method creates a new depthwise separable convolutional layer with specific settings.
When creating the layer, you specify:
- Input details: How many channels and the dimensions of your data
- How many patterns to look for (outputDepth)
- How big each filter is (kernelSize)
- How to move the filter across the data (stride)
- Whether to add an extra border (padding)
- What mathematical function to apply to the results (activation)
The layer then creates all the necessary filters with random starting values that will be improved during training. This more efficient approach requires fewer parameters than a standard convolutional layer.
DepthwiseSeparableConvolutionalLayer(int, int, int, int, int, int, int, IVectorActivationFunction<T>?)
Initializes a new instance of the DepthwiseSeparableConvolutionalLayer<T> class with the specified parameters and a vector activation function.
public DepthwiseSeparableConvolutionalLayer(int inputDepth, int outputDepth, int kernelSize, int inputHeight, int inputWidth, int stride = 1, int padding = 0, IVectorActivationFunction<T>? vectorActivation = null)
Parameters
inputDepthintThe number of channels in the input data.
outputDepthintThe number of output channels to create.
kernelSizeintThe size of each filter kernel (width and height).
inputHeightintThe height of the input data.
inputWidthintThe width of the input data.
strideintThe step size for moving the kernel. Defaults to 1.
paddingintThe amount of zero-padding to add around the input. Defaults to 0.
vectorActivationIVectorActivationFunction<T>The vector activation function to apply. Defaults to ReLU if not specified.
Remarks
This constructor creates a depthwise separable convolutional layer with the specified configuration and a vector activation function. Vector activation functions operate on entire vectors at once, which can be more efficient for certain operations.
For Beginners: This setup method is similar to the previous one, but uses a different type of activation function.
A vector activation function:
- Works on entire groups of numbers at once
- Can be more efficient for certain types of calculations
- Otherwise works the same as the regular activation function
You would choose this option if you have a specific mathematical operation that needs to be applied to groups of outputs rather than individual values.
Properties
SupportsGpuExecution
Gets a value indicating whether this layer supports GPU execution.
protected override bool SupportsGpuExecution { get; }
Property Value
SupportsGpuTraining
Gets a value indicating whether this layer supports GPU-resident training.
public override bool SupportsGpuTraining { get; }
Property Value
SupportsJitCompilation
Gets a value indicating whether this layer supports JIT compilation.
public override bool SupportsJitCompilation { get; }
Property Value
- bool
truewhen kernels are initialized and activation function supports JIT.
Remarks
Depthwise separable convolutional layers support JIT compilation using DepthwiseConv2D and Conv2D operations from TensorOperations. The layer performs depthwise convolution followed by pointwise (1x1) convolution.
SupportsTraining
Gets a value indicating whether this layer supports training through backpropagation.
public override bool SupportsTraining { get; }
Property Value
- bool
Always returns
truefor depthwise separable convolutional layers, as they contain trainable parameters.
Remarks
This property indicates whether the layer can be trained through backpropagation. Depthwise separable convolutional layers have trainable parameters (kernel weights and biases), so they support training.
For Beginners: This property tells you if the layer can learn from data.
For depthwise separable convolutional layers:
- The value is always true
- This means the layer can adjust its filters and biases during training
- It will improve its pattern recognition as it processes more data
Some other layer types might not have trainable parameters and would return false here.
Methods
ApplyActivationDerivative(T, T)
Applies the derivative of the activation function during backpropagation.
protected T ApplyActivationDerivative(T gradient, T output)
Parameters
gradientTThe gradient flowing back from the next layer.
outputTThe output value from the forward pass.
Returns
- T
The gradient after applying the activation derivative.
Remarks
This method applies the derivative of the layer's activation function during backpropagation. It handles both scalar and vector activation functions appropriately.
For Beginners: This method helps determine how sensitive the output is to small changes.
During backpropagation:
- The network needs to know how much a small change in the input affects the output
- This is calculated by applying the derivative of the activation function
- The result tells us how to adjust the parameters to improve the network
Think of it like figuring out how steep a hill is - the steeper the hill, the more a small step will change your elevation.
Exceptions
- InvalidOperationException
Thrown when activation functions are not set.
Backward(Tensor<T>)
Calculates gradients for the input, kernels, and biases during backpropagation.
public override Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>The gradient of the loss with respect to the layer's output.
Returns
- Tensor<T>
The gradient of the loss with respect to the layer's input.
Remarks
This method performs the backward pass of the depthwise separable convolutional layer during training. It calculates the gradients for the depthwise kernels, pointwise kernels, biases, and the input. These gradients indicate how each parameter should be adjusted to reduce the loss.
For Beginners: This method helps the layer learn from its mistakes.
During the backward pass:
- The layer receives information about how wrong its output was
- It calculates how to adjust each of its filters to be more accurate
- It prepares the adjustments but doesn't apply them yet
- It passes information back to previous layers so they can learn too
The layer has to figure out:
- How to adjust the depthwise filters (first step)
- How to adjust the pointwise filters (second step)
- How to adjust the biases
This is where the actual "learning" happens in the neural network.
Exceptions
- InvalidOperationException
Thrown when backward is called before forward.
BackwardGpu(IGpuTensor<T>)
Performs the backward pass on GPU tensors.
public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)
Parameters
outputGradientIGpuTensor<T>GPU tensor containing the gradient of the loss with respect to the output.
Returns
- IGpuTensor<T>
GPU tensor containing the gradient of the loss with respect to the input.
ExportComputationGraph(List<ComputationNode<T>>)
Exports the depthwise separable convolutional layer's forward pass as a JIT-compilable computation graph.
public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)
Parameters
inputNodesList<ComputationNode<T>>List to populate with input computation nodes.
Returns
- ComputationNode<T>
The output computation node representing the depthwise separable convolution output.
Remarks
The depthwise separable convolution computation graph implements: 1. Depthwise convolution: Applies separate filters to each input channel 2. Pointwise convolution: 1x1 convolution to combine channels and add bias 3. Activation function
For Beginners: This creates an optimized version of the depthwise separable convolution. It dramatically reduces computational cost compared to standard convolution.
Forward(Tensor<T>)
Processes the input data through the depthwise separable convolutional layer.
public override Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>The input tensor to process.
Returns
- Tensor<T>
The output tensor after depthwise separable convolution and activation.
Remarks
This method performs the forward pass of the depthwise separable convolutional layer. It first applies the depthwise convolution, then the pointwise convolution, adds biases, and finally applies the activation function. The result is a tensor where each channel represents different features detected by the layer.
For Beginners: This method applies the two-step filtering process to your input data.
During the forward pass:
- First, apply depthwise convolution (filter each channel separately)
- Next, apply pointwise convolution (mix filtered channels together)
- Add biases to each output channel
- Apply the activation function to make results non-linear
Think of it like a cooking process where you:
- Process each ingredient separately (depthwise)
- Mix the processed ingredients together (pointwise)
- Add seasoning (biases)
- Cook everything (activation function)
The result shows which patterns were detected in the input data.
ForwardGpu(params IGpuTensor<T>[])
Performs a GPU-resident forward pass using fused DepthwiseConv2D + pointwise Conv2D + Bias + Activation.
public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)
Parameters
inputsIGpuTensor<T>[]GPU-resident input tensor.
Returns
- IGpuTensor<T>
GPU-resident output tensor.
Remarks
For Beginners: This is the GPU-optimized version of the Forward method. All data stays on the GPU throughout the computation, avoiding expensive CPU-GPU transfers.
GetParameters()
Gets all trainable parameters of the layer as a single vector.
public override Vector<T> GetParameters()
Returns
- Vector<T>
A vector containing all depthwise kernels, pointwise kernels, and biases.
Remarks
This method extracts all trainable parameters from the layer and returns them as a single vector. This includes all depthwise kernels, pointwise kernels, and biases, concatenated in that order.
For Beginners: This method gathers all the learned values from the layer.
The parameters include:
- All depthwise filter values (first step filters)
- All pointwise filter values (second step filters)
- All bias values
These are combined into a single long list (vector), which can be used for:
- Saving the model
- Sharing parameters between layers
- Advanced optimization techniques
This provides access to all the "knowledge" the layer has learned.
ResetState()
Resets the internal state of the layer.
public override void ResetState()
Remarks
This method clears the cached values from the forward and backward passes, including the input, intermediate outputs, and gradients. This is useful when starting to process a new batch or when implementing stateful networks.
For Beginners: This method clears the layer's memory to start fresh.
When resetting the state:
- The layer forgets the last input it processed
- It forgets the intermediate results (after depthwise convolution)
- It forgets the final output it produced
- It clears any calculated gradients
This is useful for:
- Processing a new, unrelated set of data
- Preventing information from one batch affecting another
- Starting a new training episode
Think of it like wiping a whiteboard clean before starting a new calculation.
SetParameters(Vector<T>)
Sets all trainable parameters of the layer from a single vector.
public override void SetParameters(Vector<T> parameters)
Parameters
parametersVector<T>A vector containing all parameters to set.
Remarks
This method sets all trainable parameters of the layer from a single vector. The vector must contain values for all depthwise kernels, pointwise kernels, and biases, in that order.
For Beginners: This method updates all the layer's learned values at once.
When setting parameters:
- The vector must have exactly the right number of values
- The values are assigned in order: depthwise filters, pointwise filters, then biases
This is useful for:
- Loading a previously saved model
- Copying parameters from another model
- Setting parameters that were optimized externally
It's like replacing all the "knowledge" in the layer with new information.
Exceptions
- ArgumentException
Thrown when the parameters vector has incorrect length.
UpdateParameters(T)
Updates the layer's parameters (kernel weights and biases) using the calculated gradients.
public override void UpdateParameters(T learningRate)
Parameters
learningRateTThe learning rate to use for the update.
Remarks
This method updates the layer's parameters (depthwise kernels, pointwise kernels, and biases) based on the gradients calculated during the backward pass. The learning rate controls the step size of the update.
For Beginners: This method applies the lessons learned during training.
When updating parameters:
- The learning rate controls how big each adjustment is
- Small learning rate = small, careful changes
- Large learning rate = big, faster changes (but might overshoot)
The layer updates:
- The depthwise filters (first step filters)
- The pointwise filters (second step filters)
- The biases
This happens after each batch of data, gradually improving the layer's performance.
Exceptions
- InvalidOperationException
Thrown when update is called before backward.
UpdateParametersGpu(IGpuOptimizerConfig)
Updates parameters on GPU using the configured optimizer.
public override void UpdateParametersGpu(IGpuOptimizerConfig config)
Parameters
configIGpuOptimizerConfigThe GPU optimizer configuration.