Class DepthwiseSeparableConvolutionalLayer<T>

Namespace: AiDotNet.NeuralNetworks.Layers

Assembly: AiDotNet.dll

Represents a depthwise separable convolutional layer that performs convolution as two separate operations.

public class DepthwiseSeparableConvolutionalLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable

Type Parameters

T: The numeric type used for calculations, typically float or double.

Inheritance: object

LayerBase<T>

DepthwiseSeparableConvolutionalLayer<T>

Implements: ILayer<T>

IJitCompilable<T>

IDiagnosticsProvider

IWeightLoadable<T>

IDisposable

Inherited Members: LayerBase<T>.Engine

LayerBase<T>.ScalarActivation

LayerBase<T>.VectorActivation

LayerBase<T>.UsingVectorActivation

LayerBase<T>.NumOps

LayerBase<T>.Random

LayerBase<T>.Parameters

LayerBase<T>.ParameterGradients

LayerBase<T>.InputShape

LayerBase<T>.InputShapes

LayerBase<T>.UpdateInputShape(int[])

LayerBase<T>.OutputShape

LayerBase<T>.IsTrainingMode

LayerBase<T>.InitializationStrategy

LayerBase<T>.IsInitialized

LayerBase<T>.InitializationLock

LayerBase<T>.EnsureInitialized()

LayerBase<T>.UseAutodiff

LayerBase<T>.SetTrainingMode(bool)

LayerBase<T>.GetParameterGradients()

LayerBase<T>.ClearGradients()

LayerBase<T>.GetInputShape()

LayerBase<T>.GetInputShapes()

LayerBase<T>.GetOutputShape()

LayerBase<T>.GetWeights()

LayerBase<T>.GetBiases()

LayerBase<T>.MapActivationToFused()

LayerBase<T>.CanExecuteOnGpu

LayerBase<T>.CanTrainOnGpu

LayerBase<T>.UploadWeightsToGpu()

LayerBase<T>.DownloadWeightsFromGpu()

LayerBase<T>.ZeroGradientsGpu()

LayerBase<T>.GetActivationTypes()

LayerBase<T>.Forward(params Tensor<T>[])

LayerBase<T>.ApplyActivation(Tensor<T>)

LayerBase<T>.ApplyActivation(Vector<T>)

LayerBase<T>.ActivateTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ActivateTensor(IVectorActivationFunction<T>, Tensor<T>)

LayerBase<T>.CalculateInputShape(int, int, int)

LayerBase<T>.CalculateOutputShape(int, int, int)

LayerBase<T>.Clone()

LayerBase<T>.DerivativeTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ApplyActivationDerivative(Tensor<T>, Tensor<T>)

LayerBase<T>.ComputeActivationJacobian(Vector<T>)

LayerBase<T>.ApplyActivationDerivative(Vector<T>, Vector<T>)

LayerBase<T>.UpdateParameters(Vector<T>)

LayerBase<T>.ParameterCount

LayerBase<T>.Serialize(BinaryWriter)

LayerBase<T>.Deserialize(BinaryReader)

LayerBase<T>.GetDiagnostics()

LayerBase<T>.ApplyActivationToGraph(ComputationNode<T>)

LayerBase<T>.CanActivationBeJitted()

LayerBase<T>.RegisterTrainableParameter(Tensor<T>, PersistentTensorRole)

LayerBase<T>.InvalidateTrainableParameter(Tensor<T>)

LayerBase<T>.HasGpuActivation()

LayerBase<T>.ApplyActivationForwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.ApplyActivationBackwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.GetFusedActivationType()

LayerBase<T>.ApplyGpuActivation(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int, FusedActivationType)

LayerBase<T>.ApplyGpuActivationBackward(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int, FusedActivationType, float)

LayerBase<T>.Dispose()

LayerBase<T>.Dispose(bool)

LayerBase<T>.WeightParameterName

LayerBase<T>.BiasParameterName

LayerBase<T>.SetWeights(Tensor<T>)

LayerBase<T>.SetBiases(Tensor<T>)

LayerBase<T>.GetParameterNames()

LayerBase<T>.TryGetParameter(string, out Tensor<T>)

LayerBase<T>.SetParameter(string, Tensor<T>)

LayerBase<T>.GetParameterShape(string)

LayerBase<T>.NamedParameterCount

LayerBase<T>.ValidateWeights(IEnumerable<string>, Func<string, string>)

LayerBase<T>.LoadWeights(Dictionary<string, Tensor<T>>, Func<string, string>, bool)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

A depthwise separable convolutional layer splits the standard convolution operation into two parts: a depthwise convolution, which applies a single filter per input channel, and a pointwise convolution, which uses 1×1 convolutions to combine the outputs. This approach dramatically reduces the number of parameters and computational cost compared to standard convolution.

For Beginners: A depthwise separable convolution is like a more efficient way to filter an image.

Think of it as a two-step process:

First step (depthwise): Apply separate filters to each input channel (like filtering red, green, and blue separately)
Second step (pointwise): Mix these filtered channels together (like combining the filtered colors)

For example, in image processing:

Standard convolution might use 100,000 calculations for a single operation
Depthwise separable convolution might do the same job with only 10,000 calculations

This makes your neural network faster and smaller while still capturing important patterns. It's commonly used in mobile and edge devices where efficiency is critical.

Constructors

DepthwiseSeparableConvolutionalLayer(int, int, int, int, int, int, int, IActivationFunction<T>?)

Initializes a new instance of the DepthwiseSeparableConvolutionalLayer<T> class with the specified parameters and a scalar activation function.

public DepthwiseSeparableConvolutionalLayer(int inputDepth, int outputDepth, int kernelSize, int inputHeight, int inputWidth, int stride = 1, int padding = 0, IActivationFunction<T>? activation = null)

Parameters

inputDepth int: The number of channels in the input data.
outputDepth int: The number of output channels to create.
kernelSize int: The size of each filter kernel (width and height).
inputHeight int: The height of the input data.
inputWidth int: The width of the input data.
stride int: The step size for moving the kernel. Defaults to 1.
padding int: The amount of zero-padding to add around the input. Defaults to 0.
activation IActivationFunction<T>: The activation function to apply. Defaults to ReLU if not specified.

Remarks

This constructor creates a depthwise separable convolutional layer with the specified configuration. It initializes both depthwise and pointwise kernels with appropriate scaling factors to help with training convergence. The biases are initialized to zero.

For Beginners: This setup method creates a new depthwise separable convolutional layer with specific settings.

When creating the layer, you specify:

Input details: How many channels and the dimensions of your data
How many patterns to look for (outputDepth)
How big each filter is (kernelSize)
How to move the filter across the data (stride)
Whether to add an extra border (padding)
What mathematical function to apply to the results (activation)

The layer then creates all the necessary filters with random starting values that will be improved during training. This more efficient approach requires fewer parameters than a standard convolutional layer.

DepthwiseSeparableConvolutionalLayer(int, int, int, int, int, int, int, IVectorActivationFunction<T>?)

Initializes a new instance of the DepthwiseSeparableConvolutionalLayer<T> class with the specified parameters and a vector activation function.

public DepthwiseSeparableConvolutionalLayer(int inputDepth, int outputDepth, int kernelSize, int inputHeight, int inputWidth, int stride = 1, int padding = 0, IVectorActivationFunction<T>? vectorActivation = null)

Parameters

inputDepth int: The number of channels in the input data.
outputDepth int: The number of output channels to create.
kernelSize int: The size of each filter kernel (width and height).
inputHeight int: The height of the input data.
inputWidth int: The width of the input data.
stride int: The step size for moving the kernel. Defaults to 1.
padding int: The amount of zero-padding to add around the input. Defaults to 0.
vectorActivation IVectorActivationFunction<T>: The vector activation function to apply. Defaults to ReLU if not specified.

Remarks

This constructor creates a depthwise separable convolutional layer with the specified configuration and a vector activation function. Vector activation functions operate on entire vectors at once, which can be more efficient for certain operations.

For Beginners: This setup method is similar to the previous one, but uses a different type of activation function.

A vector activation function:

Works on entire groups of numbers at once
Can be more efficient for certain types of calculations
Otherwise works the same as the regular activation function

You would choose this option if you have a specific mathematical operation that needs to be applied to groups of outputs rather than individual values.

Properties

SupportsGpuExecution

Gets a value indicating whether this layer supports GPU execution.

protected override bool SupportsGpuExecution { get; }

Property Value

bool

SupportsGpuTraining

Gets a value indicating whether this layer supports GPU-resident training.

public override bool SupportsGpuTraining { get; }

Property Value

bool

SupportsJitCompilation

Gets a value indicating whether this layer supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool: true when kernels are initialized and activation function supports JIT.

Remarks

Depthwise separable convolutional layers support JIT compilation using DepthwiseConv2D and Conv2D operations from TensorOperations. The layer performs depthwise convolution followed by pointwise (1x1) convolution.

SupportsTraining

Gets a value indicating whether this layer supports training through backpropagation.

public override bool SupportsTraining { get; }

Property Value

bool: Always returns true for depthwise separable convolutional layers, as they contain trainable parameters.

Remarks

This property indicates whether the layer can be trained through backpropagation. Depthwise separable convolutional layers have trainable parameters (kernel weights and biases), so they support training.

For Beginners: This property tells you if the layer can learn from data.

For depthwise separable convolutional layers:

The value is always true
This means the layer can adjust its filters and biases during training
It will improve its pattern recognition as it processes more data

Some other layer types might not have trainable parameters and would return false here.

Methods

ApplyActivationDerivative(T, T)

Applies the derivative of the activation function during backpropagation.

protected T ApplyActivationDerivative(T gradient, T output)

Parameters

gradient T: The gradient flowing back from the next layer.
output T: The output value from the forward pass.

Returns

T: The gradient after applying the activation derivative.

Remarks

This method applies the derivative of the layer's activation function during backpropagation. It handles both scalar and vector activation functions appropriately.

For Beginners: This method helps determine how sensitive the output is to small changes.

During backpropagation:

The network needs to know how much a small change in the input affects the output
This is calculated by applying the derivative of the activation function
The result tells us how to adjust the parameters to improve the network

Think of it like figuring out how steep a hill is - the steeper the hill, the more a small step will change your elevation.

Exceptions

InvalidOperationException: Thrown when activation functions are not set.

Backward(Tensor<T>)

Calculates gradients for the input, kernels, and biases during backpropagation.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>: The gradient of the loss with respect to the layer's output.

Returns

Tensor<T>: The gradient of the loss with respect to the layer's input.

Remarks

This method performs the backward pass of the depthwise separable convolutional layer during training. It calculates the gradients for the depthwise kernels, pointwise kernels, biases, and the input. These gradients indicate how each parameter should be adjusted to reduce the loss.

For Beginners: This method helps the layer learn from its mistakes.

During the backward pass:

The layer receives information about how wrong its output was
It calculates how to adjust each of its filters to be more accurate
It prepares the adjustments but doesn't apply them yet
It passes information back to previous layers so they can learn too

The layer has to figure out:

How to adjust the depthwise filters (first step)
How to adjust the pointwise filters (second step)
How to adjust the biases

This is where the actual "learning" happens in the neural network.

Exceptions

InvalidOperationException: Thrown when backward is called before forward.

BackwardGpu(IGpuTensor<T>)

Performs the backward pass on GPU tensors.

public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)

Parameters

outputGradient IGpuTensor<T>: GPU tensor containing the gradient of the loss with respect to the output.

Returns

IGpuTensor<T>: GPU tensor containing the gradient of the loss with respect to the input.

ExportComputationGraph(List<ComputationNode<T>>)

Exports the depthwise separable convolutional layer's forward pass as a JIT-compilable computation graph.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>: List to populate with input computation nodes.

Returns

ComputationNode<T>: The output computation node representing the depthwise separable convolution output.

Remarks

The depthwise separable convolution computation graph implements: 1. Depthwise convolution: Applies separate filters to each input channel 2. Pointwise convolution: 1x1 convolution to combine channels and add bias 3. Activation function

For Beginners: This creates an optimized version of the depthwise separable convolution. It dramatically reduces computational cost compared to standard convolution.

Forward(Tensor<T>)

Processes the input data through the depthwise separable convolutional layer.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>: The input tensor to process.

Returns

Tensor<T>: The output tensor after depthwise separable convolution and activation.

Remarks

This method performs the forward pass of the depthwise separable convolutional layer. It first applies the depthwise convolution, then the pointwise convolution, adds biases, and finally applies the activation function. The result is a tensor where each channel represents different features detected by the layer.

For Beginners: This method applies the two-step filtering process to your input data.

During the forward pass:

First, apply depthwise convolution (filter each channel separately)
Next, apply pointwise convolution (mix filtered channels together)
Add biases to each output channel
Apply the activation function to make results non-linear

Think of it like a cooking process where you:

Process each ingredient separately (depthwise)
Mix the processed ingredients together (pointwise)
Add seasoning (biases)
Cook everything (activation function)

The result shows which patterns were detected in the input data.

ForwardGpu(params IGpuTensor<T>[])

Performs a GPU-resident forward pass using fused DepthwiseConv2D + pointwise Conv2D + Bias + Activation.

public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)

Parameters

inputs IGpuTensor<T>[]: GPU-resident input tensor.

Returns

IGpuTensor<T>: GPU-resident output tensor.

Remarks

For Beginners: This is the GPU-optimized version of the Forward method. All data stays on the GPU throughout the computation, avoiding expensive CPU-GPU transfers.

GetParameters()

Gets all trainable parameters of the layer as a single vector.

public override Vector<T> GetParameters()

Returns

Vector<T>: A vector containing all depthwise kernels, pointwise kernels, and biases.

Remarks

This method extracts all trainable parameters from the layer and returns them as a single vector. This includes all depthwise kernels, pointwise kernels, and biases, concatenated in that order.

For Beginners: This method gathers all the learned values from the layer.

The parameters include:

All depthwise filter values (first step filters)
All pointwise filter values (second step filters)
All bias values

These are combined into a single long list (vector), which can be used for:

Saving the model
Sharing parameters between layers
Advanced optimization techniques

This provides access to all the "knowledge" the layer has learned.

ResetState()

Resets the internal state of the layer.

public override void ResetState()

Remarks

This method clears the cached values from the forward and backward passes, including the input, intermediate outputs, and gradients. This is useful when starting to process a new batch or when implementing stateful networks.

For Beginners: This method clears the layer's memory to start fresh.

When resetting the state:

The layer forgets the last input it processed
It forgets the intermediate results (after depthwise convolution)
It forgets the final output it produced
It clears any calculated gradients

This is useful for:

Processing a new, unrelated set of data
Preventing information from one batch affecting another
Starting a new training episode

Think of it like wiping a whiteboard clean before starting a new calculation.

SetParameters(Vector<T>)

Sets all trainable parameters of the layer from a single vector.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>: A vector containing all parameters to set.

Remarks

This method sets all trainable parameters of the layer from a single vector. The vector must contain values for all depthwise kernels, pointwise kernels, and biases, in that order.

For Beginners: This method updates all the layer's learned values at once.

When setting parameters:

The vector must have exactly the right number of values
The values are assigned in order: depthwise filters, pointwise filters, then biases

This is useful for:

Loading a previously saved model
Copying parameters from another model
Setting parameters that were optimized externally

It's like replacing all the "knowledge" in the layer with new information.

Exceptions

ArgumentException: Thrown when the parameters vector has incorrect length.

UpdateParameters(T)

Updates the layer's parameters (kernel weights and biases) using the calculated gradients.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T: The learning rate to use for the update.

Remarks

This method updates the layer's parameters (depthwise kernels, pointwise kernels, and biases) based on the gradients calculated during the backward pass. The learning rate controls the step size of the update.

For Beginners: This method applies the lessons learned during training.

When updating parameters:

The learning rate controls how big each adjustment is
Small learning rate = small, careful changes
Large learning rate = big, faster changes (but might overshoot)

The layer updates:

The depthwise filters (first step filters)
The pointwise filters (second step filters)
The biases

This happens after each batch of data, gradually improving the layer's performance.

Exceptions

InvalidOperationException: Thrown when update is called before backward.

UpdateParametersGpu(IGpuOptimizerConfig)

Updates parameters on GPU using the configured optimizer.

public override void UpdateParametersGpu(IGpuOptimizerConfig config)

Parameters

config IGpuOptimizerConfig: The GPU optimizer configuration.

Table of Contents

Class DepthwiseSeparableConvolutionalLayer<T>

Type Parameters

Remarks

Constructors

DepthwiseSeparableConvolutionalLayer(int, int, int, int, int, int, int, IActivationFunction<T>?)

Parameters

Remarks

DepthwiseSeparableConvolutionalLayer(int, int, int, int, int, int, int, IVectorActivationFunction<T>?)

Parameters

Remarks

Properties

SupportsGpuExecution

Property Value

SupportsGpuTraining

Property Value

SupportsJitCompilation

Property Value

Remarks

SupportsTraining

Property Value

Remarks

Methods

ApplyActivationDerivative(T, T)

Parameters

Returns

Remarks

Exceptions

Backward(Tensor<T>)

Parameters

Returns

Remarks

Exceptions

BackwardGpu(IGpuTensor<T>)

Parameters

Returns

ExportComputationGraph(List<ComputationNode<T>>)

Parameters

Returns

Remarks

Forward(Tensor<T>)

Parameters

Returns

Remarks

ForwardGpu(params IGpuTensor<T>[])

Parameters

Returns

Remarks

GetParameters()

Returns

Remarks

ResetState()

Remarks

SetParameters(Vector<T>)

Parameters

Remarks

Exceptions

UpdateParameters(T)

Parameters

Remarks

Exceptions

UpdateParametersGpu(IGpuOptimizerConfig)

Parameters