Table of Contents

Class ConvolutionalLayer<T>

Namespace
AiDotNet.NeuralNetworks.Layers
Assembly
AiDotNet.dll

Represents a convolutional layer in a neural network that applies filters to input data.

public class ConvolutionalLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable

Type Parameters

T

The numeric type used for calculations, typically float or double.

Inheritance
ConvolutionalLayer<T>
Implements
Inherited Members

Remarks

A convolutional layer applies a set of learnable filters to input data to extract features. Each filter slides across the input data, performing element-wise multiplication and summing the results. This operation is called convolution and is particularly effective for processing grid-like data such as images.

For Beginners: A convolutional layer is like a spotlight that scans over data looking for specific patterns.

Think of it like examining a photo with a small magnifying glass:

  • You move the magnifying glass across the image, one step at a time
  • At each position, you note what you see in that small area
  • After scanning the whole image, you have a collection of observations

For example, in image recognition:

  • One filter might detect vertical edges
  • Another might detect horizontal edges
  • Together, they help the network recognize complex shapes

Convolutional layers are fundamental for recognizing patterns in images, audio, and other grid-structured data.

Constructors

ConvolutionalLayer(int, int, int, int, int, int, int, IActivationFunction<T>?, IInitializationStrategy<T>?)

Initializes a new instance of the ConvolutionalLayer<T> class with the specified parameters and a scalar activation function.

public ConvolutionalLayer(int inputDepth, int inputHeight, int inputWidth, int outputDepth, int kernelSize, int stride = 1, int padding = 0, IActivationFunction<T>? activationFunction = null, IInitializationStrategy<T>? initializationStrategy = null)

Parameters

inputDepth int

The number of channels in the input data.

inputHeight int

The height of the input data.

inputWidth int

The width of the input data.

outputDepth int

The number of filters (output channels) to create.

kernelSize int

The size of each filter kernel (width and height).

stride int

The step size for moving the kernel. Defaults to 1.

padding int

The amount of zero-padding to add around the input. Defaults to 0.

activationFunction IActivationFunction<T>

The activation function to apply. Defaults to ReLU if not specified.

initializationStrategy IInitializationStrategy<T>

Remarks

This constructor creates a convolutional layer with the specified configuration. The input shape is determined by the inputDepth, inputHeight, and inputWidth parameters, while the output shape is calculated based on these values along with the kernel size, stride, and padding. The kernels and biases are initialized with random values.

For Beginners: This setup method creates a new convolutional layer with specific settings.

When creating the layer, you specify:

  • Input details: How many channels and the dimensions of your data
  • How many patterns to look for (outputDepth)
  • How big each pattern detector is (kernelSize)
  • How to move the detector across the data (stride)
  • Whether to add an extra border (padding)
  • What mathematical function to apply to the results (activationFunction)

The layer then creates all the necessary pattern detectors with random starting values that will be improved during training.

ConvolutionalLayer(int, int, int, int, int, int, int, IVectorActivationFunction<T>, IInitializationStrategy<T>?)

Initializes a new instance of the ConvolutionalLayer<T> class with the specified parameters and a vector activation function.

public ConvolutionalLayer(int inputDepth, int inputHeight, int inputWidth, int outputDepth, int kernelSize, int stride, int padding, IVectorActivationFunction<T> vectorActivationFunction, IInitializationStrategy<T>? initializationStrategy = null)

Parameters

inputDepth int

The number of channels in the input data.

inputHeight int

The height of the input data.

inputWidth int

The width of the input data.

outputDepth int

The number of filters (output channels) to create.

kernelSize int

The size of each filter kernel (width and height).

stride int

The step size for moving the kernel. Defaults to 1.

padding int

The amount of zero-padding to add around the input. Defaults to 0.

vectorActivationFunction IVectorActivationFunction<T>

The vector activation function to apply (required to disambiguate from IActivationFunction overload).

initializationStrategy IInitializationStrategy<T>

Remarks

This constructor creates a convolutional layer with the specified configuration and a vector activation function, which operates on entire vectors rather than individual elements. This can be useful when applying more complex activation functions or when performance is a concern.

For Beginners: This setup method is similar to the previous one, but uses a different type of activation function.

A vector activation function:

  • Works on entire groups of numbers at once
  • Can be more efficient for certain types of calculations
  • Otherwise works the same as the regular activation function

You would choose this option if you have a specific mathematical operation that needs to be applied to groups of outputs rather than individual values.

Properties

InputDepth

Gets the depth (number of channels) of the input data.

public int InputDepth { get; }

Property Value

int

Remarks

The input depth represents the number of channels in the input data. For example, RGB images have a depth of 3 (red, green, and blue channels), while grayscale images have a depth of 1.

For Beginners: Input depth is the number of "layers" in your input data.

Think of it like:

  • A color photo has 3 layers (red, green, blue)
  • A black and white photo has 1 layer

Each layer contains different information about the same data.

IsInitialized

Gets a value indicating whether this layer has been initialized.

public override bool IsInitialized { get; }

Property Value

bool

Remarks

For layers with lazy initialization, this indicates whether the weights have been allocated and initialized. For eager initialization, this is always true after construction.

For Beginners: This tells you if the layer's weights are ready to use.

A value of true means:

  • Weights have been allocated
  • The layer is ready for forward/backward passes

A value of false means:

  • Weights are not yet allocated (lazy initialization)
  • The first Forward() call will initialize them

KernelSize

Gets the size of each filter (kernel) used in the convolution operation.

public int KernelSize { get; }

Property Value

int

Remarks

The kernel size determines the area of the input that is examined at each position. A larger kernel size means a larger area is considered for each output value, potentially capturing more complex patterns.

For Beginners: Kernel size is how big the "spotlight" or "magnifying glass" is.

For example:

  • A kernel size of 3 means a 3×3 area (9 pixels in an image)
  • A kernel size of 5 means a 5×5 area (25 pixels)

Smaller kernels (like 3×3) are good for detecting fine details. Larger kernels (like 7×7) can see broader patterns but may miss small details.

OutputDepth

Gets the depth (number of filters) of the output data.

public int OutputDepth { get; }

Property Value

int

Remarks

The output depth represents the number of filters applied to the input data. Each filter looks for a different pattern in the input, resulting in a different output channel.

For Beginners: Output depth is how many different patterns this layer will look for.

For example:

  • If output depth is 16, the layer will look for 16 different patterns
  • Each pattern creates its own output "layer" or channel
  • More output channels means the network can recognize more complex features

A higher number usually means the network can learn more details, but also requires more processing power.

Padding

Gets the amount of zero-padding added to the input data before convolution.

public int Padding { get; }

Property Value

int

Remarks

Padding involves adding extra values (typically zeros) around the input data before performing the convolution. This allows the kernel to slide beyond the edges of the original input, preserving the spatial dimensions in the output.

For Beginners: Padding is like adding an extra border around your data.

Imagine adding a frame around a photo:

  • The frame is filled with zeros (blank data)
  • This allows the spotlight to analyze edges without going "off the picture"

Benefits of padding:

  • Maintains the size of your data through the layer
  • Ensures border information isn't lost
  • Without padding, each layer would make your data smaller

ParameterCount

Gets all trainable parameters of the layer as a single vector.

public override int ParameterCount { get; }

Property Value

int

A vector containing all kernel weights and biases.

Remarks

This method extracts all trainable parameters (kernel weights and biases) from the layer and returns them as a single vector. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights.

For Beginners: This method gathers all the learned values from the layer.

The parameters include:

  • All values from all pattern detectors (kernels)
  • All bias values

These are combined into a single long list (vector), which can be used for:

  • Saving the model
  • Sharing parameters between layers
  • Advanced optimization techniques

This provides access to all the "knowledge" the layer has learned.

Stride

Gets the step size for moving the kernel across the input data.

public int Stride { get; }

Property Value

int

Remarks

The stride determines how many positions to move the kernel for each step during the convolution operation. A stride of 1 means the kernel moves one position at a time, examining every possible position. A larger stride means fewer positions are examined, resulting in a smaller output.

For Beginners: Stride is how far you move the spotlight each time.

Think of it like:

  • Stride of 1: Move one step at a time (examine every position)
  • Stride of 2: Skip one position between each examination (move two steps each time)

Using a larger stride:

  • Makes the output smaller (reduces dimensions)
  • Speeds up processing
  • But might miss some information

SupportsGpuExecution

Gets whether this layer has a GPU implementation.

protected override bool SupportsGpuExecution { get; }

Property Value

bool

SupportsJitCompilation

Gets whether this convolutional layer supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool

True if the layer and its activation function support JIT compilation.

Remarks

This property indicates whether the layer can be JIT compiled. The layer supports JIT if: - The layer is properly initialized with weights - The activation function (if any) supports JIT compilation

For Beginners: This tells you if this layer can use JIT compilation for faster inference.

The layer can be JIT compiled if:

  • The layer has been trained or initialized with weights
  • The activation function (ReLU, etc.) supports JIT

Conv2D operations are fully supported for JIT compilation.

SupportsTraining

Gets a value indicating whether this layer supports training.

public override bool SupportsTraining { get; }

Property Value

bool

true if the layer has trainable parameters and supports backpropagation; otherwise, false.

Remarks

This property indicates whether the layer can be trained through backpropagation. Layers with trainable parameters such as weights and biases typically return true, while layers that only perform fixed transformations (like pooling or activation layers) typically return false.

For Beginners: This property tells you if the layer can learn from data.

A value of true means:

  • The layer has parameters that can be adjusted during training
  • It will improve its performance as it sees more data
  • It participates in the learning process

A value of false means:

  • The layer doesn't have any adjustable parameters
  • It performs the same operation regardless of training
  • It doesn't need to learn (but may still be useful)

Methods

Backward(Tensor<T>)

Calculates gradients for the input, kernels, and biases during backpropagation.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>

The gradient of the loss with respect to the layer's output.

Returns

Tensor<T>

The gradient of the loss with respect to the layer's input.

Remarks

This method performs the backward pass of the convolutional layer during training. It calculates the gradient of the loss with respect to the input, kernel weights, and biases, and updates the weights and biases accordingly. The calculated input gradient is returned for propagation to earlier layers.

For Beginners: This method helps the layer learn from its mistakes.

During the backward pass:

  • The layer receives information about how wrong its output was
  • It calculates how to adjust its pattern detectors to be more accurate
  • It updates the kernels and biases to improve future predictions
  • It passes information back to previous layers so they can learn too

This is where the actual "learning" happens in the neural network. The layer gradually improves its pattern recognition based on feedback about its performance.

BackwardGpu(IGpuTensor<T>)

Performs GPU-resident backward pass for the convolutional layer. Computes gradients for kernels, biases, and input entirely on GPU - no CPU roundtrip.

public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)

Parameters

outputGradient IGpuTensor<T>

GPU-resident gradient from the next layer.

Returns

IGpuTensor<T>

GPU-resident gradient to pass to the previous layer.

Exceptions

InvalidOperationException

Thrown if ForwardGpu was not called first.

Configure(int[], int, int, int, int, IActivationFunction<T>?)

Creates a convolutional layer with the specified configuration using a fluent interface.

public static ConvolutionalLayer<T> Configure(int[] inputShape, int kernelSize, int numberOfFilters, int stride = 1, int padding = 0, IActivationFunction<T>? activationFunction = null)

Parameters

inputShape int[]

The shape of the input data as [depth, height, width].

kernelSize int

The size of each filter kernel (width and height).

numberOfFilters int

The number of filters (output channels) to create.

stride int

The step size for moving the kernel. Defaults to 1.

padding int

The amount of zero-padding to add around the input. Defaults to 0.

activationFunction IActivationFunction<T>

The activation function to apply. Defaults to ReLU if not specified.

Returns

ConvolutionalLayer<T>

A new instance of the ConvolutionalLayer<T> class.

Remarks

This static method provides a more convenient way to create a convolutional layer by specifying the input shape as an array rather than individual dimensions. It extracts the depth, height, and width from the input shape array and passes them to the constructor.

For Beginners: This is a simpler way to create a convolutional layer when you already know your input data's shape.

Instead of providing separate numbers for depth, height, and width, you can:

  • Pass all three dimensions in a single array
  • Specify the other settings in a more intuitive way

For example, if your input is 3-channel images that are 28×28 pixels:

  • You would use inputShape = [3, 28, 28]
  • Rather than listing all dimensions separately

This makes your code cleaner and easier to read.

Exceptions

ArgumentException

Thrown when the input shape does not have exactly 3 dimensions.

Configure(int[], int, int, int, int, IVectorActivationFunction<T>?)

Creates a convolutional layer with the specified configuration and a vector activation function using a fluent interface.

public static ConvolutionalLayer<T> Configure(int[] inputShape, int kernelSize, int numberOfFilters, int stride = 1, int padding = 0, IVectorActivationFunction<T>? vectorActivationFunction = null)

Parameters

inputShape int[]

The shape of the input data as [depth, height, width].

kernelSize int

The size of each filter kernel (width and height).

numberOfFilters int

The number of filters (output channels) to create.

stride int

The step size for moving the kernel. Defaults to 1.

padding int

The amount of zero-padding to add around the input. Defaults to 0.

vectorActivationFunction IVectorActivationFunction<T>

The vector activation function to apply. Defaults to ReLU if not specified.

Returns

ConvolutionalLayer<T>

A new instance of the ConvolutionalLayer<T> class with a vector activation function.

Remarks

This static method provides a more convenient way to create a convolutional layer with a vector activation function by specifying the input shape as an array rather than individual dimensions. It is similar to the Configure method with a scalar activation function, but uses a vector activation function instead.

For Beginners: This is similar to the previous Configure method, but uses a vector activation function.

This method:

  • Makes it easier to create a layer with an input shape array
  • Uses a vector activation function (works on groups of numbers)
  • Is otherwise identical to the previous Configure method

You would choose this if you need a specific type of mathematical operation applied to groups of values rather than individual numbers.

Exceptions

ArgumentException

Thrown when the input shape does not have exactly 3 dimensions.

Deserialize(BinaryReader)

Loads the layer's configuration and parameters from a binary reader.

public override void Deserialize(BinaryReader reader)

Parameters

reader BinaryReader

The binary reader to load from.

Remarks

This method loads the layer's configuration (input depth, output depth, kernel size, stride, padding) and parameters (kernel weights and biases) from a binary reader. This allows a previously saved layer to be loaded from a file.

For Beginners: This method loads a previously saved layer from a file.

When loading a layer:

  • First, it reads the basic configuration
  • Then it recreates all the pattern detectors (kernels)
  • Finally, it loads the bias values

This allows you to:

  • Continue using a model you trained earlier
  • Use a model someone else trained
  • Compare different versions of your model

It's like restoring a snapshot of a trained model exactly as it was.

Dispose(bool)

Releases resources used by this layer, including GPU tensor handles.

protected override void Dispose(bool disposing)

Parameters

disposing bool

True if called from Dispose(), false if called from finalizer.

EnsureInitialized()

Initializes the kernel weights and biases with random values.

protected override void EnsureInitialized()

Remarks

This method initializes the kernel weights using the He initialization method, which scales the random values based on the number of input and output connections. This helps improve training convergence. The biases are initialized to zero.

For Beginners: This method sets up the starting values for the pattern detectors.

When initializing weights:

  • Random values are created for each pattern detector
  • The values are carefully scaled to work well for training
  • _biases start at zero

Good initialization is important because:

  • It helps the network learn faster
  • It prevents certain mathematical problems during training
  • It gives each pattern detector a different starting point

This uses a technique called "He initialization" which works well with modern neural networks.

ExportComputationGraph(List<ComputationNode<T>>)

Exports the convolutional layer's computation graph for JIT compilation.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>

List to populate with input computation nodes.

Returns

ComputationNode<T>

The output computation node representing the convolution operation.

Remarks

This method constructs a computation graph representation of the convolutional layer by: 1. Validating input parameters and layer configuration 2. Creating a symbolic input node with proper batch dimension 3. Creating constant nodes for kernels and biases 4. Applying Conv2D operation 5. Applying activation function if configured

For Beginners: This method converts the convolutional layer into a computation graph for JIT compilation.

The computation graph describes:

  • Input: A symbolic tensor with shape [1, InputDepth, Height, Width]
  • Kernels: The learned filters [OutputDepth, InputDepth, KernelSize, KernelSize]
  • Operation: 2D convolution with specified stride and padding
  • Activation: Applied to the convolution output
  • Output: Feature maps with shape [1, OutputDepth, OutputHeight, OutputWidth]

JIT compilation can make inference 5-10x faster by optimizing this graph into native code.

Forward(Tensor<T>)

Processes the input data through the convolutional layer.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>

The input tensor to process, with shape [batchSize, inputDepth, height, width].

Returns

Tensor<T>

The output tensor after convolution and activation, with shape [batchSize, outputDepth, outputHeight, outputWidth].

Remarks

This method performs the forward pass of the convolutional layer. For each position of the kernel on the input data, it computes the element-wise product of the kernel weights and the corresponding input values, sums the results, adds the bias, and applies the activation function. The result is a tensor where each channel represents the activation of a different filter.

For Beginners: This method applies the pattern detectors to your input data.

During the forward pass:

  • Each pattern detector (kernel) slides across the input
  • At each position, it looks for its specific pattern
  • If it finds a match, it produces a high value in the output
  • The activation function then adjusts these values

Think of it like a series of spotlights scanning across your data, each one lighting up when it finds the pattern it's looking for. The result shows where each pattern was found in the input.

ForwardGpu(params IGpuTensor<T>[])

Performs a GPU-resident forward pass using fused Conv2D + Bias + Activation.

public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)

Parameters

inputs IGpuTensor<T>[]

Returns

IGpuTensor<T>

GPU-resident output tensor.

Remarks

For Beginners: This is the GPU-optimized version of the Forward method. All data stays on the GPU throughout the computation, avoiding expensive CPU-GPU transfers.

GetBiases()

Gets the biases tensor of the convolutional layer.

public override Tensor<T> GetBiases()

Returns

Tensor<T>

The bias values added to each output channel.

GetFilters()

Gets a value indicating whether this layer supports training through backpropagation.

public Tensor<T> GetFilters()

Returns

Tensor<T>

Always returns true for convolutional layers, as they contain trainable parameters.

Remarks

This property indicates whether the layer can be trained through backpropagation. Convolutional layers have trainable parameters (kernel weights and biases), so they support training.

For Beginners: This property tells you if the layer can learn from data.

For convolutional layers:

  • The value is always true
  • This means the layer can adjust its pattern detectors (filters) during training
  • It will improve its pattern recognition as it processes more data

GetParameterGradients()

Gets all parameter gradients of the layer as a single vector.

public override Vector<T> GetParameterGradients()

Returns

Vector<T>

A vector containing all parameter gradients (kernel gradients followed by bias gradients).

GetParameters()

Gets all trainable parameters of the layer as a single vector.

public override Vector<T> GetParameters()

Returns

Vector<T>

A vector containing all trainable parameters.

Remarks

This abstract method must be implemented by derived classes to provide access to all trainable parameters of the layer as a single vector. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights.

For Beginners: This method collects all the learnable values from the layer.

The parameters:

  • Are the numbers that the neural network learns during training
  • Include weights, biases, and other learnable values
  • Are combined into a single long list (vector)

This is useful for:

  • Saving the model to disk
  • Loading parameters from a previously trained model
  • Advanced optimization techniques that need access to all parameters

ResetState()

Resets the internal state of the layer.

public override void ResetState()

Remarks

This method clears the cached input and output values from the most recent forward pass. This is useful when starting to process a new sequence or when implementing stateful layers.

For Beginners: This method clears the layer's memory to start fresh.

When resetting the state:

  • The layer forgets the last input it processed
  • It forgets the last output it produced

This is useful for:

  • Processing a new, unrelated set of data
  • Preventing information from one batch affecting another
  • Starting a new training episode

Think of it like wiping a whiteboard clean before starting a new calculation.

Serialize(BinaryWriter)

Saves the layer's configuration and parameters to a binary writer.

public override void Serialize(BinaryWriter writer)

Parameters

writer BinaryWriter

The binary writer to save to.

Remarks

This method saves the layer's configuration (input depth, output depth, kernel size, stride, padding) and parameters (kernel weights and biases) to a binary writer. This allows the layer to be saved to a file and loaded later.

For Beginners: This method saves all the layer's settings and learned patterns to a file.

When saving a layer:

  • First, it saves the basic configuration (size, stride, etc.)
  • Then it saves all the learned pattern detectors (kernels)
  • Finally, it saves the bias values

This allows you to:

  • Save a trained model to use later
  • Share your trained model with others
  • Store multiple versions of your model

Think of it like taking a snapshot of everything the model has learned.

SetParameters(Vector<T>)

Sets all trainable parameters of the layer from a single vector.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

A vector containing all parameters to set.

Remarks

This method sets all trainable parameters (kernel weights and biases) of the layer from a single vector. The vector must have the exact length required for all parameters of the layer.

For Beginners: This method updates all the layer's learned values at once.

When setting parameters:

  • The vector must have exactly the right number of values
  • The values are assigned to the kernels and biases in a specific order

This is useful for:

  • Loading a previously saved model
  • Copying parameters from another model
  • Setting parameters that were optimized externally

It's like replacing all the "knowledge" in the layer with new information.

Exceptions

ArgumentException

Thrown when the parameters vector has incorrect length.

UpdateParameters(T)

Updates the layer's parameters (kernel weights and biases) using the specified learning rate.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T

The learning rate to use for the update.

Remarks

This method updates the layer's parameters (kernel weights and biases) based on the gradients calculated during the backward pass. The learning rate controls the step size of the update, with a smaller learning rate resulting in smaller, more cautious updates.

For Beginners: This method applies the lessons learned during training.

When updating parameters:

  • The learning rate controls how big each adjustment is
  • Small learning rate = small, careful changes
  • Large learning rate = big, faster changes (but might overshoot)

Think of it like adjusting your position in a game:

  • If you're far from the target, you might take big steps
  • As you get closer, you take smaller, more precise steps

The learning rate helps balance between learning quickly and learning accurately.