Class ConvolutionalLayer<T>
- Namespace
- AiDotNet.NeuralNetworks.Layers
- Assembly
- AiDotNet.dll
Represents a convolutional layer in a neural network that applies filters to input data.
public class ConvolutionalLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
LayerBase<T>ConvolutionalLayer<T>
- Implements
-
ILayer<T>
- Inherited Members
Remarks
A convolutional layer applies a set of learnable filters to input data to extract features. Each filter slides across the input data, performing element-wise multiplication and summing the results. This operation is called convolution and is particularly effective for processing grid-like data such as images.
For Beginners: A convolutional layer is like a spotlight that scans over data looking for specific patterns.
Think of it like examining a photo with a small magnifying glass:
- You move the magnifying glass across the image, one step at a time
- At each position, you note what you see in that small area
- After scanning the whole image, you have a collection of observations
For example, in image recognition:
- One filter might detect vertical edges
- Another might detect horizontal edges
- Together, they help the network recognize complex shapes
Convolutional layers are fundamental for recognizing patterns in images, audio, and other grid-structured data.
Constructors
ConvolutionalLayer(int, int, int, int, int, int, int, IActivationFunction<T>?, IInitializationStrategy<T>?)
Initializes a new instance of the ConvolutionalLayer<T> class with the specified parameters and a scalar activation function.
public ConvolutionalLayer(int inputDepth, int inputHeight, int inputWidth, int outputDepth, int kernelSize, int stride = 1, int padding = 0, IActivationFunction<T>? activationFunction = null, IInitializationStrategy<T>? initializationStrategy = null)
Parameters
inputDepthintThe number of channels in the input data.
inputHeightintThe height of the input data.
inputWidthintThe width of the input data.
outputDepthintThe number of filters (output channels) to create.
kernelSizeintThe size of each filter kernel (width and height).
strideintThe step size for moving the kernel. Defaults to 1.
paddingintThe amount of zero-padding to add around the input. Defaults to 0.
activationFunctionIActivationFunction<T>The activation function to apply. Defaults to ReLU if not specified.
initializationStrategyIInitializationStrategy<T>
Remarks
This constructor creates a convolutional layer with the specified configuration. The input shape is determined by the inputDepth, inputHeight, and inputWidth parameters, while the output shape is calculated based on these values along with the kernel size, stride, and padding. The kernels and biases are initialized with random values.
For Beginners: This setup method creates a new convolutional layer with specific settings.
When creating the layer, you specify:
- Input details: How many channels and the dimensions of your data
- How many patterns to look for (outputDepth)
- How big each pattern detector is (kernelSize)
- How to move the detector across the data (stride)
- Whether to add an extra border (padding)
- What mathematical function to apply to the results (activationFunction)
The layer then creates all the necessary pattern detectors with random starting values that will be improved during training.
ConvolutionalLayer(int, int, int, int, int, int, int, IVectorActivationFunction<T>, IInitializationStrategy<T>?)
Initializes a new instance of the ConvolutionalLayer<T> class with the specified parameters and a vector activation function.
public ConvolutionalLayer(int inputDepth, int inputHeight, int inputWidth, int outputDepth, int kernelSize, int stride, int padding, IVectorActivationFunction<T> vectorActivationFunction, IInitializationStrategy<T>? initializationStrategy = null)
Parameters
inputDepthintThe number of channels in the input data.
inputHeightintThe height of the input data.
inputWidthintThe width of the input data.
outputDepthintThe number of filters (output channels) to create.
kernelSizeintThe size of each filter kernel (width and height).
strideintThe step size for moving the kernel. Defaults to 1.
paddingintThe amount of zero-padding to add around the input. Defaults to 0.
vectorActivationFunctionIVectorActivationFunction<T>The vector activation function to apply (required to disambiguate from IActivationFunction overload).
initializationStrategyIInitializationStrategy<T>
Remarks
This constructor creates a convolutional layer with the specified configuration and a vector activation function, which operates on entire vectors rather than individual elements. This can be useful when applying more complex activation functions or when performance is a concern.
For Beginners: This setup method is similar to the previous one, but uses a different type of activation function.
A vector activation function:
- Works on entire groups of numbers at once
- Can be more efficient for certain types of calculations
- Otherwise works the same as the regular activation function
You would choose this option if you have a specific mathematical operation that needs to be applied to groups of outputs rather than individual values.
Properties
InputDepth
Gets the depth (number of channels) of the input data.
public int InputDepth { get; }
Property Value
Remarks
The input depth represents the number of channels in the input data. For example, RGB images have a depth of 3 (red, green, and blue channels), while grayscale images have a depth of 1.
For Beginners: Input depth is the number of "layers" in your input data.
Think of it like:
- A color photo has 3 layers (red, green, blue)
- A black and white photo has 1 layer
Each layer contains different information about the same data.
IsInitialized
Gets a value indicating whether this layer has been initialized.
public override bool IsInitialized { get; }
Property Value
Remarks
For layers with lazy initialization, this indicates whether the weights have been allocated and initialized. For eager initialization, this is always true after construction.
For Beginners: This tells you if the layer's weights are ready to use.
A value of true means:
- Weights have been allocated
- The layer is ready for forward/backward passes
A value of false means:
- Weights are not yet allocated (lazy initialization)
- The first Forward() call will initialize them
KernelSize
Gets the size of each filter (kernel) used in the convolution operation.
public int KernelSize { get; }
Property Value
Remarks
The kernel size determines the area of the input that is examined at each position. A larger kernel size means a larger area is considered for each output value, potentially capturing more complex patterns.
For Beginners: Kernel size is how big the "spotlight" or "magnifying glass" is.
For example:
- A kernel size of 3 means a 3×3 area (9 pixels in an image)
- A kernel size of 5 means a 5×5 area (25 pixels)
Smaller kernels (like 3×3) are good for detecting fine details. Larger kernels (like 7×7) can see broader patterns but may miss small details.
OutputDepth
Gets the depth (number of filters) of the output data.
public int OutputDepth { get; }
Property Value
Remarks
The output depth represents the number of filters applied to the input data. Each filter looks for a different pattern in the input, resulting in a different output channel.
For Beginners: Output depth is how many different patterns this layer will look for.
For example:
- If output depth is 16, the layer will look for 16 different patterns
- Each pattern creates its own output "layer" or channel
- More output channels means the network can recognize more complex features
A higher number usually means the network can learn more details, but also requires more processing power.
Padding
Gets the amount of zero-padding added to the input data before convolution.
public int Padding { get; }
Property Value
Remarks
Padding involves adding extra values (typically zeros) around the input data before performing the convolution. This allows the kernel to slide beyond the edges of the original input, preserving the spatial dimensions in the output.
For Beginners: Padding is like adding an extra border around your data.
Imagine adding a frame around a photo:
- The frame is filled with zeros (blank data)
- This allows the spotlight to analyze edges without going "off the picture"
Benefits of padding:
- Maintains the size of your data through the layer
- Ensures border information isn't lost
- Without padding, each layer would make your data smaller
ParameterCount
Gets all trainable parameters of the layer as a single vector.
public override int ParameterCount { get; }
Property Value
- int
A vector containing all kernel weights and biases.
Remarks
This method extracts all trainable parameters (kernel weights and biases) from the layer and returns them as a single vector. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights.
For Beginners: This method gathers all the learned values from the layer.
The parameters include:
- All values from all pattern detectors (kernels)
- All bias values
These are combined into a single long list (vector), which can be used for:
- Saving the model
- Sharing parameters between layers
- Advanced optimization techniques
This provides access to all the "knowledge" the layer has learned.
Stride
Gets the step size for moving the kernel across the input data.
public int Stride { get; }
Property Value
Remarks
The stride determines how many positions to move the kernel for each step during the convolution operation. A stride of 1 means the kernel moves one position at a time, examining every possible position. A larger stride means fewer positions are examined, resulting in a smaller output.
For Beginners: Stride is how far you move the spotlight each time.
Think of it like:
- Stride of 1: Move one step at a time (examine every position)
- Stride of 2: Skip one position between each examination (move two steps each time)
Using a larger stride:
- Makes the output smaller (reduces dimensions)
- Speeds up processing
- But might miss some information
SupportsGpuExecution
Gets whether this layer has a GPU implementation.
protected override bool SupportsGpuExecution { get; }
Property Value
SupportsJitCompilation
Gets whether this convolutional layer supports JIT compilation.
public override bool SupportsJitCompilation { get; }
Property Value
- bool
True if the layer and its activation function support JIT compilation.
Remarks
This property indicates whether the layer can be JIT compiled. The layer supports JIT if: - The layer is properly initialized with weights - The activation function (if any) supports JIT compilation
For Beginners: This tells you if this layer can use JIT compilation for faster inference.
The layer can be JIT compiled if:
- The layer has been trained or initialized with weights
- The activation function (ReLU, etc.) supports JIT
Conv2D operations are fully supported for JIT compilation.
SupportsTraining
Gets a value indicating whether this layer supports training.
public override bool SupportsTraining { get; }
Property Value
- bool
trueif the layer has trainable parameters and supports backpropagation; otherwise,false.
Remarks
This property indicates whether the layer can be trained through backpropagation. Layers with trainable parameters such as weights and biases typically return true, while layers that only perform fixed transformations (like pooling or activation layers) typically return false.
For Beginners: This property tells you if the layer can learn from data.
A value of true means:
- The layer has parameters that can be adjusted during training
- It will improve its performance as it sees more data
- It participates in the learning process
A value of false means:
- The layer doesn't have any adjustable parameters
- It performs the same operation regardless of training
- It doesn't need to learn (but may still be useful)
Methods
Backward(Tensor<T>)
Calculates gradients for the input, kernels, and biases during backpropagation.
public override Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>The gradient of the loss with respect to the layer's output.
Returns
- Tensor<T>
The gradient of the loss with respect to the layer's input.
Remarks
This method performs the backward pass of the convolutional layer during training. It calculates the gradient of the loss with respect to the input, kernel weights, and biases, and updates the weights and biases accordingly. The calculated input gradient is returned for propagation to earlier layers.
For Beginners: This method helps the layer learn from its mistakes.
During the backward pass:
- The layer receives information about how wrong its output was
- It calculates how to adjust its pattern detectors to be more accurate
- It updates the kernels and biases to improve future predictions
- It passes information back to previous layers so they can learn too
This is where the actual "learning" happens in the neural network. The layer gradually improves its pattern recognition based on feedback about its performance.
BackwardGpu(IGpuTensor<T>)
Performs GPU-resident backward pass for the convolutional layer. Computes gradients for kernels, biases, and input entirely on GPU - no CPU roundtrip.
public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)
Parameters
outputGradientIGpuTensor<T>GPU-resident gradient from the next layer.
Returns
- IGpuTensor<T>
GPU-resident gradient to pass to the previous layer.
Exceptions
- InvalidOperationException
Thrown if ForwardGpu was not called first.
Configure(int[], int, int, int, int, IActivationFunction<T>?)
Creates a convolutional layer with the specified configuration using a fluent interface.
public static ConvolutionalLayer<T> Configure(int[] inputShape, int kernelSize, int numberOfFilters, int stride = 1, int padding = 0, IActivationFunction<T>? activationFunction = null)
Parameters
inputShapeint[]The shape of the input data as [depth, height, width].
kernelSizeintThe size of each filter kernel (width and height).
numberOfFiltersintThe number of filters (output channels) to create.
strideintThe step size for moving the kernel. Defaults to 1.
paddingintThe amount of zero-padding to add around the input. Defaults to 0.
activationFunctionIActivationFunction<T>The activation function to apply. Defaults to ReLU if not specified.
Returns
- ConvolutionalLayer<T>
A new instance of the ConvolutionalLayer<T> class.
Remarks
This static method provides a more convenient way to create a convolutional layer by specifying the input shape as an array rather than individual dimensions. It extracts the depth, height, and width from the input shape array and passes them to the constructor.
For Beginners: This is a simpler way to create a convolutional layer when you already know your input data's shape.
Instead of providing separate numbers for depth, height, and width, you can:
- Pass all three dimensions in a single array
- Specify the other settings in a more intuitive way
For example, if your input is 3-channel images that are 28×28 pixels:
- You would use inputShape = [3, 28, 28]
- Rather than listing all dimensions separately
This makes your code cleaner and easier to read.
Exceptions
- ArgumentException
Thrown when the input shape does not have exactly 3 dimensions.
Configure(int[], int, int, int, int, IVectorActivationFunction<T>?)
Creates a convolutional layer with the specified configuration and a vector activation function using a fluent interface.
public static ConvolutionalLayer<T> Configure(int[] inputShape, int kernelSize, int numberOfFilters, int stride = 1, int padding = 0, IVectorActivationFunction<T>? vectorActivationFunction = null)
Parameters
inputShapeint[]The shape of the input data as [depth, height, width].
kernelSizeintThe size of each filter kernel (width and height).
numberOfFiltersintThe number of filters (output channels) to create.
strideintThe step size for moving the kernel. Defaults to 1.
paddingintThe amount of zero-padding to add around the input. Defaults to 0.
vectorActivationFunctionIVectorActivationFunction<T>The vector activation function to apply. Defaults to ReLU if not specified.
Returns
- ConvolutionalLayer<T>
A new instance of the ConvolutionalLayer<T> class with a vector activation function.
Remarks
This static method provides a more convenient way to create a convolutional layer with a vector activation function by specifying the input shape as an array rather than individual dimensions. It is similar to the Configure method with a scalar activation function, but uses a vector activation function instead.
For Beginners: This is similar to the previous Configure method, but uses a vector activation function.
This method:
- Makes it easier to create a layer with an input shape array
- Uses a vector activation function (works on groups of numbers)
- Is otherwise identical to the previous Configure method
You would choose this if you need a specific type of mathematical operation applied to groups of values rather than individual numbers.
Exceptions
- ArgumentException
Thrown when the input shape does not have exactly 3 dimensions.
Deserialize(BinaryReader)
Loads the layer's configuration and parameters from a binary reader.
public override void Deserialize(BinaryReader reader)
Parameters
readerBinaryReaderThe binary reader to load from.
Remarks
This method loads the layer's configuration (input depth, output depth, kernel size, stride, padding) and parameters (kernel weights and biases) from a binary reader. This allows a previously saved layer to be loaded from a file.
For Beginners: This method loads a previously saved layer from a file.
When loading a layer:
- First, it reads the basic configuration
- Then it recreates all the pattern detectors (kernels)
- Finally, it loads the bias values
This allows you to:
- Continue using a model you trained earlier
- Use a model someone else trained
- Compare different versions of your model
It's like restoring a snapshot of a trained model exactly as it was.
Dispose(bool)
Releases resources used by this layer, including GPU tensor handles.
protected override void Dispose(bool disposing)
Parameters
disposingboolTrue if called from Dispose(), false if called from finalizer.
EnsureInitialized()
Initializes the kernel weights and biases with random values.
protected override void EnsureInitialized()
Remarks
This method initializes the kernel weights using the He initialization method, which scales the random values based on the number of input and output connections. This helps improve training convergence. The biases are initialized to zero.
For Beginners: This method sets up the starting values for the pattern detectors.
When initializing weights:
- Random values are created for each pattern detector
- The values are carefully scaled to work well for training
- _biases start at zero
Good initialization is important because:
- It helps the network learn faster
- It prevents certain mathematical problems during training
- It gives each pattern detector a different starting point
This uses a technique called "He initialization" which works well with modern neural networks.
ExportComputationGraph(List<ComputationNode<T>>)
Exports the convolutional layer's computation graph for JIT compilation.
public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)
Parameters
inputNodesList<ComputationNode<T>>List to populate with input computation nodes.
Returns
- ComputationNode<T>
The output computation node representing the convolution operation.
Remarks
This method constructs a computation graph representation of the convolutional layer by: 1. Validating input parameters and layer configuration 2. Creating a symbolic input node with proper batch dimension 3. Creating constant nodes for kernels and biases 4. Applying Conv2D operation 5. Applying activation function if configured
For Beginners: This method converts the convolutional layer into a computation graph for JIT compilation.
The computation graph describes:
- Input: A symbolic tensor with shape [1, InputDepth, Height, Width]
- Kernels: The learned filters [OutputDepth, InputDepth, KernelSize, KernelSize]
- Operation: 2D convolution with specified stride and padding
- Activation: Applied to the convolution output
- Output: Feature maps with shape [1, OutputDepth, OutputHeight, OutputWidth]
JIT compilation can make inference 5-10x faster by optimizing this graph into native code.
Forward(Tensor<T>)
Processes the input data through the convolutional layer.
public override Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>The input tensor to process, with shape [batchSize, inputDepth, height, width].
Returns
- Tensor<T>
The output tensor after convolution and activation, with shape [batchSize, outputDepth, outputHeight, outputWidth].
Remarks
This method performs the forward pass of the convolutional layer. For each position of the kernel on the input data, it computes the element-wise product of the kernel weights and the corresponding input values, sums the results, adds the bias, and applies the activation function. The result is a tensor where each channel represents the activation of a different filter.
For Beginners: This method applies the pattern detectors to your input data.
During the forward pass:
- Each pattern detector (kernel) slides across the input
- At each position, it looks for its specific pattern
- If it finds a match, it produces a high value in the output
- The activation function then adjusts these values
Think of it like a series of spotlights scanning across your data, each one lighting up when it finds the pattern it's looking for. The result shows where each pattern was found in the input.
ForwardGpu(params IGpuTensor<T>[])
Performs a GPU-resident forward pass using fused Conv2D + Bias + Activation.
public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)
Parameters
inputsIGpuTensor<T>[]
Returns
- IGpuTensor<T>
GPU-resident output tensor.
Remarks
For Beginners: This is the GPU-optimized version of the Forward method. All data stays on the GPU throughout the computation, avoiding expensive CPU-GPU transfers.
GetBiases()
Gets the biases tensor of the convolutional layer.
public override Tensor<T> GetBiases()
Returns
- Tensor<T>
The bias values added to each output channel.
GetFilters()
Gets a value indicating whether this layer supports training through backpropagation.
public Tensor<T> GetFilters()
Returns
- Tensor<T>
Always returns
truefor convolutional layers, as they contain trainable parameters.
Remarks
This property indicates whether the layer can be trained through backpropagation. Convolutional layers have trainable parameters (kernel weights and biases), so they support training.
For Beginners: This property tells you if the layer can learn from data.
For convolutional layers:
- The value is always true
- This means the layer can adjust its pattern detectors (filters) during training
- It will improve its pattern recognition as it processes more data
GetParameterGradients()
Gets all parameter gradients of the layer as a single vector.
public override Vector<T> GetParameterGradients()
Returns
- Vector<T>
A vector containing all parameter gradients (kernel gradients followed by bias gradients).
GetParameters()
Gets all trainable parameters of the layer as a single vector.
public override Vector<T> GetParameters()
Returns
- Vector<T>
A vector containing all trainable parameters.
Remarks
This abstract method must be implemented by derived classes to provide access to all trainable parameters of the layer as a single vector. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights.
For Beginners: This method collects all the learnable values from the layer.
The parameters:
- Are the numbers that the neural network learns during training
- Include weights, biases, and other learnable values
- Are combined into a single long list (vector)
This is useful for:
- Saving the model to disk
- Loading parameters from a previously trained model
- Advanced optimization techniques that need access to all parameters
ResetState()
Resets the internal state of the layer.
public override void ResetState()
Remarks
This method clears the cached input and output values from the most recent forward pass. This is useful when starting to process a new sequence or when implementing stateful layers.
For Beginners: This method clears the layer's memory to start fresh.
When resetting the state:
- The layer forgets the last input it processed
- It forgets the last output it produced
This is useful for:
- Processing a new, unrelated set of data
- Preventing information from one batch affecting another
- Starting a new training episode
Think of it like wiping a whiteboard clean before starting a new calculation.
Serialize(BinaryWriter)
Saves the layer's configuration and parameters to a binary writer.
public override void Serialize(BinaryWriter writer)
Parameters
writerBinaryWriterThe binary writer to save to.
Remarks
This method saves the layer's configuration (input depth, output depth, kernel size, stride, padding) and parameters (kernel weights and biases) to a binary writer. This allows the layer to be saved to a file and loaded later.
For Beginners: This method saves all the layer's settings and learned patterns to a file.
When saving a layer:
- First, it saves the basic configuration (size, stride, etc.)
- Then it saves all the learned pattern detectors (kernels)
- Finally, it saves the bias values
This allows you to:
- Save a trained model to use later
- Share your trained model with others
- Store multiple versions of your model
Think of it like taking a snapshot of everything the model has learned.
SetParameters(Vector<T>)
Sets all trainable parameters of the layer from a single vector.
public override void SetParameters(Vector<T> parameters)
Parameters
parametersVector<T>A vector containing all parameters to set.
Remarks
This method sets all trainable parameters (kernel weights and biases) of the layer from a single vector. The vector must have the exact length required for all parameters of the layer.
For Beginners: This method updates all the layer's learned values at once.
When setting parameters:
- The vector must have exactly the right number of values
- The values are assigned to the kernels and biases in a specific order
This is useful for:
- Loading a previously saved model
- Copying parameters from another model
- Setting parameters that were optimized externally
It's like replacing all the "knowledge" in the layer with new information.
Exceptions
- ArgumentException
Thrown when the parameters vector has incorrect length.
UpdateParameters(T)
Updates the layer's parameters (kernel weights and biases) using the specified learning rate.
public override void UpdateParameters(T learningRate)
Parameters
learningRateTThe learning rate to use for the update.
Remarks
This method updates the layer's parameters (kernel weights and biases) based on the gradients calculated during the backward pass. The learning rate controls the step size of the update, with a smaller learning rate resulting in smaller, more cautious updates.
For Beginners: This method applies the lessons learned during training.
When updating parameters:
- The learning rate controls how big each adjustment is
- Small learning rate = small, careful changes
- Large learning rate = big, faster changes (but might overshoot)
Think of it like adjusting your position in a game:
- If you're far from the target, you might take big steps
- As you get closer, you take smaller, more precise steps
The learning rate helps balance between learning quickly and learning accurately.