Class DeconvolutionalLayer<T>

Namespace: AiDotNet.NeuralNetworks.Layers

Assembly: AiDotNet.dll

Represents a deconvolutional layer (also known as transposed convolution) in a neural network.

public class DeconvolutionalLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable

Type Parameters

T: The numeric type used for calculations, typically float or double.

Inheritance: object

LayerBase<T>

DeconvolutionalLayer<T>

Implements: ILayer<T>

IJitCompilable<T>

IDiagnosticsProvider

IWeightLoadable<T>

IDisposable

Inherited Members: LayerBase<T>.Engine

LayerBase<T>.ScalarActivation

LayerBase<T>.VectorActivation

LayerBase<T>.UsingVectorActivation

LayerBase<T>.NumOps

LayerBase<T>.Random

LayerBase<T>.Parameters

LayerBase<T>.ParameterGradients

LayerBase<T>.InputShape

LayerBase<T>.InputShapes

LayerBase<T>.UpdateInputShape(int[])

LayerBase<T>.OutputShape

LayerBase<T>.IsTrainingMode

LayerBase<T>.InitializationStrategy

LayerBase<T>.IsInitialized

LayerBase<T>.InitializationLock

LayerBase<T>.EnsureInitialized()

LayerBase<T>.UseAutodiff

LayerBase<T>.SetTrainingMode(bool)

LayerBase<T>.GetParameterGradients()

LayerBase<T>.ClearGradients()

LayerBase<T>.GetInputShape()

LayerBase<T>.GetInputShapes()

LayerBase<T>.GetOutputShape()

LayerBase<T>.GetWeights()

LayerBase<T>.GetBiases()

LayerBase<T>.MapActivationToFused()

LayerBase<T>.SupportsGpuTraining

LayerBase<T>.CanExecuteOnGpu

LayerBase<T>.CanTrainOnGpu

LayerBase<T>.UpdateParametersGpu(IGpuOptimizerConfig)

LayerBase<T>.UploadWeightsToGpu()

LayerBase<T>.DownloadWeightsFromGpu()

LayerBase<T>.ZeroGradientsGpu()

LayerBase<T>.GetActivationTypes()

LayerBase<T>.Forward(params Tensor<T>[])

LayerBase<T>.ApplyActivation(Tensor<T>)

LayerBase<T>.ApplyActivation(Vector<T>)

LayerBase<T>.ActivateTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ActivateTensor(IVectorActivationFunction<T>, Tensor<T>)

LayerBase<T>.CalculateInputShape(int, int, int)

LayerBase<T>.CalculateOutputShape(int, int, int)

LayerBase<T>.Clone()

LayerBase<T>.DerivativeTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ApplyActivationDerivative(T, T)

LayerBase<T>.ApplyActivationDerivative(Tensor<T>, Tensor<T>)

LayerBase<T>.ComputeActivationJacobian(Vector<T>)

LayerBase<T>.ApplyActivationDerivative(Vector<T>, Vector<T>)

LayerBase<T>.UpdateParameters(Vector<T>)

LayerBase<T>.ParameterCount

LayerBase<T>.Serialize(BinaryWriter)

LayerBase<T>.Deserialize(BinaryReader)

LayerBase<T>.GetDiagnostics()

LayerBase<T>.ApplyActivationToGraph(ComputationNode<T>)

LayerBase<T>.CanActivationBeJitted()

LayerBase<T>.RegisterTrainableParameter(Tensor<T>, PersistentTensorRole)

LayerBase<T>.InvalidateTrainableParameter(Tensor<T>)

LayerBase<T>.HasGpuActivation()

LayerBase<T>.ApplyActivationForwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.ApplyActivationBackwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.GetFusedActivationType()

LayerBase<T>.ApplyGpuActivation(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int, FusedActivationType)

LayerBase<T>.ApplyGpuActivationBackward(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int, FusedActivationType, float)

LayerBase<T>.Dispose()

LayerBase<T>.Dispose(bool)

LayerBase<T>.WeightParameterName

LayerBase<T>.BiasParameterName

LayerBase<T>.SetWeights(Tensor<T>)

LayerBase<T>.SetBiases(Tensor<T>)

LayerBase<T>.GetParameterNames()

LayerBase<T>.TryGetParameter(string, out Tensor<T>)

LayerBase<T>.SetParameter(string, Tensor<T>)

LayerBase<T>.GetParameterShape(string)

LayerBase<T>.NamedParameterCount

LayerBase<T>.ValidateWeights(IEnumerable<string>, Func<string, string>)

LayerBase<T>.LoadWeights(Dictionary<string, Tensor<T>>, Func<string, string>, bool)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

A deconvolutional layer performs the opposite operation of a convolutional layer. While convolution reduces spatial dimensions by applying filters, deconvolution expands spatial dimensions by applying learnable filters to upsample the input. This is particularly useful in generative models and image segmentation networks where upsampling is required.

For Beginners: A deconvolutional layer is like zooming in on an image in a smart way.

Think of it like the reverse of a convolutional layer:

A convolutional layer summarizes information (making images smaller)
A deconvolutional layer expands information (making images larger)

For example, if you have a small feature map representing "cat features," a deconvolutional layer could expand it back to a cat-shaped image.

This is particularly useful for:

Generating images from small encoded representations
Increasing the resolution of feature maps
Creating detailed outputs from simplified inputs

Applications include image generation, super-resolution, and segmentation tasks where you need to expand the spatial dimensions of your data.

Constructors

DeconvolutionalLayer(int[], int, int, int, int, IActivationFunction<T>?)

Initializes a new instance of the DeconvolutionalLayer<T> class with the specified parameters and a scalar activation function.

public DeconvolutionalLayer(int[] inputShape, int outputDepth, int kernelSize, int stride = 1, int padding = 0, IActivationFunction<T>? activationFunction = null)

Parameters

inputShape int[]: The shape of the input data.
outputDepth int: The number of output channels to create.
kernelSize int: The size of each filter kernel (width and height).
stride int: The step size for positioning the kernel. Defaults to 1.
padding int: The amount of padding to apply. Defaults to 0.
activationFunction IActivationFunction<T>: The activation function to apply. Defaults to ReLU if not specified.

Remarks

This constructor creates a deconvolutional layer with the specified configuration. The output shape is calculated based on the input shape, kernel size, stride, and padding. The kernels and biases are initialized with scaled random values.

For Beginners: This setup method creates a new deconvolutional layer with specific settings.

When creating the layer, you specify:

Input details: The shape of your data
How many output channels to create (outputDepth)
How big each pattern generator is (kernelSize)
How much enlargement to apply (stride)
How to adjust the exact output size (padding)
What mathematical function to apply to the results (activation)

The layer then creates all the necessary pattern generators with random starting values that will be improved during training.

DeconvolutionalLayer(int[], int, int, int, int, IVectorActivationFunction<T>?)

Initializes a new instance of the DeconvolutionalLayer<T> class with the specified parameters and a vector activation function.

public DeconvolutionalLayer(int[] inputShape, int outputDepth, int kernelSize, int stride = 1, int padding = 0, IVectorActivationFunction<T>? vectorActivationFunction = null)

Parameters

inputShape int[]: The shape of the input data.
outputDepth int: The number of output channels to create.
kernelSize int: The size of each filter kernel (width and height).
stride int: The step size for positioning the kernel. Defaults to 1.
padding int: The amount of padding to apply. Defaults to 0.
vectorActivationFunction IVectorActivationFunction<T>: The vector activation function to apply. Defaults to ReLU if not specified.

Remarks

This constructor creates a deconvolutional layer with the specified configuration and a vector activation function, which operates on entire vectors rather than individual elements. This can be useful when applying more complex activation functions or when performance is a concern.

For Beginners: This setup method is similar to the previous one, but uses a different type of activation function.

A vector activation function:

Works on entire groups of numbers at once
Can be more efficient for certain types of calculations
Otherwise works the same as the regular activation function

You would choose this option if you have a specific mathematical operation that needs to be applied to groups of outputs rather than individual values.

Properties

InputDepth

Gets the depth (number of channels) of the input data.

public int InputDepth { get; }

Property Value

int

Remarks

The input depth represents the number of feature channels in the input data. In a neural network, this typically corresponds to the number of features or patterns detected by previous layers.

For Beginners: Input depth is the number of different features in your input data.

Think of it like:

The number of different patterns the previous layer detected
The number of "aspects" of the data you're working with

For example, in a deep network, the input depth might be 64 or 128, representing many different detected features.

KernelSize

Gets the size of each filter (kernel) used in the deconvolution operation.

public int KernelSize { get; }

Property Value

int

Remarks

The kernel size determines the area of the output that is influenced by each input value. A larger kernel size means each input value affects a larger area of the output, potentially creating more detailed or smooth upsampling.

For Beginners: Kernel size is how big each "pattern generator" is.

For example:

A kernel size of 3 means a 3×3 grid (9 weights)
A kernel size of 5 means a 5×5 grid (25 weights)

Larger kernels:

Can create more complex patterns
Affect larger areas of the output
But require more computation

OutputDepth

Gets the depth (number of channels) of the output data.

public int OutputDepth { get; }

Property Value

int

Remarks

The output depth represents the number of feature channels that will be generated in the output. Each output channel is produced by a different set of kernels and captures different aspects of the upsampled data.

For Beginners: Output depth is how many different types of patterns this layer will create.

For example:

If output depth is 3, the layer might generate RGB color channels
If output depth is 32, the layer creates 32 different feature maps

A higher number usually means more detailed or varied outputs, but also requires more processing power.

Padding

Gets the amount of padding applied during the deconvolution operation.

public int Padding { get; }

Property Value

int

Remarks

In deconvolution, padding actually reduces the output size. This might seem counterintuitive, but it allows for more control over the exact output dimensions.

For Beginners: Padding in deconvolution works differently than in convolution.

In deconvolution:

More padding makes the output smaller
Zero padding means maximum enlargement
It helps control the exact output size

This is the opposite of regular convolution, where padding makes outputs larger.

Stride

Gets the step size for positioning the kernel across the output data.

public int Stride { get; }

Property Value

int

Remarks

In deconvolution, the stride determines how much the output size increases relative to the input. A stride of 2 typically doubles the spatial dimensions, while a stride of 1 increases them by a smaller amount.

For Beginners: Stride controls how much upsampling (enlargement) happens.

Think of it like:

Stride of 1: Minimal enlargement
Stride of 2: Roughly doubles the size
Stride of 4: Roughly quadruples the size

For example, if your input is 16×16 pixels and you use a stride of 2, the output might be around 32×32 pixels (the exact size depends on other factors too).

SupportsGpuExecution

Gets a value indicating whether this layer supports GPU execution.

protected override bool SupportsGpuExecution { get; }

Property Value

bool

SupportsJitCompilation

Gets whether this layer supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool: True if the layer can be JIT compiled, false otherwise.

Remarks

This property indicates whether the layer has implemented ExportComputationGraph() and can benefit from JIT compilation. All layers MUST implement this property.

For Beginners: JIT compilation can make inference 5-10x faster by converting the layer's operations into optimized native code.

Layers should return false if they:

Have not yet implemented a working ExportComputationGraph()
Use dynamic operations that change based on input data
Are too simple to benefit from JIT compilation

When false, the layer will use the standard Forward() method instead.

SupportsTraining

Gets a value indicating whether this layer supports training through backpropagation.

public override bool SupportsTraining { get; }

Property Value

bool: Always returns true for deconvolutional layers, as they contain trainable parameters.

Remarks

This property indicates whether the layer can be trained through backpropagation. Deconvolutional layers have trainable parameters (kernel weights and biases), so they support training.

For Beginners: This property tells you if the layer can learn from data.

For deconvolutional layers:

The value is always true
This means the layer can adjust its pattern generators (filters) during training
It will improve its upsampling abilities as it processes more data

Methods

Backward(Tensor<T>)

Calculates gradients for the input, kernels, and biases during backpropagation.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>: The gradient of the loss with respect to the layer's output.

Returns

Tensor<T>: The gradient of the loss with respect to the layer's input.

Remarks

This method performs the backward pass of the deconvolutional layer during training. It calculates the gradient of the loss with respect to the input, kernel weights, and biases. The calculated input gradient is returned for propagation to earlier layers.

For Beginners: This method helps the layer learn from its mistakes.

During the backward pass:

The layer receives information about how wrong its output was
It calculates how to adjust its pattern generators to be more accurate
It prepares the gradients for updating kernels and biases
It passes information back to previous layers so they can learn too

This is where the actual "learning" happens. The layer figures out how to adjust all its internal values to make better outputs next time.

Exceptions

InvalidOperationException: Thrown when backward is called before forward.

BackwardGpu(IGpuTensor<T>)

Performs a GPU-resident backward pass computing gradients for input, kernels, and biases.

public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)

Parameters

outputGradient IGpuTensor<T>: GPU-resident gradient from the next layer.

Returns

IGpuTensor<T>: GPU-resident gradient with respect to the layer's input.

ExportComputationGraph(List<ComputationNode<T>>)

Exports the layer's computation graph for JIT compilation.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>: List to populate with input computation nodes.

Returns

ComputationNode<T>: The output computation node representing the layer's operation.

Remarks

This method constructs a computation graph representation of the layer's forward pass that can be JIT compiled for faster inference. All layers MUST implement this method to support JIT compilation.

For Beginners: JIT (Just-In-Time) compilation converts the layer's operations into optimized native code for 5-10x faster inference.

To support JIT compilation, a layer must:

Implement this method to export its computation graph
Set SupportsJitCompilation to true
Use ComputationNode and TensorOperations to build the graph

All layers are required to implement this method, even if they set SupportsJitCompilation = false.

Forward(Tensor<T>)

Processes the input data through the deconvolutional layer.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>: The input tensor to process.

Returns

Tensor<T>: The output tensor after deconvolution and activation.

Remarks

This method performs the forward pass of the deconvolutional layer. For each position in the output, it computes the contribution from all relevant input positions, multiplied by the appropriate kernel weights. The results are summed, the bias is added, and the activation function is applied.

For Beginners: This method enlarges the input data using learned patterns.

During the forward pass:

Each value in the input helps create a region in the output
The pattern generators (kernels) determine what that region looks like
The layer combines all these regions to form a larger, detailed output
The activation function then adjusts these values

Think of it like painting a mural by stamping many small patterns next to each other, where each stamp design comes from your pattern generators (kernels).

ForwardGpu(params IGpuTensor<T>[])

Performs a GPU-resident forward pass using fused ConvTranspose2D + Bias + Activation.

public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)

Parameters

inputs IGpuTensor<T>[]: GPU-resident input tensor.

Returns

IGpuTensor<T>: GPU-resident output tensor.

Remarks

For Beginners: This is the GPU-optimized version of the Forward method. All data stays on the GPU throughout the computation, avoiding expensive CPU-GPU transfers.

GetParameters()

Gets all trainable parameters of the layer as a single vector.

public override Vector<T> GetParameters()

Returns

Vector<T>: A vector containing all kernel weights and biases.

Remarks

This method extracts all trainable parameters (kernel weights and biases) from the layer and returns them as a single vector. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights.

For Beginners: This method gathers all the learned values from the layer.

The parameters include:

All values from all pattern generators (kernels)
All bias values

These are combined into a single long list (vector), which can be used for:

Saving the model
Sharing parameters between layers
Advanced optimization techniques

This provides access to all the "knowledge" the layer has learned.

ResetState()

Resets the internal state of the layer.

public override void ResetState()

Remarks

This method clears the cached input and output values from the most recent forward pass, as well as the gradients calculated during the backward pass. This is useful when starting to process a new batch or when implementing stateful recurrent networks.

For Beginners: This method clears the layer's memory to start fresh.

When resetting the state:

The layer forgets the last input it processed
It forgets the last output it produced
It clears any calculated gradients

This is useful for:

Processing a new, unrelated set of data
Preventing information from one batch affecting another
Starting a new training episode

Think of it like wiping a whiteboard clean before starting a new calculation.

SetParameters(Vector<T>)

Sets all trainable parameters of the layer from a single vector.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>: A vector containing all parameters to set.

Remarks

This method sets all trainable parameters (kernel weights and biases) of the layer from a single vector. The vector must have the exact length required for all parameters of the layer.

For Beginners: This method updates all the layer's learned values at once.

When setting parameters:

The vector must have exactly the right number of values
The values are assigned to the kernels and biases in a specific order

This is useful for:

Loading a previously saved model
Copying parameters from another model
Setting parameters that were optimized externally

It's like replacing all the "knowledge" in the layer with new information.

Exceptions

ArgumentException: Thrown when the parameters vector has incorrect length.

UpdateParameters(T)

Updates the layer's parameters (kernel weights and biases) using the calculated gradients.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T: The learning rate to use for the update.

Remarks

This method updates the layer's parameters (kernel weights and biases) based on the gradients calculated during the backward pass. The learning rate controls the step size of the update.

For Beginners: This method applies the lessons learned during training.

When updating parameters:

The learning rate controls how big each adjustment is
Small learning rate = small, careful changes
Large learning rate = big, faster changes (but might overshoot)

The layer takes the gradients calculated during backward pass and uses them to update all its kernels and biases, making them slightly better for next time.

Exceptions

InvalidOperationException: Thrown when update is called before backward.

Table of Contents

Class DeconvolutionalLayer<T>

Type Parameters

Remarks

Constructors

DeconvolutionalLayer(int[], int, int, int, int, IActivationFunction<T>?)

Parameters

Remarks

DeconvolutionalLayer(int[], int, int, int, int, IVectorActivationFunction<T>?)

Parameters

Remarks

Properties

InputDepth

Property Value

Remarks

KernelSize

Property Value

Remarks

OutputDepth

Property Value

Remarks

Padding

Property Value

Remarks

Stride

Property Value

Remarks

SupportsGpuExecution

Property Value

SupportsJitCompilation

Property Value

Remarks

SupportsTraining

Property Value

Remarks

Methods

Backward(Tensor<T>)

Parameters

Returns

Remarks

Exceptions

BackwardGpu(IGpuTensor<T>)

Parameters

Returns

ExportComputationGraph(List<ComputationNode<T>>)

Parameters

Returns

Remarks

Forward(Tensor<T>)

Parameters

Returns

Remarks

ForwardGpu(params IGpuTensor<T>[])

Parameters

Returns

Remarks

GetParameters()

Returns

Remarks

ResetState()

Remarks

SetParameters(Vector<T>)

Parameters

Remarks

Exceptions

UpdateParameters(T)

Parameters

Remarks

Exceptions