Class PrimaryCapsuleLayer<T>

Namespace: AiDotNet.NeuralNetworks.Layers

Assembly: AiDotNet.dll

Represents a primary capsule layer for capsule networks.

public class PrimaryCapsuleLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable

Type Parameters

T: The numeric type used for calculations, typically float or double.

Inheritance: object

LayerBase<T>

PrimaryCapsuleLayer<T>

Implements: ILayer<T>

IJitCompilable<T>

IDiagnosticsProvider

IWeightLoadable<T>

IDisposable

Inherited Members: LayerBase<T>.Engine

LayerBase<T>.ScalarActivation

LayerBase<T>.VectorActivation

LayerBase<T>.UsingVectorActivation

LayerBase<T>.NumOps

LayerBase<T>.Random

LayerBase<T>.Parameters

LayerBase<T>.ParameterGradients

LayerBase<T>.InputShape

LayerBase<T>.InputShapes

LayerBase<T>.UpdateInputShape(int[])

LayerBase<T>.OutputShape

LayerBase<T>.IsTrainingMode

LayerBase<T>.InitializationStrategy

LayerBase<T>.IsInitialized

LayerBase<T>.InitializationLock

LayerBase<T>.EnsureInitialized()

LayerBase<T>.UseAutodiff

LayerBase<T>.SetTrainingMode(bool)

LayerBase<T>.GetParameterGradients()

LayerBase<T>.ClearGradients()

LayerBase<T>.GetInputShape()

LayerBase<T>.GetInputShapes()

LayerBase<T>.GetOutputShape()

LayerBase<T>.GetWeights()

LayerBase<T>.GetBiases()

LayerBase<T>.MapActivationToFused()

LayerBase<T>.SupportsGpuTraining

LayerBase<T>.CanExecuteOnGpu

LayerBase<T>.CanTrainOnGpu

LayerBase<T>.BackwardGpu(IGpuTensor<T>)

LayerBase<T>.UpdateParametersGpu(IGpuOptimizerConfig)

LayerBase<T>.UploadWeightsToGpu()

LayerBase<T>.DownloadWeightsFromGpu()

LayerBase<T>.ZeroGradientsGpu()

LayerBase<T>.GetActivationTypes()

LayerBase<T>.Forward(params Tensor<T>[])

LayerBase<T>.ApplyActivation(Tensor<T>)

LayerBase<T>.ApplyActivation(Vector<T>)

LayerBase<T>.ActivateTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ActivateTensor(IVectorActivationFunction<T>, Tensor<T>)

LayerBase<T>.CalculateInputShape(int, int, int)

LayerBase<T>.CalculateOutputShape(int, int, int)

LayerBase<T>.Clone()

LayerBase<T>.DerivativeTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ApplyActivationDerivative(T, T)

LayerBase<T>.ApplyActivationDerivative(Tensor<T>, Tensor<T>)

LayerBase<T>.ComputeActivationJacobian(Vector<T>)

LayerBase<T>.ApplyActivationDerivative(Vector<T>, Vector<T>)

LayerBase<T>.UpdateParameters(Vector<T>)

LayerBase<T>.ParameterCount

LayerBase<T>.Serialize(BinaryWriter)

LayerBase<T>.Deserialize(BinaryReader)

LayerBase<T>.GetDiagnostics()

LayerBase<T>.ApplyActivationToGraph(ComputationNode<T>)

LayerBase<T>.CanActivationBeJitted()

LayerBase<T>.RegisterTrainableParameter(Tensor<T>, PersistentTensorRole)

LayerBase<T>.InvalidateTrainableParameter(Tensor<T>)

LayerBase<T>.HasGpuActivation()

LayerBase<T>.ApplyActivationForwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.ApplyActivationBackwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.GetFusedActivationType()

LayerBase<T>.ApplyGpuActivation(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int, FusedActivationType)

LayerBase<T>.ApplyGpuActivationBackward(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int, FusedActivationType, float)

LayerBase<T>.Dispose()

LayerBase<T>.Dispose(bool)

LayerBase<T>.WeightParameterName

LayerBase<T>.BiasParameterName

LayerBase<T>.SetWeights(Tensor<T>)

LayerBase<T>.SetBiases(Tensor<T>)

LayerBase<T>.GetParameterNames()

LayerBase<T>.TryGetParameter(string, out Tensor<T>)

LayerBase<T>.SetParameter(string, Tensor<T>)

LayerBase<T>.GetParameterShape(string)

LayerBase<T>.NamedParameterCount

LayerBase<T>.ValidateWeights(IEnumerable<string>, Func<string, string>)

LayerBase<T>.LoadWeights(Dictionary<string, Tensor<T>>, Func<string, string>, bool)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

The PrimaryCapsuleLayer is the first layer in a capsule network that transforms traditional scalar feature maps into capsule vectors. It performs a convolution operation followed by reshaping the output into capsules. Each capsule represents a group of neurons that encodes both the presence and properties of a particular entity. This layer serves as a bridge between standard convolutional layers and higher-level capsule layers.

For Beginners: This layer is the first step in creating a capsule network.

In traditional neural networks, each neuron outputs a single number indicating the presence of a feature. In capsule networks, neurons are grouped into "capsules" where each capsule outputs a vector:

The length of the vector represents the presence of an entity
The orientation of the vector represents properties of that entity

Think of it like this:

Standard neurons: "I see a nose with 90% confidence"
Capsule neurons: "I see a nose with 90% confidence, and it's pointing 30° to the left, it's 2cm long, it has a slightly curved shape..."

The primary capsule layer converts traditional feature maps (from convolutional layers) into these vector-based capsules that can capture more detailed information about the entities detected.

This approach helps the network understand spatial relationships and maintain information about pose, orientation, and other properties that are typically lost in traditional networks.

Constructors

PrimaryCapsuleLayer(int, int, int, int, int, IActivationFunction<T>?)

Initializes a new instance of the PrimaryCapsuleLayer<T> class with the specified parameters and a scalar activation function.

public PrimaryCapsuleLayer(int inputChannels, int capsuleChannels, int capsuleDimension, int kernelSize, int stride, IActivationFunction<T>? scalarActivation = null)

Parameters

inputChannels int: The number of input channels.
capsuleChannels int: The number of capsule channels.
capsuleDimension int: The dimension of each capsule.
kernelSize int: The size of the convolutional kernel.
stride int: The stride of the convolution operation.
scalarActivation IActivationFunction<T>: The activation function to apply after processing. Defaults to Squash if not specified.

Remarks

This constructor creates a PrimaryCapsuleLayer with the specified parameters. It initializes the convolution weights and biases and sets up the layer to transform input feature maps into primary capsules.

For Beginners: This constructor sets up the layer with the necessary parameters.

When creating a PrimaryCapsuleLayer, you need to specify:

inputChannels: How many channels your input has (e.g., 3 for RGB images, or more if from a conv layer)
capsuleChannels: How many different types of capsules to create
capsuleDimension: How many values in each capsule's output vector
kernelSize: The size of the area examined by the convolution (e.g., 3 for a 3×3 kernel)
stride: How far to move the kernel each step
scalarActivation: The function applied to each scalar value (defaults to Squash)

For example, if you set capsuleChannels=8 and capsuleDimension=16, you'll have 8 different types of capsules, each outputting a 16-dimensional vector.

The default Squash activation function is specifically designed for capsule networks. It ensures that the length of each capsule's output vector is between 0 and 1, which is ideal for representing the probability of an entity being present.

PrimaryCapsuleLayer(int, int, int, int, int, IVectorActivationFunction<T>?)

Initializes a new instance of the PrimaryCapsuleLayer<T> class with the specified parameters and a vector activation function.

public PrimaryCapsuleLayer(int inputChannels, int capsuleChannels, int capsuleDimension, int kernelSize, int stride, IVectorActivationFunction<T>? vectorActivation = null)

Parameters

inputChannels int: The number of input channels.
capsuleChannels int: The number of capsule channels.
capsuleDimension int: The dimension of each capsule.
kernelSize int: The size of the convolutional kernel.
stride int: The stride of the convolution operation.
vectorActivation IVectorActivationFunction<T>: The vector activation function to apply after processing. Defaults to Squash if not specified.

Remarks

This constructor creates a PrimaryCapsuleLayer with the specified parameters and a vector activation function. A vector activation function operates on entire capsule vectors rather than individual elements.

For Beginners: This constructor is similar to the other one, but uses a vector-based activation function.

A vector activation function:

Operates on entire capsule vectors at once, rather than one value at a time
Can better preserve the relationship between values in a capsule
The default Squash function ensures vector lengths are between 0 and 1

The Squash function is specifically designed for capsule networks. It scales vectors non-linearly so that small vectors shrink to nearly zero length, while large vectors shrink to have a length slightly below 1, preserving their direction.

Properties

SupportsGpuExecution

Gets whether this layer has a GPU execution implementation for inference.

protected override bool SupportsGpuExecution { get; }

Property Value

bool

Remarks

Override this to return true when the layer implements ForwardGpu(params IGpuTensor<T>[]). The actual CanExecuteOnGpu property combines this with engine availability.

For Beginners: This flag indicates if the layer has GPU code for the forward pass. Set this to true in derived classes that implement ForwardGpu.

SupportsJitCompilation

Gets whether this layer supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool: True if the layer can be JIT compiled, false otherwise.

Remarks

This property indicates whether the layer has implemented ExportComputationGraph() and can benefit from JIT compilation. All layers MUST implement this property.

For Beginners: JIT compilation can make inference 5-10x faster by converting the layer's operations into optimized native code.

Layers should return false if they:

Have not yet implemented a working ExportComputationGraph()
Use dynamic operations that change based on input data
Are too simple to benefit from JIT compilation

When false, the layer will use the standard Forward() method instead.

SupportsTraining

Gets a value indicating whether this layer supports training.

public override bool SupportsTraining { get; }

Property Value

bool: Always true because the PrimaryCapsuleLayer has trainable parameters.

Remarks

This property indicates that PrimaryCapsuleLayer can be trained through backpropagation. The layer has trainable parameters (convolution weights and biases) that are updated during training to optimize the capsule transformation process.

For Beginners: This property tells you that this layer can learn from data.

A value of true means:

The layer has internal values (weights and biases) that change during training
It will improve its performance as it sees more data
It learns to extract better capsule representations from the input

During training, the layer learns to transform input features into capsule vectors that best represent the entities in the data.

Methods

Backward(Tensor<T>)

Performs the backward pass of the primary capsule layer.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>: The gradient of the loss with respect to the layer's output.

Returns

Tensor<T>: The gradient of the loss with respect to the layer's input.

Remarks

This method implements the backward pass of the primary capsule layer, which is used during training to propagate error gradients back through the network. It computes the gradients of the convolution weights and biases, as well as the gradient with respect to the input tensor. The computed weight and bias gradients are stored for later use in the parameter update step.

For Beginners: This method calculates how all parameters should change to reduce errors.

During the backward pass:

The layer receives gradients indicating how the output capsules should change
It calculates how each weight, bias, and input value should change
These gradients are used later to update the parameters during training

This process involves:

Applying the derivative of the activation function
For each location in the output:
- Extracting the corresponding input patch
- Computing the gradients for weights and biases
- Computing the gradients for the input
Aggregating all the gradients

This allows the layer to learn how to better transform input features into meaningful capsule representations.

Exceptions

InvalidOperationException: Thrown when backward is called before forward.

ExportComputationGraph(List<ComputationNode<T>>)

Exports the layer's computation graph for JIT compilation.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>: List to populate with input computation nodes.

Returns

ComputationNode<T>: The output computation node representing the layer's operation.

Remarks

This method constructs a computation graph representation of the layer's forward pass that can be JIT compiled for faster inference. All layers MUST implement this method to support JIT compilation.

For Beginners: JIT (Just-In-Time) compilation converts the layer's operations into optimized native code for 5-10x faster inference.

To support JIT compilation, a layer must:

Implement this method to export its computation graph
Set SupportsJitCompilation to true
Use ComputationNode and TensorOperations to build the graph

All layers are required to implement this method, even if they set SupportsJitCompilation = false.

Forward(Tensor<T>)

Performs the forward pass of the primary capsule layer.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>: The input tensor to process.

Returns

Tensor<T>: The output tensor containing capsule vectors.

Remarks

This method implements the forward pass of the primary capsule layer. It performs a convolution operation on the input tensor, reshapes the result into capsule vectors, and applies the activation function to produce the final output. The input and output tensors are cached for use during the backward pass.

For Beginners: This method transforms the input features into capsule vectors.

During the forward pass:

The method applies a convolution operation to the input (similar to a standard convolutional layer)
It reshapes the result into groups of vectors (the capsules)
It applies the activation function (typically Squash) to each capsule vector

The output is a set of capsule vectors where:

Each capsule vector's length represents the probability of detecting an entity
The orientation of the vector represents properties of the detected entity

This is the key difference from traditional neural networks - instead of just detecting if something is present, the capsules also capture information about the properties of what they detect.

ForwardGpu(params IGpuTensor<T>[])

Performs GPU-accelerated forward pass through the primary capsule layer.

public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)

Parameters

inputs IGpuTensor<T>[]: GPU-resident input tensors.

Returns

IGpuTensor<T>: GPU-resident output tensor after capsule transformation.

Remarks

This method implements the forward pass using GPU-resident operations where possible. The convolution and reshape operations are kept on GPU for efficiency.

GetParameters()

Gets all trainable parameters from the primary capsule layer as a single vector.

public override Vector<T> GetParameters()

Returns

Vector<T>: A vector containing all trainable parameters.

Remarks

This method retrieves all trainable parameters from the layer as a single vector. It concatenates the convolution weights and biases into a single vector. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights.

For Beginners: This method collects all the learnable values in the layer.

The parameters:

Are the numbers that the neural network learns during training
Include all the weights and biases from this layer
Are combined into a single long list (vector)

This is useful for:

Saving the model to disk
Loading parameters from a previously trained model
Advanced optimization techniques that need access to all parameters

The method carefully arranges all parameters in a specific order so they can be correctly restored later.

ResetState()

Resets the internal state of the primary capsule layer.

public override void ResetState()

Remarks

This method resets the internal state of the primary capsule layer, including the cached inputs, outputs, and gradients. This is useful when starting to process a new sequence or batch of data, or when implementing stateful networks.

For Beginners: This method clears the layer's memory to start fresh.

When resetting the state:

Stored inputs and outputs from previous processing are cleared
All calculated gradients are cleared
The layer forgets any information from previous data batches

This is important for:

Processing a new, unrelated batch of data
Ensuring clean state before a new training epoch
Preventing information from one batch affecting another

Resetting state helps ensure that each forward and backward pass is independent, which is important for correct behavior in many neural network architectures.

SetParameters(Vector<T>)

Sets the trainable parameters for the primary capsule layer.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>: A vector containing all parameters to set.

Remarks

This method sets all trainable parameters of the layer from a single vector. It extracts the appropriate portions of the input vector for each parameter (convolution weights and biases). This is useful for loading saved model weights or for implementing optimization algorithms that operate on all parameters at once.

For Beginners: This method updates all the learnable values in the layer.

When setting parameters:

The input must be a vector with the correct length
The method extracts portions for each weight matrix and bias vector
It places each value in its correct position

This is useful for:

Loading a previously saved model
Transferring parameters from another model
Testing different parameter values

An error is thrown if the input vector doesn't have the expected number of parameters, ensuring that all matrices and vectors maintain their correct dimensions.

Exceptions

ArgumentException: Thrown when the parameters vector has incorrect length.

UpdateParameters(T)

Updates the parameters of the primary capsule layer using the calculated gradients.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T: The learning rate to use for the parameter updates.

Remarks

This method updates the convolution weights and biases of the layer based on the gradients calculated during the backward pass. The learning rate controls the size of the parameter updates.

For Beginners: This method updates all the layer's weights and biases during training.

After the backward pass calculates how parameters should change, this method:

Takes each weight and bias
Subtracts the corresponding gradient scaled by the learning rate
This moves the parameters in the direction that reduces errors

The learning rate controls how big each update step is:

Smaller learning rates mean slower but more stable learning
Larger learning rates mean faster but potentially unstable learning

This is how the layer gradually improves its ability to transform inputs into meaningful capsule representations over many training iterations.

Exceptions

InvalidOperationException: Thrown when UpdateParameters is called before Backward.

Table of Contents

Class PrimaryCapsuleLayer<T>

Type Parameters

Remarks

Constructors

PrimaryCapsuleLayer(int, int, int, int, int, IActivationFunction<T>?)

Parameters

Remarks

PrimaryCapsuleLayer(int, int, int, int, int, IVectorActivationFunction<T>?)

Parameters

Remarks

Properties

SupportsGpuExecution

Property Value

Remarks

SupportsJitCompilation

Property Value

Remarks

SupportsTraining

Property Value

Remarks

Methods

Backward(Tensor<T>)

Parameters

Returns

Remarks

Exceptions

ExportComputationGraph(List<ComputationNode<T>>)

Parameters

Returns

Remarks

Forward(Tensor<T>)

Parameters

Returns

Remarks

ForwardGpu(params IGpuTensor<T>[])

Parameters

Returns

Remarks

GetParameters()

Returns

Remarks

ResetState()

Remarks

SetParameters(Vector<T>)

Parameters

Remarks

Exceptions

UpdateParameters(T)

Parameters

Remarks

Exceptions