Class CapsuleLayer<T>

Namespace: AiDotNet.NeuralNetworks.Layers

Assembly: AiDotNet.dll

Represents a capsule neural network layer that encapsulates groups of neurons to better preserve spatial information.

public class CapsuleLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IWeightLoadable<T>, IDisposable, IAuxiliaryLossLayer<T>, IDiagnosticsProvider

Type Parameters

T: The numeric type used for calculations, typically float or double.

Inheritance: object

LayerBase<T>

CapsuleLayer<T>

Implements: ILayer<T>

IJitCompilable<T>

IWeightLoadable<T>

IDisposable

IAuxiliaryLossLayer<T>

IDiagnosticsProvider

Inherited Members: LayerBase<T>.Engine

LayerBase<T>.ScalarActivation

LayerBase<T>.VectorActivation

LayerBase<T>.UsingVectorActivation

LayerBase<T>.NumOps

LayerBase<T>.Random

LayerBase<T>.Parameters

LayerBase<T>.ParameterGradients

LayerBase<T>.InputShape

LayerBase<T>.InputShapes

LayerBase<T>.UpdateInputShape(int[])

LayerBase<T>.OutputShape

LayerBase<T>.IsTrainingMode

LayerBase<T>.InitializationStrategy

LayerBase<T>.IsInitialized

LayerBase<T>.InitializationLock

LayerBase<T>.EnsureInitialized()

LayerBase<T>.UseAutodiff

LayerBase<T>.SetTrainingMode(bool)

LayerBase<T>.GetParameterGradients()

LayerBase<T>.ClearGradients()

LayerBase<T>.GetInputShape()

LayerBase<T>.GetInputShapes()

LayerBase<T>.GetOutputShape()

LayerBase<T>.GetWeights()

LayerBase<T>.GetBiases()

LayerBase<T>.MapActivationToFused()

LayerBase<T>.SupportsGpuTraining

LayerBase<T>.CanExecuteOnGpu

LayerBase<T>.CanTrainOnGpu

LayerBase<T>.BackwardGpu(IGpuTensor<T>)

LayerBase<T>.UpdateParametersGpu(IGpuOptimizerConfig)

LayerBase<T>.UploadWeightsToGpu()

LayerBase<T>.DownloadWeightsFromGpu()

LayerBase<T>.ZeroGradientsGpu()

LayerBase<T>.GetActivationTypes()

LayerBase<T>.Forward(params Tensor<T>[])

LayerBase<T>.ApplyActivation(Tensor<T>)

LayerBase<T>.ApplyActivation(Vector<T>)

LayerBase<T>.ActivateTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ActivateTensor(IVectorActivationFunction<T>, Tensor<T>)

LayerBase<T>.CalculateInputShape(int, int, int)

LayerBase<T>.CalculateOutputShape(int, int, int)

LayerBase<T>.Clone()

LayerBase<T>.DerivativeTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ApplyActivationDerivative(T, T)

LayerBase<T>.ApplyActivationDerivative(Tensor<T>, Tensor<T>)

LayerBase<T>.ComputeActivationJacobian(Vector<T>)

LayerBase<T>.ApplyActivationDerivative(Vector<T>, Vector<T>)

LayerBase<T>.UpdateParameters(Vector<T>)

LayerBase<T>.ParameterCount

LayerBase<T>.Serialize(BinaryWriter)

LayerBase<T>.Deserialize(BinaryReader)

LayerBase<T>.ApplyActivationToGraph(ComputationNode<T>)

LayerBase<T>.CanActivationBeJitted()

LayerBase<T>.RegisterTrainableParameter(Tensor<T>, PersistentTensorRole)

LayerBase<T>.InvalidateTrainableParameter(Tensor<T>)

LayerBase<T>.HasGpuActivation()

LayerBase<T>.ApplyActivationForwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.ApplyActivationBackwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.GetFusedActivationType()

LayerBase<T>.ApplyGpuActivation(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int, FusedActivationType)

LayerBase<T>.ApplyGpuActivationBackward(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int, FusedActivationType, float)

LayerBase<T>.Dispose()

LayerBase<T>.Dispose(bool)

LayerBase<T>.WeightParameterName

LayerBase<T>.BiasParameterName

LayerBase<T>.SetWeights(Tensor<T>)

LayerBase<T>.SetBiases(Tensor<T>)

LayerBase<T>.GetParameterNames()

LayerBase<T>.TryGetParameter(string, out Tensor<T>)

LayerBase<T>.SetParameter(string, Tensor<T>)

LayerBase<T>.GetParameterShape(string)

LayerBase<T>.NamedParameterCount

LayerBase<T>.ValidateWeights(IEnumerable<string>, Func<string, string>)

LayerBase<T>.LoadWeights(Dictionary<string, Tensor<T>>, Func<string, string>, bool)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

A capsule layer is a specialized neural network layer that groups neurons into "capsules," where each capsule represents a specific entity or feature. Unlike traditional neural networks that use scalar outputs, capsules output vectors whose length represents the probability of the entity's existence and whose orientation encodes the entity's properties. Capsule layers use dynamic routing between capsules, which helps preserve hierarchical relationships between features and improves the network's ability to recognize objects from different viewpoints.

For Beginners: A capsule layer is an advanced type of neural network layer that works differently from standard layers.

Traditional neural network layers use single numbers to represent features, but capsule layers use groups of numbers (vectors) that can capture more information:

Each "capsule" is a group of neurons that work together
The length of a capsule's output tells you how likely something exists
The direction of a capsule's output tells you about its properties (like position, size, rotation)

For example, if detecting faces in images:

A traditional network might have neurons that detect eyes, nose, mouth separately
A capsule network would understand how these parts relate to each other spatially

This helps the network recognize objects even when they're viewed from different angles or positions, which is something traditional networks struggle with.

Constructors

CapsuleLayer(int, int, int, int, int, IActivationFunction<T>?)

Initializes a new instance of the CapsuleLayer<T> class with specified dimensions and routing iterations.

public CapsuleLayer(int inputCapsules, int inputDimension, int numCapsules, int capsuleDimension, int numRoutingIterations, IActivationFunction<T>? activationFunction = null)

Parameters

inputCapsules int: The number of input capsules.
inputDimension int: The dimension of each input capsule.
numCapsules int: The number of output capsules.
capsuleDimension int: The dimension of each output capsule.
numRoutingIterations int: The number of dynamic routing iterations to perform.
activationFunction IActivationFunction<T>: The activation function to apply. Defaults to squash activation if not specified.

Remarks

This constructor creates a new capsule layer with the specified dimensions and routing parameters. It initializes the transformation matrix and bias vector with appropriate values. The transformation matrix is used to convert input capsules to output capsules, and the bias is added to each output capsule. The number of routing iterations determines how many times the dynamic routing algorithm is executed during the forward pass.

For Beginners: This constructor creates a new capsule layer with specific settings.

When creating a capsule layer, you need to specify:

How many input capsules there are (from the previous layer)
How many numbers each input capsule contains
How many output capsules you want this layer to create
How many numbers each output capsule should contain
How many routing iterations to perform (more iterations = more accurate but slower)

The "routing" is the special process that capsule networks use to determine which higher-level capsules should receive information from lower-level capsules.

Think of it like this: if you see an eye, nose, and mouth, the routing process helps decide if they should be grouped together as a face.

Properties

AuxiliaryLossWeight

Gets or sets the weight for the routing entropy auxiliary loss.

public T AuxiliaryLossWeight { get; set; }

Property Value

T

Remarks

This weight controls how much the routing entropy regularization contributes to the total loss. The total loss is: main_loss + (auxiliary_weight * entropy_loss). Typical values range from 0.001 to 0.01.

For Beginners: This controls how much the network should encourage diverse routing.

The weight determines the balance between:

Task accuracy (main loss)
Routing diversity (entropy loss)

Common values:

0.005 (default): Balanced routing diversity
0.001-0.003: Light diversity enforcement
0.008-0.01: Strong diversity enforcement

Higher values make routing more diverse but might reduce task performance. Lower values allow more deterministic routing but might lead to overconfidence.

SupportsGpuExecution

Gets whether this layer has a GPU execution implementation for inference.

protected override bool SupportsGpuExecution { get; }

Property Value

bool

Remarks

Override this to return true when the layer implements ForwardGpu(params IGpuTensor<T>[]). The actual CanExecuteOnGpu property combines this with engine availability.

For Beginners: This flag indicates if the layer has GPU code for the forward pass. Set this to true in derived classes that implement ForwardGpu.

SupportsJitCompilation

Gets a value indicating whether this layer supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool: true because CapsuleLayer uses dynamic routing with a fixed number of iterations that can be unrolled into a static computation graph.

SupportsTraining

Gets a value indicating whether this layer supports training.

public override bool SupportsTraining { get; }

Property Value

bool: Always true as capsule layers have trainable parameters.

Remarks

This property returns true, indicating that the capsule layer can be trained through backpropagation. Capsule layers contain trainable parameters (transformation matrices and biases) that are adjusted during the training process to minimize the network's error.

For Beginners: This property tells you that this layer can learn from data.

A value of true means:

The layer contains values (parameters) that will change during training
It will improve its performance as it sees more examples
It participates in the learning process of the neural network

Capsule layers always support training because they contain transformation matrices and bias values that need to be learned from data.

UseAuxiliaryLoss

Gets or sets whether auxiliary loss (routing entropy regularization) should be used during training.

public bool UseAuxiliaryLoss { get; set; }

Property Value

bool

Remarks

Routing entropy regularization encourages diversity in the routing coefficients by penalizing low entropy distributions. This prevents routing from becoming too deterministic and helps the capsule layer learn more robust features.

For Beginners: Routing regularization helps capsules make better decisions.

In capsule networks:

Routing coefficients decide how much information flows from lower to higher capsules
If routing becomes too "certain" (all weight on one capsule), it might miss important patterns
Entropy regularization encourages routing to consider multiple options

Think of it like this:

Without regularization: "This is 100% a face, ignore everything else"
With regularization: "This is probably a face (80%), but could be other things (20%)"

This helps the network:

Learn more robust features
Avoid overconfidence
Generalize better to new examples

Methods

Backward(Tensor<T>)

Performs the backward pass of the capsule layer.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>: The gradient of the loss with respect to the layer's output.

Returns

Tensor<T>: The gradient of the loss with respect to the layer's input.

Remarks

This method implements the backward pass of the capsule layer, which is used during training to propagate error gradients back through the network. It computes the gradients of the loss with respect to the layer's parameters (transformation matrix and bias) and the layer's input. The gradients are stored internally and used during the parameter update step.

For Beginners: This method is used during training to calculate how the layer's inputs and parameters should change to reduce errors.

The backward pass:

Takes in gradients (directions of improvement) from the next layer
Applies the derivative of the activation function
Calculates how much each parameter (transformation matrix and bias) contributed to the error
Calculates how the input contributed to the error, to pass gradients to the previous layer

During this process, the method:

Creates gradient tensors for the transformation matrix and bias
Uses the coupling coefficients (connection strengths) calculated during the forward pass
Produces gradients that will be used to update the parameters

This is part of the "backpropagation" algorithm that helps neural networks learn. The error flows backward through the network, and each layer determines how it should change to reduce that error.

Exceptions

InvalidOperationException: Thrown when backward is called before forward.

ComputeAuxiliaryLoss()

Computes the auxiliary loss for routing entropy regularization.

public T ComputeAuxiliaryLoss()

Returns

T: The computed routing entropy auxiliary loss.

Remarks

This method computes the entropy of the routing coefficients. Low entropy means the routing is very deterministic (concentrating on one capsule), while high entropy means it's more distributed across multiple capsules. We penalize low entropy to encourage diverse routing. Entropy: H = -Σ(p * log(p)) where p are the routing coefficients. We use negative entropy as loss since we want to maximize entropy (minimize -H).

For Beginners: This calculates how diverse the routing decisions are.

Routing entropy works by:

Looking at the routing coefficients (how information flows between capsules)
Measuring how "spread out" these coefficients are
Penalizing routing that's too concentrated on one capsule
Encouraging routing that considers multiple capsules

Entropy is a measure of uncertainty/diversity:

Low entropy: Very certain, concentrated (e.g., [0.99, 0.01, 0.00])
High entropy: Uncertain, diverse (e.g., [0.33, 0.33, 0.34])

By encouraging higher entropy, we prevent the network from becoming overconfident and help it learn more robust features that work in different situations.

ExportComputationGraph(List<ComputationNode<T>>)

Exports the layer's computation graph for JIT compilation.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>: List to populate with input computation nodes.

Returns

ComputationNode<T>: The output computation node representing the layer's operation.

Remarks

This method constructs a computation graph representation of the layer's forward pass that can be JIT compiled for faster inference. All layers MUST implement this method to support JIT compilation.

For Beginners: JIT (Just-In-Time) compilation converts the layer's operations into optimized native code for 5-10x faster inference.

To support JIT compilation, a layer must:

Implement this method to export its computation graph
Set SupportsJitCompilation to true
Use ComputationNode and TensorOperations to build the graph

All layers are required to implement this method, even if they set SupportsJitCompilation = false.

Forward(Tensor<T>)

Performs the forward pass of the capsule layer.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>: The input tensor to process.

Returns

Tensor<T>: The output tensor after capsule processing.

Remarks

This method implements the forward pass of the capsule layer, including the dynamic routing algorithm. The input capsules are first transformed using the transformation matrix. Then, dynamic routing is performed for the specified number of iterations to determine the coupling coefficients between input and output capsules. These coefficients are used to compute the weighted sum for each output capsule, which is then passed through the squash activation function to produce the final output.

For Beginners: This method processes the input data through the capsule layer.

The forward pass has several steps:

Transform input capsules using the transformation matrix
Start with equal connections between all input and output capsules
Perform dynamic routing:
- Calculate weighted sums for each output capsule
- Add the bias values
- Apply the "squash" activation function to ensure vector lengths are between 0 and 1
- Update the connection strengths based on how well input and output capsules agree
Repeat the routing process multiple times to refine the connections

The "squash" function is special to capsule networks - it preserves the direction of a vector but adjusts its length to be between 0 and 1 (representing a probability).

Dynamic routing is what makes capsule networks unique - it's how they learn to group lower-level features into higher-level concepts.

ForwardGpu(params IGpuTensor<T>[])

Performs GPU-accelerated forward pass through the capsule layer.

public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)

Parameters

inputs IGpuTensor<T>[]: GPU-resident input tensors.

Returns

IGpuTensor<T>: GPU-resident output tensor after capsule transformation.

Remarks

This method implements the forward pass using dedicated GPU kernels for capsule transformation and dynamic routing. All operations stay GPU-resident for maximum performance.

GetAuxiliaryLossDiagnostics()

Gets diagnostic information about the routing entropy auxiliary loss.

public Dictionary<string, string> GetAuxiliaryLossDiagnostics()

Returns

Dictionary<string, string>: A dictionary containing diagnostic information about routing regularization.

Remarks

This method returns detailed diagnostics about the routing entropy regularization, including the computed entropy loss, weight applied, and whether the feature is enabled. This information is useful for monitoring training progress and debugging.

For Beginners: This provides information about how routing regularization is working.

The diagnostics include:

Total routing entropy loss (how concentrated routing is)
Weight applied to the entropy loss
Whether routing regularization is enabled
Number of routing iterations being used

This helps you:

Monitor if routing is becoming too deterministic
Debug issues with capsule layer learning
Understand the impact of entropy regularization on routing

You can use this information to adjust the auxiliary loss weight or routing iterations for better results.

GetDiagnostics()

Gets diagnostic information about this component's state and behavior. Overrides GetDiagnostics() to include auxiliary loss diagnostics.

public override Dictionary<string, string> GetDiagnostics()

Returns

Dictionary<string, string>: A dictionary containing diagnostic metrics including both base layer diagnostics and auxiliary loss diagnostics from GetAuxiliaryLossDiagnostics().

GetParameters()

Gets all trainable parameters from the layer as a single vector.

public override Vector<T> GetParameters()

Returns

Vector<T>: A vector containing all trainable parameters.

Remarks

This method retrieves all trainable parameters from the layer and combines them into a single vector. It flattens the transformation matrix and concatenates it with the bias vector. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights.

For Beginners: This method collects all the learnable values from the layer into a single list.

The parameters:

Include all values from the transformation matrix and bias
Are combined into a single long list (vector)
Represent everything this layer has learned

This is useful for:

Saving the model to disk
Loading parameters from a previously trained model
Advanced optimization techniques that need access to all parameters

ResetState()

Resets the internal state of the capsule layer.

public override void ResetState()

Remarks

This method resets the internal state of the capsule layer, including the cached inputs, outputs, coupling coefficients, and gradients. This is useful when starting to process a new sequence or batch after training on a previous one.

For Beginners: This method clears the layer's temporary memory to start fresh.

When resetting the state:

Stored inputs and outputs are cleared
Calculated gradients are cleared
Coupling coefficients (connection strengths) are cleared
The layer forgets any information from previous batches

This is important for:

Processing a new, unrelated batch of data
Preventing information from one batch affecting another
Preparing the layer for a new training step

Note that this doesn't reset the learned parameters (transformation matrix and bias), just the temporary information used during a single forward/backward pass.

SetParameters(Vector<T>)

Sets the trainable parameters for the layer.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>: A vector containing all parameters to set.

Remarks

This method sets the trainable parameters for the layer from a single vector. It extracts the appropriate portions of the input vector for the transformation matrix and bias. This is useful for loading saved model weights or for implementing optimization algorithms that operate on all parameters at once.

For Beginners: This method updates all the learnable values in the layer.

When setting parameters:

The input must be a vector with the correct length
The first part of the vector is used for the transformation matrix
The second part of the vector is used for the bias

This is useful for:

Loading a previously saved model
Transferring parameters from another model
Testing different parameter values

An error is thrown if the input vector doesn't have the expected number of parameters.

Exceptions

ArgumentException: Thrown when the parameters vector has incorrect length.

UpdateParameters(T)

Updates the layer's parameters using the calculated gradients.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T: The learning rate to use for the parameter updates.

Remarks

This method updates the layer's parameters (transformation matrix and bias) based on the gradients calculated during the backward pass. The learning rate controls the size of the parameter updates. The update is performed by subtracting the scaled gradients from the current parameters.

For Beginners: This method updates the layer's internal values during training.

After calculating the gradients in the backward pass:

This method applies those changes to the transformation matrix and bias
The learning rate controls how big each update step is
Smaller learning rates mean slower but more stable learning
Larger learning rates mean faster but potentially unstable learning

The formula is simple: new_value = old_value - (gradient * learning_rate)

This is how the layer "learns" from data over time, gradually improving its ability to make accurate predictions or classifications.

Exceptions

InvalidOperationException: Thrown when update is called before backward.

Table of Contents

Class CapsuleLayer<T>

Type Parameters

Remarks

Constructors

CapsuleLayer(int, int, int, int, int, IActivationFunction<T>?)

Parameters

Remarks

Properties

AuxiliaryLossWeight

Property Value

Remarks

SupportsGpuExecution

Property Value

Remarks

SupportsJitCompilation

Property Value

SupportsTraining

Property Value

Remarks

UseAuxiliaryLoss

Property Value

Remarks

Methods

Backward(Tensor<T>)

Parameters

Returns

Remarks

Exceptions

ComputeAuxiliaryLoss()

Returns

Remarks

ExportComputationGraph(List<ComputationNode<T>>)

Parameters

Returns

Remarks

Forward(Tensor<T>)

Parameters

Returns

Remarks

ForwardGpu(params IGpuTensor<T>[])

Parameters

Returns

Remarks

GetAuxiliaryLossDiagnostics()

Returns

Remarks

GetDiagnostics()

Returns

GetParameters()

Returns

Remarks

ResetState()

Remarks

SetParameters(Vector<T>)

Parameters

Remarks

Exceptions

UpdateParameters(T)

Parameters

Remarks

Exceptions