Class CapsuleLayer<T>
- Namespace
- AiDotNet.NeuralNetworks.Layers
- Assembly
- AiDotNet.dll
Represents a capsule neural network layer that encapsulates groups of neurons to better preserve spatial information.
public class CapsuleLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IWeightLoadable<T>, IDisposable, IAuxiliaryLossLayer<T>, IDiagnosticsProvider
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
LayerBase<T>CapsuleLayer<T>
- Implements
-
ILayer<T>
- Inherited Members
Remarks
A capsule layer is a specialized neural network layer that groups neurons into "capsules," where each capsule represents a specific entity or feature. Unlike traditional neural networks that use scalar outputs, capsules output vectors whose length represents the probability of the entity's existence and whose orientation encodes the entity's properties. Capsule layers use dynamic routing between capsules, which helps preserve hierarchical relationships between features and improves the network's ability to recognize objects from different viewpoints.
For Beginners: A capsule layer is an advanced type of neural network layer that works differently from standard layers.
Traditional neural network layers use single numbers to represent features, but capsule layers use groups of numbers (vectors) that can capture more information:
- Each "capsule" is a group of neurons that work together
- The length of a capsule's output tells you how likely something exists
- The direction of a capsule's output tells you about its properties (like position, size, rotation)
For example, if detecting faces in images:
- A traditional network might have neurons that detect eyes, nose, mouth separately
- A capsule network would understand how these parts relate to each other spatially
This helps the network recognize objects even when they're viewed from different angles or positions, which is something traditional networks struggle with.
Constructors
CapsuleLayer(int, int, int, int, int, IActivationFunction<T>?)
Initializes a new instance of the CapsuleLayer<T> class with specified dimensions and routing iterations.
public CapsuleLayer(int inputCapsules, int inputDimension, int numCapsules, int capsuleDimension, int numRoutingIterations, IActivationFunction<T>? activationFunction = null)
Parameters
inputCapsulesintThe number of input capsules.
inputDimensionintThe dimension of each input capsule.
numCapsulesintThe number of output capsules.
capsuleDimensionintThe dimension of each output capsule.
numRoutingIterationsintThe number of dynamic routing iterations to perform.
activationFunctionIActivationFunction<T>The activation function to apply. Defaults to squash activation if not specified.
Remarks
This constructor creates a new capsule layer with the specified dimensions and routing parameters. It initializes the transformation matrix and bias vector with appropriate values. The transformation matrix is used to convert input capsules to output capsules, and the bias is added to each output capsule. The number of routing iterations determines how many times the dynamic routing algorithm is executed during the forward pass.
For Beginners: This constructor creates a new capsule layer with specific settings.
When creating a capsule layer, you need to specify:
- How many input capsules there are (from the previous layer)
- How many numbers each input capsule contains
- How many output capsules you want this layer to create
- How many numbers each output capsule should contain
- How many routing iterations to perform (more iterations = more accurate but slower)
The "routing" is the special process that capsule networks use to determine which higher-level capsules should receive information from lower-level capsules.
Think of it like this: if you see an eye, nose, and mouth, the routing process helps decide if they should be grouped together as a face.
Properties
AuxiliaryLossWeight
Gets or sets the weight for the routing entropy auxiliary loss.
public T AuxiliaryLossWeight { get; set; }
Property Value
- T
Remarks
This weight controls how much the routing entropy regularization contributes to the total loss. The total loss is: main_loss + (auxiliary_weight * entropy_loss). Typical values range from 0.001 to 0.01.
For Beginners: This controls how much the network should encourage diverse routing.
The weight determines the balance between:
- Task accuracy (main loss)
- Routing diversity (entropy loss)
Common values:
- 0.005 (default): Balanced routing diversity
- 0.001-0.003: Light diversity enforcement
- 0.008-0.01: Strong diversity enforcement
Higher values make routing more diverse but might reduce task performance. Lower values allow more deterministic routing but might lead to overconfidence.
SupportsGpuExecution
Gets whether this layer has a GPU execution implementation for inference.
protected override bool SupportsGpuExecution { get; }
Property Value
Remarks
Override this to return true when the layer implements ForwardGpu(params IGpuTensor<T>[]). The actual CanExecuteOnGpu property combines this with engine availability.
For Beginners: This flag indicates if the layer has GPU code for the forward pass. Set this to true in derived classes that implement ForwardGpu.
SupportsJitCompilation
Gets a value indicating whether this layer supports JIT compilation.
public override bool SupportsJitCompilation { get; }
Property Value
- bool
truebecause CapsuleLayer uses dynamic routing with a fixed number of iterations that can be unrolled into a static computation graph.
SupportsTraining
Gets a value indicating whether this layer supports training.
public override bool SupportsTraining { get; }
Property Value
- bool
Always
trueas capsule layers have trainable parameters.
Remarks
This property returns true, indicating that the capsule layer can be trained through backpropagation. Capsule layers contain trainable parameters (transformation matrices and biases) that are adjusted during the training process to minimize the network's error.
For Beginners: This property tells you that this layer can learn from data.
A value of true means:
- The layer contains values (parameters) that will change during training
- It will improve its performance as it sees more examples
- It participates in the learning process of the neural network
Capsule layers always support training because they contain transformation matrices and bias values that need to be learned from data.
UseAuxiliaryLoss
Gets or sets whether auxiliary loss (routing entropy regularization) should be used during training.
public bool UseAuxiliaryLoss { get; set; }
Property Value
Remarks
Routing entropy regularization encourages diversity in the routing coefficients by penalizing low entropy distributions. This prevents routing from becoming too deterministic and helps the capsule layer learn more robust features.
For Beginners: Routing regularization helps capsules make better decisions.
In capsule networks:
- Routing coefficients decide how much information flows from lower to higher capsules
- If routing becomes too "certain" (all weight on one capsule), it might miss important patterns
- Entropy regularization encourages routing to consider multiple options
Think of it like this:
- Without regularization: "This is 100% a face, ignore everything else"
- With regularization: "This is probably a face (80%), but could be other things (20%)"
This helps the network:
- Learn more robust features
- Avoid overconfidence
- Generalize better to new examples
Methods
Backward(Tensor<T>)
Performs the backward pass of the capsule layer.
public override Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>The gradient of the loss with respect to the layer's output.
Returns
- Tensor<T>
The gradient of the loss with respect to the layer's input.
Remarks
This method implements the backward pass of the capsule layer, which is used during training to propagate error gradients back through the network. It computes the gradients of the loss with respect to the layer's parameters (transformation matrix and bias) and the layer's input. The gradients are stored internally and used during the parameter update step.
For Beginners: This method is used during training to calculate how the layer's inputs and parameters should change to reduce errors.
The backward pass:
- Takes in gradients (directions of improvement) from the next layer
- Applies the derivative of the activation function
- Calculates how much each parameter (transformation matrix and bias) contributed to the error
- Calculates how the input contributed to the error, to pass gradients to the previous layer
During this process, the method:
- Creates gradient tensors for the transformation matrix and bias
- Uses the coupling coefficients (connection strengths) calculated during the forward pass
- Produces gradients that will be used to update the parameters
This is part of the "backpropagation" algorithm that helps neural networks learn. The error flows backward through the network, and each layer determines how it should change to reduce that error.
Exceptions
- InvalidOperationException
Thrown when backward is called before forward.
ComputeAuxiliaryLoss()
Computes the auxiliary loss for routing entropy regularization.
public T ComputeAuxiliaryLoss()
Returns
- T
The computed routing entropy auxiliary loss.
Remarks
This method computes the entropy of the routing coefficients. Low entropy means the routing is very deterministic (concentrating on one capsule), while high entropy means it's more distributed across multiple capsules. We penalize low entropy to encourage diverse routing. Entropy: H = -Σ(p * log(p)) where p are the routing coefficients. We use negative entropy as loss since we want to maximize entropy (minimize -H).
For Beginners: This calculates how diverse the routing decisions are.
Routing entropy works by:
- Looking at the routing coefficients (how information flows between capsules)
- Measuring how "spread out" these coefficients are
- Penalizing routing that's too concentrated on one capsule
- Encouraging routing that considers multiple capsules
Entropy is a measure of uncertainty/diversity:
- Low entropy: Very certain, concentrated (e.g., [0.99, 0.01, 0.00])
- High entropy: Uncertain, diverse (e.g., [0.33, 0.33, 0.34])
By encouraging higher entropy, we prevent the network from becoming overconfident and help it learn more robust features that work in different situations.
ExportComputationGraph(List<ComputationNode<T>>)
Exports the layer's computation graph for JIT compilation.
public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)
Parameters
inputNodesList<ComputationNode<T>>List to populate with input computation nodes.
Returns
- ComputationNode<T>
The output computation node representing the layer's operation.
Remarks
This method constructs a computation graph representation of the layer's forward pass that can be JIT compiled for faster inference. All layers MUST implement this method to support JIT compilation.
For Beginners: JIT (Just-In-Time) compilation converts the layer's operations into optimized native code for 5-10x faster inference.
To support JIT compilation, a layer must:
- Implement this method to export its computation graph
- Set SupportsJitCompilation to true
- Use ComputationNode and TensorOperations to build the graph
All layers are required to implement this method, even if they set SupportsJitCompilation = false.
Forward(Tensor<T>)
Performs the forward pass of the capsule layer.
public override Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>The input tensor to process.
Returns
- Tensor<T>
The output tensor after capsule processing.
Remarks
This method implements the forward pass of the capsule layer, including the dynamic routing algorithm. The input capsules are first transformed using the transformation matrix. Then, dynamic routing is performed for the specified number of iterations to determine the coupling coefficients between input and output capsules. These coefficients are used to compute the weighted sum for each output capsule, which is then passed through the squash activation function to produce the final output.
For Beginners: This method processes the input data through the capsule layer.
The forward pass has several steps:
- Transform input capsules using the transformation matrix
- Start with equal connections between all input and output capsules
- Perform dynamic routing:
- Calculate weighted sums for each output capsule
- Add the bias values
- Apply the "squash" activation function to ensure vector lengths are between 0 and 1
- Update the connection strengths based on how well input and output capsules agree
- Repeat the routing process multiple times to refine the connections
The "squash" function is special to capsule networks - it preserves the direction of a vector but adjusts its length to be between 0 and 1 (representing a probability).
Dynamic routing is what makes capsule networks unique - it's how they learn to group lower-level features into higher-level concepts.
ForwardGpu(params IGpuTensor<T>[])
Performs GPU-accelerated forward pass through the capsule layer.
public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)
Parameters
inputsIGpuTensor<T>[]GPU-resident input tensors.
Returns
- IGpuTensor<T>
GPU-resident output tensor after capsule transformation.
Remarks
This method implements the forward pass using dedicated GPU kernels for capsule transformation and dynamic routing. All operations stay GPU-resident for maximum performance.
GetAuxiliaryLossDiagnostics()
Gets diagnostic information about the routing entropy auxiliary loss.
public Dictionary<string, string> GetAuxiliaryLossDiagnostics()
Returns
- Dictionary<string, string>
A dictionary containing diagnostic information about routing regularization.
Remarks
This method returns detailed diagnostics about the routing entropy regularization, including the computed entropy loss, weight applied, and whether the feature is enabled. This information is useful for monitoring training progress and debugging.
For Beginners: This provides information about how routing regularization is working.
The diagnostics include:
- Total routing entropy loss (how concentrated routing is)
- Weight applied to the entropy loss
- Whether routing regularization is enabled
- Number of routing iterations being used
This helps you:
- Monitor if routing is becoming too deterministic
- Debug issues with capsule layer learning
- Understand the impact of entropy regularization on routing
You can use this information to adjust the auxiliary loss weight or routing iterations for better results.
GetDiagnostics()
Gets diagnostic information about this component's state and behavior. Overrides GetDiagnostics() to include auxiliary loss diagnostics.
public override Dictionary<string, string> GetDiagnostics()
Returns
- Dictionary<string, string>
A dictionary containing diagnostic metrics including both base layer diagnostics and auxiliary loss diagnostics from GetAuxiliaryLossDiagnostics().
GetParameters()
Gets all trainable parameters from the layer as a single vector.
public override Vector<T> GetParameters()
Returns
- Vector<T>
A vector containing all trainable parameters.
Remarks
This method retrieves all trainable parameters from the layer and combines them into a single vector. It flattens the transformation matrix and concatenates it with the bias vector. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights.
For Beginners: This method collects all the learnable values from the layer into a single list.
The parameters:
- Include all values from the transformation matrix and bias
- Are combined into a single long list (vector)
- Represent everything this layer has learned
This is useful for:
- Saving the model to disk
- Loading parameters from a previously trained model
- Advanced optimization techniques that need access to all parameters
ResetState()
Resets the internal state of the capsule layer.
public override void ResetState()
Remarks
This method resets the internal state of the capsule layer, including the cached inputs, outputs, coupling coefficients, and gradients. This is useful when starting to process a new sequence or batch after training on a previous one.
For Beginners: This method clears the layer's temporary memory to start fresh.
When resetting the state:
- Stored inputs and outputs are cleared
- Calculated gradients are cleared
- Coupling coefficients (connection strengths) are cleared
- The layer forgets any information from previous batches
This is important for:
- Processing a new, unrelated batch of data
- Preventing information from one batch affecting another
- Preparing the layer for a new training step
Note that this doesn't reset the learned parameters (transformation matrix and bias), just the temporary information used during a single forward/backward pass.
SetParameters(Vector<T>)
Sets the trainable parameters for the layer.
public override void SetParameters(Vector<T> parameters)
Parameters
parametersVector<T>A vector containing all parameters to set.
Remarks
This method sets the trainable parameters for the layer from a single vector. It extracts the appropriate portions of the input vector for the transformation matrix and bias. This is useful for loading saved model weights or for implementing optimization algorithms that operate on all parameters at once.
For Beginners: This method updates all the learnable values in the layer.
When setting parameters:
- The input must be a vector with the correct length
- The first part of the vector is used for the transformation matrix
- The second part of the vector is used for the bias
This is useful for:
- Loading a previously saved model
- Transferring parameters from another model
- Testing different parameter values
An error is thrown if the input vector doesn't have the expected number of parameters.
Exceptions
- ArgumentException
Thrown when the parameters vector has incorrect length.
UpdateParameters(T)
Updates the layer's parameters using the calculated gradients.
public override void UpdateParameters(T learningRate)
Parameters
learningRateTThe learning rate to use for the parameter updates.
Remarks
This method updates the layer's parameters (transformation matrix and bias) based on the gradients calculated during the backward pass. The learning rate controls the size of the parameter updates. The update is performed by subtracting the scaled gradients from the current parameters.
For Beginners: This method updates the layer's internal values during training.
After calculating the gradients in the backward pass:
- This method applies those changes to the transformation matrix and bias
- The learning rate controls how big each update step is
- Smaller learning rates mean slower but more stable learning
- Larger learning rates mean faster but potentially unstable learning
The formula is simple: new_value = old_value - (gradient * learning_rate)
This is how the layer "learns" from data over time, gradually improving its ability to make accurate predictions or classifications.
Exceptions
- InvalidOperationException
Thrown when update is called before backward.