Class SqueezeAndExcitationLayer<T>
- Namespace
- AiDotNet.NeuralNetworks.Layers
- Assembly
- AiDotNet.dll
Represents a Squeeze-and-Excitation layer that recalibrates channel-wise feature responses adaptively.
public class SqueezeAndExcitationLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IWeightLoadable<T>, IDisposable, IAuxiliaryLossLayer<T>, IDiagnosticsProvider, IChainableComputationGraph<T>
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
LayerBase<T>SqueezeAndExcitationLayer<T>
- Implements
-
ILayer<T>
- Inherited Members
Remarks
A Squeeze-and-Excitation layer enhances the representational power of a network by explicitly modeling the interdependencies between channels. It does this by performing two operations: 1. "Squeeze" - aggregating feature maps across spatial dimensions to produce a channel descriptor 2. "Excitation" - using this descriptor to recalibrate the original feature maps channel-wise
For Beginners: This layer helps the neural network focus on the most important features.
Think of it like how your brain works when looking at a picture:
- First, you get a rough idea of what's in the image (the "squeeze" step)
- Then, you decide which parts to pay more attention to (the "excitation" step)
- Finally, you look at the image again with this focused attention
For example, if the network is processing an image of a cat, the Squeeze-and-Excitation layer might:
- First compress all the information to understand "this is probably a cat"
- Then decide to pay more attention to features that look like ears, whiskers, and fur
- Finally enhance those important features in the original image data
This helps the network become more accurate and efficient by focusing on what matters most.
Constructors
SqueezeAndExcitationLayer(int, int, IActivationFunction<T>?, IActivationFunction<T>?)
Initializes a new instance of the SqueezeAndExcitationLayer<T> class with scalar activation functions.
public SqueezeAndExcitationLayer(int channels, int reductionRatio, IActivationFunction<T>? firstActivation = null, IActivationFunction<T>? secondActivation = null)
Parameters
channelsintThe number of input and output channels.
reductionRatiointThe ratio by which to reduce the number of channels in the bottleneck.
firstActivationIActivationFunction<T>The activation function for the first fully connected layer. Defaults to ReLU if not specified.
secondActivationIActivationFunction<T>The activation function for the second fully connected layer. Defaults to Sigmoid if not specified.
Remarks
This constructor creates a Squeeze-and-Excitation layer with the specified number of channels and reduction ratio. The reduction ratio determines how much the channel dimension is compressed in the bottleneck. The activation functions control the non-linearities applied after each fully connected layer.
For Beginners: This constructor creates a new Squeeze-and-Excitation layer.
The parameters you provide determine:
- channels: How many different feature types the layer will process
- reductionRatio: How much to compress the information (higher means more compression)
- firstActivation: How to process information after the first step (defaults to ReLU, which keeps only positive values)
- secondActivation: How to determine importance of each feature (defaults to Sigmoid, which outputs values between 0 and 1)
Think of it like this: if you have 64 channels (different types of features) and a reduction ratio of 16, the layer will compress those 64 channels down to just 4 during the middle step, forcing it to focus on only the most important patterns.
SqueezeAndExcitationLayer(int, int, IVectorActivationFunction<T>?, IVectorActivationFunction<T>?)
Initializes a new instance of the SqueezeAndExcitationLayer<T> class with vector activation functions.
public SqueezeAndExcitationLayer(int channels, int reductionRatio, IVectorActivationFunction<T>? firstVectorActivation = null, IVectorActivationFunction<T>? secondVectorActivation = null)
Parameters
channelsintThe number of input and output channels.
reductionRatiointThe ratio by which to reduce the number of channels in the bottleneck.
firstVectorActivationIVectorActivationFunction<T>The vector activation function for the first fully connected layer. Defaults to ReLU if not specified.
secondVectorActivationIVectorActivationFunction<T>The vector activation function for the second fully connected layer. Defaults to Sigmoid if not specified.
Remarks
This constructor creates a Squeeze-and-Excitation layer with the specified number of channels and reduction ratio. It uses vector activation functions, which operate on entire vectors rather than individual elements. The reduction ratio determines how much the channel dimension is compressed in the bottleneck.
For Beginners: This constructor is similar to the previous one, but uses vector activations.
Vector activations:
- Process entire groups of numbers at once, rather than one at a time
- Can capture relationships between different elements
- Allow for more complex transformations
This version is useful when you need more sophisticated processing that considers how different features relate to each other, rather than treating each feature independently.
Properties
AuxiliaryLossWeight
Gets or sets the weight for the auxiliary loss contribution.
public T AuxiliaryLossWeight { get; set; }
Property Value
- T
Remarks
This value determines how much the channel attention regularization contributes to the total loss. The default value of 0.01 provides a good balance between the main task and regularization.
For Beginners: This controls how much importance to give to the channel attention regularization.
The weight affects training:
- Higher values (e.g., 0.05) make the network prioritize balanced channel attention more strongly
- Lower values (e.g., 0.001) make the regularization less important
- The default (0.01) works well for most computer vision tasks
If your network is over-fitting to specific channels, increase this value. If the main task is more important, you might decrease it.
ParameterCount
Gets the total number of trainable parameters in this layer.
public override int ParameterCount { get; }
Property Value
Remarks
This returns the total count of weights and biases in both fully connected layers.
SparsityWeight
Gets or sets the weight for L1 sparsity regularization on attention weights.
public T SparsityWeight { get; set; }
Property Value
- T
The weight to apply to the L1 sparsity loss. Default is 0.0001.
Remarks
This property controls the strength of L1 sparsity regularization applied to the channel attention weights. Higher values encourage more sparse attention (fewer active channels), while lower values allow more distributed attention.
For Beginners: This controls how strongly to encourage sparse attention.
Sparsity regularization:
- Encourages the network to focus on fewer, more important channels
- Helps prevent overfitting by reducing model complexity
- Can improve interpretability by making channel selection clearer
Typical values range from 0.0001 to 0.01. Set to 0 to disable sparsity regularization.
SupportsGpuExecution
Gets a value indicating whether this layer supports GPU execution.
protected override bool SupportsGpuExecution { get; }
Property Value
SupportsJitCompilation
Gets whether this layer supports JIT compilation.
public override bool SupportsJitCompilation { get; }
Property Value
- bool
True if the layer can be JIT compiled, false otherwise.
Remarks
This property indicates whether the layer has implemented ExportComputationGraph() and can benefit from JIT compilation. All layers MUST implement this property.
For Beginners: JIT compilation can make inference 5-10x faster by converting the layer's operations into optimized native code.
Layers should return false if they:
- Have not yet implemented a working ExportComputationGraph()
- Use dynamic operations that change based on input data
- Are too simple to benefit from JIT compilation
When false, the layer will use the standard Forward() method instead.
SupportsTraining
Gets a value indicating whether this layer supports training.
public override bool SupportsTraining { get; }
Property Value
- bool
truefor this layer, as it contains trainable parameters (weights and biases).
Remarks
This property indicates whether the Squeeze-and-Excitation layer can be trained through backpropagation. Since this layer has trainable parameters (weights and biases), it supports training.
For Beginners: This property tells you if the layer can learn from data.
A value of true means:
- The layer has internal values (weights and biases) that can be adjusted during training
- It will improve its performance as it sees more data
- It participates in the learning process
For this layer, the value is always true because it needs to learn which features are most important to pay attention to.
UseAuxiliaryLoss
Gets or sets a value indicating whether auxiliary loss is enabled for this layer.
public bool UseAuxiliaryLoss { get; set; }
Property Value
Remarks
When enabled, the layer computes a channel attention regularization loss that encourages balanced channel importance. This helps prevent the layer from over-relying on specific channels.
For Beginners: This setting controls whether the layer uses an additional learning signal.
When enabled (true):
- The layer encourages balanced attention across channels
- This helps prevent over-reliance on specific features
- Training may be more stable and produce more robust representations
When disabled (false):
- Only the main task loss is used for training
- This is the default setting
Methods
Backward(Tensor<T>)
Performs the backward pass of the Squeeze-and-Excitation layer.
public override Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>The gradient of the loss with respect to the layer's output.
Returns
- Tensor<T>
The gradient of the loss with respect to the layer's input.
Remarks
This method implements the backward pass of the Squeeze-and-Excitation layer, which is used during training to propagate error gradients back through the network. It calculates gradients for the input and for all trainable parameters (weights and biases).
For Beginners: This method is used during training to calculate how the layer's input and parameters should change to reduce errors.
During the backward pass:
- The layer receives information about how its output should change (outputGradient)
- It calculates how the original input should change to reduce error (inputGradient)
- It calculates how its internal weights and biases should change to reduce error
This process follows the chain rule of calculus, working backward from the output to the input. It's essential for the "learning" part of deep learning, allowing the network to gradually improve its performance based on examples.
Exceptions
- InvalidOperationException
Thrown when trying to perform a backward pass before a forward pass.
BuildComputationGraph(ComputationNode<T>, string)
Builds the computation graph for this layer using the provided input node.
public ComputationNode<T> BuildComputationGraph(ComputationNode<T> inputNode, string namePrefix)
Parameters
inputNodeComputationNode<T>The input computation node from the parent layer.
namePrefixstringPrefix for naming internal nodes (for debugging/visualization).
Returns
- ComputationNode<T>
The output computation node representing this layer's computation.
Remarks
Unlike ILayer<T>.ExportComputationGraph, this method does NOT create a new
input variable. Instead, it uses the provided inputNode as its input,
allowing the parent layer to chain multiple sub-layers together in a single computation graph.
The namePrefix parameter should be used to prefix all internal node names
to avoid naming conflicts when multiple instances of the same layer type are used.
ComputeAuxiliaryLoss()
Computes the auxiliary loss for this layer based on channel attention regularization.
public T ComputeAuxiliaryLoss()
Returns
- T
The computed auxiliary loss value.
Remarks
This method computes a channel attention regularization loss. In a full implementation, this would encourage balanced channel attention by penalizing extreme attention values (all attention on one channel or uniform attention across all channels). The regularization can use L2 norm or entropy-based measures.
For Beginners: This method calculates a penalty to encourage balanced feature importance.
Channel attention regularization:
- Prevents the layer from relying too heavily on specific channels
- Encourages the network to use information from multiple features
- Helps create more robust and generalizable models
Why this is useful:
- In complex tasks, multiple types of features are usually important
- Over-relying on one type of feature can lead to poor generalization
- Balanced attention helps the network learn richer representations
Example: In image classification, instead of only looking at edges (one channel), the network should also consider colors, textures, and shapes (other channels).
Note: This is a placeholder implementation. For full functionality, the layer would need to cache the excitation weights (channel attention scores) during the forward pass. The formula would compute a regularization term based on these attention weights, such as:
- L2 regularization: L = ||excitation||²
- Entropy regularization: L = -Σ(p * log(p)) for normalized excitation weights
- Variance penalty: encouraging variance in attention across channels
ExportComputationGraph(List<ComputationNode<T>>)
Exports the layer's computation graph for JIT compilation.
public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)
Parameters
inputNodesList<ComputationNode<T>>List to populate with input computation nodes.
Returns
- ComputationNode<T>
The output computation node representing the layer's operation.
Remarks
This method constructs a computation graph representation of the layer's forward pass that can be JIT compiled for faster inference. All layers MUST implement this method to support JIT compilation.
For Beginners: JIT (Just-In-Time) compilation converts the layer's operations into optimized native code for 5-10x faster inference.
To support JIT compilation, a layer must:
- Implement this method to export its computation graph
- Set SupportsJitCompilation to true
- Use ComputationNode and TensorOperations to build the graph
All layers are required to implement this method, even if they set SupportsJitCompilation = false.
Forward(Tensor<T>)
Performs the forward pass of the Squeeze-and-Excitation layer.
public override Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>The input tensor to process.
Returns
- Tensor<T>
The output tensor after Squeeze-and-Excitation processing.
Remarks
This method implements the forward pass of the Squeeze-and-Excitation layer. It first applies global average pooling to "squeeze" spatial information into a channel descriptor. Then it passes this descriptor through two fully connected layers with activations to produce channel-wise scaling factors. Finally, it multiplies the original input by these scaling factors to recalibrate the feature maps.
For Beginners: This method processes the input data through the Squeeze-and-Excitation steps.
The process works in three main steps:
Squeeze: Compresses all spatial information into a single value per channel
- For each channel, all values are averaged together
- This creates a "summary" of each feature type
Excitation: Determines the importance of each channel
- The summary passes through two neural layers with activations
- This produces an "importance score" between 0 and 1 for each channel
Scaling: Adjusts the original input based on importance
- Each feature map is multiplied by its importance score
- Important features are kept or enhanced
- Less important features are reduced
This helps the network focus attention on the most useful features for the current input.
ForwardGpu(params IGpuTensor<T>[])
Performs the forward pass of the Squeeze-and-Excitation layer on GPU tensors.
public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)
Parameters
inputsIGpuTensor<T>[]GPU tensor inputs.
Returns
- IGpuTensor<T>
GPU tensor output after SE processing.
Remarks
This method implements the GPU-accelerated forward pass of the SE layer. All tensor ranks are handled natively on GPU using GlobalAvgPool2D for squeeze, FusedLinearGpu for excitation, and BroadcastMultiplyFirstAxis for scaling.
GetAuxiliaryLossDiagnostics()
Gets diagnostic information about the auxiliary loss computation.
public Dictionary<string, string> GetAuxiliaryLossDiagnostics()
Returns
- Dictionary<string, string>
A dictionary containing diagnostic information about the auxiliary loss.
Remarks
This method returns diagnostic information that can be used to monitor the auxiliary loss during training. The diagnostics include the total channel attention loss, the weight applied to it, and whether auxiliary loss is enabled.
For Beginners: This method provides information to help you understand how the auxiliary loss is working.
The diagnostics show:
- TotalChannelAttentionLoss: The computed penalty for imbalanced channel attention
- ChannelAttentionWeight: How much this penalty affects the overall training
- UseChannelAttention: Whether this penalty is currently enabled
You can use this information to:
- Monitor if channel attention is becoming more balanced over time
- Debug training issues related to feature selection
- Understand which features the network prioritizes
Example: If TotalChannelAttentionLoss is high, it might indicate that the network is over-relying on specific channels, which could be a sign of overfitting or poor feature diversity.
GetDiagnostics()
Gets diagnostic information about this component's state and behavior. Overrides GetDiagnostics() to include auxiliary loss diagnostics.
public override Dictionary<string, string> GetDiagnostics()
Returns
- Dictionary<string, string>
A dictionary containing diagnostic metrics including both base layer diagnostics and auxiliary loss diagnostics from GetAuxiliaryLossDiagnostics().
GetParameters()
Gets all trainable parameters of the layer as a single vector.
public override Vector<T> GetParameters()
Returns
- Vector<T>
A vector containing all trainable parameters.
Remarks
This method retrieves all trainable parameters (weights and biases) of the layer and combines them into a single vector. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights.
For Beginners: This method collects all the learnable values from the layer.
The parameters:
- Are the numbers that the neural network learns during training
- Include all weights and biases from both fully connected layers
- Are combined into a single long list (vector)
This is useful for:
- Saving the model to disk
- Loading parameters from a previously trained model
- Advanced optimization techniques that need access to all parameters
ResetState()
Resets the internal state of the Squeeze-and-Excitation layer.
public override void ResetState()
Remarks
This method resets the internal state of the Squeeze-and-Excitation layer, including the cached inputs and outputs from the forward pass and the gradients calculated during the backward pass. This is useful when starting to process a new input after training or when implementing stateful networks.
For Beginners: This method clears the layer's memory to start fresh.
When resetting the state:
- Stored inputs and outputs are cleared
- Calculated gradients are cleared
- The layer forgets any information from previous inputs
This is important for:
- Processing a new, unrelated input
- Starting a new training epoch
- Preventing information from one input affecting another
Think of it like wiping a whiteboard clean before starting a new problem.
SetParameters(Vector<T>)
Sets the trainable parameters of the layer from a single vector.
public override void SetParameters(Vector<T> parameters)
Parameters
parametersVector<T>A vector containing all parameters to set.
Remarks
This method sets the trainable parameters (weights and biases) of the layer from a single vector. This is useful for loading saved model weights or for implementing optimization algorithms that operate on all parameters at once.
For Beginners: This method updates all the learnable values in the layer.
When setting parameters:
- The input must be a vector with the correct length
- The values are copied back into the layer's weights and biases
This is useful for:
- Loading a previously saved model
- Transferring parameters from another model
- Testing different parameter values
An error is thrown if the input vector doesn't have the expected number of parameters.
Exceptions
- ArgumentException
Thrown when the parameters vector has incorrect length.
UpdateParameters(T)
Updates the layer's parameters using the calculated gradients and the specified learning rate.
public override void UpdateParameters(T learningRate)
Parameters
learningRateTThe learning rate that controls the size of the parameter updates.
Remarks
This method updates the weights and biases of the layer based on the gradients calculated during the backward pass. The learning rate controls the size of the updates, with larger values leading to faster but potentially less stable learning.
For Beginners: This method adjusts the layer's weights and biases to improve performance.
During training:
- The backward pass calculates how each parameter should change to reduce errors
- This method applies those changes to the actual parameters
- The learning rate controls how big each adjustment is
Think of it like learning to ride a bike:
- If you make very small adjustments (small learning rate), you learn slowly but steadily
- If you make large adjustments (large learning rate), you might learn faster but risk overcorrecting
This process of gradual adjustment is how neural networks "learn" from examples.
Exceptions
- InvalidOperationException
Thrown when trying to update parameters before calculating gradients.