Class SqueezeAndExcitationLayer<T>

Namespace: AiDotNet.NeuralNetworks.Layers

Assembly: AiDotNet.dll

Represents a Squeeze-and-Excitation layer that recalibrates channel-wise feature responses adaptively.

public class SqueezeAndExcitationLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IWeightLoadable<T>, IDisposable, IAuxiliaryLossLayer<T>, IDiagnosticsProvider, IChainableComputationGraph<T>

Type Parameters

T: The numeric type used for calculations, typically float or double.

Inheritance: object

LayerBase<T>

SqueezeAndExcitationLayer<T>

Implements: ILayer<T>

IJitCompilable<T>

IWeightLoadable<T>

IDisposable

IAuxiliaryLossLayer<T>

IDiagnosticsProvider

IChainableComputationGraph<T>

Inherited Members: LayerBase<T>.Engine

LayerBase<T>.ScalarActivation

LayerBase<T>.VectorActivation

LayerBase<T>.UsingVectorActivation

LayerBase<T>.NumOps

LayerBase<T>.Random

LayerBase<T>.Parameters

LayerBase<T>.ParameterGradients

LayerBase<T>.InputShape

LayerBase<T>.InputShapes

LayerBase<T>.UpdateInputShape(int[])

LayerBase<T>.OutputShape

LayerBase<T>.IsTrainingMode

LayerBase<T>.InitializationStrategy

LayerBase<T>.IsInitialized

LayerBase<T>.InitializationLock

LayerBase<T>.EnsureInitialized()

LayerBase<T>.UseAutodiff

LayerBase<T>.SetTrainingMode(bool)

LayerBase<T>.GetParameterGradients()

LayerBase<T>.ClearGradients()

LayerBase<T>.GetInputShape()

LayerBase<T>.GetInputShapes()

LayerBase<T>.GetOutputShape()

LayerBase<T>.GetWeights()

LayerBase<T>.GetBiases()

LayerBase<T>.MapActivationToFused()

LayerBase<T>.SupportsGpuTraining

LayerBase<T>.CanExecuteOnGpu

LayerBase<T>.CanTrainOnGpu

LayerBase<T>.BackwardGpu(IGpuTensor<T>)

LayerBase<T>.UpdateParametersGpu(IGpuOptimizerConfig)

LayerBase<T>.UploadWeightsToGpu()

LayerBase<T>.DownloadWeightsFromGpu()

LayerBase<T>.ZeroGradientsGpu()

LayerBase<T>.GetActivationTypes()

LayerBase<T>.Forward(params Tensor<T>[])

LayerBase<T>.ApplyActivation(Tensor<T>)

LayerBase<T>.ApplyActivation(Vector<T>)

LayerBase<T>.ActivateTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ActivateTensor(IVectorActivationFunction<T>, Tensor<T>)

LayerBase<T>.CalculateInputShape(int, int, int)

LayerBase<T>.CalculateOutputShape(int, int, int)

LayerBase<T>.Clone()

LayerBase<T>.DerivativeTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ApplyActivationDerivative(T, T)

LayerBase<T>.ApplyActivationDerivative(Tensor<T>, Tensor<T>)

LayerBase<T>.ComputeActivationJacobian(Vector<T>)

LayerBase<T>.ApplyActivationDerivative(Vector<T>, Vector<T>)

LayerBase<T>.UpdateParameters(Vector<T>)

LayerBase<T>.Serialize(BinaryWriter)

LayerBase<T>.Deserialize(BinaryReader)

LayerBase<T>.ApplyActivationToGraph(ComputationNode<T>)

LayerBase<T>.CanActivationBeJitted()

LayerBase<T>.RegisterTrainableParameter(Tensor<T>, PersistentTensorRole)

LayerBase<T>.InvalidateTrainableParameter(Tensor<T>)

LayerBase<T>.HasGpuActivation()

LayerBase<T>.ApplyActivationForwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.ApplyActivationBackwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.GetFusedActivationType()

LayerBase<T>.ApplyGpuActivation(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int, FusedActivationType)

LayerBase<T>.ApplyGpuActivationBackward(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int, FusedActivationType, float)

LayerBase<T>.Dispose()

LayerBase<T>.Dispose(bool)

LayerBase<T>.WeightParameterName

LayerBase<T>.BiasParameterName

LayerBase<T>.SetWeights(Tensor<T>)

LayerBase<T>.SetBiases(Tensor<T>)

LayerBase<T>.GetParameterNames()

LayerBase<T>.TryGetParameter(string, out Tensor<T>)

LayerBase<T>.SetParameter(string, Tensor<T>)

LayerBase<T>.GetParameterShape(string)

LayerBase<T>.NamedParameterCount

LayerBase<T>.ValidateWeights(IEnumerable<string>, Func<string, string>)

LayerBase<T>.LoadWeights(Dictionary<string, Tensor<T>>, Func<string, string>, bool)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

A Squeeze-and-Excitation layer enhances the representational power of a network by explicitly modeling the interdependencies between channels. It does this by performing two operations: 1. "Squeeze" - aggregating feature maps across spatial dimensions to produce a channel descriptor 2. "Excitation" - using this descriptor to recalibrate the original feature maps channel-wise

For Beginners: This layer helps the neural network focus on the most important features.

Think of it like how your brain works when looking at a picture:

First, you get a rough idea of what's in the image (the "squeeze" step)
Then, you decide which parts to pay more attention to (the "excitation" step)
Finally, you look at the image again with this focused attention

For example, if the network is processing an image of a cat, the Squeeze-and-Excitation layer might:

First compress all the information to understand "this is probably a cat"
Then decide to pay more attention to features that look like ears, whiskers, and fur
Finally enhance those important features in the original image data

This helps the network become more accurate and efficient by focusing on what matters most.

Constructors

SqueezeAndExcitationLayer(int, int, IActivationFunction<T>?, IActivationFunction<T>?)

Initializes a new instance of the SqueezeAndExcitationLayer<T> class with scalar activation functions.

public SqueezeAndExcitationLayer(int channels, int reductionRatio, IActivationFunction<T>? firstActivation = null, IActivationFunction<T>? secondActivation = null)

Parameters

channels int: The number of input and output channels.
reductionRatio int: The ratio by which to reduce the number of channels in the bottleneck.
firstActivation IActivationFunction<T>: The activation function for the first fully connected layer. Defaults to ReLU if not specified.
secondActivation IActivationFunction<T>: The activation function for the second fully connected layer. Defaults to Sigmoid if not specified.

Remarks

This constructor creates a Squeeze-and-Excitation layer with the specified number of channels and reduction ratio. The reduction ratio determines how much the channel dimension is compressed in the bottleneck. The activation functions control the non-linearities applied after each fully connected layer.

For Beginners: This constructor creates a new Squeeze-and-Excitation layer.

The parameters you provide determine:

channels: How many different feature types the layer will process
reductionRatio: How much to compress the information (higher means more compression)
firstActivation: How to process information after the first step (defaults to ReLU, which keeps only positive values)
secondActivation: How to determine importance of each feature (defaults to Sigmoid, which outputs values between 0 and 1)

Think of it like this: if you have 64 channels (different types of features) and a reduction ratio of 16, the layer will compress those 64 channels down to just 4 during the middle step, forcing it to focus on only the most important patterns.

SqueezeAndExcitationLayer(int, int, IVectorActivationFunction<T>?, IVectorActivationFunction<T>?)

Initializes a new instance of the SqueezeAndExcitationLayer<T> class with vector activation functions.

public SqueezeAndExcitationLayer(int channels, int reductionRatio, IVectorActivationFunction<T>? firstVectorActivation = null, IVectorActivationFunction<T>? secondVectorActivation = null)

Parameters

channels int: The number of input and output channels.
reductionRatio int: The ratio by which to reduce the number of channels in the bottleneck.
firstVectorActivation IVectorActivationFunction<T>: The vector activation function for the first fully connected layer. Defaults to ReLU if not specified.
secondVectorActivation IVectorActivationFunction<T>: The vector activation function for the second fully connected layer. Defaults to Sigmoid if not specified.

Remarks

This constructor creates a Squeeze-and-Excitation layer with the specified number of channels and reduction ratio. It uses vector activation functions, which operate on entire vectors rather than individual elements. The reduction ratio determines how much the channel dimension is compressed in the bottleneck.

For Beginners: This constructor is similar to the previous one, but uses vector activations.

Vector activations:

Process entire groups of numbers at once, rather than one at a time
Can capture relationships between different elements
Allow for more complex transformations

This version is useful when you need more sophisticated processing that considers how different features relate to each other, rather than treating each feature independently.

Properties

AuxiliaryLossWeight

Gets or sets the weight for the auxiliary loss contribution.

public T AuxiliaryLossWeight { get; set; }

Property Value

T

Remarks

This value determines how much the channel attention regularization contributes to the total loss. The default value of 0.01 provides a good balance between the main task and regularization.

For Beginners: This controls how much importance to give to the channel attention regularization.

The weight affects training:

Higher values (e.g., 0.05) make the network prioritize balanced channel attention more strongly
Lower values (e.g., 0.001) make the regularization less important
The default (0.01) works well for most computer vision tasks

If your network is over-fitting to specific channels, increase this value. If the main task is more important, you might decrease it.

ParameterCount

Gets the total number of trainable parameters in this layer.

public override int ParameterCount { get; }

Property Value

int

Remarks

This returns the total count of weights and biases in both fully connected layers.

SparsityWeight

Gets or sets the weight for L1 sparsity regularization on attention weights.

public T SparsityWeight { get; set; }

Property Value

T: The weight to apply to the L1 sparsity loss. Default is 0.0001.

Remarks

This property controls the strength of L1 sparsity regularization applied to the channel attention weights. Higher values encourage more sparse attention (fewer active channels), while lower values allow more distributed attention.

For Beginners: This controls how strongly to encourage sparse attention.

Sparsity regularization:

Encourages the network to focus on fewer, more important channels
Helps prevent overfitting by reducing model complexity
Can improve interpretability by making channel selection clearer

Typical values range from 0.0001 to 0.01. Set to 0 to disable sparsity regularization.

SupportsGpuExecution

Gets a value indicating whether this layer supports GPU execution.

protected override bool SupportsGpuExecution { get; }

Property Value

bool

SupportsJitCompilation

Gets whether this layer supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool: True if the layer can be JIT compiled, false otherwise.

Remarks

This property indicates whether the layer has implemented ExportComputationGraph() and can benefit from JIT compilation. All layers MUST implement this property.

For Beginners: JIT compilation can make inference 5-10x faster by converting the layer's operations into optimized native code.

Layers should return false if they:

Have not yet implemented a working ExportComputationGraph()
Use dynamic operations that change based on input data
Are too simple to benefit from JIT compilation

When false, the layer will use the standard Forward() method instead.

SupportsTraining

Gets a value indicating whether this layer supports training.

public override bool SupportsTraining { get; }

Property Value

bool: true for this layer, as it contains trainable parameters (weights and biases).

Remarks

This property indicates whether the Squeeze-and-Excitation layer can be trained through backpropagation. Since this layer has trainable parameters (weights and biases), it supports training.

For Beginners: This property tells you if the layer can learn from data.

A value of true means:

The layer has internal values (weights and biases) that can be adjusted during training
It will improve its performance as it sees more data
It participates in the learning process

For this layer, the value is always true because it needs to learn which features are most important to pay attention to.

UseAuxiliaryLoss

Gets or sets a value indicating whether auxiliary loss is enabled for this layer.

public bool UseAuxiliaryLoss { get; set; }

Property Value

bool

Remarks

When enabled, the layer computes a channel attention regularization loss that encourages balanced channel importance. This helps prevent the layer from over-relying on specific channels.

For Beginners: This setting controls whether the layer uses an additional learning signal.

When enabled (true):

The layer encourages balanced attention across channels
This helps prevent over-reliance on specific features
Training may be more stable and produce more robust representations

When disabled (false):

Only the main task loss is used for training
This is the default setting

Methods

Backward(Tensor<T>)

Performs the backward pass of the Squeeze-and-Excitation layer.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>: The gradient of the loss with respect to the layer's output.

Returns

Tensor<T>: The gradient of the loss with respect to the layer's input.

Remarks

This method implements the backward pass of the Squeeze-and-Excitation layer, which is used during training to propagate error gradients back through the network. It calculates gradients for the input and for all trainable parameters (weights and biases).

For Beginners: This method is used during training to calculate how the layer's input and parameters should change to reduce errors.

During the backward pass:

The layer receives information about how its output should change (outputGradient)
It calculates how the original input should change to reduce error (inputGradient)
It calculates how its internal weights and biases should change to reduce error

This process follows the chain rule of calculus, working backward from the output to the input. It's essential for the "learning" part of deep learning, allowing the network to gradually improve its performance based on examples.

Exceptions

InvalidOperationException: Thrown when trying to perform a backward pass before a forward pass.

BuildComputationGraph(ComputationNode<T>, string)

Builds the computation graph for this layer using the provided input node.

public ComputationNode<T> BuildComputationGraph(ComputationNode<T> inputNode, string namePrefix)

Parameters

inputNode ComputationNode<T>: The input computation node from the parent layer.
namePrefix string: Prefix for naming internal nodes (for debugging/visualization).

Returns

ComputationNode<T>: The output computation node representing this layer's computation.

Remarks

Unlike ILayer<T>.ExportComputationGraph, this method does NOT create a new input variable. Instead, it uses the provided inputNode as its input, allowing the parent layer to chain multiple sub-layers together in a single computation graph.

The namePrefix parameter should be used to prefix all internal node names to avoid naming conflicts when multiple instances of the same layer type are used.

ComputeAuxiliaryLoss()

Computes the auxiliary loss for this layer based on channel attention regularization.

public T ComputeAuxiliaryLoss()

Returns

T: The computed auxiliary loss value.

Remarks

This method computes a channel attention regularization loss. In a full implementation, this would encourage balanced channel attention by penalizing extreme attention values (all attention on one channel or uniform attention across all channels). The regularization can use L2 norm or entropy-based measures.

For Beginners: This method calculates a penalty to encourage balanced feature importance.

Channel attention regularization:

Prevents the layer from relying too heavily on specific channels
Encourages the network to use information from multiple features
Helps create more robust and generalizable models

Why this is useful:

In complex tasks, multiple types of features are usually important
Over-relying on one type of feature can lead to poor generalization
Balanced attention helps the network learn richer representations

Example: In image classification, instead of only looking at edges (one channel), the network should also consider colors, textures, and shapes (other channels).

Note: This is a placeholder implementation. For full functionality, the layer would need to cache the excitation weights (channel attention scores) during the forward pass. The formula would compute a regularization term based on these attention weights, such as:

L2 regularization: L = ||excitation||²
Entropy regularization: L = -Σ(p * log(p)) for normalized excitation weights
Variance penalty: encouraging variance in attention across channels

ExportComputationGraph(List<ComputationNode<T>>)

Exports the layer's computation graph for JIT compilation.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>: List to populate with input computation nodes.

Returns

ComputationNode<T>: The output computation node representing the layer's operation.

Remarks

This method constructs a computation graph representation of the layer's forward pass that can be JIT compiled for faster inference. All layers MUST implement this method to support JIT compilation.

For Beginners: JIT (Just-In-Time) compilation converts the layer's operations into optimized native code for 5-10x faster inference.

To support JIT compilation, a layer must:

Implement this method to export its computation graph
Set SupportsJitCompilation to true
Use ComputationNode and TensorOperations to build the graph

All layers are required to implement this method, even if they set SupportsJitCompilation = false.

Forward(Tensor<T>)

Performs the forward pass of the Squeeze-and-Excitation layer.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>: The input tensor to process.

Returns

Tensor<T>: The output tensor after Squeeze-and-Excitation processing.

Remarks

This method implements the forward pass of the Squeeze-and-Excitation layer. It first applies global average pooling to "squeeze" spatial information into a channel descriptor. Then it passes this descriptor through two fully connected layers with activations to produce channel-wise scaling factors. Finally, it multiplies the original input by these scaling factors to recalibrate the feature maps.

For Beginners: This method processes the input data through the Squeeze-and-Excitation steps.

The process works in three main steps:

Squeeze: Compresses all spatial information into a single value per channel
- For each channel, all values are averaged together
- This creates a "summary" of each feature type
Excitation: Determines the importance of each channel
- The summary passes through two neural layers with activations
- This produces an "importance score" between 0 and 1 for each channel
Scaling: Adjusts the original input based on importance
- Each feature map is multiplied by its importance score
- Important features are kept or enhanced
- Less important features are reduced

This helps the network focus attention on the most useful features for the current input.

ForwardGpu(params IGpuTensor<T>[])

Performs the forward pass of the Squeeze-and-Excitation layer on GPU tensors.

public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)

Parameters

inputs IGpuTensor<T>[]: GPU tensor inputs.

Returns

IGpuTensor<T>: GPU tensor output after SE processing.

Remarks

This method implements the GPU-accelerated forward pass of the SE layer. All tensor ranks are handled natively on GPU using GlobalAvgPool2D for squeeze, FusedLinearGpu for excitation, and BroadcastMultiplyFirstAxis for scaling.

GetAuxiliaryLossDiagnostics()

Gets diagnostic information about the auxiliary loss computation.

public Dictionary<string, string> GetAuxiliaryLossDiagnostics()

Returns

Dictionary<string, string>: A dictionary containing diagnostic information about the auxiliary loss.

Remarks

This method returns diagnostic information that can be used to monitor the auxiliary loss during training. The diagnostics include the total channel attention loss, the weight applied to it, and whether auxiliary loss is enabled.

For Beginners: This method provides information to help you understand how the auxiliary loss is working.

The diagnostics show:

TotalChannelAttentionLoss: The computed penalty for imbalanced channel attention
ChannelAttentionWeight: How much this penalty affects the overall training
UseChannelAttention: Whether this penalty is currently enabled

You can use this information to:

Monitor if channel attention is becoming more balanced over time
Debug training issues related to feature selection
Understand which features the network prioritizes

Example: If TotalChannelAttentionLoss is high, it might indicate that the network is over-relying on specific channels, which could be a sign of overfitting or poor feature diversity.

GetDiagnostics()

Gets diagnostic information about this component's state and behavior. Overrides GetDiagnostics() to include auxiliary loss diagnostics.

public override Dictionary<string, string> GetDiagnostics()

Returns

Dictionary<string, string>: A dictionary containing diagnostic metrics including both base layer diagnostics and auxiliary loss diagnostics from GetAuxiliaryLossDiagnostics().

GetParameters()

Gets all trainable parameters of the layer as a single vector.

public override Vector<T> GetParameters()

Returns

Vector<T>: A vector containing all trainable parameters.

Remarks

This method retrieves all trainable parameters (weights and biases) of the layer and combines them into a single vector. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights.

For Beginners: This method collects all the learnable values from the layer.

The parameters:

Are the numbers that the neural network learns during training
Include all weights and biases from both fully connected layers
Are combined into a single long list (vector)

This is useful for:

Saving the model to disk
Loading parameters from a previously trained model
Advanced optimization techniques that need access to all parameters

ResetState()

Resets the internal state of the Squeeze-and-Excitation layer.

public override void ResetState()

Remarks

This method resets the internal state of the Squeeze-and-Excitation layer, including the cached inputs and outputs from the forward pass and the gradients calculated during the backward pass. This is useful when starting to process a new input after training or when implementing stateful networks.

For Beginners: This method clears the layer's memory to start fresh.

When resetting the state:

Stored inputs and outputs are cleared
Calculated gradients are cleared
The layer forgets any information from previous inputs

This is important for:

Processing a new, unrelated input
Starting a new training epoch
Preventing information from one input affecting another

Think of it like wiping a whiteboard clean before starting a new problem.

SetParameters(Vector<T>)

Sets the trainable parameters of the layer from a single vector.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>: A vector containing all parameters to set.

Remarks

This method sets the trainable parameters (weights and biases) of the layer from a single vector. This is useful for loading saved model weights or for implementing optimization algorithms that operate on all parameters at once.

For Beginners: This method updates all the learnable values in the layer.

When setting parameters:

The input must be a vector with the correct length
The values are copied back into the layer's weights and biases

This is useful for:

Loading a previously saved model
Transferring parameters from another model
Testing different parameter values

An error is thrown if the input vector doesn't have the expected number of parameters.

Exceptions

ArgumentException: Thrown when the parameters vector has incorrect length.

UpdateParameters(T)

Updates the layer's parameters using the calculated gradients and the specified learning rate.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T: The learning rate that controls the size of the parameter updates.

Remarks

This method updates the weights and biases of the layer based on the gradients calculated during the backward pass. The learning rate controls the size of the updates, with larger values leading to faster but potentially less stable learning.

For Beginners: This method adjusts the layer's weights and biases to improve performance.

During training:

The backward pass calculates how each parameter should change to reduce errors
This method applies those changes to the actual parameters
The learning rate controls how big each adjustment is

Think of it like learning to ride a bike:

If you make very small adjustments (small learning rate), you learn slowly but steadily
If you make large adjustments (large learning rate), you might learn faster but risk overcorrecting

This process of gradual adjustment is how neural networks "learn" from examples.

Exceptions

InvalidOperationException: Thrown when trying to update parameters before calculating gradients.

Table of Contents

Class SqueezeAndExcitationLayer<T>

Type Parameters

Remarks

Constructors

SqueezeAndExcitationLayer(int, int, IActivationFunction<T>?, IActivationFunction<T>?)

Parameters

Remarks

SqueezeAndExcitationLayer(int, int, IVectorActivationFunction<T>?, IVectorActivationFunction<T>?)

Parameters

Remarks

Properties

AuxiliaryLossWeight

Property Value

Remarks

ParameterCount

Property Value

Remarks

SparsityWeight

Property Value

Remarks

SupportsGpuExecution

Property Value

SupportsJitCompilation

Property Value

Remarks

SupportsTraining

Property Value

Remarks

UseAuxiliaryLoss

Property Value

Remarks

Methods

Backward(Tensor<T>)

Parameters

Returns

Remarks

Exceptions

BuildComputationGraph(ComputationNode<T>, string)

Parameters

Returns

Remarks

ComputeAuxiliaryLoss()

Returns

Remarks

ExportComputationGraph(List<ComputationNode<T>>)

Parameters

Returns

Remarks

Forward(Tensor<T>)

Parameters

Returns

Remarks

ForwardGpu(params IGpuTensor<T>[])

Parameters

Returns

Remarks

GetAuxiliaryLossDiagnostics()

Returns

Remarks

GetDiagnostics()

Returns

GetParameters()

Returns

Remarks

ResetState()

Remarks

SetParameters(Vector<T>)

Parameters

Remarks

Exceptions

UpdateParameters(T)

Parameters

Remarks

Exceptions