Class GroupNormalizationLayer<T>

Namespace: AiDotNet.NeuralNetworks.Layers

Assembly: AiDotNet.dll

Represents a Group Normalization layer that normalizes inputs across groups of channels.

public class GroupNormalizationLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable

Type Parameters

T: The numeric type used for calculations, typically float or double.

Inheritance: object

LayerBase<T>

GroupNormalizationLayer<T>

Implements: ILayer<T>

IJitCompilable<T>

IDiagnosticsProvider

IWeightLoadable<T>

IDisposable

Inherited Members: LayerBase<T>.Engine

LayerBase<T>.ScalarActivation

LayerBase<T>.VectorActivation

LayerBase<T>.UsingVectorActivation

LayerBase<T>.NumOps

LayerBase<T>.Random

LayerBase<T>.Parameters

LayerBase<T>.ParameterGradients

LayerBase<T>.InputShape

LayerBase<T>.InputShapes

LayerBase<T>.UpdateInputShape(int[])

LayerBase<T>.OutputShape

LayerBase<T>.IsTrainingMode

LayerBase<T>.InitializationStrategy

LayerBase<T>.IsInitialized

LayerBase<T>.InitializationLock

LayerBase<T>.EnsureInitialized()

LayerBase<T>.UseAutodiff

LayerBase<T>.SetTrainingMode(bool)

LayerBase<T>.GetParameterGradients()

LayerBase<T>.ClearGradients()

LayerBase<T>.GetInputShape()

LayerBase<T>.GetInputShapes()

LayerBase<T>.GetOutputShape()

LayerBase<T>.GetWeights()

LayerBase<T>.GetBiases()

LayerBase<T>.MapActivationToFused()

LayerBase<T>.CanExecuteOnGpu

LayerBase<T>.CanTrainOnGpu

LayerBase<T>.UploadWeightsToGpu()

LayerBase<T>.DownloadWeightsFromGpu()

LayerBase<T>.ZeroGradientsGpu()

LayerBase<T>.GetActivationTypes()

LayerBase<T>.Forward(params Tensor<T>[])

LayerBase<T>.ApplyActivation(Tensor<T>)

LayerBase<T>.ApplyActivation(Vector<T>)

LayerBase<T>.ActivateTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ActivateTensor(IVectorActivationFunction<T>, Tensor<T>)

LayerBase<T>.CalculateInputShape(int, int, int)

LayerBase<T>.CalculateOutputShape(int, int, int)

LayerBase<T>.Clone()

LayerBase<T>.DerivativeTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ApplyActivationDerivative(T, T)

LayerBase<T>.ApplyActivationDerivative(Tensor<T>, Tensor<T>)

LayerBase<T>.ComputeActivationJacobian(Vector<T>)

LayerBase<T>.ApplyActivationDerivative(Vector<T>, Vector<T>)

LayerBase<T>.UpdateParameters(Vector<T>)

LayerBase<T>.ParameterCount

LayerBase<T>.Serialize(BinaryWriter)

LayerBase<T>.Deserialize(BinaryReader)

LayerBase<T>.GetDiagnostics()

LayerBase<T>.ApplyActivationToGraph(ComputationNode<T>)

LayerBase<T>.CanActivationBeJitted()

LayerBase<T>.RegisterTrainableParameter(Tensor<T>, PersistentTensorRole)

LayerBase<T>.InvalidateTrainableParameter(Tensor<T>)

LayerBase<T>.HasGpuActivation()

LayerBase<T>.ApplyActivationForwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.ApplyActivationBackwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.GetFusedActivationType()

LayerBase<T>.ApplyGpuActivation(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int, FusedActivationType)

LayerBase<T>.ApplyGpuActivationBackward(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int, FusedActivationType, float)

LayerBase<T>.Dispose()

LayerBase<T>.Dispose(bool)

LayerBase<T>.WeightParameterName

LayerBase<T>.BiasParameterName

LayerBase<T>.SetWeights(Tensor<T>)

LayerBase<T>.SetBiases(Tensor<T>)

LayerBase<T>.GetParameterNames()

LayerBase<T>.TryGetParameter(string, out Tensor<T>)

LayerBase<T>.SetParameter(string, Tensor<T>)

LayerBase<T>.GetParameterShape(string)

LayerBase<T>.NamedParameterCount

LayerBase<T>.ValidateWeights(IEnumerable<string>, Func<string, string>)

LayerBase<T>.LoadWeights(Dictionary<string, Tensor<T>>, Func<string, string>, bool)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

Group Normalization divides channels into groups and normalizes the features within each group. This makes it invariant to batch size, making it suitable for small batch sizes or applications where batch statistics are not reliable (like VAEs and generative models).

For Beginners: This layer helps stabilize training for convolutional networks.

Think of Group Normalization like organizing students into study groups:

Each group (of channels) studies together and normalizes their behavior
It works the same regardless of class size (batch size)
This is especially useful for generative models like VAEs where batch sizes may be small

Key advantages:

Works well with small batch sizes (even batch size of 1)
More stable than Batch Normalization for generative models
Used extensively in modern architectures like Stable Diffusion VAE

Typical usage:

numGroups=32 for 256+ channels
numGroups=16 for 128 channels
numGroups=8 for 64 channels

Constructors

GroupNormalizationLayer(int, int, double)

public GroupNormalizationLayer(int numGroups, int numChannels, double epsilon = 1E-05)

Parameters

numGroups int
numChannels int
epsilon double

Properties

NumChannels

public int NumChannels { get; }

Property Value

int

NumGroups

public int NumGroups { get; }

Property Value

int

SupportsGpuExecution

Gets a value indicating whether this layer supports GPU execution.

protected override bool SupportsGpuExecution { get; }

Property Value

bool

SupportsGpuTraining

Gets a value indicating whether this layer supports GPU-resident training.

public override bool SupportsGpuTraining { get; }

Property Value

bool

SupportsJitCompilation

Gets a value indicating whether this layer supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool

SupportsTraining

Gets a value indicating whether this layer supports training.

public override bool SupportsTraining { get; }

Property Value

bool: true if the layer has trainable parameters and supports backpropagation; otherwise, false.

Remarks

This property indicates whether the layer can be trained through backpropagation. Layers with trainable parameters such as weights and biases typically return true, while layers that only perform fixed transformations (like pooling or activation layers) typically return false.

For Beginners: This property tells you if the layer can learn from data.

A value of true means:

The layer has parameters that can be adjusted during training
It will improve its performance as it sees more data
It participates in the learning process

A value of false means:

The layer doesn't have any adjustable parameters
It performs the same operation regardless of training
It doesn't need to learn (but may still be useful)

Methods

Backward(Tensor<T>)

Performs the backward pass of the layer.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>: The gradient of the loss with respect to the layer's output.

Returns

Tensor<T>: The gradient of the loss with respect to the layer's input.

Remarks

This abstract method must be implemented by derived classes to define the backward pass of the layer. The backward pass propagates error gradients from the output of the layer back to its input, and calculates gradients for any trainable parameters.

For Beginners: This method is used during training to calculate how the layer's input should change to reduce errors.

During the backward pass:

The layer receives information about how its output contributed to errors
It calculates how its parameters should change to reduce errors
It calculates how its input should change, which will be used by earlier layers

This is the core of how neural networks learn from their mistakes during training.

BackwardGpu(IGpuTensor<T>)

GPU-resident backward pass for group normalization layer. Computes gradients for gamma and beta parameters.

public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)

Parameters

outputGradient IGpuTensor<T>: The gradient of the loss with respect to the layer's output (GPU tensor).

Returns

IGpuTensor<T>: The gradient of the loss with respect to the layer's input (GPU tensor).

ExportComputationGraph(List<ComputationNode<T>>)

Exports the computation graph for JIT compilation.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>: List to populate with input computation nodes.

Returns

ComputationNode<T>: The output computation node representing the GroupNormalization operation.

Remarks

This method builds a computation graph representing the GroupNormalization layer. The graph divides channels into groups and normalizes within each group, then applies learned scale (gamma) and shift (beta) parameters per channel.

Forward(Tensor<T>)

Performs the forward pass of the layer.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>: The input tensor to process.

Returns

Tensor<T>: The output tensor after processing.

Remarks

This abstract method must be implemented by derived classes to define the forward pass of the layer. The forward pass transforms the input tensor according to the layer's operation and activation function.

For Beginners: This method processes your data through the layer.

The forward pass:

Takes input data from the previous layer or the network input
Applies the layer's specific transformation (like convolution or matrix multiplication)
Applies any activation function
Passes the result to the next layer

This is where the actual data processing happens during both training and prediction.

ForwardGpu(params IGpuTensor<T>[])

Performs the forward pass using GPU-resident tensors.

public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)

Parameters

inputs IGpuTensor<T>[]: The GPU-resident input tensors.

Returns

IGpuTensor<T>: A GPU-resident output tensor after group normalization.

Remarks

This method performs group normalization entirely on the GPU without downloading intermediate results to CPU. Uses native GroupNorm GPU kernel for maximum performance.

GetBeta()

public Vector<T> GetBeta()

Returns

Vector<T>

GetBetaTensor()

public Tensor<T> GetBetaTensor()

Returns

Tensor<T>

GetEpsilon()

public T GetEpsilon()

Returns

T

GetGamma()

public Vector<T> GetGamma()

Returns

Vector<T>

GetGammaTensor()

public Tensor<T> GetGammaTensor()

Returns

Tensor<T>

GetParameters()

Gets all trainable parameters of the layer as a single vector.

public override Vector<T> GetParameters()

Returns

Vector<T>: A vector containing all trainable parameters.

Remarks

This abstract method must be implemented by derived classes to provide access to all trainable parameters of the layer as a single vector. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights.

For Beginners: This method collects all the learnable values from the layer.

The parameters:

Are the numbers that the neural network learns during training
Include weights, biases, and other learnable values
Are combined into a single long list (vector)

This is useful for:

Saving the model to disk
Loading parameters from a previously trained model
Advanced optimization techniques that need access to all parameters

ResetState()

Resets the internal state of the layer.

public override void ResetState()

Remarks

This abstract method must be implemented by derived classes to reset any internal state the layer maintains between forward and backward passes. This is useful when starting to process a new sequence or when implementing stateful recurrent networks.

For Beginners: This method clears the layer's memory to start fresh.

When resetting the state:

Cached inputs and outputs are cleared
Any temporary calculations are discarded
The layer is ready to process new data without being influenced by previous data

This is important for:

Processing a new, unrelated sequence
Preventing information from one sequence affecting another
Starting a new training episode

SetParameters(Vector<T>)

Sets the trainable parameters of the layer.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>: A vector containing all parameters to set.

Remarks

This method sets all the trainable parameters of the layer from a single vector of parameters. The parameters vector must have the correct length to match the total number of parameters in the layer. By default, it simply assigns the parameters vector to the Parameters field, but derived classes may override this to handle the parameters differently.

For Beginners: This method updates all the learnable values in the layer.

When setting parameters:

The input must be a vector with the correct length
The layer parses this vector to set all its internal parameters
Throws an error if the input doesn't match the expected number of parameters

This is useful for:

Loading a previously saved model
Transferring parameters from another model
Setting specific parameter values for testing

Exceptions

ArgumentException: Thrown when the parameters vector has incorrect length.

UpdateParameters(T)

Updates the parameters of the layer using the calculated gradients.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T: The learning rate to use for the parameter updates.

Remarks

This abstract method must be implemented by derived classes to define how the layer's parameters are updated during training. The learning rate controls the size of the parameter updates.

For Beginners: This method updates the layer's internal values during training.

When updating parameters:

The weights, biases, or other parameters are adjusted to reduce prediction errors
The learning rate controls how big each update step is
Smaller learning rates mean slower but more stable learning
Larger learning rates mean faster but potentially unstable learning

This is how the layer "learns" from data over time, gradually improving its ability to extract useful patterns from inputs.

UpdateParametersGpu(IGpuOptimizerConfig)

GPU-resident parameter update using the provided optimizer configuration. Updates gamma and beta parameters directly on GPU.

public override void UpdateParametersGpu(IGpuOptimizerConfig config)

Parameters

config IGpuOptimizerConfig: GPU optimizer configuration specifying the optimizer type and hyperparameters.

Table of Contents

Class GroupNormalizationLayer<T>

Type Parameters

Remarks

Constructors

GroupNormalizationLayer(int, int, double)

Parameters

Properties

NumChannels

Property Value

NumGroups

Property Value

SupportsGpuExecution

Property Value

SupportsGpuTraining

Property Value

SupportsJitCompilation

Property Value

SupportsTraining

Property Value

Remarks

Methods

Backward(Tensor<T>)

Parameters

Returns

Remarks

BackwardGpu(IGpuTensor<T>)

Parameters

Returns

ExportComputationGraph(List<ComputationNode<T>>)

Parameters

Returns

Remarks

Forward(Tensor<T>)

Parameters

Returns

Remarks

ForwardGpu(params IGpuTensor<T>[])

Parameters

Returns

Remarks

GetBeta()

Returns

GetBetaTensor()

Returns

GetEpsilon()

Returns

GetGamma()

Returns

GetGammaTensor()

Returns

GetParameters()

Returns

Remarks

ResetState()

Remarks

SetParameters(Vector<T>)

Parameters

Remarks

Exceptions

UpdateParameters(T)

Parameters

Remarks

UpdateParametersGpu(IGpuOptimizerConfig)

Parameters