Class BatchNormalizationLayer<T>

Namespace: AiDotNet.NeuralNetworks.Layers

Assembly: AiDotNet.dll

Implements batch normalization for neural networks, which normalizes the inputs across a mini-batch.

public class BatchNormalizationLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable

Type Parameters

T: The numeric type used for computations (e.g., float, double).

Inheritance: object

LayerBase<T>

BatchNormalizationLayer<T>

Implements: ILayer<T>

IJitCompilable<T>

IDiagnosticsProvider

IWeightLoadable<T>

IDisposable

Inherited Members: LayerBase<T>.Engine

LayerBase<T>.ScalarActivation

LayerBase<T>.VectorActivation

LayerBase<T>.UsingVectorActivation

LayerBase<T>.NumOps

LayerBase<T>.Random

LayerBase<T>.Parameters

LayerBase<T>.ParameterGradients

LayerBase<T>.InputShape

LayerBase<T>.InputShapes

LayerBase<T>.UpdateInputShape(int[])

LayerBase<T>.OutputShape

LayerBase<T>.IsTrainingMode

LayerBase<T>.InitializationStrategy

LayerBase<T>.IsInitialized

LayerBase<T>.InitializationLock

LayerBase<T>.EnsureInitialized()

LayerBase<T>.UseAutodiff

LayerBase<T>.SetTrainingMode(bool)

LayerBase<T>.GetParameterGradients()

LayerBase<T>.ClearGradients()

LayerBase<T>.GetInputShape()

LayerBase<T>.GetInputShapes()

LayerBase<T>.GetOutputShape()

LayerBase<T>.GetWeights()

LayerBase<T>.GetBiases()

LayerBase<T>.MapActivationToFused()

LayerBase<T>.SupportsGpuTraining

LayerBase<T>.CanExecuteOnGpu

LayerBase<T>.CanTrainOnGpu

LayerBase<T>.UpdateParametersGpu(IGpuOptimizerConfig)

LayerBase<T>.UploadWeightsToGpu()

LayerBase<T>.DownloadWeightsFromGpu()

LayerBase<T>.ZeroGradientsGpu()

LayerBase<T>.GetActivationTypes()

LayerBase<T>.Forward(params Tensor<T>[])

LayerBase<T>.ApplyActivation(Tensor<T>)

LayerBase<T>.ApplyActivation(Vector<T>)

LayerBase<T>.ActivateTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ActivateTensor(IVectorActivationFunction<T>, Tensor<T>)

LayerBase<T>.CalculateInputShape(int, int, int)

LayerBase<T>.CalculateOutputShape(int, int, int)

LayerBase<T>.Clone()

LayerBase<T>.DerivativeTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ApplyActivationDerivative(T, T)

LayerBase<T>.ApplyActivationDerivative(Tensor<T>, Tensor<T>)

LayerBase<T>.ComputeActivationJacobian(Vector<T>)

LayerBase<T>.ApplyActivationDerivative(Vector<T>, Vector<T>)

LayerBase<T>.UpdateParameters(Vector<T>)

LayerBase<T>.Serialize(BinaryWriter)

LayerBase<T>.Deserialize(BinaryReader)

LayerBase<T>.GetDiagnostics()

LayerBase<T>.ApplyActivationToGraph(ComputationNode<T>)

LayerBase<T>.CanActivationBeJitted()

LayerBase<T>.RegisterTrainableParameter(Tensor<T>, PersistentTensorRole)

LayerBase<T>.InvalidateTrainableParameter(Tensor<T>)

LayerBase<T>.HasGpuActivation()

LayerBase<T>.ApplyActivationForwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.ApplyActivationBackwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.GetFusedActivationType()

LayerBase<T>.ApplyGpuActivation(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int, FusedActivationType)

LayerBase<T>.ApplyGpuActivationBackward(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int, FusedActivationType, float)

LayerBase<T>.Dispose()

LayerBase<T>.Dispose(bool)

LayerBase<T>.WeightParameterName

LayerBase<T>.BiasParameterName

LayerBase<T>.SetWeights(Tensor<T>)

LayerBase<T>.SetBiases(Tensor<T>)

LayerBase<T>.GetParameterNames()

LayerBase<T>.TryGetParameter(string, out Tensor<T>)

LayerBase<T>.SetParameter(string, Tensor<T>)

LayerBase<T>.GetParameterShape(string)

LayerBase<T>.NamedParameterCount

LayerBase<T>.ValidateWeights(IEnumerable<string>, Func<string, string>)

LayerBase<T>.LoadWeights(Dictionary<string, Tensor<T>>, Func<string, string>, bool)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

Batch normalization helps stabilize and accelerate training by normalizing layer inputs. It works by normalizing each feature to have zero mean and unit variance across the batch, then applying learnable scale (gamma) and shift (beta) parameters.

Benefits include: - Faster training convergence - Reduced sensitivity to weight initialization - Ability to use higher learning rates - Acts as a form of regularization

For Beginners: Batch normalization is like standardizing test scores in a classroom.

Imagine a class where each student (input) has a raw test score. Batch normalization:

Calculates the average score and how spread out the scores are
Converts each score to show how many standard deviations it is from the average
Applies adjustable scaling and shifting to the standardized scores

This helps neural networks learn more efficiently by:

Keeping input values in a consistent range
Reducing the "internal covariate shift" problem
Making the network less sensitive to poor weight initialization
Allowing higher learning rates without divergence

In practice, this means your network will typically train faster and perform better.

Constructors

BatchNormalizationLayer(int, double, double)

public BatchNormalizationLayer(int numFeatures, double epsilon = 1E-05, double momentum = 0.9)

Parameters

numFeatures int
epsilon double
momentum double

Properties

ParameterCount

Gets all trainable parameters of the batch normalization layer.

public override int ParameterCount { get; }

Property Value

int: A vector containing all trainable parameters (gamma and beta) concatenated together.

Remarks

This method returns a single vector containing all trainable parameters of the layer: - First half: gamma (scale) parameters - Second half: beta (shift) parameters

This is useful for optimization algorithms that need access to all parameters at once, or for saving/loading model weights.

For Beginners: This method returns all the learnable parameters as a single vector.

Batch normalization has two sets of learnable parameters:

Gamma (scale): Controls how much to stretch or compress the normalized data
Beta (shift): Controls how much to move the normalized data up or down

This method combines both sets into a single vector, with gamma values first, followed by beta values. For example, with 3 features:

[gamma1, gamma2, gamma3, beta1, beta2, beta3]

This format is useful for:

Saving and loading models
Advanced optimization algorithms that work with all parameters at once
Regularization techniques that need to access all parameters

The total length of the returned vector is twice the number of features, since there's one gamma and one beta parameter per feature.

SupportsGpuExecution

Gets whether this layer has a GPU implementation.

protected override bool SupportsGpuExecution { get; }

Property Value

bool

SupportsJitCompilation

Gets whether this batch normalization layer supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool: True if the layer parameters and running statistics are initialized.

Remarks

This property indicates whether the layer can be JIT compiled. The layer supports JIT if: - Gamma (scale) and beta (shift) parameters are initialized - Running mean and variance statistics are initialized (from training)

For Beginners: This tells you if this layer can use JIT compilation for faster inference.

The layer can be JIT compiled if:

The layer has been initialized with learnable parameters (gamma and beta)
The model has been trained, so running statistics are available

Batch normalization during inference requires running statistics collected during training, so JIT compilation is only supported after the model has been trained at least once.

Once these conditions are met, JIT compilation can provide significant speedup (5-10x) by optimizing the normalization, scaling, and shifting operations.

SupportsTraining

Gets a value indicating whether this layer supports training.

public override bool SupportsTraining { get; }

Property Value

bool: true if the layer has trainable parameters and supports backpropagation; otherwise, false.

Remarks

This property indicates whether the layer can be trained through backpropagation. Layers with trainable parameters such as weights and biases typically return true, while layers that only perform fixed transformations (like pooling or activation layers) typically return false.

For Beginners: This property tells you if the layer can learn from data.

A value of true means:

The layer has parameters that can be adjusted during training
It will improve its performance as it sees more data
It participates in the learning process

A value of false means:

The layer doesn't have any adjustable parameters
It performs the same operation regardless of training
It doesn't need to learn (but may still be useful)

Methods

Backward(Tensor<T>)

Performs the backward pass of batch normalization.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>: The gradient of the loss with respect to the layer's output.

Returns

Tensor<T>: The gradient of the loss with respect to the layer's input.

Remarks

The backward pass computes three types of gradients: 1. Gradients for the input (to pass to previous layers) 2. Gradients for gamma (scale parameter) 3. Gradients for beta (shift parameter)

This is a complex calculation that accounts for how each input affects: - The normalized value directly - The batch mean - The batch variance

The implementation follows the chain rule of calculus to properly backpropagate through all operations in the forward pass.

For Beginners: This method calculates how the error gradients flow backward through this layer.

During backpropagation, this method:

Checks that Forward() was called first
Creates tensors to hold the gradients for inputs and parameters
Calculates the inverse standard deviation (1/sqrt(variance + epsilon))
For each feature:
- Sums the output gradients across the batch
- Sums the product of output gradients and normalized values
- Calculates gradients for gamma and beta parameters
- Calculates gradients for each input value

The calculation is complex because in batch normalization, each input affects:

Its own normalized value directly
The mean of the batch (which affects all normalized values)
The variance of the batch (which affects all normalized values)

The formula accounts for all these dependencies using the chain rule of calculus.

This method stores the gradients for gamma and beta to use during parameter updates, and returns the gradient for the input to pass to previous layers.

Exceptions

InvalidOperationException: Thrown when backward is called before forward.

BackwardGpu(IGpuTensor<T>)

Performs GPU-resident backward pass for the batch normalization layer. Computes gradients for input, gamma, and beta entirely on GPU.

public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)

Parameters

outputGradient IGpuTensor<T>: GPU-resident gradient from the next layer.

Returns

IGpuTensor<T>: GPU-resident gradient to pass to the previous layer.

Exceptions

InvalidOperationException: Thrown if ForwardGpu was not called first.

ExportComputationGraph(List<ComputationNode<T>>)

Exports the batch normalization layer as a computation graph for JIT compilation.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>: List to which the input node will be added.

Returns

ComputationNode<T>: The output computation node representing the batch normalization operation.

Remarks

This method creates a symbolic computation graph for JIT compilation: 1. Creates a symbolic input node with shape [batch=1, features] 2. Creates constant nodes for gamma (scale) and beta (shift) parameters 3. Uses running statistics (mean and variance) for inference mode 4. Applies the batch normalization operation: gamma * ((x - mean) / sqrt(variance + epsilon)) + beta

For Beginners: This method builds a symbolic representation of batch normalization for JIT.

JIT compilation converts the batch normalization operation into optimized native code. During inference (prediction), batch normalization uses:

Running mean and variance collected during training (not batch statistics)
Learned scale (gamma) and shift (beta) parameters

The symbolic graph allows the JIT compiler to:

Optimize the normalization formula: (x - mean) / sqrt(variance + epsilon)
Fuse the scale and shift operations: result * gamma + beta
Generate SIMD-optimized code for better performance

This typically provides 5-10x speedup compared to interpreted execution.

Exceptions

ArgumentNullException: Thrown when inputNodes is null.
InvalidOperationException: Thrown when layer shape or parameters are not initialized.

Forward(Tensor<T>)

Performs the forward pass of batch normalization.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>: The input tensor with shape [batchSize, featureSize].

Returns

Tensor<T>: The normalized, scaled, and shifted output tensor.

Remarks

The forward pass performs these steps: 1. If in training mode: - Compute mean and variance of the current batch - Update running statistics for inference - Normalize using batch statistics 2. If in inference mode: - Normalize using running statistics collected during training 3. Apply scale (gamma) and shift (beta) parameters

The normalization formula is: y = gamma * ((x - mean) / sqrt(variance + epsilon)) + beta

For Beginners: This method normalizes the input data and applies learned scaling and shifting.

During the forward pass, this method:

Saves the input for later use in backpropagation
If in training mode:
- Calculates the mean and variance of each feature across the batch
- Updates the running statistics for use during inference
- Normalizes the data using the batch statistics
If in inference/testing mode:
- Uses the running statistics collected during training
Applies the learned scale (gamma) and shift (beta) parameters

The normalization makes each feature have approximately zero mean and unit variance, while the scale and shift parameters allow the network to learn the optimal distribution for each feature.

ForwardGpu(params IGpuTensor<T>[])

Performs GPU-resident batch normalization forward pass.

public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)

Parameters

inputs IGpuTensor<T>[]

Returns

IGpuTensor<T>: GPU-resident output tensor with same shape as input.

Remarks

This method performs batch normalization entirely on GPU, avoiding CPU round-trips. The input and output tensors remain GPU-resident for chained GPU operations.

During training mode, running statistics (mean and variance) are updated on GPU and then downloaded back to CPU for persistence.

Exceptions

InvalidOperationException: Thrown when GPU engine is not available.

GetBeta()

Gets the beta (shift) parameters of the batch normalization layer.

public Tensor<T> GetBeta()

Returns

Tensor<T>: The beta tensor used for shifting scaled values.

GetEpsilon()

Gets the epsilon value used for numerical stability.

public T GetEpsilon()

Returns

T: The epsilon value.

GetGamma()

Gets a value indicating whether this layer supports training mode.

public Tensor<T> GetGamma()

Returns

Tensor<T>: The gamma tensor used for scaling normalized values.

Remarks

Batch normalization behaves differently during training versus inference: - During training: Uses statistics from the current batch - During inference: Uses running statistics collected during training

This property always returns true because the layer needs to track its training state.

For Beginners: This tells the network that this layer behaves differently during training versus testing.

During training, batch normalization uses statistics (mean and variance) calculated from the current batch of data. During testing or inference, it uses the average statistics collected during training.

This property being true means:

The layer needs to know whether it's in training or inference mode
The layer has parameters that can be updated during training
The layer's behavior will change depending on the mode

This is important because it affects how the network processes data and how the layer's internal statistics are updated.

GetMomentum()

Gets the momentum value for running statistics.

public T GetMomentum()

Returns

T: The momentum value.

GetParameters()

Gets all trainable parameters of the layer as a single vector.

public override Vector<T> GetParameters()

Returns

Vector<T>: A vector containing all trainable parameters.

Remarks

This abstract method must be implemented by derived classes to provide access to all trainable parameters of the layer as a single vector. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights.

For Beginners: This method collects all the learnable values from the layer.

The parameters:

Are the numbers that the neural network learns during training
Include weights, biases, and other learnable values
Are combined into a single long list (vector)

This is useful for:

Saving the model to disk
Loading parameters from a previously trained model
Advanced optimization techniques that need access to all parameters

GetRunningMean()

Gets the running mean of the batch normalization layer.

public Tensor<T> GetRunningMean()

Returns

Tensor<T>: The running mean tensor used during inference.

GetRunningVariance()

Gets the running variance of the batch normalization layer.

public Tensor<T> GetRunningVariance()

Returns

Tensor<T>: The running variance tensor used during inference.

ResetState()

Resets the internal state of the batch normalization layer.

public override void ResetState()

Remarks

This method clears all cached values from the forward and backward passes, including: - Last input tensor - Last normalized values - Last batch mean and variance - Gradients for gamma and beta parameters

It does NOT reset the learned parameters (gamma and beta) or the running statistics (running mean and variance) used for inference.

This is typically called when starting a new training epoch or when switching between training and inference modes.

For Beginners: This method clears the layer's memory of previous calculations.

During training, the batch normalization layer keeps track of:

The last input it processed
The normalized values it calculated
The mean and variance of the last batch
The gradients for its parameters

This method clears all of these temporary values, which is useful when:

Starting a new training epoch
Switching between training and testing modes
Ensuring the layer behaves deterministically

Important: This does NOT reset the learned parameters (gamma and beta) or the running statistics (running mean and variance) that are used during inference. It only clears temporary calculation values.

Think of it as clearing the layer's short-term memory while preserving its long-term learning.

SetParameters(Vector<T>)

Sets all trainable parameters of the batch normalization layer.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>: A vector containing all parameters (gamma and beta) concatenated together.

Remarks

This method expects a single vector containing all trainable parameters: - First half: gamma (scale) parameters - Second half: beta (shift) parameters

The length of the parameters vector must be exactly twice the feature size. This method is useful for loading pre-trained weights or setting parameters after optimization.

For Beginners: This method loads parameters into the layer from a single vector.

This is the counterpart to GetParameters() - it takes a vector containing all parameters and sets them in the layer. The vector must have the format:

[gamma1, gamma2, ..., gammaN, beta1, beta2, ..., betaN]

Where N is the number of features. The total length must be exactly 2*N.

This method is commonly used for:

Loading pre-trained models
Setting parameters after external optimization
Implementing transfer learning
Testing different parameter configurations

If the vector doesn't have the expected length, the method will throw an exception to prevent incorrect parameter assignments.

Exceptions

ArgumentException: Thrown when the parameters vector has incorrect length.

UpdateParameters(T)

Updates the layer's parameters using the computed gradients.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T: The learning rate for parameter updates.

Remarks

This method updates the gamma (scale) and beta (shift) parameters using gradient descent: - gamma = gamma - learningRate * gammaGradient - beta = beta - learningRate * betaGradient

The gradients are computed during the backward pass and represent how much each parameter should change to reduce the loss function.

For Beginners: This method updates the layer's learnable parameters during training.

After the backward pass calculates how each parameter affects the error, this method adjusts those parameters to reduce the error:

It checks that the backward pass has been called first
It updates the gamma (scale) parameters: gamma = gamma - learningRate * gammaGradient
It updates the beta (shift) parameters: beta = beta - learningRate * betaGradient

The learning rate controls how big the updates are:

A larger learning rate means bigger changes (faster learning but potentially unstable)
A smaller learning rate means smaller changes (slower but more stable learning)

For example, if a particular gamma value is causing high error, its gradient will be large, and this method will adjust that parameter more significantly to reduce the error in the next forward pass.

This is the step where actual "learning" happens in the neural network.

Exceptions

InvalidOperationException: Thrown when update is called before backward.

ZeroInitGamma()

Initializes gamma (scale) parameters to zero.

public void ZeroInitGamma()

Remarks

This is used for zero-init residual in ResNet, where the last BatchNorm in each residual block has gamma initialized to zero. This makes the residual blocks start as identity mappings, which can improve training.

Table of Contents

Class BatchNormalizationLayer<T>

Type Parameters

Remarks

Constructors

BatchNormalizationLayer(int, double, double)

Parameters

Properties

ParameterCount

Property Value

Remarks

SupportsGpuExecution

Property Value

SupportsJitCompilation

Property Value

Remarks

SupportsTraining

Property Value

Remarks

Methods

Backward(Tensor<T>)

Parameters

Returns

Remarks

Exceptions

BackwardGpu(IGpuTensor<T>)

Parameters

Returns

Exceptions

ExportComputationGraph(List<ComputationNode<T>>)

Parameters

Returns

Remarks

Exceptions

Forward(Tensor<T>)

Parameters

Returns

Remarks

ForwardGpu(params IGpuTensor<T>[])

Parameters

Returns

Remarks

Exceptions

GetBeta()

Returns

GetEpsilon()

Returns

GetGamma()

Returns

Remarks

GetMomentum()

Returns

GetParameters()

Returns

Remarks

GetRunningMean()

Returns

GetRunningVariance()

Returns

ResetState()

Remarks

SetParameters(Vector<T>)

Parameters

Remarks

Exceptions

UpdateParameters(T)

Parameters

Remarks

Exceptions

ZeroInitGamma()

Remarks