Class MeanLayer<T>

Namespace: AiDotNet.NeuralNetworks.Layers

Assembly: AiDotNet.dll

Represents a layer that computes the mean (average) of input values along a specified axis.

public class MeanLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable

Type Parameters

T: The numeric type used for calculations, typically float or double.

Inheritance: object

LayerBase<T>

MeanLayer<T>

Implements: ILayer<T>

IJitCompilable<T>

IDiagnosticsProvider

IWeightLoadable<T>

IDisposable

Inherited Members: LayerBase<T>.Engine

LayerBase<T>.ScalarActivation

LayerBase<T>.VectorActivation

LayerBase<T>.UsingVectorActivation

LayerBase<T>.NumOps

LayerBase<T>.Random

LayerBase<T>.Parameters

LayerBase<T>.ParameterGradients

LayerBase<T>.InputShape

LayerBase<T>.InputShapes

LayerBase<T>.UpdateInputShape(int[])

LayerBase<T>.OutputShape

LayerBase<T>.IsTrainingMode

LayerBase<T>.InitializationStrategy

LayerBase<T>.IsInitialized

LayerBase<T>.InitializationLock

LayerBase<T>.EnsureInitialized()

LayerBase<T>.UseAutodiff

LayerBase<T>.SetTrainingMode(bool)

LayerBase<T>.GetParameterGradients()

LayerBase<T>.ClearGradients()

LayerBase<T>.GetInputShape()

LayerBase<T>.GetInputShapes()

LayerBase<T>.GetOutputShape()

LayerBase<T>.GetWeights()

LayerBase<T>.GetBiases()

LayerBase<T>.MapActivationToFused()

LayerBase<T>.CanExecuteOnGpu

LayerBase<T>.CanTrainOnGpu

LayerBase<T>.UpdateParametersGpu(IGpuOptimizerConfig)

LayerBase<T>.UploadWeightsToGpu()

LayerBase<T>.DownloadWeightsFromGpu()

LayerBase<T>.ZeroGradientsGpu()

LayerBase<T>.GetActivationTypes()

LayerBase<T>.Forward(params Tensor<T>[])

LayerBase<T>.ApplyActivation(Tensor<T>)

LayerBase<T>.ApplyActivation(Vector<T>)

LayerBase<T>.ActivateTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ActivateTensor(IVectorActivationFunction<T>, Tensor<T>)

LayerBase<T>.CalculateInputShape(int, int, int)

LayerBase<T>.CalculateOutputShape(int, int, int)

LayerBase<T>.Clone()

LayerBase<T>.DerivativeTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ApplyActivationDerivative(T, T)

LayerBase<T>.ApplyActivationDerivative(Tensor<T>, Tensor<T>)

LayerBase<T>.ComputeActivationJacobian(Vector<T>)

LayerBase<T>.ApplyActivationDerivative(Vector<T>, Vector<T>)

LayerBase<T>.UpdateParameters(Vector<T>)

LayerBase<T>.ParameterCount

LayerBase<T>.Serialize(BinaryWriter)

LayerBase<T>.Deserialize(BinaryReader)

LayerBase<T>.SetParameters(Vector<T>)

LayerBase<T>.GetDiagnostics()

LayerBase<T>.ApplyActivationToGraph(ComputationNode<T>)

LayerBase<T>.CanActivationBeJitted()

LayerBase<T>.RegisterTrainableParameter(Tensor<T>, PersistentTensorRole)

LayerBase<T>.InvalidateTrainableParameter(Tensor<T>)

LayerBase<T>.HasGpuActivation()

LayerBase<T>.ApplyActivationForwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.ApplyActivationBackwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.GetFusedActivationType()

LayerBase<T>.ApplyGpuActivation(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int, FusedActivationType)

LayerBase<T>.ApplyGpuActivationBackward(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int, FusedActivationType, float)

LayerBase<T>.Dispose()

LayerBase<T>.Dispose(bool)

LayerBase<T>.WeightParameterName

LayerBase<T>.BiasParameterName

LayerBase<T>.SetWeights(Tensor<T>)

LayerBase<T>.SetBiases(Tensor<T>)

LayerBase<T>.GetParameterNames()

LayerBase<T>.TryGetParameter(string, out Tensor<T>)

LayerBase<T>.SetParameter(string, Tensor<T>)

LayerBase<T>.GetParameterShape(string)

LayerBase<T>.NamedParameterCount

LayerBase<T>.ValidateWeights(IEnumerable<string>, Func<string, string>)

LayerBase<T>.LoadWeights(Dictionary<string, Tensor<T>>, Func<string, string>, bool)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

The MeanLayer reduces the dimensionality of data by taking the average of values along a specified axis. This operation is useful for aggregating feature information or reducing sequence data to a fixed-size representation. The output shape has one fewer dimension than the input shape, with the specified axis being removed.

For Beginners: This layer calculates the average of values in your data along one direction.

Think of it like calculating the average test score for each student across multiple subjects:

Input: A table of scores where rows are students and columns are subjects
MeanLayer with axis=1 (columns): Gives each student's average score across all subjects

Some practical examples:

In image processing: Taking the average across color channels
In text analysis: Taking the average of word embeddings to get a sentence representation
In time series: Taking the average across time steps to get a summary

For instance, if you have data with shape [10, 5, 20] (e.g., 10 batches, 5 time steps, 20 features), a MeanLayer with axis=1 would output shape [10, 20], giving you the average across all time steps.

Constructors

MeanLayer(int[], int)

Initializes a new instance of the MeanLayer<T> class with the specified input shape and axis.

public MeanLayer(int[] inputShape, int axis)

Parameters

inputShape int[]: The shape of the input tensor.
axis int: The axis along which to compute the mean.

Remarks

This constructor creates a MeanLayer that computes the mean along the specified axis. The output shape is calculated by removing the specified axis from the input shape.

For Beginners: This constructor sets up the layer with the necessary information.

When creating a MeanLayer, you need to specify:

inputShape: The shape of your data (e.g., [32, 10, 128] for 32 samples, 10 time steps, 128 features)
axis: Which dimension to average over (e.g., 1 to average over the 10 time steps)

The constructor automatically calculates what shape your data will have after averaging. For example, with inputShape=[32, 10, 128] and axis=1, the output shape would be [32, 128].

Properties

Axis

Gets the axis along which the mean is calculated.

public int Axis { get; }

Property Value

int: The index of the axis for mean calculation.

Remarks

This property indicates which dimension of the input tensor will be averaged and removed in the output. For example, with a 3D input tensor, axis=0 would average across batches, axis=1 would average across the second dimension (often time steps or rows), and axis=2 would average across the third dimension (often features or columns).

For Beginners: The axis tells the layer which direction to calculate averages in.

Think of your data as a multi-dimensional array:

axis=0: First dimension (often batch samples)
axis=1: Second dimension (often rows or time steps)
axis=2: Third dimension (often columns or features)

For example, with image data shaped as [batch, height, width, channels]:

axis=1 would average across the height dimension
axis=3 would average across the channels dimension

The axis you choose determines what kind of summary you get from your data.

SupportsGpuExecution

Gets a value indicating whether this layer supports GPU execution.

protected override bool SupportsGpuExecution { get; }

Property Value

bool

SupportsGpuTraining

Gets whether this layer has full GPU training support (forward, backward, and parameter updates).

public override bool SupportsGpuTraining { get; }

Property Value

bool

Remarks

This property indicates whether the layer can perform its entire training cycle on GPU without downloading data to CPU. A layer has full GPU training support when:

ForwardGpu is implemented
BackwardGpu is implemented
UpdateParametersGpu is implemented (for layers with trainable parameters)
GPU weight/bias/gradient buffers are properly managed

For Beginners: This tells you if training can happen entirely on GPU.

GPU-resident training is much faster because:

Data stays on GPU between forward and backward passes
No expensive CPU-GPU transfers during each training step
GPU kernels handle all gradient computation

Only layers that return true here can participate in fully GPU-resident training.

SupportsJitCompilation

Gets whether this layer supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool: True if the layer can be JIT compiled, false otherwise.

Remarks

This property indicates whether the layer has implemented ExportComputationGraph() and can benefit from JIT compilation. All layers MUST implement this property.

For Beginners: JIT compilation can make inference 5-10x faster by converting the layer's operations into optimized native code.

Layers should return false if they:

Have not yet implemented a working ExportComputationGraph()
Use dynamic operations that change based on input data
Are too simple to benefit from JIT compilation

When false, the layer will use the standard Forward() method instead.

SupportsTraining

Gets a value indicating whether this layer supports training.

public override bool SupportsTraining { get; }

Property Value

bool: Always false because the MeanLayer has no trainable parameters.

Remarks

This property indicates that MeanLayer cannot be trained through backpropagation. Since the mean operation is a fixed mathematical procedure with no learnable parameters, this layer always returns false for SupportsTraining.

For Beginners: This property tells you that this layer doesn't learn from data.

A value of false means:

The layer has no internal values that change during training
It always performs the same mathematical operation (averaging)
It's a fixed transformation rather than a learned one

Many layers in neural networks learn patterns from data (like Convolutional or Dense layers), but some layers, like MeanLayer, simply apply a fixed mathematical operation.

Methods

Backward(Tensor<T>)

Performs the backward pass of the mean layer.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>: The gradient of the loss with respect to the layer's output.

Returns

Tensor<T>: The gradient of the loss with respect to the layer's input.

Remarks

This method implements the backward pass of the mean layer, which is used during training to propagate error gradients back through the network. Since the mean operation averages multiple input values to produce each output value, during backpropagation, the gradient for each output value is distributed equally among all corresponding input values.

For Beginners: This method is used during training to calculate how the layer's input should change to reduce errors.

During the backward pass:

The layer receives the error gradient from the next layer
It needs to distribute this gradient back to its inputs
For a mean operation, each input that contributed to an average receives an equal portion of the gradient

For example: If 5 values were averaged to produce one output, and that output's gradient is 10, each of the 5 input values would receive a gradient of 10/5 = 2.

This process is part of the "backpropagation" algorithm that helps neural networks learn.

Exceptions

InvalidOperationException: Thrown when backward is called before forward.

BackwardGpu(IGpuTensor<T>)

Performs the backward pass of the layer on GPU.

public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)

Parameters

outputGradient IGpuTensor<T>: The GPU-resident gradient of the loss with respect to the layer's output.

Returns

IGpuTensor<T>: The GPU-resident gradient of the loss with respect to the layer's input.

Remarks

This method performs the layer's backward computation entirely on GPU, including:

Computing input gradients to pass to previous layers
Computing and storing weight gradients on GPU (for layers with trainable parameters)
Computing and storing bias gradients on GPU

For Beginners: This is like Backward() but runs entirely on GPU.

During GPU training:

Output gradients come in (on GPU)
Input gradients are computed (stay on GPU)
Weight/bias gradients are computed and stored (on GPU)
Input gradients are returned for the previous layer

All data stays on GPU - no CPU round-trips needed!

Exceptions

NotSupportedException: Thrown when the layer does not support GPU training.

ExportComputationGraph(List<ComputationNode<T>>)

Exports the layer's computation graph for JIT compilation.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>: List to populate with input computation nodes.

Returns

ComputationNode<T>: The output computation node representing the layer's operation.

Remarks

This method constructs a computation graph representation of the layer's forward pass that can be JIT compiled for faster inference. All layers MUST implement this method to support JIT compilation.

For Beginners: JIT (Just-In-Time) compilation converts the layer's operations into optimized native code for 5-10x faster inference.

To support JIT compilation, a layer must:

Implement this method to export its computation graph
Set SupportsJitCompilation to true
Use ComputationNode and TensorOperations to build the graph

All layers are required to implement this method, even if they set SupportsJitCompilation = false.

Forward(Tensor<T>)

Performs the forward pass of the mean layer.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>: The input tensor to process.

Returns

Tensor<T>: The output tensor after mean calculation.

Remarks

This method implements the forward pass of the mean layer. It computes the mean of the input tensor along the specified axis and returns a tensor with one fewer dimension. The input and output tensors are cached for use during the backward pass.

For Beginners: This method performs the actual averaging operation on your data.

During the forward pass:

The layer receives input data
It calculates the average along the specified axis
It returns the averaged result with one fewer dimension
It also saves both the input and output for later use during training

The averaging works by:

Creating an output tensor with the correct shape
For each position in the output, averaging all corresponding values in the input
Storing this average in the output tensor

For example, with a 2D array like [[1,2,3], [4,5,6]] and axis=0, the result would be [2.5, 3.5, 4.5] (average of each column).

ForwardGpu(params IGpuTensor<T>[])

Performs GPU-accelerated forward pass for mean reduction.

public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)

Parameters

inputs IGpuTensor<T>[]: Input GPU tensors (uses first input).

Returns

IGpuTensor<T>: GPU-resident output tensor with mean values.

GetParameters()

Gets all trainable parameters from the mean layer as a single vector.

public override Vector<T> GetParameters()

Returns

Vector<T>: An empty vector since MeanLayer has no trainable parameters.

Remarks

This method retrieves all trainable parameters from the layer as a single vector. Since MeanLayer has no trainable parameters, it returns an empty vector.

For Beginners: This method returns all the learnable values in the layer.

Since MeanLayer:

Only performs fixed mathematical operations (averaging)
Has no weights, biases, or other learnable parameters
The method returns an empty list

This is different from layers like Dense layers, which would return their weights and biases.

ResetState()

Resets the internal state of the mean layer.

public override void ResetState()

Remarks

This method resets the internal state of the mean layer, including the cached inputs and outputs. This is useful when starting to process a new sequence or batch.

For Beginners: This method clears the layer's memory to start fresh.

When resetting the state:

Stored inputs and outputs from previous processing are cleared
The layer forgets any information from previous data batches

This is important for:

Processing a new, unrelated batch of data
Ensuring clean state before a new training epoch
Preventing information from one batch affecting another

While the MeanLayer doesn't maintain long-term state across samples (unlike recurrent layers), clearing these cached values helps with memory management and ensuring a clean processing pipeline.

UpdateParameters(T)

Updates the parameters of the mean layer using the calculated gradients.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T: The learning rate to use for the parameter updates.

Remarks

This method is part of the training process, but since MeanLayer has no trainable parameters, this method does nothing.

For Beginners: This method would normally update a layer's internal values during training.

However, since MeanLayer just performs a fixed mathematical operation (averaging) and doesn't have any internal values that can be learned or adjusted, this method is empty.

This is unlike layers such as Dense or Convolutional layers, which have weights and biases that get updated during training.

Table of Contents

Class MeanLayer<T>

Type Parameters

Remarks

Constructors

MeanLayer(int[], int)

Parameters

Remarks

Properties

Axis

Property Value

Remarks

SupportsGpuExecution

Property Value

SupportsGpuTraining

Property Value

Remarks

SupportsJitCompilation

Property Value

Remarks

SupportsTraining

Property Value

Remarks

Methods

Backward(Tensor<T>)

Parameters

Returns

Remarks

Exceptions

BackwardGpu(IGpuTensor<T>)

Parameters

Returns

Remarks

Exceptions

ExportComputationGraph(List<ComputationNode<T>>)

Parameters

Returns

Remarks

Forward(Tensor<T>)

Parameters

Returns

Remarks

ForwardGpu(params IGpuTensor<T>[])

Parameters

Returns

GetParameters()

Returns

Remarks

ResetState()

Remarks

UpdateParameters(T)

Parameters

Remarks