Class MeanLayer<T>
- Namespace
- AiDotNet.NeuralNetworks.Layers
- Assembly
- AiDotNet.dll
Represents a layer that computes the mean (average) of input values along a specified axis.
public class MeanLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
LayerBase<T>MeanLayer<T>
- Implements
-
ILayer<T>
- Inherited Members
Remarks
The MeanLayer reduces the dimensionality of data by taking the average of values along a specified axis. This operation is useful for aggregating feature information or reducing sequence data to a fixed-size representation. The output shape has one fewer dimension than the input shape, with the specified axis being removed.
For Beginners: This layer calculates the average of values in your data along one direction.
Think of it like calculating the average test score for each student across multiple subjects:
- Input: A table of scores where rows are students and columns are subjects
- MeanLayer with axis=1 (columns): Gives each student's average score across all subjects
Some practical examples:
- In image processing: Taking the average across color channels
- In text analysis: Taking the average of word embeddings to get a sentence representation
- In time series: Taking the average across time steps to get a summary
For instance, if you have data with shape [10, 5, 20] (e.g., 10 batches, 5 time steps, 20 features), a MeanLayer with axis=1 would output shape [10, 20], giving you the average across all time steps.
Constructors
MeanLayer(int[], int)
Initializes a new instance of the MeanLayer<T> class with the specified input shape and axis.
public MeanLayer(int[] inputShape, int axis)
Parameters
Remarks
This constructor creates a MeanLayer that computes the mean along the specified axis. The output shape is calculated by removing the specified axis from the input shape.
For Beginners: This constructor sets up the layer with the necessary information.
When creating a MeanLayer, you need to specify:
- inputShape: The shape of your data (e.g., [32, 10, 128] for 32 samples, 10 time steps, 128 features)
- axis: Which dimension to average over (e.g., 1 to average over the 10 time steps)
The constructor automatically calculates what shape your data will have after averaging. For example, with inputShape=[32, 10, 128] and axis=1, the output shape would be [32, 128].
Properties
Axis
Gets the axis along which the mean is calculated.
public int Axis { get; }
Property Value
- int
The index of the axis for mean calculation.
Remarks
This property indicates which dimension of the input tensor will be averaged and removed in the output. For example, with a 3D input tensor, axis=0 would average across batches, axis=1 would average across the second dimension (often time steps or rows), and axis=2 would average across the third dimension (often features or columns).
For Beginners: The axis tells the layer which direction to calculate averages in.
Think of your data as a multi-dimensional array:
- axis=0: First dimension (often batch samples)
- axis=1: Second dimension (often rows or time steps)
- axis=2: Third dimension (often columns or features)
For example, with image data shaped as [batch, height, width, channels]:
- axis=1 would average across the height dimension
- axis=3 would average across the channels dimension
The axis you choose determines what kind of summary you get from your data.
SupportsGpuExecution
Gets a value indicating whether this layer supports GPU execution.
protected override bool SupportsGpuExecution { get; }
Property Value
SupportsGpuTraining
Gets whether this layer has full GPU training support (forward, backward, and parameter updates).
public override bool SupportsGpuTraining { get; }
Property Value
Remarks
This property indicates whether the layer can perform its entire training cycle on GPU without downloading data to CPU. A layer has full GPU training support when:
- ForwardGpu is implemented
- BackwardGpu is implemented
- UpdateParametersGpu is implemented (for layers with trainable parameters)
- GPU weight/bias/gradient buffers are properly managed
For Beginners: This tells you if training can happen entirely on GPU.
GPU-resident training is much faster because:
- Data stays on GPU between forward and backward passes
- No expensive CPU-GPU transfers during each training step
- GPU kernels handle all gradient computation
Only layers that return true here can participate in fully GPU-resident training.
SupportsJitCompilation
Gets whether this layer supports JIT compilation.
public override bool SupportsJitCompilation { get; }
Property Value
- bool
True if the layer can be JIT compiled, false otherwise.
Remarks
This property indicates whether the layer has implemented ExportComputationGraph() and can benefit from JIT compilation. All layers MUST implement this property.
For Beginners: JIT compilation can make inference 5-10x faster by converting the layer's operations into optimized native code.
Layers should return false if they:
- Have not yet implemented a working ExportComputationGraph()
- Use dynamic operations that change based on input data
- Are too simple to benefit from JIT compilation
When false, the layer will use the standard Forward() method instead.
SupportsTraining
Gets a value indicating whether this layer supports training.
public override bool SupportsTraining { get; }
Property Value
- bool
Always
falsebecause the MeanLayer has no trainable parameters.
Remarks
This property indicates that MeanLayer cannot be trained through backpropagation. Since the mean operation is a fixed mathematical procedure with no learnable parameters, this layer always returns false for SupportsTraining.
For Beginners: This property tells you that this layer doesn't learn from data.
A value of false means:
- The layer has no internal values that change during training
- It always performs the same mathematical operation (averaging)
- It's a fixed transformation rather than a learned one
Many layers in neural networks learn patterns from data (like Convolutional or Dense layers), but some layers, like MeanLayer, simply apply a fixed mathematical operation.
Methods
Backward(Tensor<T>)
Performs the backward pass of the mean layer.
public override Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>The gradient of the loss with respect to the layer's output.
Returns
- Tensor<T>
The gradient of the loss with respect to the layer's input.
Remarks
This method implements the backward pass of the mean layer, which is used during training to propagate error gradients back through the network. Since the mean operation averages multiple input values to produce each output value, during backpropagation, the gradient for each output value is distributed equally among all corresponding input values.
For Beginners: This method is used during training to calculate how the layer's input should change to reduce errors.
During the backward pass:
- The layer receives the error gradient from the next layer
- It needs to distribute this gradient back to its inputs
- For a mean operation, each input that contributed to an average receives an equal portion of the gradient
For example: If 5 values were averaged to produce one output, and that output's gradient is 10, each of the 5 input values would receive a gradient of 10/5 = 2.
This process is part of the "backpropagation" algorithm that helps neural networks learn.
Exceptions
- InvalidOperationException
Thrown when backward is called before forward.
BackwardGpu(IGpuTensor<T>)
Performs the backward pass of the layer on GPU.
public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)
Parameters
outputGradientIGpuTensor<T>The GPU-resident gradient of the loss with respect to the layer's output.
Returns
- IGpuTensor<T>
The GPU-resident gradient of the loss with respect to the layer's input.
Remarks
This method performs the layer's backward computation entirely on GPU, including:
- Computing input gradients to pass to previous layers
- Computing and storing weight gradients on GPU (for layers with trainable parameters)
- Computing and storing bias gradients on GPU
For Beginners: This is like Backward() but runs entirely on GPU.
During GPU training:
- Output gradients come in (on GPU)
- Input gradients are computed (stay on GPU)
- Weight/bias gradients are computed and stored (on GPU)
- Input gradients are returned for the previous layer
All data stays on GPU - no CPU round-trips needed!
Exceptions
- NotSupportedException
Thrown when the layer does not support GPU training.
ExportComputationGraph(List<ComputationNode<T>>)
Exports the layer's computation graph for JIT compilation.
public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)
Parameters
inputNodesList<ComputationNode<T>>List to populate with input computation nodes.
Returns
- ComputationNode<T>
The output computation node representing the layer's operation.
Remarks
This method constructs a computation graph representation of the layer's forward pass that can be JIT compiled for faster inference. All layers MUST implement this method to support JIT compilation.
For Beginners: JIT (Just-In-Time) compilation converts the layer's operations into optimized native code for 5-10x faster inference.
To support JIT compilation, a layer must:
- Implement this method to export its computation graph
- Set SupportsJitCompilation to true
- Use ComputationNode and TensorOperations to build the graph
All layers are required to implement this method, even if they set SupportsJitCompilation = false.
Forward(Tensor<T>)
Performs the forward pass of the mean layer.
public override Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>The input tensor to process.
Returns
- Tensor<T>
The output tensor after mean calculation.
Remarks
This method implements the forward pass of the mean layer. It computes the mean of the input tensor along the specified axis and returns a tensor with one fewer dimension. The input and output tensors are cached for use during the backward pass.
For Beginners: This method performs the actual averaging operation on your data.
During the forward pass:
- The layer receives input data
- It calculates the average along the specified axis
- It returns the averaged result with one fewer dimension
- It also saves both the input and output for later use during training
The averaging works by:
- Creating an output tensor with the correct shape
- For each position in the output, averaging all corresponding values in the input
- Storing this average in the output tensor
For example, with a 2D array like [[1,2,3], [4,5,6]] and axis=0, the result would be [2.5, 3.5, 4.5] (average of each column).
ForwardGpu(params IGpuTensor<T>[])
Performs GPU-accelerated forward pass for mean reduction.
public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)
Parameters
inputsIGpuTensor<T>[]Input GPU tensors (uses first input).
Returns
- IGpuTensor<T>
GPU-resident output tensor with mean values.
GetParameters()
Gets all trainable parameters from the mean layer as a single vector.
public override Vector<T> GetParameters()
Returns
- Vector<T>
An empty vector since MeanLayer has no trainable parameters.
Remarks
This method retrieves all trainable parameters from the layer as a single vector. Since MeanLayer has no trainable parameters, it returns an empty vector.
For Beginners: This method returns all the learnable values in the layer.
Since MeanLayer:
- Only performs fixed mathematical operations (averaging)
- Has no weights, biases, or other learnable parameters
- The method returns an empty list
This is different from layers like Dense layers, which would return their weights and biases.
ResetState()
Resets the internal state of the mean layer.
public override void ResetState()
Remarks
This method resets the internal state of the mean layer, including the cached inputs and outputs. This is useful when starting to process a new sequence or batch.
For Beginners: This method clears the layer's memory to start fresh.
When resetting the state:
- Stored inputs and outputs from previous processing are cleared
- The layer forgets any information from previous data batches
This is important for:
- Processing a new, unrelated batch of data
- Ensuring clean state before a new training epoch
- Preventing information from one batch affecting another
While the MeanLayer doesn't maintain long-term state across samples (unlike recurrent layers), clearing these cached values helps with memory management and ensuring a clean processing pipeline.
UpdateParameters(T)
Updates the parameters of the mean layer using the calculated gradients.
public override void UpdateParameters(T learningRate)
Parameters
learningRateTThe learning rate to use for the parameter updates.
Remarks
This method is part of the training process, but since MeanLayer has no trainable parameters, this method does nothing.
For Beginners: This method would normally update a layer's internal values during training.
However, since MeanLayer just performs a fixed mathematical operation (averaging) and doesn't have any internal values that can be learned or adjusted, this method is empty.
This is unlike layers such as Dense or Convolutional layers, which have weights and biases that get updated during training.