Table of Contents

Class LayerBase<T>

Namespace
AiDotNet.NeuralNetworks.Layers
Assembly
AiDotNet.dll

Represents the base class for all neural network layers, providing common functionality and interfaces.

public abstract class LayerBase<T> : ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable

Type Parameters

T

The numeric type used for calculations, typically float or double.

Inheritance
LayerBase<T>
Implements
Derived
Inherited Members

Remarks

LayerBase is an abstract class that serves as the foundation for all neural network layers. It defines the common structure and functionality that all layers must implement, such as forward and backward propagation, parameter management, and activation functions. This class handles the core mechanics of layers in a neural network, allowing derived classes to focus on their specific implementations.

For Beginners: This is the blueprint that all neural network layers follow.

Think of LayerBase as the common foundation that all layers are built upon:

  • It defines what every layer must be able to do (process data forward and backward)
  • It provides shared tools that all layers can use (like activation functions)
  • It manages the shapes of data flowing in and out of layers
  • It handles saving and loading layer parameters

All specific layer types (like convolutional, dense, etc.) inherit from this class, which ensures they all work together consistently in a neural network.

Constructors

LayerBase(int[], int[])

Initializes a new instance of the LayerBase<T> class with the specified input and output shapes.

protected LayerBase(int[] inputShape, int[] outputShape)

Parameters

inputShape int[]

The shape of the input tensor.

outputShape int[]

The shape of the output tensor.

Remarks

This constructor creates a new Layer with the specified input and output shapes. It initializes an empty parameter vector and sets up the single input shape.

For Beginners: This creates a new layer with the specified data shapes.

When creating a layer, you need to define:

  • The shape of data coming in (inputShape)
  • The shape of data going out (outputShape)

This helps the layer organize its operations and connect properly with other layers.

LayerBase(int[], int[], IActivationFunction<T>)

Initializes a new instance of the LayerBase<T> class with the specified shapes and element-wise activation function.

protected LayerBase(int[] inputShape, int[] outputShape, IActivationFunction<T> scalarActivation)

Parameters

inputShape int[]

The shape of the input tensor.

outputShape int[]

The shape of the output tensor.

scalarActivation IActivationFunction<T>

The element-wise activation function to apply.

Remarks

This constructor creates a new Layer with the specified input and output shapes and element-wise activation function.

For Beginners: This creates a new layer with a standard activation function.

In addition to the shapes, this also sets up:

  • A scalar activation function that processes each value independently
  • The foundation for a layer that transforms data in a specific way

For example, you might create a layer with a ReLU activation function, which turns all negative values to zero while keeping positive values.

LayerBase(int[], int[], IVectorActivationFunction<T>)

Initializes a new instance of the LayerBase<T> class with the specified shapes and vector activation function.

protected LayerBase(int[] inputShape, int[] outputShape, IVectorActivationFunction<T> vectorActivation)

Parameters

inputShape int[]

The shape of the input tensor.

outputShape int[]

The shape of the output tensor.

vectorActivation IVectorActivationFunction<T>

The vector activation function to apply.

Remarks

This constructor creates a new Layer with the specified input and output shapes and vector activation function. Vector activation functions operate on entire vectors rather than individual elements.

For Beginners: This creates a new layer with an advanced vector-based activation.

This constructor:

  • Sets up the layer's input and output shapes
  • Configures a vector activation that processes groups of values together
  • Marks the layer as using vector activation

Vector activations like Softmax are important for specific tasks like classification, where outputs need to be interpreted as probabilities.

LayerBase(int[][], int[])

Initializes a new instance of the LayerBase<T> class with multiple input shapes and a specified output shape.

protected LayerBase(int[][] inputShapes, int[] outputShape)

Parameters

inputShapes int[][]

The shapes of the input tensors.

outputShape int[]

The shape of the output tensor.

Remarks

This constructor creates a new Layer that accepts multiple inputs with different shapes. This is useful for layers that combine multiple inputs, such as concatenation or addition layers.

For Beginners: This creates a layer that can handle multiple input sources.

When creating a layer that combines different data sources:

  • You need to specify the shape of each input source
  • The layer needs to know how to handle multiple inputs
  • The output shape defines what comes out after combining them

For example, a layer that combines features from images and text would need to know the shape of both the image and text data.

LayerBase(int[][], int[], IActivationFunction<T>)

Initializes a new instance of the LayerBase<T> class with multiple input shapes, a specified output shape, and an element-wise activation function.

protected LayerBase(int[][] inputShapes, int[] outputShape, IActivationFunction<T> scalarActivation)

Parameters

inputShapes int[][]

The shapes of the input tensors.

outputShape int[]

The shape of the output tensor.

scalarActivation IActivationFunction<T>

The element-wise activation function to apply.

Remarks

This constructor creates a new Layer that accepts multiple inputs with different shapes and applies an element-wise activation function to the output.

For Beginners: This creates a layer that handles multiple inputs and applies a standard activation.

This constructor:

  • Sets up the layer to accept multiple input sources
  • Defines the shape of the combined output
  • Adds a scalar activation function that processes each output value independently

This is useful for creating complex networks that merge data from different sources.

LayerBase(int[][], int[], IVectorActivationFunction<T>)

Initializes a new instance of the LayerBase<T> class with multiple input shapes, a specified output shape, and a vector activation function.

protected LayerBase(int[][] inputShapes, int[] outputShape, IVectorActivationFunction<T> vectorActivation)

Parameters

inputShapes int[][]

The shapes of the input tensors.

outputShape int[]

The shape of the output tensor.

vectorActivation IVectorActivationFunction<T>

The vector activation function to apply.

Remarks

This constructor creates a new Layer that accepts multiple inputs with different shapes and applies a vector activation function to the output.

For Beginners: This creates a layer that handles multiple inputs and applies a vector-based activation.

This constructor:

  • Sets up the layer to accept multiple input sources
  • Defines the shape of the combined output
  • Adds a vector activation function that processes groups of output values together
  • Marks the layer as using vector activation

This combines the flexibility of multiple inputs with the power of vector activations.

Fields

BiasParameterName

Standard parameter name for bias tensors.

protected const string BiasParameterName = "bias"

Field Value

string

InitializationLock

Object used for thread-safe lazy initialization.

protected readonly object InitializationLock

Field Value

object

IsTrainingMode

Gets or sets a value indicating whether the layer is in training mode.

protected bool IsTrainingMode

Field Value

bool

Remarks

This flag indicates whether the layer is currently in training mode or inference (evaluation) mode. Some layers behave differently during training versus inference, such as Dropout or BatchNormalization.

For Beginners: This tells the layer whether it's currently training or being used for predictions.

This mode flag:

  • Affects how certain layers behave
  • Can turn on/off special training features
  • Helps the network switch between learning and using what it learned

For example, dropout layers randomly turn off neurons during training to improve generalization, but during inference they don't drop anything.

ParameterGradients

The gradients of the trainable parameters.

protected Vector<T>? ParameterGradients

Field Value

Vector<T>

Remarks

This vector contains the gradients of all trainable parameters for the layer. These gradients indicate how each parameter should be adjusted during training to reduce the error.

For Beginners: These values show how to adjust the parameters during training.

Parameter gradients:

  • Tell the network which direction to change each parameter
  • Show how sensitive the error is to each parameter
  • Guide the learning process

A larger gradient means a parameter has more influence on the error and needs a bigger adjustment during training.

Parameters

The trainable parameters of this layer.

protected Vector<T> Parameters

Field Value

Vector<T>

Remarks

This vector contains all trainable parameters for the layer, such as weights and biases. The specific interpretation of these parameters depends on the layer type.

For Beginners: These are the values that the layer learns during training.

Parameters include:

  • Weights that determine how important each input is
  • Biases that provide a baseline or starting point
  • Other learnable values specific to certain layer types

During training, these values are adjusted to make the network's predictions better.

WeightParameterName

Standard parameter name for weight tensors.

protected const string WeightParameterName = "weight"

Field Value

string

Properties

CanExecuteOnGpu

Gets whether this layer can execute its forward pass on GPU.

public virtual bool CanExecuteOnGpu { get; }

Property Value

bool

Remarks

Returns true when both the layer supports GPU execution AND a GPU engine is currently active. Use this to check at runtime whether GPU forward pass is available.

For Beginners: Check this before calling ForwardGpu. It combines "does the layer have GPU code?" with "is the GPU engine active?"

CanTrainOnGpu

Gets whether this layer can execute GPU training (forward, backward, parameter update).

public virtual bool CanTrainOnGpu { get; }

Property Value

bool

Remarks

Returns true when both the layer supports GPU training AND a GPU engine is currently active.

For Beginners: Check this before attempting GPU-resident training. If false, training will fall back to CPU operations.

Engine

Gets the global execution engine for vector operations.

protected IEngine Engine { get; }

Property Value

IEngine

InitializationStrategy

Gets or sets the initialization strategy for this layer.

public IInitializationStrategy<T>? InitializationStrategy { get; set; }

Property Value

IInitializationStrategy<T>

Remarks

The initialization strategy controls when and how the layer's weights are allocated and initialized. Lazy initialization defers weight allocation until the first forward pass, which significantly speeds up network construction.

For Beginners: This controls when the layer sets up its internal weights.

Lazy initialization:

  • Defers weight allocation until the layer is actually used
  • Makes network construction much faster
  • Useful for tests and when comparing network architectures

Eager initialization:

  • Allocates weights immediately at construction time
  • Traditional behavior, weights are ready immediately

InputShape

Gets the input shape for this layer.

protected int[] InputShape { get; }

Property Value

int[]

Remarks

This property contains the shape of the input tensor that the layer expects. For example, a 2D convolutional layer might expect an input shape of [batchSize, channels, height, width].

For Beginners: This defines the shape of data this layer expects to receive.

The input shape:

  • Tells the layer how many dimensions the input data has
  • Specifies the size of each dimension
  • Helps the layer organize its operations properly

For example, if processing images that are 28x28 pixels with 1 color channel, the input shape might be [1, 28, 28] (channels, height, width).

InputShapes

Gets the input shapes for this layer, supporting multiple inputs.

protected int[][] InputShapes { get; }

Property Value

int[][]

Remarks

This property contains the shapes of all input tensors that the layer expects, for layers that accept multiple inputs (such as merge layers).

For Beginners: This defines the shapes of all input sources for layers that take multiple inputs.

For layers that combine multiple data sources:

  • Each input may have a different shape
  • This array stores all those shapes
  • Helps the layer handle multiple inputs properly

For example, a layer that combines features from two different sources would need to know the shape of each source.

IsInitialized

Gets a value indicating whether this layer has been initialized.

public virtual bool IsInitialized { get; }

Property Value

bool

Remarks

For layers with lazy initialization, this indicates whether the weights have been allocated and initialized. For eager initialization, this is always true after construction.

For Beginners: This tells you if the layer's weights are ready to use.

A value of true means:

  • Weights have been allocated
  • The layer is ready for forward/backward passes

A value of false means:

  • Weights are not yet allocated (lazy initialization)
  • The first Forward() call will initialize them

NamedParameterCount

Gets the total number of named parameters.

public virtual int NamedParameterCount { get; }

Property Value

int

NumOps

Gets the numeric operations provider for type T.

protected INumericOperations<T> NumOps { get; }

Property Value

INumericOperations<T>

Remarks

This property provides access to numeric operations (like addition, multiplication, etc.) that work with the generic type T. This allows the layer to perform mathematical operations regardless of whether T is float, double, or another numeric type.

For Beginners: This is a toolkit for math operations that works with different number types.

It provides:

  • Basic math operations (add, subtract, multiply, etc.)
  • Ways to convert between different number formats
  • Special math functions needed by neural networks

This allows the layer to work with different types of numbers (float, double, etc.) without needing different code for each type.

OutputShape

Gets the output shape for this layer.

protected int[] OutputShape { get; }

Property Value

int[]

Remarks

This property contains the shape of the output tensor that the layer produces. For example, a 2D convolutional layer with 16 filters might produce an output shape of [batchSize, 16, height, width].

For Beginners: This defines the shape of data this layer produces as output.

The output shape:

  • Tells the next layer what shape of data to expect
  • Shows how this layer transforms the data dimensions
  • Helps verify the network is structured correctly

For example, if a layer reduces image size from 28x28 to 14x14 and produces 16 feature maps, the output shape might be [16, 14, 14] (channels, height, width).

ParameterCount

Gets the total number of parameters in this layer.

public virtual int ParameterCount { get; }

Property Value

int

The total number of trainable parameters.

Remarks

This property returns the total number of trainable parameters in the layer. By default, it returns the length of the Parameters vector, but derived classes can override this to calculate the number of parameters differently.

For Beginners: This tells you how many learnable values the layer has.

The parameter count:

  • Shows how complex the layer is
  • Indicates how many values need to be learned during training
  • Can help estimate memory usage and computational requirements

Layers with more parameters can potentially learn more complex patterns but may also require more data to train effectively.

Random

Gets the thread-safe random number generator.

protected static Random Random { get; }

Property Value

Random

Remarks

This property provides access to the centralized thread-safe random number generator, which is used for initializing weights and other parameters that require randomization.

For Beginners: This provides random numbers for initializing the layer.

Random numbers are needed to:

  • Set starting values for weights and biases
  • Add randomness to avoid symmetry problems
  • Help the network learn diverse patterns

Good initialization with proper randomness is important for neural networks to learn effectively.

ScalarActivation

Gets the element-wise activation function for this layer, if specified.

public IActivationFunction<T>? ScalarActivation { get; }

Property Value

IActivationFunction<T>

Remarks

The scalar activation function applies to individual values in the layer's output tensor. Common activation functions include ReLU, Sigmoid, and Tanh.

For Beginners: This is the function that adds non-linearity to each value individually.

Activation functions:

  • Add non-linearity, helping the network learn complex patterns
  • Process each number one at a time
  • Transform values into more useful ranges (like 0 to 1, or -1 to 1)

For example, ReLU turns all negative values to zero while keeping positive values unchanged. Without activation functions, neural networks couldn't learn complex patterns.

SupportsGpuExecution

Gets whether this layer has a GPU execution implementation for inference.

protected virtual bool SupportsGpuExecution { get; }

Property Value

bool

Remarks

Override this to return true when the layer implements ForwardGpu(params IGpuTensor<T>[]). The actual CanExecuteOnGpu property combines this with engine availability.

For Beginners: This flag indicates if the layer has GPU code for the forward pass. Set this to true in derived classes that implement ForwardGpu.

SupportsGpuTraining

Gets whether this layer has full GPU training support (forward, backward, and parameter updates).

public virtual bool SupportsGpuTraining { get; }

Property Value

bool

Remarks

This property indicates whether the layer can perform its entire training cycle on GPU without downloading data to CPU. A layer has full GPU training support when:

  • ForwardGpu is implemented
  • BackwardGpu is implemented
  • UpdateParametersGpu is implemented (for layers with trainable parameters)
  • GPU weight/bias/gradient buffers are properly managed

For Beginners: This tells you if training can happen entirely on GPU.

GPU-resident training is much faster because:

  • Data stays on GPU between forward and backward passes
  • No expensive CPU-GPU transfers during each training step
  • GPU kernels handle all gradient computation

Only layers that return true here can participate in fully GPU-resident training.

SupportsJitCompilation

Gets whether this layer supports JIT compilation.

public abstract bool SupportsJitCompilation { get; }

Property Value

bool

True if the layer can be JIT compiled, false otherwise.

Remarks

This property indicates whether the layer has implemented ExportComputationGraph() and can benefit from JIT compilation. All layers MUST implement this property.

For Beginners: JIT compilation can make inference 5-10x faster by converting the layer's operations into optimized native code.

Layers should return false if they:

  • Have not yet implemented a working ExportComputationGraph()
  • Use dynamic operations that change based on input data
  • Are too simple to benefit from JIT compilation

When false, the layer will use the standard Forward() method instead.

SupportsTraining

Gets a value indicating whether this layer supports training.

public abstract bool SupportsTraining { get; }

Property Value

bool

true if the layer has trainable parameters and supports backpropagation; otherwise, false.

Remarks

This property indicates whether the layer can be trained through backpropagation. Layers with trainable parameters such as weights and biases typically return true, while layers that only perform fixed transformations (like pooling or activation layers) typically return false.

For Beginners: This property tells you if the layer can learn from data.

A value of true means:

  • The layer has parameters that can be adjusted during training
  • It will improve its performance as it sees more data
  • It participates in the learning process

A value of false means:

  • The layer doesn't have any adjustable parameters
  • It performs the same operation regardless of training
  • It doesn't need to learn (but may still be useful)

UseAutodiff

Gets or sets a value indicating whether this layer uses automatic differentiation for backward passes.

public bool UseAutodiff { get; set; }

Property Value

bool

true if the layer should use autodiff; false if it uses manual backward implementation. Default is false.

Remarks

This property controls whether the layer uses the automatic differentiation system (autodiff) or manual backward pass implementations during training. Manual backward passes are typically faster but require explicit gradient computation code. Autodiff is more flexible and can be useful for: - Custom layer implementations where manual gradients are complex - Research and experimentation with novel architectures - Rapid prototyping of new layer types

For Beginners: This controls how the layer computes gradients during training.

Two modes are available:

  • Manual (default, false): Uses hand-written, optimized gradient code. Faster but requires careful implementation.
  • Autodiff (true): Uses automatic differentiation to compute gradients. Slower but more flexible and less error-prone.

Most users should leave this as false (default) for best performance. Set to true only for:

  • Custom layers with complex gradients
  • Experimental or research purposes
  • When you need guaranteed correct gradients for a new operation

Note: Autodiff support must be implemented by the specific layer type. Not all layers support autodiff mode yet.

UsingVectorActivation

Gets a value indicating whether this layer uses a vector activation function.

protected bool UsingVectorActivation { get; }

Property Value

bool

Remarks

This property indicates whether the layer is using a vector activation function or an element-wise activation function. It is used to determine which type of activation to apply during forward and backward passes.

For Beginners: This tells the layer which type of activation function to use.

It's like a switch that determines:

  • Whether to process values one by one (scalar activation)
  • Or to process groups of values together (vector activation)

This helps the layer know which method to use when applying activations.

VectorActivation

Gets the vector activation function for this layer, if specified.

public IVectorActivationFunction<T>? VectorActivation { get; }

Property Value

IVectorActivationFunction<T>

Remarks

The vector activation function applies to entire vectors in the layer's output tensor. This can capture dependencies between different elements of the vectors, such as in Softmax.

For Beginners: This is a more advanced function that processes groups of values together.

Vector activation functions:

  • Process entire groups of numbers together, not just one at a time
  • Can capture relationships between different features
  • Are used for special purposes like classification (Softmax)

For example, Softmax turns a vector of numbers into probabilities that sum to 1, which is useful for classifying inputs into categories.

Methods

ActivateTensor(IActivationFunction<T>?, Tensor<T>)

Applies a scalar activation function to each element of a tensor.

protected Tensor<T> ActivateTensor(IActivationFunction<T>? activation, Tensor<T> input)

Parameters

activation IActivationFunction<T>

The scalar activation function to apply.

input Tensor<T>

The input tensor to activate.

Returns

Tensor<T>

The activated tensor.

Remarks

This helper method applies a scalar activation function to each element of a tensor. If the activation function is null, it returns the input tensor unchanged.

For Beginners: This method applies an activation function to each value in a tensor.

Activation functions:

  • Transform values in specific ways (like sigmoid squeezes values between 0 and 1)
  • Add non-linearity, which helps neural networks learn complex patterns
  • Are applied individually to each number in the data

If no activation function is provided, the values pass through unchanged.

ActivateTensor(IVectorActivationFunction<T>?, Tensor<T>)

Applies a vector activation function to a tensor.

protected Tensor<T> ActivateTensor(IVectorActivationFunction<T>? activation, Tensor<T> input)

Parameters

activation IVectorActivationFunction<T>

The vector activation function to apply.

input Tensor<T>

The input tensor to activate.

Returns

Tensor<T>

The activated tensor.

Remarks

This helper method applies a vector activation function to a tensor. If the activation function is null, it returns the input tensor unchanged. Vector activation functions operate on entire tensors at once, which can be more efficient than element-wise operations.

For Beginners: This method applies an activation function to an entire tensor at once.

Vector activation functions:

  • Process entire groups of values simultaneously
  • Can be more efficient than processing one value at a time
  • Provide the same mathematical result but often faster

If no activation function is provided, the values pass through unchanged.

ApplyActivation(Tensor<T>)

Applies the activation function to a tensor.

protected Tensor<T> ApplyActivation(Tensor<T> input)

Parameters

input Tensor<T>

The input tensor to activate.

Returns

Tensor<T>

The activated tensor.

ApplyActivation(Vector<T>)

Applies the activation function to a vector.

protected Vector<T> ApplyActivation(Vector<T> input)

Parameters

input Vector<T>

The input vector to activate.

Returns

Vector<T>

The activated vector.

Remarks

This method applies the layer's activation function to a vector. It uses the vector activation function if one is specified, or applies the scalar activation function element-wise if no vector activation is available.

For Beginners: This method applies the activation function to a vector of values.

This method:

  • First checks if a vector activation function is available (processes all elements together)
  • If not, uses the scalar activation function (processes each element independently)
  • If neither is available, returns the input unchanged (identity function)

This flexibility allows the layer to use the most appropriate activation method based on what was specified during creation.

ApplyActivationBackwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer?, IGpuBuffer?, IGpuBuffer, int)

Applies the layer's activation function backward pass on GPU using the activation's own GPU method.

protected bool ApplyActivationBackwardGpu(IDirectGpuBackend backend, IGpuBuffer gradOutput, IGpuBuffer? input, IGpuBuffer? output, IGpuBuffer gradInput, int size)

Parameters

backend IDirectGpuBackend

The GPU backend to use for execution.

gradOutput IGpuBuffer

The gradient flowing back from the next layer.

input IGpuBuffer

The input buffer from the forward pass (needed for some activations).

output IGpuBuffer

The output buffer from the forward pass (needed for some activations).

gradInput IGpuBuffer

The buffer to store the input gradient.

size int

The number of elements to process.

Returns

bool

True if the backward pass was applied on GPU; false if no activation or GPU not supported.

Remarks

This method follows the Open/Closed Principle by delegating to the activation function's own GPU backward implementation. Each activation function knows what it needs: - ReLU, GELU, Swish, LeakyReLU, SiLU, Mish, etc.: Need the input from forward pass - Sigmoid, Tanh: Need the output from forward pass - ELU: Needs both input and output from forward pass

For Beginners: During training, we need to compute how the activation affects the gradients. Each activation function handles this differently, and by delegating to the activation's BackwardGpu method, we don't need to know the details here.

ApplyActivationDerivative(Tensor<T>, Tensor<T>)

Applies the derivative of the activation function to a tensor.

protected Tensor<T> ApplyActivationDerivative(Tensor<T> input, Tensor<T> outputGradient)

Parameters

input Tensor<T>

The input tensor.

outputGradient Tensor<T>

The output gradient tensor.

Returns

Tensor<T>

The input gradient tensor after applying the activation derivative.

Remarks

This method applies the derivative of the layer's activation function to a tensor during the backward pass. It multiplies the derivative of the activation function at each point in the input tensor by the corresponding output gradient.

For Beginners: This calculates how small changes in values affect the output.

During backpropagation:

  • This method handles tensors (multi-dimensional arrays of values)
  • It applies the correct derivative calculation based on the activation type
  • For vector activations, it uses the specialized derivative method
  • For scalar activations, it applies the derivative to each value independently

This is a key part of the math that allows neural networks to learn through backpropagation.

Exceptions

ArgumentException

Thrown when the input and output gradient tensors have different ranks.

ApplyActivationDerivative(Vector<T>, Vector<T>)

Applies the derivative of the activation function to a vector.

protected Vector<T> ApplyActivationDerivative(Vector<T> input, Vector<T> outputGradient)

Parameters

input Vector<T>

The input vector.

outputGradient Vector<T>

The output gradient vector.

Returns

Vector<T>

The input gradient vector after applying the activation derivative.

Remarks

This method applies the derivative of the activation function to a vector during the backward pass. It computes the Jacobian matrix of the activation function and multiplies it by the output gradient.

For Beginners: This calculates how changes in a vector of values affect the output.

For vector operations:

  • This method computes the full matrix of relationships between inputs and outputs
  • It then multiplies this matrix by the incoming gradient
  • The result shows how each input value should be adjusted

This is a more comprehensive approach than the element-wise method, accounting for cases where each output depends on multiple inputs.

ApplyActivationDerivative(T, T)

Applies the derivative of the activation function to a single value.

protected T ApplyActivationDerivative(T input, T outputGradient)

Parameters

input T

The input value.

outputGradient T

The output gradient.

Returns

T

The input gradient after applying the activation derivative.

Remarks

This method applies the derivative of the layer's activation function to a single value during the backward pass. It multiplies the derivative of the activation function at the input value by the output gradient.

For Beginners: This calculates how a small change in one value affects the output.

During backpropagation:

  • We need to know how sensitive each value is to changes
  • This method calculates that sensitivity for a single value
  • It multiplies the activation derivative by the incoming gradient

This helps determine how much each individual value should be adjusted during training.

ApplyActivationForwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int)

Applies the layer's activation function forward pass on GPU using the activation's own GPU method.

protected bool ApplyActivationForwardGpu(IDirectGpuBackend backend, IGpuBuffer input, IGpuBuffer output, int size)

Parameters

backend IDirectGpuBackend

The GPU backend to use for execution.

input IGpuBuffer

The input GPU buffer.

output IGpuBuffer

The output GPU buffer.

size int

The number of elements to process.

Returns

bool

True if the activation was applied on GPU; false if no activation or GPU not supported.

Remarks

This method follows the Open/Closed Principle by delegating to the activation function's own GPU implementation rather than using a switch statement on activation types. Each activation function knows how to apply itself on GPU.

For Beginners: Instead of having one giant switch statement that handles every possible activation type, each activation function has its own ForwardGpu method. This makes it easy to add new activation functions without modifying this code.

ApplyActivationToGraph(ComputationNode<T>)

Applies the layer's configured activation function to a computation graph node.

protected ComputationNode<T> ApplyActivationToGraph(ComputationNode<T> input)

Parameters

input ComputationNode<T>

The computation node to apply activation to.

Returns

ComputationNode<T>

The computation node with activation applied.

Remarks

This helper method delegates to the activation's ApplyToGraph method, following the Open/Closed Principle. Adding new activations does not require modifying layer code.

For Beginners: This method adds the activation function to the computation graph.

Instead of the layer code checking what type of activation is configured (which would require changing the layer every time a new activation is added), this method simply asks the activation to add itself to the graph. This makes the code more maintainable and extensible.

Exceptions

ArgumentNullException

Thrown if input is null.

NotSupportedException

Thrown if activation does not support JIT.

ApplyGpuActivation(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int, FusedActivationType)

Applies the specified activation function on GPU using the direct backend operations.

protected static void ApplyGpuActivation(IDirectGpuBackend backend, IGpuBuffer input, IGpuBuffer output, int size, FusedActivationType activation)

Parameters

backend IDirectGpuBackend

The GPU backend to use for activation.

input IGpuBuffer

The input GPU buffer.

output IGpuBuffer

The output GPU buffer.

size int

The number of elements to process.

activation FusedActivationType

The type of activation function to apply.

Remarks

This method is primarily used for fused kernel operations where the activation type is specified via the AiDotNet.Tensors.Engines.FusedActivationType enum. It maps enum values to the corresponding backend activation kernels.

Note: For new code, prefer using ApplyActivationForwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int) which follows the Open/Closed Principle by delegating to each activation function's own GPU implementation. This allows new activation functions to be added without modifying this switch statement.

This static method only supports common activations (ReLU, Sigmoid, Tanh, GELU, LeakyReLU, Swish). For other activations, use the OCP-compliant method instead.

ApplyGpuActivationBackward(IDirectGpuBackend, IGpuBuffer, IGpuBuffer?, IGpuBuffer?, IGpuBuffer, int, FusedActivationType, float)

Applies the backward pass of the specified activation function on GPU.

protected static bool ApplyGpuActivationBackward(IDirectGpuBackend backend, IGpuBuffer gradOutput, IGpuBuffer? input, IGpuBuffer? output, IGpuBuffer gradInput, int size, FusedActivationType activation, float alpha = 0.01)

Parameters

backend IDirectGpuBackend

The GPU backend to use for activation backward.

gradOutput IGpuBuffer

The gradient from the next layer.

input IGpuBuffer

The input from the forward pass (needed for ReLU, LeakyReLU, GELU, Swish).

output IGpuBuffer

The output from the forward pass (needed for Sigmoid, Tanh).

gradInput IGpuBuffer

The buffer to store the input gradient.

size int

The number of elements to process.

activation FusedActivationType

The type of activation function.

alpha float

Alpha parameter for LeakyReLU (default 0.01).

Returns

bool

True if the backward was handled on GPU, false if CPU fallback is needed.

Remarks

This method is primarily used for fused kernel operations where the activation type is specified via the AiDotNet.Tensors.Engines.FusedActivationType enum.

Note: For new code, prefer using ApplyActivationBackwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer?, IGpuBuffer?, IGpuBuffer, int) which follows the Open/Closed Principle by delegating to each activation function's own GPU backward implementation. This allows new activation functions to be added without modifying this switch statement.

Different activation functions require different cached values from forward pass:

  • ReLU, LeakyReLU, GELU, Swish: Need the input from forward pass
  • Sigmoid, Tanh: Need the output from forward pass

Backward(Tensor<T>)

Performs the backward pass of the layer.

public abstract Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>

The gradient of the loss with respect to the layer's output.

Returns

Tensor<T>

The gradient of the loss with respect to the layer's input.

Remarks

This abstract method must be implemented by derived classes to define the backward pass of the layer. The backward pass propagates error gradients from the output of the layer back to its input, and calculates gradients for any trainable parameters.

For Beginners: This method is used during training to calculate how the layer's input should change to reduce errors.

During the backward pass:

  1. The layer receives information about how its output contributed to errors
  2. It calculates how its parameters should change to reduce errors
  3. It calculates how its input should change, which will be used by earlier layers

This is the core of how neural networks learn from their mistakes during training.

BackwardGpu(IGpuTensor<T>)

Performs the backward pass of the layer on GPU.

public virtual IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)

Parameters

outputGradient IGpuTensor<T>

The GPU-resident gradient of the loss with respect to the layer's output.

Returns

IGpuTensor<T>

The GPU-resident gradient of the loss with respect to the layer's input.

Remarks

This method performs the layer's backward computation entirely on GPU, including:

  • Computing input gradients to pass to previous layers
  • Computing and storing weight gradients on GPU (for layers with trainable parameters)
  • Computing and storing bias gradients on GPU

For Beginners: This is like Backward() but runs entirely on GPU.

During GPU training:

  1. Output gradients come in (on GPU)
  2. Input gradients are computed (stay on GPU)
  3. Weight/bias gradients are computed and stored (on GPU)
  4. Input gradients are returned for the previous layer

All data stays on GPU - no CPU round-trips needed!

Exceptions

NotSupportedException

Thrown when the layer does not support GPU training.

CalculateInputShape(int, int, int)

Calculates a standard input shape for 2D data with batch size of 1.

protected static int[] CalculateInputShape(int inputDepth, int height, int width)

Parameters

inputDepth int

The depth (number of channels) of the input.

height int

The height of the input.

width int

The width of the input.

Returns

int[]

An array representing the input shape [batch, depth, height, width].

Remarks

This helper method calculates a standard input shape for 2D data (like images) with a batch size of 1. The shape follows the NCHW (batch, channels, height, width) format.

For Beginners: This method creates a standard shape for image-like data.

When working with images or similar 2D data:

  • This creates a standard shape array in the format [batch, channels, height, width]
  • The batch dimension is set to 1 (processing one item at a time)
  • The other dimensions come from the parameters

For example, for a 28x28 grayscale image, you might use inputDepth=1, height=28, width=28, resulting in a shape of [1, 1, 28, 28].

CalculateOutputShape(int, int, int)

Calculates a standard output shape for 2D data with batch size of 1.

protected static int[] CalculateOutputShape(int outputDepth, int outputHeight, int outputWidth)

Parameters

outputDepth int

The depth (number of channels) of the output.

outputHeight int

The height of the output.

outputWidth int

The width of the output.

Returns

int[]

An array representing the output shape [batch, depth, height, width].

Remarks

This helper method calculates a standard output shape for 2D data (like images) with a batch size of 1. The shape follows the NCHW (batch, channels, height, width) format.

For Beginners: This method creates a standard shape for image-like output data.

When defining the output shape for 2D data:

  • This creates a standard shape array in the format [batch, channels, height, width]
  • The batch dimension is set to 1 (producing one output at a time)
  • The other dimensions come from the parameters

For example, if a convolutional layer produces 16 feature maps of size 14x14, you might use outputDepth=16, outputHeight=14, outputWidth=14.

CanActivationBeJitted()

Checks if the layer's current activation function supports JIT compilation.

protected bool CanActivationBeJitted()

Returns

bool

True if the activation can be JIT compiled, false otherwise.

Remarks

This method checks whether the layer's configured activation function supports JIT compilation by querying the activation's SupportsJitCompilation property. If no activation is configured, returns true (identity function is always JIT-compatible).

For Beginners: This method checks if the activation is ready for JIT compilation.

The layer uses this to determine if it can export a computation graph for faster inference. If the activation does not support JIT yet (because gradients are not implemented), the layer will fall back to the standard execution path.

ClearGradients()

Clears all parameter gradients in this layer.

public virtual void ClearGradients()

Remarks

This method sets all parameter gradients to zero. This is typically called at the beginning of each batch during training to ensure that gradients from previous batches don't affect the current batch.

For Beginners: This method resets all adjustment values to zero to start fresh.

Clearing gradients:

  • Erases all previous adjustment information
  • Prepares the layer for a new training batch
  • Prevents old adjustments from interfering with new ones

This is typically done at the start of processing each batch of training data to ensure clean, accurate gradient calculations.

Clone()

Creates a copy of this layer.

public virtual LayerBase<T> Clone()

Returns

LayerBase<T>

A new instance of the layer with the same configuration.

Remarks

This method creates a shallow copy of the layer with deep copies of the input/output shapes and activation functions. Derived classes should override this method to properly copy any additional fields they define.

For Beginners: This method creates a duplicate of this layer.

When copying a layer:

  • Basic properties like shapes are duplicated
  • Activation functions are cloned
  • The new layer works independently from the original

This is useful for:

  • Creating similar layers with small variations
  • Implementing complex network architectures with repeated patterns
  • Saving a layer's state before making changes

ComputeActivationJacobian(Vector<T>)

Computes the Jacobian matrix of the activation function for a given input vector.

protected Matrix<T> ComputeActivationJacobian(Vector<T> input)

Parameters

input Vector<T>

The input vector.

Returns

Matrix<T>

The Jacobian matrix of the activation function at the input.

Remarks

This method computes the Jacobian matrix of the activation function, which represents how each output element changes with respect to each input element. For vector activation functions, it uses the function's derivative method. For scalar activation functions, it creates a diagonal matrix with the derivatives.

For Beginners: This calculates a matrix that shows how changes in inputs affect outputs.

The Jacobian matrix:

  • Shows how each output value depends on each input value
  • For scalar activations, it's a diagonal matrix (each output depends only on the corresponding input)
  • For vector activations, it can have off-diagonal elements (outputs depend on multiple inputs)

This is an advanced concept used in certain optimization techniques and for precise gradient calculations.

DerivativeTensor(IActivationFunction<T>?, Tensor<T>)

Calculates the derivative of a scalar activation function for each element of a tensor.

protected Tensor<T> DerivativeTensor(IActivationFunction<T>? activation, Tensor<T> input)

Parameters

activation IActivationFunction<T>

The scalar activation function.

input Tensor<T>

The input tensor.

Returns

Tensor<T>

A tensor containing the derivatives.

Remarks

This helper method calculates the derivative of a scalar activation function for each element of a tensor. If the activation function is null, it returns a tensor filled with ones, representing the derivative of the identity function.

For Beginners: This method calculates how sensitive each value is to changes.

The derivative:

  • Measures how much the output changes when the input changes slightly
  • Is essential for the backpropagation algorithm during training
  • Helps determine how to adjust weights to reduce errors

If no activation function is provided, it assumes the identity function (y = x), which has a derivative of 1 everywhere.

Deserialize(BinaryReader)

Deserializes the layer's parameters from a binary reader.

public virtual void Deserialize(BinaryReader reader)

Parameters

reader BinaryReader

The binary reader to read from.

Remarks

This method reads the layer's parameters from a binary reader, which can be used to load the layer's state from a file or other storage medium. It reads the parameter count followed by each parameter value.

For Beginners: This method loads the layer's learned values from storage.

When deserializing a layer:

  • The number of parameters is read first
  • Then each parameter value is read
  • All values are converted from doubles to the appropriate numeric type

This allows you to load a previously trained layer without having to retrain it from scratch.

Dispose()

Releases all resources used by this layer, including any GPU resources.

public void Dispose()

Remarks

This method releases any resources allocated by the layer, including GPU memory for persistent tensors. All layers that allocate resources should override Dispose(bool) to properly release them.

For Beginners: GPU memory is limited and precious.

When you're done with a layer:

  • Call Dispose() or use a 'using' statement
  • This frees up GPU memory for other operations
  • Failing to dispose can cause memory leaks

Example:

using var layer = new DenseLayer<float>(784, 128);
// ... use layer ...
// Automatically disposed when out of scope

Dispose(bool)

Releases resources used by this layer.

protected virtual void Dispose(bool disposing)

Parameters

disposing bool

True if called from Dispose(), false if called from finalizer.

Remarks

Override this method in derived classes to release layer-specific resources. Always call base.Dispose(disposing) after releasing your resources.

For Beginners: When creating a custom layer with resources:

protected override void Dispose(bool disposing)
{
    if (disposing)
    {
        // Release your managed resources here
        _myGpuHandle?.Dispose();
        _myGpuHandle = null;
    }
    base.Dispose(disposing);
}

DownloadWeightsFromGpu()

Downloads the layer's weights and biases from GPU memory back to CPU.

public virtual void DownloadWeightsFromGpu()

Remarks

Call this after GPU training to sync weights back to CPU for:

  • Model checkpointing / saving
  • CPU inference
  • Inspection of trained weights

For Beginners: This copies learned values back from GPU to CPU.

During GPU training, weights are modified on GPU and the CPU copy is stale. Call this to:

  • Save the model to disk
  • Switch to CPU inference
  • Examine what the layer learned

This is relatively expensive, so only do it when necessary (not every batch).

EnsureInitialized()

Ensures that the layer is initialized. Call this at the start of Forward() for lazy initialization.

protected virtual void EnsureInitialized()

Remarks

For layers that support lazy initialization, this method should be called at the start of Forward() to ensure weights are allocated before use. The default implementation does nothing (for layers without lazy initialization support).

For Beginners: This makes sure the layer is ready before processing data.

For lazy initialization:

  • First call allocates and initializes weights
  • Subsequent calls do nothing (weights already initialized)
  • Thread-safe for parallel execution

ExportComputationGraph(List<ComputationNode<T>>)

Exports the layer's computation graph for JIT compilation.

public abstract ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>

List to populate with input computation nodes.

Returns

ComputationNode<T>

The output computation node representing the layer's operation.

Remarks

This method constructs a computation graph representation of the layer's forward pass that can be JIT compiled for faster inference. All layers MUST implement this method to support JIT compilation.

For Beginners: JIT (Just-In-Time) compilation converts the layer's operations into optimized native code for 5-10x faster inference.

To support JIT compilation, a layer must:

  1. Implement this method to export its computation graph
  2. Set SupportsJitCompilation to true
  3. Use ComputationNode and TensorOperations to build the graph

All layers are required to implement this method, even if they set SupportsJitCompilation = false.

Forward(Tensor<T>)

Performs the forward pass of the layer.

public abstract Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>

The input tensor to process.

Returns

Tensor<T>

The output tensor after processing.

Remarks

This abstract method must be implemented by derived classes to define the forward pass of the layer. The forward pass transforms the input tensor according to the layer's operation and activation function.

For Beginners: This method processes your data through the layer.

The forward pass:

  • Takes input data from the previous layer or the network input
  • Applies the layer's specific transformation (like convolution or matrix multiplication)
  • Applies any activation function
  • Passes the result to the next layer

This is where the actual data processing happens during both training and prediction.

Forward(params Tensor<T>[])

Performs the forward pass of the layer with multiple input tensors.

public virtual Tensor<T> Forward(params Tensor<T>[] inputs)

Parameters

inputs Tensor<T>[]

The input tensors to process.

Returns

Tensor<T>

The output tensor after processing.

Remarks

This method implements a default forward pass for layers that accept multiple inputs. By default, it concatenates the inputs along the channel dimension. Derived classes can override this method to implement more specific behavior for multiple inputs.

For Beginners: This method handles processing multiple inputs through the layer.

When a layer needs to combine multiple data sources:

  • This method takes all the input tensors
  • By default, it combines them by stacking them along the channel dimension
  • It checks that the inputs are compatible (same shape except for channels)
  • It then passes the combined data forward

For example, if combining features from two sources each with 10 channels, this would create a tensor with 20 channels by default.

Specialized layers can override this to combine inputs in different ways.

Exceptions

ArgumentException

Thrown when no input tensors are provided or when input tensors have incompatible shapes.

ForwardGpu(params IGpuTensor<T>[])

Performs the forward pass of the layer on GPU.

public virtual IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)

Parameters

inputs IGpuTensor<T>[]

The GPU-resident input tensor(s).

Returns

IGpuTensor<T>

The GPU-resident output tensor.

Remarks

This method performs the layer's forward computation entirely on GPU. The input and output tensors remain in GPU memory, avoiding expensive CPU-GPU transfers.

For Beginners: This is like Forward() but runs on the graphics card.

The key difference:

  • Forward() uses CPU tensors that may be copied to/from GPU
  • ForwardGpu() keeps everything on GPU the whole time

Override this in derived classes that support GPU acceleration.

Exceptions

NotSupportedException

Thrown when the layer does not support GPU execution.

GetActivationTypes()

Gets the types of activation functions used by this layer.

public virtual IEnumerable<ActivationFunction> GetActivationTypes()

Returns

IEnumerable<ActivationFunction>

An enumerable of activation function types.

Remarks

This method returns the types of activation functions used by this layer. This is useful for serialization and debugging purposes.

For Beginners: This method tells you what kinds of activation functions the layer uses.

This information:

  • Helps identify what non-linearities are applied in the layer
  • Is useful for saving/loading models
  • Helps with debugging and visualization

The information is returned as standardized activation types (like ReLU, Sigmoid, etc.) rather than the actual function objects.

GetBiases()

Gets the bias tensor for layers that have trainable biases.

public virtual Tensor<T>? GetBiases()

Returns

Tensor<T>

The bias tensor, or null if the layer has no biases.

Remarks

This method provides access to the layer's bias tensor for layers that use biases during computation. Layers without biases return null.

For Beginners: Biases are learnable offsets added to the layer's output.

Think of biases as a starting point:

  • Without bias: output = weights × input
  • With bias: output = weights × input + bias

Biases help the network learn more flexible patterns by shifting the activation function.

GetDiagnostics()

Gets diagnostic information about this layer's state and behavior.

public virtual Dictionary<string, string> GetDiagnostics()

Returns

Dictionary<string, string>

A dictionary containing diagnostic metrics for this layer. Base implementation provides common metrics like layer type, input/output shapes, and parameter count. Derived classes can override this method to add layer-specific diagnostics.

Remarks

The base implementation provides the following diagnostics:

  • layer.type: The concrete type name of the layer
  • layer.input_shape: The shape of input tensors
  • layer.output_shape: The shape of output tensors
  • layer.parameter_count: The total number of trainable parameters
  • layer.supports_training: Whether the layer has trainable parameters
  • layer.activation: The activation function type, if any

For Beginners: This method returns a report card with useful information about the layer.

The diagnostics help you understand:

  • What type of layer this is (Dense, Convolutional, etc.)
  • What size of data it expects (input shape)
  • What size of data it produces (output shape)
  • How many parameters it's learning
  • What activation function it uses

Derived classes (specific layer types) can add more detailed information:

  • Attention layers might report attention weights statistics
  • Batch normalization layers might report running mean/variance
  • Dropout layers might report dropout rate

Example usage:

var diagnostics = layer.GetDiagnostics();
foreach (var (key, value) in diagnostics)
{
    Console.WriteLine($"{key}: {value}");
}

Override Guidelines: When overriding in derived classes:

  1. Call base.GetDiagnostics() first to get common metrics
  2. Add your layer-specific diagnostics to the returned dictionary
  3. Use consistent key naming (e.g., "activation.mean", "gradient.norm")
  4. Provide human-readable string values
  5. Keep computations lightweight to avoid impacting performance

Example override:

public override Dictionary<string, string> GetDiagnostics()
{
    var diagnostics = base.GetDiagnostics();

    if (_lastActivations != null)
    {
        diagnostics["activation.mean"] = ComputeMean(_lastActivations).ToString();
        diagnostics["activation.std"] = ComputeStd(_lastActivations).ToString();
    }

    return diagnostics;
}

GetFusedActivationType()

Gets the fused activation type for IEngine fused operations.

protected FusedActivationType GetFusedActivationType()

Returns

FusedActivationType

The FusedActivationType enum value for the current activation function.

Remarks

This method maps the layer's activation function to a FusedActivationType enum value, allowing IEngine to use optimized fused GPU kernels (e.g., GEMM+Bias+ReLU in one kernel).

For Beginners: GPU operations are faster when combined. Instead of doing MatMul, then adding bias, then applying ReLU as separate steps, fused operations do all three in one GPU kernel - this is 20-50% faster. This method tells the GPU which activation to fuse with other operations.

Supported Activations:

  • ReLU → FusedActivationType.ReLU
  • Sigmoid → FusedActivationType.Sigmoid
  • Tanh → FusedActivationType.Tanh
  • GELU → FusedActivationType.GELU
  • LeakyReLU → FusedActivationType.LeakyReLU
  • Swish/SiLU → FusedActivationType.Swish
  • Other/None → FusedActivationType.None (activation applied separately)

GetInputShape()

Gets the input shape for this layer.

public virtual int[] GetInputShape()

Returns

int[]

The input shape as an array of integers.

Remarks

This method returns the input shape of the layer. If the layer has multiple input shapes, it returns the first one.

For Beginners: This method tells you what shape of data the layer expects.

The input shape:

  • Shows the dimensions of data this layer processes
  • Is needed to connect this layer with previous layers
  • Helps verify the network structure is correct

For layers with multiple inputs, this returns just the first input shape.

GetInputShapes()

Gets all input shapes for this layer.

public virtual int[][] GetInputShapes()

Returns

int[][]

An array of input shapes.

Remarks

This method returns all input shapes of the layer. This is particularly useful for layers that accept multiple inputs with different shapes.

For Beginners: This method tells you the shapes of all data sources this layer can accept.

For layers that combine multiple inputs:

  • This returns all the input shapes in an array
  • Each shape defines the dimensions of one input source
  • Helpful for understanding complex network connections

This is most useful for layers like concatenation or merge layers.

GetOutputShape()

Gets the output shape for this layer.

public int[] GetOutputShape()

Returns

int[]

The output shape as an array of integers.

Remarks

This method returns the output shape of the layer, which defines the dimensions of the tensor that will be produced when data flows through this layer.

For Beginners: This method tells you what shape of data the layer produces.

The output shape:

  • Shows the dimensions of data after this layer processes it
  • Is needed to connect this layer with the next layer
  • Helps verify that data flows correctly through the network

For example, a convolutional layer might change the number of channels in the data, which would be reflected in the output shape.

GetParameterGradients()

Gets the gradients of all trainable parameters in this layer.

public virtual Vector<T> GetParameterGradients()

Returns

Vector<T>

A vector containing the gradients of all trainable parameters.

Remarks

This method returns the gradients of all trainable parameters in the layer. If the gradients haven't been calculated yet, it initializes a new vector of the appropriate size.

For Beginners: This method provides the current adjustment values for all parameters.

The parameter gradients:

  • Show how each parameter should be adjusted during training
  • Are calculated during the backward pass
  • Guide the optimization process

These gradients are usually passed to an optimizer like SGD or Adam, which uses them to update the parameters in a way that reduces errors.

GetParameterNames()

Gets all parameter names in this layer.

public virtual IEnumerable<string> GetParameterNames()

Returns

IEnumerable<string>

A collection of parameter names ("weight", "bias", or both depending on layer type).

Remarks

The default implementation returns "weight" and/or "bias" based on whether GetWeights() and GetBiases() return non-null values.

GetParameterShape(string)

Gets the expected shape for a parameter.

public virtual int[]? GetParameterShape(string name)

Parameters

name string

The parameter name ("weight" or "bias").

Returns

int[]

The expected shape, or null if the parameter doesn't exist.

GetParameters()

Gets all trainable parameters of the layer as a single vector.

public abstract Vector<T> GetParameters()

Returns

Vector<T>

A vector containing all trainable parameters.

Remarks

This abstract method must be implemented by derived classes to provide access to all trainable parameters of the layer as a single vector. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights.

For Beginners: This method collects all the learnable values from the layer.

The parameters:

  • Are the numbers that the neural network learns during training
  • Include weights, biases, and other learnable values
  • Are combined into a single long list (vector)

This is useful for:

  • Saving the model to disk
  • Loading parameters from a previously trained model
  • Advanced optimization techniques that need access to all parameters

GetWeights()

Gets the weight matrix for layers that have trainable weights.

public virtual Tensor<T>? GetWeights()

Returns

Tensor<T>

The weight matrix, or null if the layer has no weights.

Remarks

This method provides access to the layer's weight matrix for layers that use weights during computation. Layers without weights (like pooling or activation layers) return null.

For Beginners: Weights are the learnable parameters that define how a layer transforms data.

For example:

  • Dense layers use a weight matrix to transform inputs
  • Convolutional layers use filters (which are weights) to detect patterns
  • Pooling layers have no weights, so they return null

This method lets you inspect or modify the weights after training.

HasGpuActivation()

Checks if the layer's scalar activation function supports GPU training.

protected bool HasGpuActivation()

Returns

bool

True if the activation function has GPU kernels; false otherwise.

Remarks

For Beginners: Not all activation functions have GPU implementations yet. This method checks whether the layer's activation can run entirely on the GPU. If false, the layer must fall back to CPU computation for the activation.

InvalidateTrainableParameter(Tensor<T>)

Notifies the engine that a registered persistent tensor's data has changed.

protected void InvalidateTrainableParameter(Tensor<T> tensor)

Parameters

tensor Tensor<T>

The tensor whose data has been modified.

Remarks

Call this method after modifying a registered tensor's data (e.g., during parameter updates). The engine will re-upload the data to GPU on the next operation that uses the tensor.

For Beginners: When you change the values in a registered tensor (like updating weights during training), you need to tell the GPU that the copy it has is outdated. This method does that - it tells the GPU "hey, this data changed, please get a fresh copy."

Usage Pattern:

Call after UpdateParameters modifies weights:

public override void UpdateParameters(T learningRate)
{
    // Update weights using gradients
    _weights = _weights.Subtract(_weightGradients.Multiply(learningRate));
// Notify engine that GPU copy is stale
InvalidateTrainableParameter(_weights);

}

LoadWeights(Dictionary<string, Tensor<T>>, Func<string, string?>?, bool)

Loads weights from a dictionary of tensors using optional name mapping.

public virtual WeightLoadResult LoadWeights(Dictionary<string, Tensor<T>> weights, Func<string, string?>? mapping = null, bool strict = false)

Parameters

weights Dictionary<string, Tensor<T>>

Dictionary of weight name to tensor.

mapping Func<string, string>

Optional function to map source names to target names.

strict bool

If true, fails when any mapped weight fails to load.

Returns

WeightLoadResult

Load result with statistics.

MapActivationToFused()

Maps the layer's activation function to a AiDotNet.Tensors.Engines.FusedActivationType for GPU-fused operations.

protected FusedActivationType MapActivationToFused()

Returns

FusedActivationType

The corresponding AiDotNet.Tensors.Engines.FusedActivationType for the layer's activation function, or AiDotNet.Tensors.Engines.FusedActivationType.None if no activation is configured or the activation type is not supported for GPU fusion.

Remarks

This method is used by GPU-optimized layers to determine which fused activation kernel to use. Fused operations combine matrix multiplication, bias addition, and activation into a single GPU kernel, reducing memory bandwidth and improving performance.

For Beginners: When running on a GPU, combining multiple operations (like matrix multiply and activation) into one step is faster than doing them separately. This method tells the GPU which activation function to include in the combined operation.

RegisterTrainableParameter(Tensor<T>, PersistentTensorRole)

Registers a trainable parameter tensor with the engine for GPU memory optimization.

protected void RegisterTrainableParameter(Tensor<T> tensor, PersistentTensorRole role)

Parameters

tensor Tensor<T>

The tensor to register (typically weights or biases).

role PersistentTensorRole

The role of the tensor for optimization hints.

Remarks

This method hints to the engine that the tensor will be reused across many operations and should be kept resident in GPU memory when a GPU engine is active. This avoids expensive CPU-GPU data transfers on every forward pass.

Performance Impact:

Without registration: Layer weights (e.g., 285MB for a large Dense layer) are transferred to GPU on every forward pass.

With registration: Weights are transferred once and cached on GPU. Only activations (much smaller) are transferred per pass. Expected speedup: 100-1000x for large layers.

For Beginners: This method tells the GPU to keep certain data (like learned weights) in its fast memory instead of copying it back and forth every time. Think of it like keeping frequently used books on your desk instead of walking to the library each time.

Usage Pattern:

Call this method in the layer's constructor after initializing weight tensors:

public DenseLayer(int inputSize, int outputSize)
{
    _weights = new Tensor<T>(outputSize, inputSize);
    _biases = new Tensor<T>(outputSize);
    InitializeWeights();
// Register for GPU persistence
RegisterTrainableParameter(_weights, PersistentTensorRole.Weights);
RegisterTrainableParameter(_biases, PersistentTensorRole.Biases);

}

ResetState()

Resets the internal state of the layer.

public abstract void ResetState()

Remarks

This abstract method must be implemented by derived classes to reset any internal state the layer maintains between forward and backward passes. This is useful when starting to process a new sequence or when implementing stateful recurrent networks.

For Beginners: This method clears the layer's memory to start fresh.

When resetting the state:

  • Cached inputs and outputs are cleared
  • Any temporary calculations are discarded
  • The layer is ready to process new data without being influenced by previous data

This is important for:

  • Processing a new, unrelated sequence
  • Preventing information from one sequence affecting another
  • Starting a new training episode

Serialize(BinaryWriter)

Serializes the layer's parameters to a binary writer.

public virtual void Serialize(BinaryWriter writer)

Parameters

writer BinaryWriter

The binary writer to write to.

Remarks

This method writes the layer's parameters to a binary writer, which can be used to save the layer's state to a file or other storage medium. It writes the parameter count followed by each parameter value.

For Beginners: This method saves the layer's learned values to storage.

When serializing a layer:

  • The number of parameters is written first
  • Then each parameter value is written
  • All values are converted to doubles for consistent storage

This allows you to save a trained layer and reload it later without having to retrain it from scratch.

SetBiases(Tensor<T>)

Sets the bias tensor for this layer.

protected virtual void SetBiases(Tensor<T> biases)

Parameters

biases Tensor<T>

The bias tensor to set.

Remarks

Derived classes with trainable biases should override this method to update their internal bias storage. The default implementation throws an exception since LayerBase doesn't know the layer's bias structure.

Exceptions

InvalidOperationException

Thrown if the layer does not support biases.

SetParameter(string, Tensor<T>)

Sets a parameter tensor by name.

public virtual bool SetParameter(string name, Tensor<T> value)

Parameters

name string

The parameter name ("weight" or "bias").

value Tensor<T>

The tensor value to set.

Returns

bool

True if the parameter was set successfully, false if the name was not found.

Exceptions

ArgumentException

Thrown when the tensor shape doesn't match expected shape.

SetParameters(Vector<T>)

Sets the trainable parameters of the layer.

public virtual void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

A vector containing all parameters to set.

Remarks

This method sets all the trainable parameters of the layer from a single vector of parameters. The parameters vector must have the correct length to match the total number of parameters in the layer. By default, it simply assigns the parameters vector to the Parameters field, but derived classes may override this to handle the parameters differently.

For Beginners: This method updates all the learnable values in the layer.

When setting parameters:

  • The input must be a vector with the correct length
  • The layer parses this vector to set all its internal parameters
  • Throws an error if the input doesn't match the expected number of parameters

This is useful for:

  • Loading a previously saved model
  • Transferring parameters from another model
  • Setting specific parameter values for testing

Exceptions

ArgumentException

Thrown when the parameters vector has incorrect length.

SetTrainingMode(bool)

Sets whether the layer is in training mode or inference mode.

public virtual void SetTrainingMode(bool isTraining)

Parameters

isTraining bool

true to set the layer to training mode; false to set it to inference mode.

Remarks

This method sets the layer's mode to either training or inference (evaluation). Some layers behave differently during training versus inference, such as Dropout or BatchNormalization. This method only has an effect if the layer supports training.

For Beginners: This method switches the layer between learning mode and prediction mode.

Setting this mode:

  • Tells the layer whether to optimize for learning or for making predictions
  • Changes behavior in layers like Dropout (which randomly ignores neurons during training)
  • Has no effect in layers that don't support training

It's important to set this correctly before using a network - training mode for learning, inference mode for making predictions.

SetWeights(Tensor<T>)

Sets the weight tensor for this layer.

protected virtual void SetWeights(Tensor<T> weights)

Parameters

weights Tensor<T>

The weight tensor to set.

Remarks

Derived classes with trainable weights should override this method to update their internal weight storage. The default implementation throws an exception since LayerBase doesn't know the layer's weight structure.

Exceptions

InvalidOperationException

Thrown if the layer does not support weights.

TryGetParameter(string, out Tensor<T>?)

Tries to get a parameter tensor by name.

public virtual bool TryGetParameter(string name, out Tensor<T>? tensor)

Parameters

name string

The parameter name ("weight" or "bias").

tensor Tensor<T>

The parameter tensor if found.

Returns

bool

True if the parameter was found, false otherwise.

UpdateInputShape(int[])

protected void UpdateInputShape(int[] inputShape)

Parameters

inputShape int[]

UpdateParameters(Vector<T>)

Updates the parameters of the layer with the given vector of parameter values.

public virtual void UpdateParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

A vector containing all parameters to set.

Remarks

This method sets all the parameters of the layer from a single vector of parameters. The parameters vector must have the correct length to match the total number of parameters in the layer.

For Beginners: This method updates all the learnable values in the layer at once.

When updating parameters:

  • The input must be a vector with the correct length
  • This replaces all the current parameters with the new ones
  • Throws an error if the input doesn't match the expected number of parameters

This is useful for:

  • Optimizers that work with all parameters at once
  • Applying parameters from another source
  • Setting parameters to specific values for testing

Exceptions

ArgumentException

Thrown when the parameters vector has incorrect length.

UpdateParameters(T)

Updates the parameters of the layer using the calculated gradients.

public abstract void UpdateParameters(T learningRate)

Parameters

learningRate T

The learning rate to use for the parameter updates.

Remarks

This abstract method must be implemented by derived classes to define how the layer's parameters are updated during training. The learning rate controls the size of the parameter updates.

For Beginners: This method updates the layer's internal values during training.

When updating parameters:

  • The weights, biases, or other parameters are adjusted to reduce prediction errors
  • The learning rate controls how big each update step is
  • Smaller learning rates mean slower but more stable learning
  • Larger learning rates mean faster but potentially unstable learning

This is how the layer "learns" from data over time, gradually improving its ability to extract useful patterns from inputs.

UpdateParametersGpu(IGpuOptimizerConfig)

Updates the layer's parameters on GPU using the specified optimizer configuration.

public virtual void UpdateParametersGpu(IGpuOptimizerConfig config)

Parameters

config IGpuOptimizerConfig

The GPU optimizer configuration specifying the update algorithm and hyperparameters.

Remarks

This method updates weights and biases directly on GPU using the optimizer specified in the config. Supported optimizers include SGD, Adam, AdamW, RMSprop, Adagrad, NAG, LARS, and LAMB.

For Beginners: This updates the layer's learned values entirely on GPU.

The config determines which optimizer algorithm to use:

  • SGD: Simple gradient descent with optional momentum
  • Adam: Adaptive learning rates with moment estimates (most popular)
  • AdamW: Adam with proper weight decay (recommended for transformers)

Using this method keeps all training computation on the GPU for maximum speed.

Exceptions

NotSupportedException

Thrown when the layer does not support GPU training.

UploadWeightsToGpu()

Uploads the layer's weights and biases to GPU memory for GPU-resident training.

public virtual void UploadWeightsToGpu()

Remarks

Call this before starting GPU training to initialize GPU weight buffers. The CPU weights are copied to GPU and remain there until DownloadWeightsFromGpu is called.

For Beginners: This copies the layer's learned values to the GPU.

Call this once at the start of training to:

  • Create GPU buffers for weights and biases
  • Copy current values from CPU to GPU
  • Create GPU buffers for gradients and optimizer states (momentum, etc.)

After this, all training can happen on GPU without CPU involvement.

ValidateWeights(IEnumerable<string>, Func<string, string?>?)

Validates that a set of weight names can be loaded into this layer.

public virtual WeightLoadValidation ValidateWeights(IEnumerable<string> weightNames, Func<string, string?>? mapping = null)

Parameters

weightNames IEnumerable<string>

Names of weights to validate.

mapping Func<string, string>

Optional weight name mapping function.

Returns

WeightLoadValidation

Validation result with matched and unmatched names.

ZeroGradientsGpu()

Resets the GPU gradient accumulators to zero.

public virtual void ZeroGradientsGpu()

Remarks

Call this at the start of each training batch to clear accumulated gradients from the previous batch.

For Beginners: This clears the "how to improve" information from the last batch.

Each batch computes new gradients. Before processing a new batch, you need to:

  • Clear the old gradients
  • Compute fresh gradients for the current batch
  • Update weights based on the new gradients

If you forget to zero gradients, they accumulate and training goes wrong!