Class LayerBase<T>
- Namespace
- AiDotNet.NeuralNetworks.Layers
- Assembly
- AiDotNet.dll
Represents the base class for all neural network layers, providing common functionality and interfaces.
public abstract class LayerBase<T> : ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
LayerBase<T>
- Implements
-
ILayer<T>
- Derived
- Inherited Members
Remarks
LayerBase is an abstract class that serves as the foundation for all neural network layers. It defines the common structure and functionality that all layers must implement, such as forward and backward propagation, parameter management, and activation functions. This class handles the core mechanics of layers in a neural network, allowing derived classes to focus on their specific implementations.
For Beginners: This is the blueprint that all neural network layers follow.
Think of LayerBase as the common foundation that all layers are built upon:
- It defines what every layer must be able to do (process data forward and backward)
- It provides shared tools that all layers can use (like activation functions)
- It manages the shapes of data flowing in and out of layers
- It handles saving and loading layer parameters
All specific layer types (like convolutional, dense, etc.) inherit from this class, which ensures they all work together consistently in a neural network.
Constructors
LayerBase(int[], int[])
Initializes a new instance of the LayerBase<T> class with the specified input and output shapes.
protected LayerBase(int[] inputShape, int[] outputShape)
Parameters
Remarks
This constructor creates a new Layer with the specified input and output shapes. It initializes an empty parameter vector and sets up the single input shape.
For Beginners: This creates a new layer with the specified data shapes.
When creating a layer, you need to define:
- The shape of data coming in (inputShape)
- The shape of data going out (outputShape)
This helps the layer organize its operations and connect properly with other layers.
LayerBase(int[], int[], IActivationFunction<T>)
Initializes a new instance of the LayerBase<T> class with the specified shapes and element-wise activation function.
protected LayerBase(int[] inputShape, int[] outputShape, IActivationFunction<T> scalarActivation)
Parameters
inputShapeint[]The shape of the input tensor.
outputShapeint[]The shape of the output tensor.
scalarActivationIActivationFunction<T>The element-wise activation function to apply.
Remarks
This constructor creates a new Layer with the specified input and output shapes and element-wise activation function.
For Beginners: This creates a new layer with a standard activation function.
In addition to the shapes, this also sets up:
- A scalar activation function that processes each value independently
- The foundation for a layer that transforms data in a specific way
For example, you might create a layer with a ReLU activation function, which turns all negative values to zero while keeping positive values.
LayerBase(int[], int[], IVectorActivationFunction<T>)
Initializes a new instance of the LayerBase<T> class with the specified shapes and vector activation function.
protected LayerBase(int[] inputShape, int[] outputShape, IVectorActivationFunction<T> vectorActivation)
Parameters
inputShapeint[]The shape of the input tensor.
outputShapeint[]The shape of the output tensor.
vectorActivationIVectorActivationFunction<T>The vector activation function to apply.
Remarks
This constructor creates a new Layer with the specified input and output shapes and vector activation function. Vector activation functions operate on entire vectors rather than individual elements.
For Beginners: This creates a new layer with an advanced vector-based activation.
This constructor:
- Sets up the layer's input and output shapes
- Configures a vector activation that processes groups of values together
- Marks the layer as using vector activation
Vector activations like Softmax are important for specific tasks like classification, where outputs need to be interpreted as probabilities.
LayerBase(int[][], int[])
Initializes a new instance of the LayerBase<T> class with multiple input shapes and a specified output shape.
protected LayerBase(int[][] inputShapes, int[] outputShape)
Parameters
inputShapesint[][]The shapes of the input tensors.
outputShapeint[]The shape of the output tensor.
Remarks
This constructor creates a new Layer that accepts multiple inputs with different shapes. This is useful for layers that combine multiple inputs, such as concatenation or addition layers.
For Beginners: This creates a layer that can handle multiple input sources.
When creating a layer that combines different data sources:
- You need to specify the shape of each input source
- The layer needs to know how to handle multiple inputs
- The output shape defines what comes out after combining them
For example, a layer that combines features from images and text would need to know the shape of both the image and text data.
LayerBase(int[][], int[], IActivationFunction<T>)
Initializes a new instance of the LayerBase<T> class with multiple input shapes, a specified output shape, and an element-wise activation function.
protected LayerBase(int[][] inputShapes, int[] outputShape, IActivationFunction<T> scalarActivation)
Parameters
inputShapesint[][]The shapes of the input tensors.
outputShapeint[]The shape of the output tensor.
scalarActivationIActivationFunction<T>The element-wise activation function to apply.
Remarks
This constructor creates a new Layer that accepts multiple inputs with different shapes and applies an element-wise activation function to the output.
For Beginners: This creates a layer that handles multiple inputs and applies a standard activation.
This constructor:
- Sets up the layer to accept multiple input sources
- Defines the shape of the combined output
- Adds a scalar activation function that processes each output value independently
This is useful for creating complex networks that merge data from different sources.
LayerBase(int[][], int[], IVectorActivationFunction<T>)
Initializes a new instance of the LayerBase<T> class with multiple input shapes, a specified output shape, and a vector activation function.
protected LayerBase(int[][] inputShapes, int[] outputShape, IVectorActivationFunction<T> vectorActivation)
Parameters
inputShapesint[][]The shapes of the input tensors.
outputShapeint[]The shape of the output tensor.
vectorActivationIVectorActivationFunction<T>The vector activation function to apply.
Remarks
This constructor creates a new Layer that accepts multiple inputs with different shapes and applies a vector activation function to the output.
For Beginners: This creates a layer that handles multiple inputs and applies a vector-based activation.
This constructor:
- Sets up the layer to accept multiple input sources
- Defines the shape of the combined output
- Adds a vector activation function that processes groups of output values together
- Marks the layer as using vector activation
This combines the flexibility of multiple inputs with the power of vector activations.
Fields
BiasParameterName
Standard parameter name for bias tensors.
protected const string BiasParameterName = "bias"
Field Value
InitializationLock
Object used for thread-safe lazy initialization.
protected readonly object InitializationLock
Field Value
IsTrainingMode
Gets or sets a value indicating whether the layer is in training mode.
protected bool IsTrainingMode
Field Value
Remarks
This flag indicates whether the layer is currently in training mode or inference (evaluation) mode. Some layers behave differently during training versus inference, such as Dropout or BatchNormalization.
For Beginners: This tells the layer whether it's currently training or being used for predictions.
This mode flag:
- Affects how certain layers behave
- Can turn on/off special training features
- Helps the network switch between learning and using what it learned
For example, dropout layers randomly turn off neurons during training to improve generalization, but during inference they don't drop anything.
ParameterGradients
The gradients of the trainable parameters.
protected Vector<T>? ParameterGradients
Field Value
- Vector<T>
Remarks
This vector contains the gradients of all trainable parameters for the layer. These gradients indicate how each parameter should be adjusted during training to reduce the error.
For Beginners: These values show how to adjust the parameters during training.
Parameter gradients:
- Tell the network which direction to change each parameter
- Show how sensitive the error is to each parameter
- Guide the learning process
A larger gradient means a parameter has more influence on the error and needs a bigger adjustment during training.
Parameters
The trainable parameters of this layer.
protected Vector<T> Parameters
Field Value
- Vector<T>
Remarks
This vector contains all trainable parameters for the layer, such as weights and biases. The specific interpretation of these parameters depends on the layer type.
For Beginners: These are the values that the layer learns during training.
Parameters include:
- Weights that determine how important each input is
- Biases that provide a baseline or starting point
- Other learnable values specific to certain layer types
During training, these values are adjusted to make the network's predictions better.
WeightParameterName
Standard parameter name for weight tensors.
protected const string WeightParameterName = "weight"
Field Value
Properties
CanExecuteOnGpu
Gets whether this layer can execute its forward pass on GPU.
public virtual bool CanExecuteOnGpu { get; }
Property Value
Remarks
Returns true when both the layer supports GPU execution AND a GPU engine is currently active. Use this to check at runtime whether GPU forward pass is available.
For Beginners: Check this before calling ForwardGpu. It combines "does the layer have GPU code?" with "is the GPU engine active?"
CanTrainOnGpu
Gets whether this layer can execute GPU training (forward, backward, parameter update).
public virtual bool CanTrainOnGpu { get; }
Property Value
Remarks
Returns true when both the layer supports GPU training AND a GPU engine is currently active.
For Beginners: Check this before attempting GPU-resident training. If false, training will fall back to CPU operations.
Engine
Gets the global execution engine for vector operations.
protected IEngine Engine { get; }
Property Value
- IEngine
InitializationStrategy
Gets or sets the initialization strategy for this layer.
public IInitializationStrategy<T>? InitializationStrategy { get; set; }
Property Value
Remarks
The initialization strategy controls when and how the layer's weights are allocated and initialized. Lazy initialization defers weight allocation until the first forward pass, which significantly speeds up network construction.
For Beginners: This controls when the layer sets up its internal weights.
Lazy initialization:
- Defers weight allocation until the layer is actually used
- Makes network construction much faster
- Useful for tests and when comparing network architectures
Eager initialization:
- Allocates weights immediately at construction time
- Traditional behavior, weights are ready immediately
InputShape
Gets the input shape for this layer.
protected int[] InputShape { get; }
Property Value
- int[]
Remarks
This property contains the shape of the input tensor that the layer expects. For example, a 2D convolutional layer might expect an input shape of [batchSize, channels, height, width].
For Beginners: This defines the shape of data this layer expects to receive.
The input shape:
- Tells the layer how many dimensions the input data has
- Specifies the size of each dimension
- Helps the layer organize its operations properly
For example, if processing images that are 28x28 pixels with 1 color channel, the input shape might be [1, 28, 28] (channels, height, width).
InputShapes
Gets the input shapes for this layer, supporting multiple inputs.
protected int[][] InputShapes { get; }
Property Value
- int[][]
Remarks
This property contains the shapes of all input tensors that the layer expects, for layers that accept multiple inputs (such as merge layers).
For Beginners: This defines the shapes of all input sources for layers that take multiple inputs.
For layers that combine multiple data sources:
- Each input may have a different shape
- This array stores all those shapes
- Helps the layer handle multiple inputs properly
For example, a layer that combines features from two different sources would need to know the shape of each source.
IsInitialized
Gets a value indicating whether this layer has been initialized.
public virtual bool IsInitialized { get; }
Property Value
Remarks
For layers with lazy initialization, this indicates whether the weights have been allocated and initialized. For eager initialization, this is always true after construction.
For Beginners: This tells you if the layer's weights are ready to use.
A value of true means:
- Weights have been allocated
- The layer is ready for forward/backward passes
A value of false means:
- Weights are not yet allocated (lazy initialization)
- The first Forward() call will initialize them
NamedParameterCount
Gets the total number of named parameters.
public virtual int NamedParameterCount { get; }
Property Value
NumOps
Gets the numeric operations provider for type T.
protected INumericOperations<T> NumOps { get; }
Property Value
- INumericOperations<T>
Remarks
This property provides access to numeric operations (like addition, multiplication, etc.) that work with the generic type T. This allows the layer to perform mathematical operations regardless of whether T is float, double, or another numeric type.
For Beginners: This is a toolkit for math operations that works with different number types.
It provides:
- Basic math operations (add, subtract, multiply, etc.)
- Ways to convert between different number formats
- Special math functions needed by neural networks
This allows the layer to work with different types of numbers (float, double, etc.) without needing different code for each type.
OutputShape
Gets the output shape for this layer.
protected int[] OutputShape { get; }
Property Value
- int[]
Remarks
This property contains the shape of the output tensor that the layer produces. For example, a 2D convolutional layer with 16 filters might produce an output shape of [batchSize, 16, height, width].
For Beginners: This defines the shape of data this layer produces as output.
The output shape:
- Tells the next layer what shape of data to expect
- Shows how this layer transforms the data dimensions
- Helps verify the network is structured correctly
For example, if a layer reduces image size from 28x28 to 14x14 and produces 16 feature maps, the output shape might be [16, 14, 14] (channels, height, width).
ParameterCount
Gets the total number of parameters in this layer.
public virtual int ParameterCount { get; }
Property Value
- int
The total number of trainable parameters.
Remarks
This property returns the total number of trainable parameters in the layer. By default, it returns the length of the Parameters vector, but derived classes can override this to calculate the number of parameters differently.
For Beginners: This tells you how many learnable values the layer has.
The parameter count:
- Shows how complex the layer is
- Indicates how many values need to be learned during training
- Can help estimate memory usage and computational requirements
Layers with more parameters can potentially learn more complex patterns but may also require more data to train effectively.
Random
Gets the thread-safe random number generator.
protected static Random Random { get; }
Property Value
Remarks
This property provides access to the centralized thread-safe random number generator, which is used for initializing weights and other parameters that require randomization.
For Beginners: This provides random numbers for initializing the layer.
Random numbers are needed to:
- Set starting values for weights and biases
- Add randomness to avoid symmetry problems
- Help the network learn diverse patterns
Good initialization with proper randomness is important for neural networks to learn effectively.
ScalarActivation
Gets the element-wise activation function for this layer, if specified.
public IActivationFunction<T>? ScalarActivation { get; }
Property Value
Remarks
The scalar activation function applies to individual values in the layer's output tensor. Common activation functions include ReLU, Sigmoid, and Tanh.
For Beginners: This is the function that adds non-linearity to each value individually.
Activation functions:
- Add non-linearity, helping the network learn complex patterns
- Process each number one at a time
- Transform values into more useful ranges (like 0 to 1, or -1 to 1)
For example, ReLU turns all negative values to zero while keeping positive values unchanged. Without activation functions, neural networks couldn't learn complex patterns.
SupportsGpuExecution
Gets whether this layer has a GPU execution implementation for inference.
protected virtual bool SupportsGpuExecution { get; }
Property Value
Remarks
Override this to return true when the layer implements ForwardGpu(params IGpuTensor<T>[]). The actual CanExecuteOnGpu property combines this with engine availability.
For Beginners: This flag indicates if the layer has GPU code for the forward pass. Set this to true in derived classes that implement ForwardGpu.
SupportsGpuTraining
Gets whether this layer has full GPU training support (forward, backward, and parameter updates).
public virtual bool SupportsGpuTraining { get; }
Property Value
Remarks
This property indicates whether the layer can perform its entire training cycle on GPU without downloading data to CPU. A layer has full GPU training support when:
- ForwardGpu is implemented
- BackwardGpu is implemented
- UpdateParametersGpu is implemented (for layers with trainable parameters)
- GPU weight/bias/gradient buffers are properly managed
For Beginners: This tells you if training can happen entirely on GPU.
GPU-resident training is much faster because:
- Data stays on GPU between forward and backward passes
- No expensive CPU-GPU transfers during each training step
- GPU kernels handle all gradient computation
Only layers that return true here can participate in fully GPU-resident training.
SupportsJitCompilation
Gets whether this layer supports JIT compilation.
public abstract bool SupportsJitCompilation { get; }
Property Value
- bool
True if the layer can be JIT compiled, false otherwise.
Remarks
This property indicates whether the layer has implemented ExportComputationGraph() and can benefit from JIT compilation. All layers MUST implement this property.
For Beginners: JIT compilation can make inference 5-10x faster by converting the layer's operations into optimized native code.
Layers should return false if they:
- Have not yet implemented a working ExportComputationGraph()
- Use dynamic operations that change based on input data
- Are too simple to benefit from JIT compilation
When false, the layer will use the standard Forward() method instead.
SupportsTraining
Gets a value indicating whether this layer supports training.
public abstract bool SupportsTraining { get; }
Property Value
- bool
trueif the layer has trainable parameters and supports backpropagation; otherwise,false.
Remarks
This property indicates whether the layer can be trained through backpropagation. Layers with trainable parameters such as weights and biases typically return true, while layers that only perform fixed transformations (like pooling or activation layers) typically return false.
For Beginners: This property tells you if the layer can learn from data.
A value of true means:
- The layer has parameters that can be adjusted during training
- It will improve its performance as it sees more data
- It participates in the learning process
A value of false means:
- The layer doesn't have any adjustable parameters
- It performs the same operation regardless of training
- It doesn't need to learn (but may still be useful)
UseAutodiff
Gets or sets a value indicating whether this layer uses automatic differentiation for backward passes.
public bool UseAutodiff { get; set; }
Property Value
- bool
trueif the layer should use autodiff;falseif it uses manual backward implementation. Default isfalse.
Remarks
This property controls whether the layer uses the automatic differentiation system (autodiff) or manual backward pass implementations during training. Manual backward passes are typically faster but require explicit gradient computation code. Autodiff is more flexible and can be useful for: - Custom layer implementations where manual gradients are complex - Research and experimentation with novel architectures - Rapid prototyping of new layer types
For Beginners: This controls how the layer computes gradients during training.
Two modes are available:
- Manual (default, false): Uses hand-written, optimized gradient code. Faster but requires careful implementation.
- Autodiff (true): Uses automatic differentiation to compute gradients. Slower but more flexible and less error-prone.
Most users should leave this as false (default) for best performance. Set to true only for:
- Custom layers with complex gradients
- Experimental or research purposes
- When you need guaranteed correct gradients for a new operation
Note: Autodiff support must be implemented by the specific layer type. Not all layers support autodiff mode yet.
UsingVectorActivation
Gets a value indicating whether this layer uses a vector activation function.
protected bool UsingVectorActivation { get; }
Property Value
Remarks
This property indicates whether the layer is using a vector activation function or an element-wise activation function. It is used to determine which type of activation to apply during forward and backward passes.
For Beginners: This tells the layer which type of activation function to use.
It's like a switch that determines:
- Whether to process values one by one (scalar activation)
- Or to process groups of values together (vector activation)
This helps the layer know which method to use when applying activations.
VectorActivation
Gets the vector activation function for this layer, if specified.
public IVectorActivationFunction<T>? VectorActivation { get; }
Property Value
Remarks
The vector activation function applies to entire vectors in the layer's output tensor. This can capture dependencies between different elements of the vectors, such as in Softmax.
For Beginners: This is a more advanced function that processes groups of values together.
Vector activation functions:
- Process entire groups of numbers together, not just one at a time
- Can capture relationships between different features
- Are used for special purposes like classification (Softmax)
For example, Softmax turns a vector of numbers into probabilities that sum to 1, which is useful for classifying inputs into categories.
Methods
ActivateTensor(IActivationFunction<T>?, Tensor<T>)
Applies a scalar activation function to each element of a tensor.
protected Tensor<T> ActivateTensor(IActivationFunction<T>? activation, Tensor<T> input)
Parameters
activationIActivationFunction<T>The scalar activation function to apply.
inputTensor<T>The input tensor to activate.
Returns
- Tensor<T>
The activated tensor.
Remarks
This helper method applies a scalar activation function to each element of a tensor. If the activation function is null, it returns the input tensor unchanged.
For Beginners: This method applies an activation function to each value in a tensor.
Activation functions:
- Transform values in specific ways (like sigmoid squeezes values between 0 and 1)
- Add non-linearity, which helps neural networks learn complex patterns
- Are applied individually to each number in the data
If no activation function is provided, the values pass through unchanged.
ActivateTensor(IVectorActivationFunction<T>?, Tensor<T>)
Applies a vector activation function to a tensor.
protected Tensor<T> ActivateTensor(IVectorActivationFunction<T>? activation, Tensor<T> input)
Parameters
activationIVectorActivationFunction<T>The vector activation function to apply.
inputTensor<T>The input tensor to activate.
Returns
- Tensor<T>
The activated tensor.
Remarks
This helper method applies a vector activation function to a tensor. If the activation function is null, it returns the input tensor unchanged. Vector activation functions operate on entire tensors at once, which can be more efficient than element-wise operations.
For Beginners: This method applies an activation function to an entire tensor at once.
Vector activation functions:
- Process entire groups of values simultaneously
- Can be more efficient than processing one value at a time
- Provide the same mathematical result but often faster
If no activation function is provided, the values pass through unchanged.
ApplyActivation(Tensor<T>)
Applies the activation function to a tensor.
protected Tensor<T> ApplyActivation(Tensor<T> input)
Parameters
inputTensor<T>The input tensor to activate.
Returns
- Tensor<T>
The activated tensor.
ApplyActivation(Vector<T>)
Applies the activation function to a vector.
protected Vector<T> ApplyActivation(Vector<T> input)
Parameters
inputVector<T>The input vector to activate.
Returns
- Vector<T>
The activated vector.
Remarks
This method applies the layer's activation function to a vector. It uses the vector activation function if one is specified, or applies the scalar activation function element-wise if no vector activation is available.
For Beginners: This method applies the activation function to a vector of values.
This method:
- First checks if a vector activation function is available (processes all elements together)
- If not, uses the scalar activation function (processes each element independently)
- If neither is available, returns the input unchanged (identity function)
This flexibility allows the layer to use the most appropriate activation method based on what was specified during creation.
ApplyActivationBackwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer?, IGpuBuffer?, IGpuBuffer, int)
Applies the layer's activation function backward pass on GPU using the activation's own GPU method.
protected bool ApplyActivationBackwardGpu(IDirectGpuBackend backend, IGpuBuffer gradOutput, IGpuBuffer? input, IGpuBuffer? output, IGpuBuffer gradInput, int size)
Parameters
backendIDirectGpuBackendThe GPU backend to use for execution.
gradOutputIGpuBufferThe gradient flowing back from the next layer.
inputIGpuBufferThe input buffer from the forward pass (needed for some activations).
outputIGpuBufferThe output buffer from the forward pass (needed for some activations).
gradInputIGpuBufferThe buffer to store the input gradient.
sizeintThe number of elements to process.
Returns
- bool
True if the backward pass was applied on GPU; false if no activation or GPU not supported.
Remarks
This method follows the Open/Closed Principle by delegating to the activation function's own GPU backward implementation. Each activation function knows what it needs: - ReLU, GELU, Swish, LeakyReLU, SiLU, Mish, etc.: Need the input from forward pass - Sigmoid, Tanh: Need the output from forward pass - ELU: Needs both input and output from forward pass
For Beginners: During training, we need to compute how the activation affects the gradients. Each activation function handles this differently, and by delegating to the activation's BackwardGpu method, we don't need to know the details here.
ApplyActivationDerivative(Tensor<T>, Tensor<T>)
Applies the derivative of the activation function to a tensor.
protected Tensor<T> ApplyActivationDerivative(Tensor<T> input, Tensor<T> outputGradient)
Parameters
inputTensor<T>The input tensor.
outputGradientTensor<T>The output gradient tensor.
Returns
- Tensor<T>
The input gradient tensor after applying the activation derivative.
Remarks
This method applies the derivative of the layer's activation function to a tensor during the backward pass. It multiplies the derivative of the activation function at each point in the input tensor by the corresponding output gradient.
For Beginners: This calculates how small changes in values affect the output.
During backpropagation:
- This method handles tensors (multi-dimensional arrays of values)
- It applies the correct derivative calculation based on the activation type
- For vector activations, it uses the specialized derivative method
- For scalar activations, it applies the derivative to each value independently
This is a key part of the math that allows neural networks to learn through backpropagation.
Exceptions
- ArgumentException
Thrown when the input and output gradient tensors have different ranks.
ApplyActivationDerivative(Vector<T>, Vector<T>)
Applies the derivative of the activation function to a vector.
protected Vector<T> ApplyActivationDerivative(Vector<T> input, Vector<T> outputGradient)
Parameters
inputVector<T>The input vector.
outputGradientVector<T>The output gradient vector.
Returns
- Vector<T>
The input gradient vector after applying the activation derivative.
Remarks
This method applies the derivative of the activation function to a vector during the backward pass. It computes the Jacobian matrix of the activation function and multiplies it by the output gradient.
For Beginners: This calculates how changes in a vector of values affect the output.
For vector operations:
- This method computes the full matrix of relationships between inputs and outputs
- It then multiplies this matrix by the incoming gradient
- The result shows how each input value should be adjusted
This is a more comprehensive approach than the element-wise method, accounting for cases where each output depends on multiple inputs.
ApplyActivationDerivative(T, T)
Applies the derivative of the activation function to a single value.
protected T ApplyActivationDerivative(T input, T outputGradient)
Parameters
inputTThe input value.
outputGradientTThe output gradient.
Returns
- T
The input gradient after applying the activation derivative.
Remarks
This method applies the derivative of the layer's activation function to a single value during the backward pass. It multiplies the derivative of the activation function at the input value by the output gradient.
For Beginners: This calculates how a small change in one value affects the output.
During backpropagation:
- We need to know how sensitive each value is to changes
- This method calculates that sensitivity for a single value
- It multiplies the activation derivative by the incoming gradient
This helps determine how much each individual value should be adjusted during training.
ApplyActivationForwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int)
Applies the layer's activation function forward pass on GPU using the activation's own GPU method.
protected bool ApplyActivationForwardGpu(IDirectGpuBackend backend, IGpuBuffer input, IGpuBuffer output, int size)
Parameters
backendIDirectGpuBackendThe GPU backend to use for execution.
inputIGpuBufferThe input GPU buffer.
outputIGpuBufferThe output GPU buffer.
sizeintThe number of elements to process.
Returns
- bool
True if the activation was applied on GPU; false if no activation or GPU not supported.
Remarks
This method follows the Open/Closed Principle by delegating to the activation function's own GPU implementation rather than using a switch statement on activation types. Each activation function knows how to apply itself on GPU.
For Beginners: Instead of having one giant switch statement that handles every possible activation type, each activation function has its own ForwardGpu method. This makes it easy to add new activation functions without modifying this code.
ApplyActivationToGraph(ComputationNode<T>)
Applies the layer's configured activation function to a computation graph node.
protected ComputationNode<T> ApplyActivationToGraph(ComputationNode<T> input)
Parameters
inputComputationNode<T>The computation node to apply activation to.
Returns
- ComputationNode<T>
The computation node with activation applied.
Remarks
This helper method delegates to the activation's ApplyToGraph method, following the Open/Closed Principle. Adding new activations does not require modifying layer code.
For Beginners: This method adds the activation function to the computation graph.
Instead of the layer code checking what type of activation is configured (which would require changing the layer every time a new activation is added), this method simply asks the activation to add itself to the graph. This makes the code more maintainable and extensible.
Exceptions
- ArgumentNullException
Thrown if input is null.
- NotSupportedException
Thrown if activation does not support JIT.
ApplyGpuActivation(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int, FusedActivationType)
Applies the specified activation function on GPU using the direct backend operations.
protected static void ApplyGpuActivation(IDirectGpuBackend backend, IGpuBuffer input, IGpuBuffer output, int size, FusedActivationType activation)
Parameters
backendIDirectGpuBackendThe GPU backend to use for activation.
inputIGpuBufferThe input GPU buffer.
outputIGpuBufferThe output GPU buffer.
sizeintThe number of elements to process.
activationFusedActivationTypeThe type of activation function to apply.
Remarks
This method is primarily used for fused kernel operations where the activation type is specified via the AiDotNet.Tensors.Engines.FusedActivationType enum. It maps enum values to the corresponding backend activation kernels.
Note: For new code, prefer using ApplyActivationForwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int) which follows the Open/Closed Principle by delegating to each activation function's own GPU implementation. This allows new activation functions to be added without modifying this switch statement.
This static method only supports common activations (ReLU, Sigmoid, Tanh, GELU, LeakyReLU, Swish). For other activations, use the OCP-compliant method instead.
ApplyGpuActivationBackward(IDirectGpuBackend, IGpuBuffer, IGpuBuffer?, IGpuBuffer?, IGpuBuffer, int, FusedActivationType, float)
Applies the backward pass of the specified activation function on GPU.
protected static bool ApplyGpuActivationBackward(IDirectGpuBackend backend, IGpuBuffer gradOutput, IGpuBuffer? input, IGpuBuffer? output, IGpuBuffer gradInput, int size, FusedActivationType activation, float alpha = 0.01)
Parameters
backendIDirectGpuBackendThe GPU backend to use for activation backward.
gradOutputIGpuBufferThe gradient from the next layer.
inputIGpuBufferThe input from the forward pass (needed for ReLU, LeakyReLU, GELU, Swish).
outputIGpuBufferThe output from the forward pass (needed for Sigmoid, Tanh).
gradInputIGpuBufferThe buffer to store the input gradient.
sizeintThe number of elements to process.
activationFusedActivationTypeThe type of activation function.
alphafloatAlpha parameter for LeakyReLU (default 0.01).
Returns
- bool
True if the backward was handled on GPU, false if CPU fallback is needed.
Remarks
This method is primarily used for fused kernel operations where the activation type is specified via the AiDotNet.Tensors.Engines.FusedActivationType enum.
Note: For new code, prefer using ApplyActivationBackwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer?, IGpuBuffer?, IGpuBuffer, int) which follows the Open/Closed Principle by delegating to each activation function's own GPU backward implementation. This allows new activation functions to be added without modifying this switch statement.
Different activation functions require different cached values from forward pass:
- ReLU, LeakyReLU, GELU, Swish: Need the input from forward pass
- Sigmoid, Tanh: Need the output from forward pass
Backward(Tensor<T>)
Performs the backward pass of the layer.
public abstract Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>The gradient of the loss with respect to the layer's output.
Returns
- Tensor<T>
The gradient of the loss with respect to the layer's input.
Remarks
This abstract method must be implemented by derived classes to define the backward pass of the layer. The backward pass propagates error gradients from the output of the layer back to its input, and calculates gradients for any trainable parameters.
For Beginners: This method is used during training to calculate how the layer's input should change to reduce errors.
During the backward pass:
- The layer receives information about how its output contributed to errors
- It calculates how its parameters should change to reduce errors
- It calculates how its input should change, which will be used by earlier layers
This is the core of how neural networks learn from their mistakes during training.
BackwardGpu(IGpuTensor<T>)
Performs the backward pass of the layer on GPU.
public virtual IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)
Parameters
outputGradientIGpuTensor<T>The GPU-resident gradient of the loss with respect to the layer's output.
Returns
- IGpuTensor<T>
The GPU-resident gradient of the loss with respect to the layer's input.
Remarks
This method performs the layer's backward computation entirely on GPU, including:
- Computing input gradients to pass to previous layers
- Computing and storing weight gradients on GPU (for layers with trainable parameters)
- Computing and storing bias gradients on GPU
For Beginners: This is like Backward() but runs entirely on GPU.
During GPU training:
- Output gradients come in (on GPU)
- Input gradients are computed (stay on GPU)
- Weight/bias gradients are computed and stored (on GPU)
- Input gradients are returned for the previous layer
All data stays on GPU - no CPU round-trips needed!
Exceptions
- NotSupportedException
Thrown when the layer does not support GPU training.
CalculateInputShape(int, int, int)
Calculates a standard input shape for 2D data with batch size of 1.
protected static int[] CalculateInputShape(int inputDepth, int height, int width)
Parameters
inputDepthintThe depth (number of channels) of the input.
heightintThe height of the input.
widthintThe width of the input.
Returns
- int[]
An array representing the input shape [batch, depth, height, width].
Remarks
This helper method calculates a standard input shape for 2D data (like images) with a batch size of 1. The shape follows the NCHW (batch, channels, height, width) format.
For Beginners: This method creates a standard shape for image-like data.
When working with images or similar 2D data:
- This creates a standard shape array in the format [batch, channels, height, width]
- The batch dimension is set to 1 (processing one item at a time)
- The other dimensions come from the parameters
For example, for a 28x28 grayscale image, you might use inputDepth=1, height=28, width=28, resulting in a shape of [1, 1, 28, 28].
CalculateOutputShape(int, int, int)
Calculates a standard output shape for 2D data with batch size of 1.
protected static int[] CalculateOutputShape(int outputDepth, int outputHeight, int outputWidth)
Parameters
outputDepthintThe depth (number of channels) of the output.
outputHeightintThe height of the output.
outputWidthintThe width of the output.
Returns
- int[]
An array representing the output shape [batch, depth, height, width].
Remarks
This helper method calculates a standard output shape for 2D data (like images) with a batch size of 1. The shape follows the NCHW (batch, channels, height, width) format.
For Beginners: This method creates a standard shape for image-like output data.
When defining the output shape for 2D data:
- This creates a standard shape array in the format [batch, channels, height, width]
- The batch dimension is set to 1 (producing one output at a time)
- The other dimensions come from the parameters
For example, if a convolutional layer produces 16 feature maps of size 14x14, you might use outputDepth=16, outputHeight=14, outputWidth=14.
CanActivationBeJitted()
Checks if the layer's current activation function supports JIT compilation.
protected bool CanActivationBeJitted()
Returns
- bool
True if the activation can be JIT compiled, false otherwise.
Remarks
This method checks whether the layer's configured activation function supports JIT compilation by querying the activation's SupportsJitCompilation property. If no activation is configured, returns true (identity function is always JIT-compatible).
For Beginners: This method checks if the activation is ready for JIT compilation.
The layer uses this to determine if it can export a computation graph for faster inference. If the activation does not support JIT yet (because gradients are not implemented), the layer will fall back to the standard execution path.
ClearGradients()
Clears all parameter gradients in this layer.
public virtual void ClearGradients()
Remarks
This method sets all parameter gradients to zero. This is typically called at the beginning of each batch during training to ensure that gradients from previous batches don't affect the current batch.
For Beginners: This method resets all adjustment values to zero to start fresh.
Clearing gradients:
- Erases all previous adjustment information
- Prepares the layer for a new training batch
- Prevents old adjustments from interfering with new ones
This is typically done at the start of processing each batch of training data to ensure clean, accurate gradient calculations.
Clone()
Creates a copy of this layer.
public virtual LayerBase<T> Clone()
Returns
- LayerBase<T>
A new instance of the layer with the same configuration.
Remarks
This method creates a shallow copy of the layer with deep copies of the input/output shapes and activation functions. Derived classes should override this method to properly copy any additional fields they define.
For Beginners: This method creates a duplicate of this layer.
When copying a layer:
- Basic properties like shapes are duplicated
- Activation functions are cloned
- The new layer works independently from the original
This is useful for:
- Creating similar layers with small variations
- Implementing complex network architectures with repeated patterns
- Saving a layer's state before making changes
ComputeActivationJacobian(Vector<T>)
Computes the Jacobian matrix of the activation function for a given input vector.
protected Matrix<T> ComputeActivationJacobian(Vector<T> input)
Parameters
inputVector<T>The input vector.
Returns
- Matrix<T>
The Jacobian matrix of the activation function at the input.
Remarks
This method computes the Jacobian matrix of the activation function, which represents how each output element changes with respect to each input element. For vector activation functions, it uses the function's derivative method. For scalar activation functions, it creates a diagonal matrix with the derivatives.
For Beginners: This calculates a matrix that shows how changes in inputs affect outputs.
The Jacobian matrix:
- Shows how each output value depends on each input value
- For scalar activations, it's a diagonal matrix (each output depends only on the corresponding input)
- For vector activations, it can have off-diagonal elements (outputs depend on multiple inputs)
This is an advanced concept used in certain optimization techniques and for precise gradient calculations.
DerivativeTensor(IActivationFunction<T>?, Tensor<T>)
Calculates the derivative of a scalar activation function for each element of a tensor.
protected Tensor<T> DerivativeTensor(IActivationFunction<T>? activation, Tensor<T> input)
Parameters
activationIActivationFunction<T>The scalar activation function.
inputTensor<T>The input tensor.
Returns
- Tensor<T>
A tensor containing the derivatives.
Remarks
This helper method calculates the derivative of a scalar activation function for each element of a tensor. If the activation function is null, it returns a tensor filled with ones, representing the derivative of the identity function.
For Beginners: This method calculates how sensitive each value is to changes.
The derivative:
- Measures how much the output changes when the input changes slightly
- Is essential for the backpropagation algorithm during training
- Helps determine how to adjust weights to reduce errors
If no activation function is provided, it assumes the identity function (y = x), which has a derivative of 1 everywhere.
Deserialize(BinaryReader)
Deserializes the layer's parameters from a binary reader.
public virtual void Deserialize(BinaryReader reader)
Parameters
readerBinaryReaderThe binary reader to read from.
Remarks
This method reads the layer's parameters from a binary reader, which can be used to load the layer's state from a file or other storage medium. It reads the parameter count followed by each parameter value.
For Beginners: This method loads the layer's learned values from storage.
When deserializing a layer:
- The number of parameters is read first
- Then each parameter value is read
- All values are converted from doubles to the appropriate numeric type
This allows you to load a previously trained layer without having to retrain it from scratch.
Dispose()
Releases all resources used by this layer, including any GPU resources.
public void Dispose()
Remarks
This method releases any resources allocated by the layer, including GPU memory for persistent tensors. All layers that allocate resources should override Dispose(bool) to properly release them.
For Beginners: GPU memory is limited and precious.
When you're done with a layer:
- Call Dispose() or use a 'using' statement
- This frees up GPU memory for other operations
- Failing to dispose can cause memory leaks
Example:
using var layer = new DenseLayer<float>(784, 128);
// ... use layer ...
// Automatically disposed when out of scope
Dispose(bool)
Releases resources used by this layer.
protected virtual void Dispose(bool disposing)
Parameters
disposingboolTrue if called from Dispose(), false if called from finalizer.
Remarks
Override this method in derived classes to release layer-specific resources. Always call base.Dispose(disposing) after releasing your resources.
For Beginners: When creating a custom layer with resources:
protected override void Dispose(bool disposing)
{
if (disposing)
{
// Release your managed resources here
_myGpuHandle?.Dispose();
_myGpuHandle = null;
}
base.Dispose(disposing);
}
DownloadWeightsFromGpu()
Downloads the layer's weights and biases from GPU memory back to CPU.
public virtual void DownloadWeightsFromGpu()
Remarks
Call this after GPU training to sync weights back to CPU for:
- Model checkpointing / saving
- CPU inference
- Inspection of trained weights
For Beginners: This copies learned values back from GPU to CPU.
During GPU training, weights are modified on GPU and the CPU copy is stale. Call this to:
- Save the model to disk
- Switch to CPU inference
- Examine what the layer learned
This is relatively expensive, so only do it when necessary (not every batch).
EnsureInitialized()
Ensures that the layer is initialized. Call this at the start of Forward() for lazy initialization.
protected virtual void EnsureInitialized()
Remarks
For layers that support lazy initialization, this method should be called at the start of Forward() to ensure weights are allocated before use. The default implementation does nothing (for layers without lazy initialization support).
For Beginners: This makes sure the layer is ready before processing data.
For lazy initialization:
- First call allocates and initializes weights
- Subsequent calls do nothing (weights already initialized)
- Thread-safe for parallel execution
ExportComputationGraph(List<ComputationNode<T>>)
Exports the layer's computation graph for JIT compilation.
public abstract ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)
Parameters
inputNodesList<ComputationNode<T>>List to populate with input computation nodes.
Returns
- ComputationNode<T>
The output computation node representing the layer's operation.
Remarks
This method constructs a computation graph representation of the layer's forward pass that can be JIT compiled for faster inference. All layers MUST implement this method to support JIT compilation.
For Beginners: JIT (Just-In-Time) compilation converts the layer's operations into optimized native code for 5-10x faster inference.
To support JIT compilation, a layer must:
- Implement this method to export its computation graph
- Set SupportsJitCompilation to true
- Use ComputationNode and TensorOperations to build the graph
All layers are required to implement this method, even if they set SupportsJitCompilation = false.
Forward(Tensor<T>)
Performs the forward pass of the layer.
public abstract Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>The input tensor to process.
Returns
- Tensor<T>
The output tensor after processing.
Remarks
This abstract method must be implemented by derived classes to define the forward pass of the layer. The forward pass transforms the input tensor according to the layer's operation and activation function.
For Beginners: This method processes your data through the layer.
The forward pass:
- Takes input data from the previous layer or the network input
- Applies the layer's specific transformation (like convolution or matrix multiplication)
- Applies any activation function
- Passes the result to the next layer
This is where the actual data processing happens during both training and prediction.
Forward(params Tensor<T>[])
Performs the forward pass of the layer with multiple input tensors.
public virtual Tensor<T> Forward(params Tensor<T>[] inputs)
Parameters
inputsTensor<T>[]The input tensors to process.
Returns
- Tensor<T>
The output tensor after processing.
Remarks
This method implements a default forward pass for layers that accept multiple inputs. By default, it concatenates the inputs along the channel dimension. Derived classes can override this method to implement more specific behavior for multiple inputs.
For Beginners: This method handles processing multiple inputs through the layer.
When a layer needs to combine multiple data sources:
- This method takes all the input tensors
- By default, it combines them by stacking them along the channel dimension
- It checks that the inputs are compatible (same shape except for channels)
- It then passes the combined data forward
For example, if combining features from two sources each with 10 channels, this would create a tensor with 20 channels by default.
Specialized layers can override this to combine inputs in different ways.
Exceptions
- ArgumentException
Thrown when no input tensors are provided or when input tensors have incompatible shapes.
ForwardGpu(params IGpuTensor<T>[])
Performs the forward pass of the layer on GPU.
public virtual IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)
Parameters
inputsIGpuTensor<T>[]The GPU-resident input tensor(s).
Returns
- IGpuTensor<T>
The GPU-resident output tensor.
Remarks
This method performs the layer's forward computation entirely on GPU. The input and output tensors remain in GPU memory, avoiding expensive CPU-GPU transfers.
For Beginners: This is like Forward() but runs on the graphics card.
The key difference:
- Forward() uses CPU tensors that may be copied to/from GPU
- ForwardGpu() keeps everything on GPU the whole time
Override this in derived classes that support GPU acceleration.
Exceptions
- NotSupportedException
Thrown when the layer does not support GPU execution.
GetActivationTypes()
Gets the types of activation functions used by this layer.
public virtual IEnumerable<ActivationFunction> GetActivationTypes()
Returns
- IEnumerable<ActivationFunction>
An enumerable of activation function types.
Remarks
This method returns the types of activation functions used by this layer. This is useful for serialization and debugging purposes.
For Beginners: This method tells you what kinds of activation functions the layer uses.
This information:
- Helps identify what non-linearities are applied in the layer
- Is useful for saving/loading models
- Helps with debugging and visualization
The information is returned as standardized activation types (like ReLU, Sigmoid, etc.) rather than the actual function objects.
GetBiases()
Gets the bias tensor for layers that have trainable biases.
public virtual Tensor<T>? GetBiases()
Returns
- Tensor<T>
The bias tensor, or null if the layer has no biases.
Remarks
This method provides access to the layer's bias tensor for layers that use biases during computation. Layers without biases return null.
For Beginners: Biases are learnable offsets added to the layer's output.
Think of biases as a starting point:
- Without bias: output = weights × input
- With bias: output = weights × input + bias
Biases help the network learn more flexible patterns by shifting the activation function.
GetDiagnostics()
Gets diagnostic information about this layer's state and behavior.
public virtual Dictionary<string, string> GetDiagnostics()
Returns
- Dictionary<string, string>
A dictionary containing diagnostic metrics for this layer. Base implementation provides common metrics like layer type, input/output shapes, and parameter count. Derived classes can override this method to add layer-specific diagnostics.
Remarks
The base implementation provides the following diagnostics:
layer.type: The concrete type name of the layerlayer.input_shape: The shape of input tensorslayer.output_shape: The shape of output tensorslayer.parameter_count: The total number of trainable parameterslayer.supports_training: Whether the layer has trainable parameterslayer.activation: The activation function type, if any
For Beginners: This method returns a report card with useful information about the layer.
The diagnostics help you understand:
- What type of layer this is (Dense, Convolutional, etc.)
- What size of data it expects (input shape)
- What size of data it produces (output shape)
- How many parameters it's learning
- What activation function it uses
Derived classes (specific layer types) can add more detailed information:
- Attention layers might report attention weights statistics
- Batch normalization layers might report running mean/variance
- Dropout layers might report dropout rate
Example usage:
var diagnostics = layer.GetDiagnostics();
foreach (var (key, value) in diagnostics)
{
Console.WriteLine($"{key}: {value}");
}
Override Guidelines: When overriding in derived classes:
- Call base.GetDiagnostics() first to get common metrics
- Add your layer-specific diagnostics to the returned dictionary
- Use consistent key naming (e.g., "activation.mean", "gradient.norm")
- Provide human-readable string values
- Keep computations lightweight to avoid impacting performance
Example override:
public override Dictionary<string, string> GetDiagnostics()
{
var diagnostics = base.GetDiagnostics();
if (_lastActivations != null)
{
diagnostics["activation.mean"] = ComputeMean(_lastActivations).ToString();
diagnostics["activation.std"] = ComputeStd(_lastActivations).ToString();
}
return diagnostics;
}
GetFusedActivationType()
Gets the fused activation type for IEngine fused operations.
protected FusedActivationType GetFusedActivationType()
Returns
- FusedActivationType
The FusedActivationType enum value for the current activation function.
Remarks
This method maps the layer's activation function to a FusedActivationType enum value, allowing IEngine to use optimized fused GPU kernels (e.g., GEMM+Bias+ReLU in one kernel).
For Beginners: GPU operations are faster when combined. Instead of doing MatMul, then adding bias, then applying ReLU as separate steps, fused operations do all three in one GPU kernel - this is 20-50% faster. This method tells the GPU which activation to fuse with other operations.
Supported Activations:
- ReLU → FusedActivationType.ReLU
- Sigmoid → FusedActivationType.Sigmoid
- Tanh → FusedActivationType.Tanh
- GELU → FusedActivationType.GELU
- LeakyReLU → FusedActivationType.LeakyReLU
- Swish/SiLU → FusedActivationType.Swish
- Other/None → FusedActivationType.None (activation applied separately)
GetInputShape()
Gets the input shape for this layer.
public virtual int[] GetInputShape()
Returns
- int[]
The input shape as an array of integers.
Remarks
This method returns the input shape of the layer. If the layer has multiple input shapes, it returns the first one.
For Beginners: This method tells you what shape of data the layer expects.
The input shape:
- Shows the dimensions of data this layer processes
- Is needed to connect this layer with previous layers
- Helps verify the network structure is correct
For layers with multiple inputs, this returns just the first input shape.
GetInputShapes()
Gets all input shapes for this layer.
public virtual int[][] GetInputShapes()
Returns
- int[][]
An array of input shapes.
Remarks
This method returns all input shapes of the layer. This is particularly useful for layers that accept multiple inputs with different shapes.
For Beginners: This method tells you the shapes of all data sources this layer can accept.
For layers that combine multiple inputs:
- This returns all the input shapes in an array
- Each shape defines the dimensions of one input source
- Helpful for understanding complex network connections
This is most useful for layers like concatenation or merge layers.
GetOutputShape()
Gets the output shape for this layer.
public int[] GetOutputShape()
Returns
- int[]
The output shape as an array of integers.
Remarks
This method returns the output shape of the layer, which defines the dimensions of the tensor that will be produced when data flows through this layer.
For Beginners: This method tells you what shape of data the layer produces.
The output shape:
- Shows the dimensions of data after this layer processes it
- Is needed to connect this layer with the next layer
- Helps verify that data flows correctly through the network
For example, a convolutional layer might change the number of channels in the data, which would be reflected in the output shape.
GetParameterGradients()
Gets the gradients of all trainable parameters in this layer.
public virtual Vector<T> GetParameterGradients()
Returns
- Vector<T>
A vector containing the gradients of all trainable parameters.
Remarks
This method returns the gradients of all trainable parameters in the layer. If the gradients haven't been calculated yet, it initializes a new vector of the appropriate size.
For Beginners: This method provides the current adjustment values for all parameters.
The parameter gradients:
- Show how each parameter should be adjusted during training
- Are calculated during the backward pass
- Guide the optimization process
These gradients are usually passed to an optimizer like SGD or Adam, which uses them to update the parameters in a way that reduces errors.
GetParameterNames()
Gets all parameter names in this layer.
public virtual IEnumerable<string> GetParameterNames()
Returns
- IEnumerable<string>
A collection of parameter names ("weight", "bias", or both depending on layer type).
Remarks
The default implementation returns "weight" and/or "bias" based on whether GetWeights() and GetBiases() return non-null values.
GetParameterShape(string)
Gets the expected shape for a parameter.
public virtual int[]? GetParameterShape(string name)
Parameters
namestringThe parameter name ("weight" or "bias").
Returns
- int[]
The expected shape, or null if the parameter doesn't exist.
GetParameters()
Gets all trainable parameters of the layer as a single vector.
public abstract Vector<T> GetParameters()
Returns
- Vector<T>
A vector containing all trainable parameters.
Remarks
This abstract method must be implemented by derived classes to provide access to all trainable parameters of the layer as a single vector. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights.
For Beginners: This method collects all the learnable values from the layer.
The parameters:
- Are the numbers that the neural network learns during training
- Include weights, biases, and other learnable values
- Are combined into a single long list (vector)
This is useful for:
- Saving the model to disk
- Loading parameters from a previously trained model
- Advanced optimization techniques that need access to all parameters
GetWeights()
Gets the weight matrix for layers that have trainable weights.
public virtual Tensor<T>? GetWeights()
Returns
- Tensor<T>
The weight matrix, or null if the layer has no weights.
Remarks
This method provides access to the layer's weight matrix for layers that use weights during computation. Layers without weights (like pooling or activation layers) return null.
For Beginners: Weights are the learnable parameters that define how a layer transforms data.
For example:
- Dense layers use a weight matrix to transform inputs
- Convolutional layers use filters (which are weights) to detect patterns
- Pooling layers have no weights, so they return null
This method lets you inspect or modify the weights after training.
HasGpuActivation()
Checks if the layer's scalar activation function supports GPU training.
protected bool HasGpuActivation()
Returns
- bool
True if the activation function has GPU kernels; false otherwise.
Remarks
For Beginners: Not all activation functions have GPU implementations yet. This method checks whether the layer's activation can run entirely on the GPU. If false, the layer must fall back to CPU computation for the activation.
InvalidateTrainableParameter(Tensor<T>)
Notifies the engine that a registered persistent tensor's data has changed.
protected void InvalidateTrainableParameter(Tensor<T> tensor)
Parameters
tensorTensor<T>The tensor whose data has been modified.
Remarks
Call this method after modifying a registered tensor's data (e.g., during parameter updates). The engine will re-upload the data to GPU on the next operation that uses the tensor.
For Beginners: When you change the values in a registered tensor (like updating weights during training), you need to tell the GPU that the copy it has is outdated. This method does that - it tells the GPU "hey, this data changed, please get a fresh copy."
Usage Pattern:
Call after UpdateParameters modifies weights:
public override void UpdateParameters(T learningRate)
{
// Update weights using gradients
_weights = _weights.Subtract(_weightGradients.Multiply(learningRate));
// Notify engine that GPU copy is stale
InvalidateTrainableParameter(_weights);
}
LoadWeights(Dictionary<string, Tensor<T>>, Func<string, string?>?, bool)
Loads weights from a dictionary of tensors using optional name mapping.
public virtual WeightLoadResult LoadWeights(Dictionary<string, Tensor<T>> weights, Func<string, string?>? mapping = null, bool strict = false)
Parameters
weightsDictionary<string, Tensor<T>>Dictionary of weight name to tensor.
mappingFunc<string, string>Optional function to map source names to target names.
strictboolIf true, fails when any mapped weight fails to load.
Returns
- WeightLoadResult
Load result with statistics.
MapActivationToFused()
Maps the layer's activation function to a AiDotNet.Tensors.Engines.FusedActivationType for GPU-fused operations.
protected FusedActivationType MapActivationToFused()
Returns
- FusedActivationType
The corresponding AiDotNet.Tensors.Engines.FusedActivationType for the layer's activation function, or AiDotNet.Tensors.Engines.FusedActivationType.None if no activation is configured or the activation type is not supported for GPU fusion.
Remarks
This method is used by GPU-optimized layers to determine which fused activation kernel to use. Fused operations combine matrix multiplication, bias addition, and activation into a single GPU kernel, reducing memory bandwidth and improving performance.
For Beginners: When running on a GPU, combining multiple operations (like matrix multiply and activation) into one step is faster than doing them separately. This method tells the GPU which activation function to include in the combined operation.
RegisterTrainableParameter(Tensor<T>, PersistentTensorRole)
Registers a trainable parameter tensor with the engine for GPU memory optimization.
protected void RegisterTrainableParameter(Tensor<T> tensor, PersistentTensorRole role)
Parameters
tensorTensor<T>The tensor to register (typically weights or biases).
rolePersistentTensorRoleThe role of the tensor for optimization hints.
Remarks
This method hints to the engine that the tensor will be reused across many operations and should be kept resident in GPU memory when a GPU engine is active. This avoids expensive CPU-GPU data transfers on every forward pass.
Performance Impact:
Without registration: Layer weights (e.g., 285MB for a large Dense layer) are transferred to GPU on every forward pass.
With registration: Weights are transferred once and cached on GPU. Only activations (much smaller) are transferred per pass. Expected speedup: 100-1000x for large layers.
For Beginners: This method tells the GPU to keep certain data (like learned weights) in its fast memory instead of copying it back and forth every time. Think of it like keeping frequently used books on your desk instead of walking to the library each time.
Usage Pattern:
Call this method in the layer's constructor after initializing weight tensors:
public DenseLayer(int inputSize, int outputSize)
{
_weights = new Tensor<T>(outputSize, inputSize);
_biases = new Tensor<T>(outputSize);
InitializeWeights();
// Register for GPU persistence
RegisterTrainableParameter(_weights, PersistentTensorRole.Weights);
RegisterTrainableParameter(_biases, PersistentTensorRole.Biases);
}
ResetState()
Resets the internal state of the layer.
public abstract void ResetState()
Remarks
This abstract method must be implemented by derived classes to reset any internal state the layer maintains between forward and backward passes. This is useful when starting to process a new sequence or when implementing stateful recurrent networks.
For Beginners: This method clears the layer's memory to start fresh.
When resetting the state:
- Cached inputs and outputs are cleared
- Any temporary calculations are discarded
- The layer is ready to process new data without being influenced by previous data
This is important for:
- Processing a new, unrelated sequence
- Preventing information from one sequence affecting another
- Starting a new training episode
Serialize(BinaryWriter)
Serializes the layer's parameters to a binary writer.
public virtual void Serialize(BinaryWriter writer)
Parameters
writerBinaryWriterThe binary writer to write to.
Remarks
This method writes the layer's parameters to a binary writer, which can be used to save the layer's state to a file or other storage medium. It writes the parameter count followed by each parameter value.
For Beginners: This method saves the layer's learned values to storage.
When serializing a layer:
- The number of parameters is written first
- Then each parameter value is written
- All values are converted to doubles for consistent storage
This allows you to save a trained layer and reload it later without having to retrain it from scratch.
SetBiases(Tensor<T>)
Sets the bias tensor for this layer.
protected virtual void SetBiases(Tensor<T> biases)
Parameters
biasesTensor<T>The bias tensor to set.
Remarks
Derived classes with trainable biases should override this method to update their internal bias storage. The default implementation throws an exception since LayerBase doesn't know the layer's bias structure.
Exceptions
- InvalidOperationException
Thrown if the layer does not support biases.
SetParameter(string, Tensor<T>)
Sets a parameter tensor by name.
public virtual bool SetParameter(string name, Tensor<T> value)
Parameters
namestringThe parameter name ("weight" or "bias").
valueTensor<T>The tensor value to set.
Returns
- bool
True if the parameter was set successfully, false if the name was not found.
Exceptions
- ArgumentException
Thrown when the tensor shape doesn't match expected shape.
SetParameters(Vector<T>)
Sets the trainable parameters of the layer.
public virtual void SetParameters(Vector<T> parameters)
Parameters
parametersVector<T>A vector containing all parameters to set.
Remarks
This method sets all the trainable parameters of the layer from a single vector of parameters. The parameters vector must have the correct length to match the total number of parameters in the layer. By default, it simply assigns the parameters vector to the Parameters field, but derived classes may override this to handle the parameters differently.
For Beginners: This method updates all the learnable values in the layer.
When setting parameters:
- The input must be a vector with the correct length
- The layer parses this vector to set all its internal parameters
- Throws an error if the input doesn't match the expected number of parameters
This is useful for:
- Loading a previously saved model
- Transferring parameters from another model
- Setting specific parameter values for testing
Exceptions
- ArgumentException
Thrown when the parameters vector has incorrect length.
SetTrainingMode(bool)
Sets whether the layer is in training mode or inference mode.
public virtual void SetTrainingMode(bool isTraining)
Parameters
isTrainingbooltrueto set the layer to training mode;falseto set it to inference mode.
Remarks
This method sets the layer's mode to either training or inference (evaluation). Some layers behave differently during training versus inference, such as Dropout or BatchNormalization. This method only has an effect if the layer supports training.
For Beginners: This method switches the layer between learning mode and prediction mode.
Setting this mode:
- Tells the layer whether to optimize for learning or for making predictions
- Changes behavior in layers like Dropout (which randomly ignores neurons during training)
- Has no effect in layers that don't support training
It's important to set this correctly before using a network - training mode for learning, inference mode for making predictions.
SetWeights(Tensor<T>)
Sets the weight tensor for this layer.
protected virtual void SetWeights(Tensor<T> weights)
Parameters
weightsTensor<T>The weight tensor to set.
Remarks
Derived classes with trainable weights should override this method to update their internal weight storage. The default implementation throws an exception since LayerBase doesn't know the layer's weight structure.
Exceptions
- InvalidOperationException
Thrown if the layer does not support weights.
TryGetParameter(string, out Tensor<T>?)
Tries to get a parameter tensor by name.
public virtual bool TryGetParameter(string name, out Tensor<T>? tensor)
Parameters
namestringThe parameter name ("weight" or "bias").
tensorTensor<T>The parameter tensor if found.
Returns
- bool
True if the parameter was found, false otherwise.
UpdateInputShape(int[])
protected void UpdateInputShape(int[] inputShape)
Parameters
inputShapeint[]
UpdateParameters(Vector<T>)
Updates the parameters of the layer with the given vector of parameter values.
public virtual void UpdateParameters(Vector<T> parameters)
Parameters
parametersVector<T>A vector containing all parameters to set.
Remarks
This method sets all the parameters of the layer from a single vector of parameters. The parameters vector must have the correct length to match the total number of parameters in the layer.
For Beginners: This method updates all the learnable values in the layer at once.
When updating parameters:
- The input must be a vector with the correct length
- This replaces all the current parameters with the new ones
- Throws an error if the input doesn't match the expected number of parameters
This is useful for:
- Optimizers that work with all parameters at once
- Applying parameters from another source
- Setting parameters to specific values for testing
Exceptions
- ArgumentException
Thrown when the parameters vector has incorrect length.
UpdateParameters(T)
Updates the parameters of the layer using the calculated gradients.
public abstract void UpdateParameters(T learningRate)
Parameters
learningRateTThe learning rate to use for the parameter updates.
Remarks
This abstract method must be implemented by derived classes to define how the layer's parameters are updated during training. The learning rate controls the size of the parameter updates.
For Beginners: This method updates the layer's internal values during training.
When updating parameters:
- The weights, biases, or other parameters are adjusted to reduce prediction errors
- The learning rate controls how big each update step is
- Smaller learning rates mean slower but more stable learning
- Larger learning rates mean faster but potentially unstable learning
This is how the layer "learns" from data over time, gradually improving its ability to extract useful patterns from inputs.
UpdateParametersGpu(IGpuOptimizerConfig)
Updates the layer's parameters on GPU using the specified optimizer configuration.
public virtual void UpdateParametersGpu(IGpuOptimizerConfig config)
Parameters
configIGpuOptimizerConfigThe GPU optimizer configuration specifying the update algorithm and hyperparameters.
Remarks
This method updates weights and biases directly on GPU using the optimizer specified in the config. Supported optimizers include SGD, Adam, AdamW, RMSprop, Adagrad, NAG, LARS, and LAMB.
For Beginners: This updates the layer's learned values entirely on GPU.
The config determines which optimizer algorithm to use:
- SGD: Simple gradient descent with optional momentum
- Adam: Adaptive learning rates with moment estimates (most popular)
- AdamW: Adam with proper weight decay (recommended for transformers)
Using this method keeps all training computation on the GPU for maximum speed.
Exceptions
- NotSupportedException
Thrown when the layer does not support GPU training.
UploadWeightsToGpu()
Uploads the layer's weights and biases to GPU memory for GPU-resident training.
public virtual void UploadWeightsToGpu()
Remarks
Call this before starting GPU training to initialize GPU weight buffers. The CPU weights are copied to GPU and remain there until DownloadWeightsFromGpu is called.
For Beginners: This copies the layer's learned values to the GPU.
Call this once at the start of training to:
- Create GPU buffers for weights and biases
- Copy current values from CPU to GPU
- Create GPU buffers for gradients and optimizer states (momentum, etc.)
After this, all training can happen on GPU without CPU involvement.
ValidateWeights(IEnumerable<string>, Func<string, string?>?)
Validates that a set of weight names can be loaded into this layer.
public virtual WeightLoadValidation ValidateWeights(IEnumerable<string> weightNames, Func<string, string?>? mapping = null)
Parameters
weightNamesIEnumerable<string>Names of weights to validate.
mappingFunc<string, string>Optional weight name mapping function.
Returns
- WeightLoadValidation
Validation result with matched and unmatched names.
ZeroGradientsGpu()
Resets the GPU gradient accumulators to zero.
public virtual void ZeroGradientsGpu()
Remarks
Call this at the start of each training batch to clear accumulated gradients from the previous batch.
For Beginners: This clears the "how to improve" information from the last batch.
Each batch computes new gradients. Before processing a new batch, you need to:
- Clear the old gradients
- Compute fresh gradients for the current batch
- Update weights based on the new gradients
If you forget to zero gradients, they accumulate and training goes wrong!