Table of Contents

Class DenseLayer<T>

Namespace
AiDotNet.NeuralNetworks.Layers
Assembly
AiDotNet.dll

Represents a fully connected (dense) layer in a neural network.

public class DenseLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IWeightLoadable<T>, IDisposable, IAuxiliaryLossLayer<T>, IDiagnosticsProvider

Type Parameters

T

The numeric type used for calculations, typically float or double.

Inheritance
DenseLayer<T>
Implements
Inherited Members

Remarks

A dense layer connects every input neuron to every output neuron, with each connection having a learnable weight. This is the most basic and widely used type of neural network layer. Dense layers are capable of learning complex patterns by adjusting these weights during training.

For Beginners: A dense layer is like a voting system where every input gets to vote on every output.

Think of it like this:

  • Each input sends information to every output
  • Each connection has a different "importance" (weight)
  • The layer learns which connections should be strong and which should be weak

For example, in an image recognition task:

  • One input might detect a curved edge
  • Another might detect a straight line
  • The dense layer combines these features to recognize higher-level patterns

Dense layers are the building blocks of many neural networks because they can learn almost any relationship between inputs and outputs, given enough neurons and training data.

Thread Safety: This layer is not thread-safe. Each layer instance maintains internal state during forward and backward passes. If you need concurrent execution, use separate layer instances per thread or synchronize access to shared instances.

Constructors

DenseLayer(int, int, IActivationFunction<T>?, IInitializationStrategy<T>?)

Initializes a new instance of the DenseLayer<T> class with the specified input and output sizes and a scalar activation function.

public DenseLayer(int inputSize, int outputSize, IActivationFunction<T>? activationFunction = null, IInitializationStrategy<T>? initializationStrategy = null)

Parameters

inputSize int

The number of input neurons.

outputSize int

The number of output neurons.

activationFunction IActivationFunction<T>

The activation function to apply. Defaults to ReLU if not specified.

initializationStrategy IInitializationStrategy<T>

Remarks

This constructor creates a dense layer with the specified number of input and output neurons. The weights are initialized using Xavier/Glorot initialization, which scales the random values based on the number of input and output neurons. The biases are initialized to zero.

For Beginners: This setup method creates a new dense layer with specific dimensions.

When creating the layer, you specify:

  • How many inputs it will receive (inputSize)
  • How many outputs it will produce (outputSize)
  • What mathematical function to apply to the results (activation)

For example, a layer with inputSize=784 and outputSize=10 could connect the flattened pixels of a 28×28 image to 10 output neurons (one for each digit 0-9).

The layer automatically initializes all the weights and biases with carefully chosen starting values that help with training.

DenseLayer(int, int, IVectorActivationFunction<T>, IInitializationStrategy<T>?)

Initializes a new instance of the DenseLayer<T> class with the specified input and output sizes and a vector activation function.

public DenseLayer(int inputSize, int outputSize, IVectorActivationFunction<T> vectorActivation, IInitializationStrategy<T>? initializationStrategy = null)

Parameters

inputSize int

The number of input neurons.

outputSize int

The number of output neurons.

vectorActivation IVectorActivationFunction<T>

The vector activation function to apply (required to disambiguate from IActivationFunction overload).

initializationStrategy IInitializationStrategy<T>

Remarks

This constructor creates a dense layer with the specified number of input and output neurons and a vector activation function. Vector activation functions operate on entire vectors at once, which can be more efficient for certain operations.

For Beginners: This setup method is similar to the previous one, but uses a different type of activation function.

A vector activation function:

  • Works on all outputs at once instead of one at a time
  • Can be more efficient for certain calculations
  • Might capture relationships between different outputs

Most of the time, you'll use the standard constructor, but this one gives you flexibility if you need special activation functions that work on the entire output vector at once.

Note: If your activation function implements both IActivationFunction and IVectorActivationFunction, use WithActivation or WithVectorActivation factory methods to avoid ambiguity.

Properties

AuxiliaryLossWeight

Gets or sets the weight for the regularization auxiliary loss.

public T AuxiliaryLossWeight { get; set; }

Property Value

T

Remarks

This weight controls how much the regularization penalty contributes to the total loss. The total loss is: main_loss + (auxiliary_weight * regularization_loss). Typical values range from 0.0001 to 0.1.

For Beginners: This controls how much the network should prefer simple models.

The weight determines the balance between:

  • Fitting the training data well (main loss)
  • Keeping the model simple (regularization loss)

Common values:

  • 0.01 (default): Moderate regularization
  • 0.001-0.005: Light regularization
  • 0.05-0.1: Strong regularization

Higher values make the network simpler but might underfit the data. Lower values allow more complexity but might overfit.

IsInitialized

Gets a value indicating whether this layer has been initialized.

public override bool IsInitialized { get; }

Property Value

bool

Remarks

For layers with lazy initialization, this indicates whether the weights have been allocated and initialized. For eager initialization, this is always true after construction.

For Beginners: This tells you if the layer's weights are ready to use.

A value of true means:

  • Weights have been allocated
  • The layer is ready for forward/backward passes

A value of false means:

  • Weights are not yet allocated (lazy initialization)
  • The first Forward() call will initialize them

L1Strength

Gets or sets the L1 regularization strength (used when Regularization is L1 or L1L2).

public T L1Strength { get; set; }

Property Value

T

L2Strength

Gets or sets the L2 regularization strength (used when Regularization is L2 or L1L2).

public T L2Strength { get; set; }

Property Value

T

ParameterCount

Gets the total number of trainable parameters in the layer.

public override int ParameterCount { get; }

Property Value

int

The sum of the number of weights and biases in the layer.

Remarks

This property returns the total number of trainable parameters in the layer, which is the sum of the number of elements in the weights matrix and the biases vector. This is useful for understanding the complexity of the layer.

For Beginners: This tells you how many individual numbers the layer can adjust during training.

The parameter count:

  • Equals (number of inputs × number of outputs) + number of outputs
  • First part counts the weights, second part counts the biases
  • Higher numbers mean more flexibility but also more risk of overfitting

For example, a dense layer with 100 inputs and 50 outputs would have 100 × 50 = 5,000 weights plus 50 biases, for a total of 5,050 parameters.

Regularization

Gets or sets the type of regularization to apply.

public RegularizationType Regularization { get; set; }

Property Value

RegularizationType

SupportsGpuExecution

Gets whether this layer has a GPU execution implementation for inference.

protected override bool SupportsGpuExecution { get; }

Property Value

bool

Remarks

Override this to return true when the layer implements ForwardGpu(params IGpuTensor<T>[]). The actual CanExecuteOnGpu property combines this with engine availability.

For Beginners: This flag indicates if the layer has GPU code for the forward pass. Set this to true in derived classes that implement ForwardGpu.

SupportsJitCompilation

Gets whether this layer currently supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool

True if the layer's activation function is supported for JIT compilation. Supported activations: ReLU, Sigmoid, Tanh, Softmax, Identity.

SupportsTraining

Gets a value indicating whether this layer supports training through backpropagation.

public override bool SupportsTraining { get; }

Property Value

bool

Always returns true for dense layers, as they contain trainable parameters.

Remarks

This property indicates whether the layer can be trained through backpropagation. Dense layers have trainable parameters (weights and biases), so they support training.

For Beginners: This property tells you if the layer can learn from data.

For dense layers:

  • The value is always true
  • This means the layer can adjust its weights and biases during training
  • It will improve its performance as it sees more examples

Some other layer types might not have trainable parameters and would return false here.

UseAuxiliaryLoss

Gets or sets whether auxiliary loss (weight regularization) should be used during training.

public bool UseAuxiliaryLoss { get; set; }

Property Value

bool

Remarks

Weight regularization adds a penalty based on the magnitude of the weights to prevent overfitting. This helps the network generalize better to unseen data by discouraging overly complex models.

For Beginners: Weight regularization is like encouraging simplicity in your model.

Why use regularization:

  • Prevents the network from memorizing training data (overfitting)
  • Encourages the network to learn general patterns instead of specific details
  • Makes the model work better on new, unseen data

Think of it like learning to recognize cats:

  • Without regularization: "This cat has exactly 157 whiskers" (too specific)
  • With regularization: "Cats have fur, whiskers, and pointy ears" (general pattern)

Regularization is especially helpful when you have limited training data.

Methods

Backward(Tensor<T>)

Calculates gradients for the input, weights, and biases during backpropagation.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>

The gradient of the loss with respect to the layer's output.

Returns

Tensor<T>

The gradient of the loss with respect to the layer's input.

Remarks

This method performs the backward pass of the dense layer during training. It calculates the gradient of the loss with respect to the input, weights, and biases. The calculated gradients for weights and biases are stored for the subsequent parameter update, and the input gradient is returned for propagation to earlier layers.

For Beginners: This method helps the layer learn from its mistakes.

During the backward pass:

  • The layer receives information about how wrong its output was
  • It calculates how to adjust its weights and biases to be more accurate
  • It prepares the adjustments but doesn't apply them yet
  • It passes information back to previous layers so they can learn too

This is where the actual "learning" happens. The layer figures out which connections should be strengthened and which should be weakened based on the error in its output.

Exceptions

InvalidOperationException

Thrown when backward is called before forward.

BackwardGpu(IGpuTensor<T>)

Performs GPU-resident backward pass for the dense layer. Computes gradients for weights, biases, and input entirely on GPU - no CPU roundtrip.

public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)

Parameters

outputGradient IGpuTensor<T>

GPU-resident gradient from the next layer.

Returns

IGpuTensor<T>

GPU-resident gradient to pass to the previous layer.

Exceptions

InvalidOperationException

Thrown if ForwardGpu was not called first.

ClearGradients()

Clears stored gradients for weights and biases.

public override void ClearGradients()

Clone()

Creates a deep copy of the layer with the same configuration and parameters.

public override LayerBase<T> Clone()

Returns

LayerBase<T>

A new instance of the DenseLayer<T> class with the same configuration and parameters.

Remarks

This method creates a deep copy of the dense layer, including its configuration and parameters. This is useful when you need multiple instances of the same layer, such as in ensemble methods or when implementing layer factories.

For Beginners: This method creates an exact duplicate of the layer.

The copy:

  • Has the same input and output dimensions
  • Has the same weights and biases
  • Is completely independent from the original

This is useful for:

  • Creating multiple similar layers
  • Experimenting with variations of a layer
  • Implementing certain advanced techniques

Think of it like making a perfect clone that starts exactly where the original is.

ComputeAuxiliaryLoss()

Computes the auxiliary loss for weight regularization (L1, L2, or both).

public T ComputeAuxiliaryLoss()

Returns

T

The computed regularization auxiliary loss.

Remarks

This method computes the regularization loss based on the magnitude of the weights. L1 regularization computes the sum of absolute values of weights. L2 regularization computes the sum of squared values of weights. L1L2 combines both penalties.

For Beginners: This calculates how "complex" the layer's weights are.

Different regularization types:

  1. L1 (Lasso): Σ|weight|

    • Encourages many weights to become exactly zero
    • Creates sparse networks (many connections turned off)
    • Good for feature selection
  2. L2 (Ridge): Σ(weight²)

    • Encourages all weights to be small
    • Prevents any single weight from dominating
    • Smooths the network's behavior
  3. L1L2 (Elastic Net): Combines both

    • Gets benefits of both L1 and L2
    • More flexible regularization

The loss is added to the main loss during training to discourage large weights.

Dispose(bool)

Releases resources used by this layer, including GPU tensor handles.

protected override void Dispose(bool disposing)

Parameters

disposing bool

True if called from Dispose(), false if called from finalizer.

Remarks

This method releases GPU memory allocated for persistent weight tensors. It is called by the base class Dispose() method.

For Beginners: GPU memory is limited and precious.

When you're done with a layer:

  • Call Dispose() or use a 'using' statement
  • This frees up GPU memory for other operations
  • Failing to dispose can cause memory leaks on the GPU

Example:

using var layer = new DenseLayer<float>(784, 128);
// ... use layer ...
// Automatically disposed when out of scope

EnsureInitialized()

Ensures that weights are allocated and initialized for lazy initialization.

protected override void EnsureInitialized()

ExportComputationGraph(List<ComputationNode<T>>)

Exports the dense layer's forward pass as a JIT-compilable computation graph.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>

List to populate with input computation nodes (input data, weights, biases).

Returns

ComputationNode<T>

The output computation node representing the layer's prediction.

Remarks

This method builds a computation graph that mirrors the layer's forward pass logic. The graph uses TensorOperations which now integrates with IEngine for GPU acceleration where supported (e.g., Add operations use IEngine.TensorAdd).

Current IEngine integration status: - Addition operations: Fully GPU-accelerated via IEngine.TensorAdd - Matrix multiplication: Uses Tensor.MatrixMultiply (pending IEngine integration) - Transpose operations: Uses Tensor.Transpose (pending IEngine integration)

The computation graph enables: - JIT compilation for optimized inference - Operation fusion and dead code elimination - Automatic differentiation via backpropagation - Deferred execution with GPU acceleration

Forward(Tensor<T>)

Processes the input data through the dense layer.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>

The input tensor to process.

Returns

Tensor<T>

The output tensor after applying the dense layer transformation and activation.

Remarks

This method performs the forward pass of the dense layer. It multiplies the input by the weights, adds the biases, and applies the activation function. The result is a tensor where each element represents the activation of an output neuron.

Industry Standard: Like PyTorch's nn.Linear, this layer supports any-rank input tensors. The transformation is applied to the last dimension, preserving all batch/sequence dimensions. For example, input [..., inputSize] produces output [..., outputSize].

For Beginners: This method transforms input data into output data.

During the forward pass:

  • The input values are multiplied by their corresponding weights
  • All weighted inputs for each output neuron are added together
  • The bias is added to each sum
  • The activation function is applied to each result

For example, if your inputs represent image features, the outputs might represent the probability of the image belonging to different categories.

This is where the actual "thinking" happens in the neural network.

ForwardGpu(params IGpuTensor<T>[])

Performs a GPU-resident forward pass, keeping tensors on GPU. Use this for chained layer execution to avoid CPU round-trips. Supports any-rank tensor input (1D, 2D, or ND), matching CPU Forward behavior.

public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)

Parameters

inputs IGpuTensor<T>[]

GPU-resident input tensors (uses first input). Last dimension is features.

Returns

IGpuTensor<T>

GPU-resident output tensor with same batch dimensions, outputSize as last dim.

Exceptions

InvalidOperationException

Thrown if GPU execution is not available.

GetAuxiliaryLossDiagnostics()

Gets diagnostic information about the weight regularization auxiliary loss.

public Dictionary<string, string> GetAuxiliaryLossDiagnostics()

Returns

Dictionary<string, string>

A dictionary containing diagnostic information about regularization.

Remarks

This method returns detailed diagnostics about the weight regularization, including the computed regularization loss, type of regularization, strengths, and whether it's enabled. This information is useful for monitoring training progress and debugging.

For Beginners: This provides information about how regularization is affecting the layer.

The diagnostics include:

  • Total regularization loss (penalty for large weights)
  • Type of regularization being used (L1, L2, L1L2, or None)
  • Strength parameters for L1 and L2
  • Weight applied to the regularization loss
  • Whether regularization is enabled

This helps you:

  • Monitor if regularization is helping prevent overfitting
  • Debug issues with model complexity
  • Understand the impact of different regularization settings

You can use this information to adjust regularization parameters for better results.

GetBiases()

Gets the biases tensor of the layer.

public override Tensor<T> GetBiases()

Returns

Tensor<T>

The bias values added to each output neuron.

GetDiagnostics()

Gets diagnostic information about this component's state and behavior. Overrides GetDiagnostics() to include auxiliary loss diagnostics.

public override Dictionary<string, string> GetDiagnostics()

Returns

Dictionary<string, string>

A dictionary containing diagnostic metrics including both base layer diagnostics and auxiliary loss diagnostics from GetAuxiliaryLossDiagnostics().

GetParameterGradients()

Gets the gradients of all trainable parameters in this layer.

public override Vector<T> GetParameterGradients()

Returns

Vector<T>

GetParameters()

Gets all trainable parameters of the layer as a single vector.

public override Vector<T> GetParameters()

Returns

Vector<T>

A vector containing all weights and biases.

Remarks

This method extracts all trainable parameters (weights and biases) from the layer and returns them as a single vector. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights.

For Beginners: This method gathers all the learned values from the layer.

The parameters include:

  • All weight values (connections between inputs and outputs)
  • All bias values (base values for each output)

These are combined into a single long list (vector), which can be used for:

  • Saving the model
  • Sharing parameters between layers
  • Advanced optimization techniques

This provides access to all the "knowledge" the layer has learned.

GetWeights()

Gets the weights tensor of the layer.

public override Tensor<T> GetWeights()

Returns

Tensor<T>

The weight tensor connecting input neurons to output neurons.

ResetState()

Resets the internal state of the layer.

public override void ResetState()

Remarks

This method clears the cached input values from the most recent forward pass and the gradients calculated during the backward pass. This is useful when starting to process a new batch or when implementing stateful recurrent networks.

For Beginners: This method clears the layer's memory to start fresh.

When resetting the state:

  • The layer forgets the last input it processed
  • It clears any calculated gradients

This is useful for:

  • Processing a new, unrelated set of data
  • Preventing information from one batch affecting another
  • Starting a new training episode

Think of it like wiping a whiteboard clean before starting a new calculation.

SetParameters(Vector<T>)

Sets all trainable parameters of the layer from a single vector.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

A vector containing all parameters to set.

Remarks

This method sets all trainable parameters (weights and biases) of the layer from a single vector. The vector must have the exact length required for all parameters of the layer.

For Beginners: This method updates all the layer's learned values at once.

When setting parameters:

  • The vector must have exactly the right number of values
  • The values are assigned to the weights and biases in a specific order

This is useful for:

  • Loading a previously saved model
  • Copying parameters from another model
  • Setting parameters that were optimized externally

It's like replacing all the "knowledge" in the layer with new information.

Exceptions

ArgumentException

Thrown when the parameters vector has incorrect length.

SetWeights(Tensor<T>)

Sets the weights of the layer to specified values.

protected override void SetWeights(Tensor<T> weights)

Parameters

weights Tensor<T>

The weight matrix to set.

Remarks

This method allows direct setting of the weight matrix, which can be useful for transfer learning, weight initialization with custom algorithms, or loading pre-trained models. The dimensions of the provided matrix must match the layer's input and output dimensions.

For Beginners: This method lets you directly set all connection strengths at once.

You might use this to:

  • Load pre-trained weights from another model
  • Test the layer with specific weight values
  • Implement custom initialization strategies

The weight matrix must have exactly the right dimensions:

  • Rows equal to the number of inputs (inputSize)
  • Columns equal to the number of outputs (outputSize)

If the dimensions don't match, the method will throw an error.

Exceptions

ArgumentNullException

Thrown when the weights parameter is null.

ArgumentException

Thrown when the weights matrix has incorrect dimensions.

UpdateParameters(T)

Updates the layer's parameters (weights and biases) using the calculated gradients.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T

The learning rate to use for the update.

Remarks

This method updates the layer's parameters (weights and biases) based on the gradients calculated during the backward pass. The learning rate controls the step size of the update.

For Beginners: This method applies the lessons learned during training.

When updating parameters:

  • The learning rate controls how big each adjustment is
  • Small learning rate = small, careful changes
  • Large learning rate = big, faster changes (but might overshoot)

The weights and biases are adjusted by subtracting the gradient multiplied by the learning rate. This moves them in the direction that reduces the error the most.

Exceptions

InvalidOperationException

Thrown when update is called before backward.