Class DenseLayer<T>
- Namespace
- AiDotNet.NeuralNetworks.Layers
- Assembly
- AiDotNet.dll
Represents a fully connected (dense) layer in a neural network.
public class DenseLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IWeightLoadable<T>, IDisposable, IAuxiliaryLossLayer<T>, IDiagnosticsProvider
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
LayerBase<T>DenseLayer<T>
- Implements
-
ILayer<T>
- Inherited Members
Remarks
A dense layer connects every input neuron to every output neuron, with each connection having a learnable weight. This is the most basic and widely used type of neural network layer. Dense layers are capable of learning complex patterns by adjusting these weights during training.
For Beginners: A dense layer is like a voting system where every input gets to vote on every output.
Think of it like this:
- Each input sends information to every output
- Each connection has a different "importance" (weight)
- The layer learns which connections should be strong and which should be weak
For example, in an image recognition task:
- One input might detect a curved edge
- Another might detect a straight line
- The dense layer combines these features to recognize higher-level patterns
Dense layers are the building blocks of many neural networks because they can learn almost any relationship between inputs and outputs, given enough neurons and training data.
Thread Safety: This layer is not thread-safe. Each layer instance maintains internal state during forward and backward passes. If you need concurrent execution, use separate layer instances per thread or synchronize access to shared instances.
Constructors
DenseLayer(int, int, IActivationFunction<T>?, IInitializationStrategy<T>?)
Initializes a new instance of the DenseLayer<T> class with the specified input and output sizes and a scalar activation function.
public DenseLayer(int inputSize, int outputSize, IActivationFunction<T>? activationFunction = null, IInitializationStrategy<T>? initializationStrategy = null)
Parameters
inputSizeintThe number of input neurons.
outputSizeintThe number of output neurons.
activationFunctionIActivationFunction<T>The activation function to apply. Defaults to ReLU if not specified.
initializationStrategyIInitializationStrategy<T>
Remarks
This constructor creates a dense layer with the specified number of input and output neurons. The weights are initialized using Xavier/Glorot initialization, which scales the random values based on the number of input and output neurons. The biases are initialized to zero.
For Beginners: This setup method creates a new dense layer with specific dimensions.
When creating the layer, you specify:
- How many inputs it will receive (inputSize)
- How many outputs it will produce (outputSize)
- What mathematical function to apply to the results (activation)
For example, a layer with inputSize=784 and outputSize=10 could connect the flattened pixels of a 28×28 image to 10 output neurons (one for each digit 0-9).
The layer automatically initializes all the weights and biases with carefully chosen starting values that help with training.
DenseLayer(int, int, IVectorActivationFunction<T>, IInitializationStrategy<T>?)
Initializes a new instance of the DenseLayer<T> class with the specified input and output sizes and a vector activation function.
public DenseLayer(int inputSize, int outputSize, IVectorActivationFunction<T> vectorActivation, IInitializationStrategy<T>? initializationStrategy = null)
Parameters
inputSizeintThe number of input neurons.
outputSizeintThe number of output neurons.
vectorActivationIVectorActivationFunction<T>The vector activation function to apply (required to disambiguate from IActivationFunction overload).
initializationStrategyIInitializationStrategy<T>
Remarks
This constructor creates a dense layer with the specified number of input and output neurons and a vector activation function. Vector activation functions operate on entire vectors at once, which can be more efficient for certain operations.
For Beginners: This setup method is similar to the previous one, but uses a different type of activation function.
A vector activation function:
- Works on all outputs at once instead of one at a time
- Can be more efficient for certain calculations
- Might capture relationships between different outputs
Most of the time, you'll use the standard constructor, but this one gives you flexibility if you need special activation functions that work on the entire output vector at once.
Note: If your activation function implements both IActivationFunction and IVectorActivationFunction, use WithActivation or WithVectorActivation factory methods to avoid ambiguity.
Properties
AuxiliaryLossWeight
Gets or sets the weight for the regularization auxiliary loss.
public T AuxiliaryLossWeight { get; set; }
Property Value
- T
Remarks
This weight controls how much the regularization penalty contributes to the total loss. The total loss is: main_loss + (auxiliary_weight * regularization_loss). Typical values range from 0.0001 to 0.1.
For Beginners: This controls how much the network should prefer simple models.
The weight determines the balance between:
- Fitting the training data well (main loss)
- Keeping the model simple (regularization loss)
Common values:
- 0.01 (default): Moderate regularization
- 0.001-0.005: Light regularization
- 0.05-0.1: Strong regularization
Higher values make the network simpler but might underfit the data. Lower values allow more complexity but might overfit.
IsInitialized
Gets a value indicating whether this layer has been initialized.
public override bool IsInitialized { get; }
Property Value
Remarks
For layers with lazy initialization, this indicates whether the weights have been allocated and initialized. For eager initialization, this is always true after construction.
For Beginners: This tells you if the layer's weights are ready to use.
A value of true means:
- Weights have been allocated
- The layer is ready for forward/backward passes
A value of false means:
- Weights are not yet allocated (lazy initialization)
- The first Forward() call will initialize them
L1Strength
Gets or sets the L1 regularization strength (used when Regularization is L1 or L1L2).
public T L1Strength { get; set; }
Property Value
- T
L2Strength
Gets or sets the L2 regularization strength (used when Regularization is L2 or L1L2).
public T L2Strength { get; set; }
Property Value
- T
ParameterCount
Gets the total number of trainable parameters in the layer.
public override int ParameterCount { get; }
Property Value
- int
The sum of the number of weights and biases in the layer.
Remarks
This property returns the total number of trainable parameters in the layer, which is the sum of the number of elements in the weights matrix and the biases vector. This is useful for understanding the complexity of the layer.
For Beginners: This tells you how many individual numbers the layer can adjust during training.
The parameter count:
- Equals (number of inputs × number of outputs) + number of outputs
- First part counts the weights, second part counts the biases
- Higher numbers mean more flexibility but also more risk of overfitting
For example, a dense layer with 100 inputs and 50 outputs would have 100 × 50 = 5,000 weights plus 50 biases, for a total of 5,050 parameters.
Regularization
Gets or sets the type of regularization to apply.
public RegularizationType Regularization { get; set; }
Property Value
SupportsGpuExecution
Gets whether this layer has a GPU execution implementation for inference.
protected override bool SupportsGpuExecution { get; }
Property Value
Remarks
Override this to return true when the layer implements ForwardGpu(params IGpuTensor<T>[]). The actual CanExecuteOnGpu property combines this with engine availability.
For Beginners: This flag indicates if the layer has GPU code for the forward pass. Set this to true in derived classes that implement ForwardGpu.
SupportsJitCompilation
Gets whether this layer currently supports JIT compilation.
public override bool SupportsJitCompilation { get; }
Property Value
- bool
True if the layer's activation function is supported for JIT compilation. Supported activations: ReLU, Sigmoid, Tanh, Softmax, Identity.
SupportsTraining
Gets a value indicating whether this layer supports training through backpropagation.
public override bool SupportsTraining { get; }
Property Value
- bool
Always returns
truefor dense layers, as they contain trainable parameters.
Remarks
This property indicates whether the layer can be trained through backpropagation. Dense layers have trainable parameters (weights and biases), so they support training.
For Beginners: This property tells you if the layer can learn from data.
For dense layers:
- The value is always true
- This means the layer can adjust its weights and biases during training
- It will improve its performance as it sees more examples
Some other layer types might not have trainable parameters and would return false here.
UseAuxiliaryLoss
Gets or sets whether auxiliary loss (weight regularization) should be used during training.
public bool UseAuxiliaryLoss { get; set; }
Property Value
Remarks
Weight regularization adds a penalty based on the magnitude of the weights to prevent overfitting. This helps the network generalize better to unseen data by discouraging overly complex models.
For Beginners: Weight regularization is like encouraging simplicity in your model.
Why use regularization:
- Prevents the network from memorizing training data (overfitting)
- Encourages the network to learn general patterns instead of specific details
- Makes the model work better on new, unseen data
Think of it like learning to recognize cats:
- Without regularization: "This cat has exactly 157 whiskers" (too specific)
- With regularization: "Cats have fur, whiskers, and pointy ears" (general pattern)
Regularization is especially helpful when you have limited training data.
Methods
Backward(Tensor<T>)
Calculates gradients for the input, weights, and biases during backpropagation.
public override Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>The gradient of the loss with respect to the layer's output.
Returns
- Tensor<T>
The gradient of the loss with respect to the layer's input.
Remarks
This method performs the backward pass of the dense layer during training. It calculates the gradient of the loss with respect to the input, weights, and biases. The calculated gradients for weights and biases are stored for the subsequent parameter update, and the input gradient is returned for propagation to earlier layers.
For Beginners: This method helps the layer learn from its mistakes.
During the backward pass:
- The layer receives information about how wrong its output was
- It calculates how to adjust its weights and biases to be more accurate
- It prepares the adjustments but doesn't apply them yet
- It passes information back to previous layers so they can learn too
This is where the actual "learning" happens. The layer figures out which connections should be strengthened and which should be weakened based on the error in its output.
Exceptions
- InvalidOperationException
Thrown when backward is called before forward.
BackwardGpu(IGpuTensor<T>)
Performs GPU-resident backward pass for the dense layer. Computes gradients for weights, biases, and input entirely on GPU - no CPU roundtrip.
public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)
Parameters
outputGradientIGpuTensor<T>GPU-resident gradient from the next layer.
Returns
- IGpuTensor<T>
GPU-resident gradient to pass to the previous layer.
Exceptions
- InvalidOperationException
Thrown if ForwardGpu was not called first.
ClearGradients()
Clears stored gradients for weights and biases.
public override void ClearGradients()
Clone()
Creates a deep copy of the layer with the same configuration and parameters.
public override LayerBase<T> Clone()
Returns
- LayerBase<T>
A new instance of the DenseLayer<T> class with the same configuration and parameters.
Remarks
This method creates a deep copy of the dense layer, including its configuration and parameters. This is useful when you need multiple instances of the same layer, such as in ensemble methods or when implementing layer factories.
For Beginners: This method creates an exact duplicate of the layer.
The copy:
- Has the same input and output dimensions
- Has the same weights and biases
- Is completely independent from the original
This is useful for:
- Creating multiple similar layers
- Experimenting with variations of a layer
- Implementing certain advanced techniques
Think of it like making a perfect clone that starts exactly where the original is.
ComputeAuxiliaryLoss()
Computes the auxiliary loss for weight regularization (L1, L2, or both).
public T ComputeAuxiliaryLoss()
Returns
- T
The computed regularization auxiliary loss.
Remarks
This method computes the regularization loss based on the magnitude of the weights. L1 regularization computes the sum of absolute values of weights. L2 regularization computes the sum of squared values of weights. L1L2 combines both penalties.
For Beginners: This calculates how "complex" the layer's weights are.
Different regularization types:
L1 (Lasso): Σ|weight|
- Encourages many weights to become exactly zero
- Creates sparse networks (many connections turned off)
- Good for feature selection
L2 (Ridge): Σ(weight²)
- Encourages all weights to be small
- Prevents any single weight from dominating
- Smooths the network's behavior
L1L2 (Elastic Net): Combines both
- Gets benefits of both L1 and L2
- More flexible regularization
The loss is added to the main loss during training to discourage large weights.
Dispose(bool)
Releases resources used by this layer, including GPU tensor handles.
protected override void Dispose(bool disposing)
Parameters
disposingboolTrue if called from Dispose(), false if called from finalizer.
Remarks
This method releases GPU memory allocated for persistent weight tensors. It is called by the base class Dispose() method.
For Beginners: GPU memory is limited and precious.
When you're done with a layer:
- Call Dispose() or use a 'using' statement
- This frees up GPU memory for other operations
- Failing to dispose can cause memory leaks on the GPU
Example:
using var layer = new DenseLayer<float>(784, 128);
// ... use layer ...
// Automatically disposed when out of scope
EnsureInitialized()
Ensures that weights are allocated and initialized for lazy initialization.
protected override void EnsureInitialized()
ExportComputationGraph(List<ComputationNode<T>>)
Exports the dense layer's forward pass as a JIT-compilable computation graph.
public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)
Parameters
inputNodesList<ComputationNode<T>>List to populate with input computation nodes (input data, weights, biases).
Returns
- ComputationNode<T>
The output computation node representing the layer's prediction.
Remarks
This method builds a computation graph that mirrors the layer's forward pass logic. The graph uses TensorOperations which now integrates with IEngine for GPU acceleration where supported (e.g., Add operations use IEngine.TensorAdd).
Current IEngine integration status: - Addition operations: Fully GPU-accelerated via IEngine.TensorAdd - Matrix multiplication: Uses Tensor.MatrixMultiply (pending IEngine integration) - Transpose operations: Uses Tensor.Transpose (pending IEngine integration)
The computation graph enables: - JIT compilation for optimized inference - Operation fusion and dead code elimination - Automatic differentiation via backpropagation - Deferred execution with GPU acceleration
Forward(Tensor<T>)
Processes the input data through the dense layer.
public override Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>The input tensor to process.
Returns
- Tensor<T>
The output tensor after applying the dense layer transformation and activation.
Remarks
This method performs the forward pass of the dense layer. It multiplies the input by the weights, adds the biases, and applies the activation function. The result is a tensor where each element represents the activation of an output neuron.
Industry Standard: Like PyTorch's nn.Linear, this layer supports any-rank input tensors. The transformation is applied to the last dimension, preserving all batch/sequence dimensions. For example, input [..., inputSize] produces output [..., outputSize].
For Beginners: This method transforms input data into output data.
During the forward pass:
- The input values are multiplied by their corresponding weights
- All weighted inputs for each output neuron are added together
- The bias is added to each sum
- The activation function is applied to each result
For example, if your inputs represent image features, the outputs might represent the probability of the image belonging to different categories.
This is where the actual "thinking" happens in the neural network.
ForwardGpu(params IGpuTensor<T>[])
Performs a GPU-resident forward pass, keeping tensors on GPU. Use this for chained layer execution to avoid CPU round-trips. Supports any-rank tensor input (1D, 2D, or ND), matching CPU Forward behavior.
public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)
Parameters
inputsIGpuTensor<T>[]GPU-resident input tensors (uses first input). Last dimension is features.
Returns
- IGpuTensor<T>
GPU-resident output tensor with same batch dimensions, outputSize as last dim.
Exceptions
- InvalidOperationException
Thrown if GPU execution is not available.
GetAuxiliaryLossDiagnostics()
Gets diagnostic information about the weight regularization auxiliary loss.
public Dictionary<string, string> GetAuxiliaryLossDiagnostics()
Returns
- Dictionary<string, string>
A dictionary containing diagnostic information about regularization.
Remarks
This method returns detailed diagnostics about the weight regularization, including the computed regularization loss, type of regularization, strengths, and whether it's enabled. This information is useful for monitoring training progress and debugging.
For Beginners: This provides information about how regularization is affecting the layer.
The diagnostics include:
- Total regularization loss (penalty for large weights)
- Type of regularization being used (L1, L2, L1L2, or None)
- Strength parameters for L1 and L2
- Weight applied to the regularization loss
- Whether regularization is enabled
This helps you:
- Monitor if regularization is helping prevent overfitting
- Debug issues with model complexity
- Understand the impact of different regularization settings
You can use this information to adjust regularization parameters for better results.
GetBiases()
Gets the biases tensor of the layer.
public override Tensor<T> GetBiases()
Returns
- Tensor<T>
The bias values added to each output neuron.
GetDiagnostics()
Gets diagnostic information about this component's state and behavior. Overrides GetDiagnostics() to include auxiliary loss diagnostics.
public override Dictionary<string, string> GetDiagnostics()
Returns
- Dictionary<string, string>
A dictionary containing diagnostic metrics including both base layer diagnostics and auxiliary loss diagnostics from GetAuxiliaryLossDiagnostics().
GetParameterGradients()
Gets the gradients of all trainable parameters in this layer.
public override Vector<T> GetParameterGradients()
Returns
- Vector<T>
GetParameters()
Gets all trainable parameters of the layer as a single vector.
public override Vector<T> GetParameters()
Returns
- Vector<T>
A vector containing all weights and biases.
Remarks
This method extracts all trainable parameters (weights and biases) from the layer and returns them as a single vector. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights.
For Beginners: This method gathers all the learned values from the layer.
The parameters include:
- All weight values (connections between inputs and outputs)
- All bias values (base values for each output)
These are combined into a single long list (vector), which can be used for:
- Saving the model
- Sharing parameters between layers
- Advanced optimization techniques
This provides access to all the "knowledge" the layer has learned.
GetWeights()
Gets the weights tensor of the layer.
public override Tensor<T> GetWeights()
Returns
- Tensor<T>
The weight tensor connecting input neurons to output neurons.
ResetState()
Resets the internal state of the layer.
public override void ResetState()
Remarks
This method clears the cached input values from the most recent forward pass and the gradients calculated during the backward pass. This is useful when starting to process a new batch or when implementing stateful recurrent networks.
For Beginners: This method clears the layer's memory to start fresh.
When resetting the state:
- The layer forgets the last input it processed
- It clears any calculated gradients
This is useful for:
- Processing a new, unrelated set of data
- Preventing information from one batch affecting another
- Starting a new training episode
Think of it like wiping a whiteboard clean before starting a new calculation.
SetParameters(Vector<T>)
Sets all trainable parameters of the layer from a single vector.
public override void SetParameters(Vector<T> parameters)
Parameters
parametersVector<T>A vector containing all parameters to set.
Remarks
This method sets all trainable parameters (weights and biases) of the layer from a single vector. The vector must have the exact length required for all parameters of the layer.
For Beginners: This method updates all the layer's learned values at once.
When setting parameters:
- The vector must have exactly the right number of values
- The values are assigned to the weights and biases in a specific order
This is useful for:
- Loading a previously saved model
- Copying parameters from another model
- Setting parameters that were optimized externally
It's like replacing all the "knowledge" in the layer with new information.
Exceptions
- ArgumentException
Thrown when the parameters vector has incorrect length.
SetWeights(Tensor<T>)
Sets the weights of the layer to specified values.
protected override void SetWeights(Tensor<T> weights)
Parameters
weightsTensor<T>The weight matrix to set.
Remarks
This method allows direct setting of the weight matrix, which can be useful for transfer learning, weight initialization with custom algorithms, or loading pre-trained models. The dimensions of the provided matrix must match the layer's input and output dimensions.
For Beginners: This method lets you directly set all connection strengths at once.
You might use this to:
- Load pre-trained weights from another model
- Test the layer with specific weight values
- Implement custom initialization strategies
The weight matrix must have exactly the right dimensions:
- Rows equal to the number of inputs (inputSize)
- Columns equal to the number of outputs (outputSize)
If the dimensions don't match, the method will throw an error.
Exceptions
- ArgumentNullException
Thrown when the weights parameter is null.
- ArgumentException
Thrown when the weights matrix has incorrect dimensions.
UpdateParameters(T)
Updates the layer's parameters (weights and biases) using the calculated gradients.
public override void UpdateParameters(T learningRate)
Parameters
learningRateTThe learning rate to use for the update.
Remarks
This method updates the layer's parameters (weights and biases) based on the gradients calculated during the backward pass. The learning rate controls the step size of the update.
For Beginners: This method applies the lessons learned during training.
When updating parameters:
- The learning rate controls how big each adjustment is
- Small learning rate = small, careful changes
- Large learning rate = big, faster changes (but might overshoot)
The weights and biases are adjusted by subtracting the gradient multiplied by the learning rate. This moves them in the direction that reduces the error the most.
Exceptions
- InvalidOperationException
Thrown when update is called before backward.