Table of Contents

Class FullyConnectedLayer<T>

Namespace
AiDotNet.NeuralNetworks.Layers
Assembly
AiDotNet.dll

Represents a fully connected layer in a neural network where every input neuron connects to every output neuron.

public class FullyConnectedLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable

Type Parameters

T

The numeric type used for calculations, typically float or double.

Inheritance
FullyConnectedLayer<T>
Implements
Inherited Members

Remarks

A fully connected layer, also known as a dense layer, is a fundamental building block in neural networks. It connects every input neuron to every output neuron with learnable weights. Each output neuron also has a learnable bias term. The layer applies a linear transformation followed by an activation function to produce its output. Fully connected layers are particularly useful for learning complex patterns and for classification tasks.

For Beginners: A fully connected layer connects every input to every output, like a complete web of connections.

Imagine you have inputs representing different features:

  • Each feature (input) connects to every possible output
  • Each connection has a strength (weight) that can be adjusted
  • Each output also has a starting value (bias)

For example, in an image classification task:

  • Inputs might be flattened features from convolutional layers
  • Each output might represent a score for a different category
  • The connections (weights) learn which features are important for each category

Fully connected layers are excellent at combining features to make final decisions. They're often used toward the end of a neural network to interpret the features extracted by earlier layers.

Constructors

FullyConnectedLayer(int, int, IActivationFunction<T>?)

public FullyConnectedLayer(int inputSize, int outputSize, IActivationFunction<T>? activationFunction = null)

Parameters

inputSize int
outputSize int
activationFunction IActivationFunction<T>

FullyConnectedLayer(int, int, IVectorActivationFunction<T>?)

public FullyConnectedLayer(int inputSize, int outputSize, IVectorActivationFunction<T>? vectorActivationFunction = null)

Parameters

inputSize int
outputSize int
vectorActivationFunction IVectorActivationFunction<T>

Properties

SupportsGpuExecution

Gets whether this layer has a GPU implementation.

protected override bool SupportsGpuExecution { get; }

Property Value

bool

SupportsJitCompilation

Gets whether this layer supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool

True if the layer can be JIT compiled, false otherwise.

Remarks

This property indicates whether the layer has implemented ExportComputationGraph() and can benefit from JIT compilation. All layers MUST implement this property.

For Beginners: JIT compilation can make inference 5-10x faster by converting the layer's operations into optimized native code.

Layers should return false if they:

  • Have not yet implemented a working ExportComputationGraph()
  • Use dynamic operations that change based on input data
  • Are too simple to benefit from JIT compilation

When false, the layer will use the standard Forward() method instead.

SupportsTraining

Gets a value indicating whether this layer supports training.

public override bool SupportsTraining { get; }

Property Value

bool

Always true because fully connected layers have trainable parameters (weights and biases).

Remarks

This property indicates that the fully connected layer supports training through backpropagation. The layer has trainable parameters (weights and biases) that are updated during the training process.

For Beginners: This property tells you that this layer can learn from data.

A value of true means:

  • The layer can adjust its weights and biases during training
  • It will improve its performance as it sees more data
  • It has parameters that are updated to make better predictions

Fully connected layers are primary learning components in neural networks, as they contain trainable parameters that adapt to recognize patterns in the data.

Methods

Backward(Tensor<T>)

Performs the backward pass of the fully connected layer to compute gradients.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>

The gradient tensor from the next layer. Shape: [batchSize, outputSize].

Returns

Tensor<T>

The gradient tensor to be passed to the previous layer. Shape: [batchSize, inputSize].

Remarks

This method implements the backward pass (backpropagation) of the fully connected layer. It computes the gradients of the loss with respect to the layer's weights, biases, and inputs. These gradients are used to update the parameters during training and to propagate the error signal back to the previous layer.

For Beginners: This is where the layer learns from its mistakes during training.

The backward pass works in these steps for each example in the batch:

  1. Extract the gradient and necessary vectors for this example
  2. Apply the activation function derivative to the gradient
    • This accounts for how the activation function affected the output
  3. Calculate weight gradients using outer product
    • Shows how each weight contributed to the error
  4. Accumulate bias gradients
    • Shows how each bias affected the output
  5. Calculate input gradients to pass back to previous layer
    • Helps earlier layers learn as well

The gradients tell us:

  • How to adjust each weight and bias to reduce errors
  • How the error signal should flow back to previous layers

All gradients are accumulated across the batch before being used to update parameters.

Exceptions

InvalidOperationException

Thrown when backward is called before forward.

BackwardGpu(IGpuTensor<T>)

Performs the backward pass using GPU-resident tensors.

public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)

Parameters

outputGradient IGpuTensor<T>

GPU-resident gradient of the loss w.r.t. output.

Returns

IGpuTensor<T>

GPU-resident gradient of the loss w.r.t. input.

ClearGradients()

Clears stored gradients for weights and biases.

public override void ClearGradients()

ExportComputationGraph(List<ComputationNode<T>>)

Exports the layer's computation graph for JIT compilation.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>

List to populate with input computation nodes.

Returns

ComputationNode<T>

The output computation node representing the layer's operation.

Remarks

This method constructs a computation graph representation of the layer's forward pass that can be JIT compiled for faster inference. All layers MUST implement this method to support JIT compilation.

For Beginners: JIT (Just-In-Time) compilation converts the layer's operations into optimized native code for 5-10x faster inference.

To support JIT compilation, a layer must:

  1. Implement this method to export its computation graph
  2. Set SupportsJitCompilation to true
  3. Use ComputationNode and TensorOperations to build the graph

All layers are required to implement this method, even if they set SupportsJitCompilation = false.

Forward(Tensor<T>)

Performs the forward pass of the fully connected layer.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>

The input tensor to process. Shape: [batchSize, inputSize].

Returns

Tensor<T>

The output tensor after the linear transformation and activation. Shape: [batchSize, outputSize].

Remarks

This method implements the forward pass of the fully connected layer. For each example in the batch, it performs a matrix multiplication between the input vector and the weight matrix, adds the bias vector, and applies the activation function to produce the final output. The input and output are cached for use during the backward pass.

For Beginners: This is where the layer processes input data to produce outputs.

The forward pass works in these steps for each example in the batch:

  1. Extract the input vector for this example
  2. Multiply the input vector by the weight matrix
    • Each output neuron computes a weighted sum of all inputs
  3. Add the bias vector to the result
    • Each output gets its own bias value added
  4. Apply the activation function
    • This introduces non-linearity, helping the network learn complex patterns
  5. Store the result in the output tensor

This process transforms the input data through the layer's learned parameters, producing output values that will either be passed to the next layer or used as the final network output.

ForwardGpu(params IGpuTensor<T>[])

Performs a GPU-resident forward pass, keeping tensors on GPU. Use this for chained layer execution to avoid CPU round-trips.

public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)

Parameters

inputs IGpuTensor<T>[]

GPU-resident input tensors (uses first input).

Returns

IGpuTensor<T>

GPU-resident output tensor.

Exceptions

InvalidOperationException

Thrown if GPU execution is not available.

GetParameterGradients()

Gets the gradients of all trainable parameters in this layer.

public override Vector<T> GetParameterGradients()

Returns

Vector<T>

GetParameters()

Gets all trainable parameters of the layer as a single vector.

public override Vector<T> GetParameters()

Returns

Vector<T>

A vector containing all trainable parameters.

Remarks

This method retrieves all trainable parameters (weights and biases) of the layer as a single vector. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights.

For Beginners: This method collects all the layer's learnable values into a single list.

The parameters include:

  • All the weight values (the majority of the parameters)
  • All the bias values (one per output neuron)

This combined list is useful for:

  • Saving a trained model to disk
  • Loading parameters from a previously trained model
  • Advanced optimization techniques that need all parameters together

For example, a layer with 100 inputs and 10 outputs would have:

  • 1,000 weight parameters (100 × 10)
  • 10 bias parameters (one per output)
  • Totaling 1,010 parameters in the returned vector

ResetState()

Resets the internal state of the layer.

public override void ResetState()

Remarks

This method resets the internal state of the layer by clearing all cached values from forward and backward passes. This is useful when starting to process a new batch of data.

For Beginners: This method clears the layer's memory to start fresh.

When resetting the state:

  • The saved input and output are cleared
  • The calculated gradients are cleared
  • The layer forgets previous calculations it performed

This is typically called:

  • Between training batches to free up memory
  • When switching from training to evaluation mode
  • When starting to process completely new data

It's like wiping a whiteboard clean before starting a new calculation. Note that this doesn't affect the learned weights and biases, just the temporary working data.

SetParameters(Vector<T>)

Sets the trainable parameters of the layer from a single vector.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

A vector containing all parameters to set.

Remarks

This method sets all trainable parameters (weights and biases) of the layer from a single vector. This is useful for loading saved model weights or for implementing optimization algorithms that operate on all parameters at once.

For Beginners: This method updates all the layer's learnable values from a provided list.

When setting parameters:

  • The input must be a vector with the exact right length
  • The values are distributed back to the weights and biases
  • This allows loading previously trained weights

Use cases include:

  • Restoring a saved model
  • Using pre-trained weights
  • Testing specific weight configurations

The method throws an error if the provided vector doesn't contain exactly the right number of values.

Exceptions

ArgumentException

Thrown when the parameters vector has incorrect length.

UpdateParameters(T)

Updates the weights and biases using the calculated gradients and the specified learning rate.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T

The learning rate to use for the parameter updates.

Remarks

This method updates the weights and biases based on the gradients calculated during the backward pass. The learning rate determines the size of the parameter updates. Smaller learning rates lead to more stable but slower training, while larger learning rates can lead to faster but potentially unstable training.

For Beginners: This method actually changes the weights and biases to improve future predictions.

After figuring out how each parameter should change:

  • Each weight and bias is adjusted in the direction that reduces errors
  • The learning rate controls how big these adjustments are

Think of it like adjusting a recipe after tasting:

  • Too salty? Reduce salt next time (adjust weights/biases)
  • But make small adjustments (learning rate), not drastic ones

For example, with a learning rate of 0.01:

  • A gradient of 0.5 would change the parameter by -0.005
  • A gradient of -2.0 would change the parameter by +0.02

The minus sign in the code is because we want to go in the opposite direction of the gradient to minimize error.

Exceptions

InvalidOperationException

Thrown when update is called before backward.