Class FullyConnectedLayer<T>

Namespace: AiDotNet.NeuralNetworks.Layers

Assembly: AiDotNet.dll

Represents a fully connected layer in a neural network where every input neuron connects to every output neuron.

public class FullyConnectedLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable

Type Parameters

T: The numeric type used for calculations, typically float or double.

Inheritance: object

LayerBase<T>

FullyConnectedLayer<T>

Implements: ILayer<T>

IJitCompilable<T>

IDiagnosticsProvider

IWeightLoadable<T>

IDisposable

Inherited Members: LayerBase<T>.Engine

LayerBase<T>.ScalarActivation

LayerBase<T>.VectorActivation

LayerBase<T>.UsingVectorActivation

LayerBase<T>.NumOps

LayerBase<T>.Random

LayerBase<T>.Parameters

LayerBase<T>.ParameterGradients

LayerBase<T>.InputShape

LayerBase<T>.InputShapes

LayerBase<T>.UpdateInputShape(int[])

LayerBase<T>.OutputShape

LayerBase<T>.IsTrainingMode

LayerBase<T>.InitializationStrategy

LayerBase<T>.IsInitialized

LayerBase<T>.InitializationLock

LayerBase<T>.EnsureInitialized()

LayerBase<T>.UseAutodiff

LayerBase<T>.SetTrainingMode(bool)

LayerBase<T>.GetInputShape()

LayerBase<T>.GetInputShapes()

LayerBase<T>.GetOutputShape()

LayerBase<T>.GetWeights()

LayerBase<T>.GetBiases()

LayerBase<T>.MapActivationToFused()

LayerBase<T>.SupportsGpuTraining

LayerBase<T>.CanExecuteOnGpu

LayerBase<T>.CanTrainOnGpu

LayerBase<T>.UpdateParametersGpu(IGpuOptimizerConfig)

LayerBase<T>.UploadWeightsToGpu()

LayerBase<T>.DownloadWeightsFromGpu()

LayerBase<T>.ZeroGradientsGpu()

LayerBase<T>.GetActivationTypes()

LayerBase<T>.Forward(params Tensor<T>[])

LayerBase<T>.ApplyActivation(Tensor<T>)

LayerBase<T>.ApplyActivation(Vector<T>)

LayerBase<T>.ActivateTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ActivateTensor(IVectorActivationFunction<T>, Tensor<T>)

LayerBase<T>.CalculateInputShape(int, int, int)

LayerBase<T>.CalculateOutputShape(int, int, int)

LayerBase<T>.Clone()

LayerBase<T>.DerivativeTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ApplyActivationDerivative(T, T)

LayerBase<T>.ApplyActivationDerivative(Tensor<T>, Tensor<T>)

LayerBase<T>.ComputeActivationJacobian(Vector<T>)

LayerBase<T>.ApplyActivationDerivative(Vector<T>, Vector<T>)

LayerBase<T>.UpdateParameters(Vector<T>)

LayerBase<T>.ParameterCount

LayerBase<T>.Serialize(BinaryWriter)

LayerBase<T>.Deserialize(BinaryReader)

LayerBase<T>.GetDiagnostics()

LayerBase<T>.ApplyActivationToGraph(ComputationNode<T>)

LayerBase<T>.CanActivationBeJitted()

LayerBase<T>.RegisterTrainableParameter(Tensor<T>, PersistentTensorRole)

LayerBase<T>.InvalidateTrainableParameter(Tensor<T>)

LayerBase<T>.HasGpuActivation()

LayerBase<T>.ApplyActivationForwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.ApplyActivationBackwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.GetFusedActivationType()

LayerBase<T>.ApplyGpuActivation(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int, FusedActivationType)

LayerBase<T>.ApplyGpuActivationBackward(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int, FusedActivationType, float)

LayerBase<T>.Dispose()

LayerBase<T>.Dispose(bool)

LayerBase<T>.WeightParameterName

LayerBase<T>.BiasParameterName

LayerBase<T>.SetWeights(Tensor<T>)

LayerBase<T>.SetBiases(Tensor<T>)

LayerBase<T>.GetParameterNames()

LayerBase<T>.TryGetParameter(string, out Tensor<T>)

LayerBase<T>.SetParameter(string, Tensor<T>)

LayerBase<T>.GetParameterShape(string)

LayerBase<T>.NamedParameterCount

LayerBase<T>.ValidateWeights(IEnumerable<string>, Func<string, string>)

LayerBase<T>.LoadWeights(Dictionary<string, Tensor<T>>, Func<string, string>, bool)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

A fully connected layer, also known as a dense layer, is a fundamental building block in neural networks. It connects every input neuron to every output neuron with learnable weights. Each output neuron also has a learnable bias term. The layer applies a linear transformation followed by an activation function to produce its output. Fully connected layers are particularly useful for learning complex patterns and for classification tasks.

For Beginners: A fully connected layer connects every input to every output, like a complete web of connections.

Imagine you have inputs representing different features:

Each feature (input) connects to every possible output
Each connection has a strength (weight) that can be adjusted
Each output also has a starting value (bias)

For example, in an image classification task:

Inputs might be flattened features from convolutional layers
Each output might represent a score for a different category
The connections (weights) learn which features are important for each category

Fully connected layers are excellent at combining features to make final decisions. They're often used toward the end of a neural network to interpret the features extracted by earlier layers.

Constructors

FullyConnectedLayer(int, int, IActivationFunction<T>?)

public FullyConnectedLayer(int inputSize, int outputSize, IActivationFunction<T>? activationFunction = null)

Parameters

inputSize int
outputSize int
activationFunction IActivationFunction<T>

FullyConnectedLayer(int, int, IVectorActivationFunction<T>?)

public FullyConnectedLayer(int inputSize, int outputSize, IVectorActivationFunction<T>? vectorActivationFunction = null)

Parameters

inputSize int
outputSize int
vectorActivationFunction IVectorActivationFunction<T>

Properties

SupportsGpuExecution

Gets whether this layer has a GPU implementation.

protected override bool SupportsGpuExecution { get; }

Property Value

bool

SupportsJitCompilation

Gets whether this layer supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool: True if the layer can be JIT compiled, false otherwise.

Remarks

This property indicates whether the layer has implemented ExportComputationGraph() and can benefit from JIT compilation. All layers MUST implement this property.

For Beginners: JIT compilation can make inference 5-10x faster by converting the layer's operations into optimized native code.

Layers should return false if they:

Have not yet implemented a working ExportComputationGraph()
Use dynamic operations that change based on input data
Are too simple to benefit from JIT compilation

When false, the layer will use the standard Forward() method instead.

SupportsTraining

Gets a value indicating whether this layer supports training.

public override bool SupportsTraining { get; }

Property Value

bool: Always true because fully connected layers have trainable parameters (weights and biases).

Remarks

This property indicates that the fully connected layer supports training through backpropagation. The layer has trainable parameters (weights and biases) that are updated during the training process.

For Beginners: This property tells you that this layer can learn from data.

A value of true means:

The layer can adjust its weights and biases during training
It will improve its performance as it sees more data
It has parameters that are updated to make better predictions

Fully connected layers are primary learning components in neural networks, as they contain trainable parameters that adapt to recognize patterns in the data.

Methods

Backward(Tensor<T>)

Performs the backward pass of the fully connected layer to compute gradients.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>: The gradient tensor from the next layer. Shape: [batchSize, outputSize].

Returns

Tensor<T>: The gradient tensor to be passed to the previous layer. Shape: [batchSize, inputSize].

Remarks

This method implements the backward pass (backpropagation) of the fully connected layer. It computes the gradients of the loss with respect to the layer's weights, biases, and inputs. These gradients are used to update the parameters during training and to propagate the error signal back to the previous layer.

For Beginners: This is where the layer learns from its mistakes during training.

The backward pass works in these steps for each example in the batch:

Extract the gradient and necessary vectors for this example
Apply the activation function derivative to the gradient
- This accounts for how the activation function affected the output
Calculate weight gradients using outer product
- Shows how each weight contributed to the error
Accumulate bias gradients
- Shows how each bias affected the output
Calculate input gradients to pass back to previous layer
- Helps earlier layers learn as well

The gradients tell us:

How to adjust each weight and bias to reduce errors
How the error signal should flow back to previous layers

All gradients are accumulated across the batch before being used to update parameters.

Exceptions

InvalidOperationException: Thrown when backward is called before forward.

BackwardGpu(IGpuTensor<T>)

Performs the backward pass using GPU-resident tensors.

public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)

Parameters

outputGradient IGpuTensor<T>: GPU-resident gradient of the loss w.r.t. output.

Returns

IGpuTensor<T>: GPU-resident gradient of the loss w.r.t. input.

ClearGradients()

Clears stored gradients for weights and biases.

public override void ClearGradients()

ExportComputationGraph(List<ComputationNode<T>>)

Exports the layer's computation graph for JIT compilation.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>: List to populate with input computation nodes.

Returns

ComputationNode<T>: The output computation node representing the layer's operation.

Remarks

This method constructs a computation graph representation of the layer's forward pass that can be JIT compiled for faster inference. All layers MUST implement this method to support JIT compilation.

For Beginners: JIT (Just-In-Time) compilation converts the layer's operations into optimized native code for 5-10x faster inference.

To support JIT compilation, a layer must:

Implement this method to export its computation graph
Set SupportsJitCompilation to true
Use ComputationNode and TensorOperations to build the graph

All layers are required to implement this method, even if they set SupportsJitCompilation = false.

Forward(Tensor<T>)

Performs the forward pass of the fully connected layer.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>: The input tensor to process. Shape: [batchSize, inputSize].

Returns

Tensor<T>: The output tensor after the linear transformation and activation. Shape: [batchSize, outputSize].

Remarks

This method implements the forward pass of the fully connected layer. For each example in the batch, it performs a matrix multiplication between the input vector and the weight matrix, adds the bias vector, and applies the activation function to produce the final output. The input and output are cached for use during the backward pass.

For Beginners: This is where the layer processes input data to produce outputs.

The forward pass works in these steps for each example in the batch:

Extract the input vector for this example
Multiply the input vector by the weight matrix
- Each output neuron computes a weighted sum of all inputs
Add the bias vector to the result
- Each output gets its own bias value added
Apply the activation function
- This introduces non-linearity, helping the network learn complex patterns
Store the result in the output tensor

This process transforms the input data through the layer's learned parameters, producing output values that will either be passed to the next layer or used as the final network output.

ForwardGpu(params IGpuTensor<T>[])

Performs a GPU-resident forward pass, keeping tensors on GPU. Use this for chained layer execution to avoid CPU round-trips.

public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)

Parameters

inputs IGpuTensor<T>[]: GPU-resident input tensors (uses first input).

Returns

IGpuTensor<T>: GPU-resident output tensor.

Exceptions

InvalidOperationException: Thrown if GPU execution is not available.

GetParameterGradients()

Gets the gradients of all trainable parameters in this layer.

public override Vector<T> GetParameterGradients()

Returns

Vector<T>

GetParameters()

Gets all trainable parameters of the layer as a single vector.

public override Vector<T> GetParameters()

Returns

Vector<T>: A vector containing all trainable parameters.

Remarks

This method retrieves all trainable parameters (weights and biases) of the layer as a single vector. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights.

For Beginners: This method collects all the layer's learnable values into a single list.

The parameters include:

All the weight values (the majority of the parameters)
All the bias values (one per output neuron)

This combined list is useful for:

Saving a trained model to disk
Loading parameters from a previously trained model
Advanced optimization techniques that need all parameters together

For example, a layer with 100 inputs and 10 outputs would have:

1,000 weight parameters (100 × 10)
10 bias parameters (one per output)
Totaling 1,010 parameters in the returned vector

ResetState()

Resets the internal state of the layer.

public override void ResetState()

Remarks

This method resets the internal state of the layer by clearing all cached values from forward and backward passes. This is useful when starting to process a new batch of data.

For Beginners: This method clears the layer's memory to start fresh.

When resetting the state:

The saved input and output are cleared
The calculated gradients are cleared
The layer forgets previous calculations it performed

This is typically called:

Between training batches to free up memory
When switching from training to evaluation mode
When starting to process completely new data

It's like wiping a whiteboard clean before starting a new calculation. Note that this doesn't affect the learned weights and biases, just the temporary working data.

SetParameters(Vector<T>)

Sets the trainable parameters of the layer from a single vector.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>: A vector containing all parameters to set.

Remarks

This method sets all trainable parameters (weights and biases) of the layer from a single vector. This is useful for loading saved model weights or for implementing optimization algorithms that operate on all parameters at once.

For Beginners: This method updates all the layer's learnable values from a provided list.

When setting parameters:

The input must be a vector with the exact right length
The values are distributed back to the weights and biases
This allows loading previously trained weights

Use cases include:

Restoring a saved model
Using pre-trained weights
Testing specific weight configurations

The method throws an error if the provided vector doesn't contain exactly the right number of values.

Exceptions

ArgumentException: Thrown when the parameters vector has incorrect length.

UpdateParameters(T)

Updates the weights and biases using the calculated gradients and the specified learning rate.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T: The learning rate to use for the parameter updates.

Remarks

This method updates the weights and biases based on the gradients calculated during the backward pass. The learning rate determines the size of the parameter updates. Smaller learning rates lead to more stable but slower training, while larger learning rates can lead to faster but potentially unstable training.

For Beginners: This method actually changes the weights and biases to improve future predictions.

After figuring out how each parameter should change:

Each weight and bias is adjusted in the direction that reduces errors
The learning rate controls how big these adjustments are

Think of it like adjusting a recipe after tasting:

Too salty? Reduce salt next time (adjust weights/biases)
But make small adjustments (learning rate), not drastic ones

For example, with a learning rate of 0.01:

A gradient of 0.5 would change the parameter by -0.005
A gradient of -2.0 would change the parameter by +0.02

The minus sign in the code is because we want to go in the opposite direction of the gradient to minimize error.

Exceptions

InvalidOperationException: Thrown when update is called before backward.

Table of Contents

Class FullyConnectedLayer<T>

Type Parameters

Remarks

Constructors

FullyConnectedLayer(int, int, IActivationFunction<T>?)

Parameters

FullyConnectedLayer(int, int, IVectorActivationFunction<T>?)

Parameters

Properties

SupportsGpuExecution

Property Value

SupportsJitCompilation

Property Value

Remarks

SupportsTraining

Property Value

Remarks

Methods

Backward(Tensor<T>)

Parameters

Returns

Remarks

Exceptions

BackwardGpu(IGpuTensor<T>)

Parameters

Returns

ClearGradients()

ExportComputationGraph(List<ComputationNode<T>>)

Parameters

Returns

Remarks

Forward(Tensor<T>)

Parameters

Returns

Remarks

ForwardGpu(params IGpuTensor<T>[])

Parameters

Returns

Exceptions

GetParameterGradients()

Returns

GetParameters()

Returns

Remarks

ResetState()

Remarks

SetParameters(Vector<T>)

Parameters

Remarks

Exceptions

UpdateParameters(T)

Parameters

Remarks

Exceptions