Table of Contents

Class FeedForwardLayer<T>

Namespace
AiDotNet.NeuralNetworks.Layers
Assembly
AiDotNet.dll

Represents a fully connected (dense) feed-forward layer in a neural network.

public class FeedForwardLayer<T> : LayerBase<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>, IDisposable

Type Parameters

T

The numeric type used for calculations, typically float or double.

Inheritance
FeedForwardLayer<T>
Implements
Inherited Members

Remarks

A feed-forward layer, also known as a fully connected or dense layer, is one of the most common types of neural network layers. It connects every input neuron to every output neuron with learnable weights. Each output neuron also has a learnable bias term. The layer applies a linear transformation followed by an activation function to produce its output.

For Beginners: A feed-forward layer is like a voting system where every input gets to vote on every output.

Imagine you have 3 inputs and 2 outputs:

  • Each input has a different level of influence (weight) on each output
  • Each output has its own starting value (bias)
  • The layer calculates each output by combining all input influences plus the bias
  • Finally, an activation function adds non-linearity (like setting a threshold)

For example:

  • Input: [0.2, 0.5, 0.1] (representing features from previous layer)
  • Weights: [[0.1, 0.8], [0.4, 0.3], [0.7, 0.2]] (each input's influence on each output)
  • Biases: [0.1, -0.2] (starting values for each output)
  • Output before activation: [0.2×0.1 + 0.5×0.4 + 0.1×0.7 + 0.1, 0.2×0.8 + 0.5×0.3 + 0.1×0.2 - 0.2] = [0.39, 0.33]
  • After activation (e.g., ReLU): [0.39, 0.33] (since both are already positive)

Feed-forward layers are the building blocks of many neural networks. Multiple feed-forward layers stacked together form a "deep" neural network that can learn increasingly complex patterns.

Constructors

FeedForwardLayer(int, int, IActivationFunction<T>?)

public FeedForwardLayer(int inputSize, int outputSize, IActivationFunction<T>? activationFunction = null)

Parameters

inputSize int
outputSize int
activationFunction IActivationFunction<T>

FeedForwardLayer(int, int, IVectorActivationFunction<T>?)

Initializes a new instance of the FeedForwardLayer<T> class with a vector activation function.

public FeedForwardLayer(int inputSize, int outputSize, IVectorActivationFunction<T>? activationFunction = null)

Parameters

inputSize int

The number of input neurons.

outputSize int

The number of output neurons.

activationFunction IVectorActivationFunction<T>

The vector activation function to apply after the linear transformation.

Remarks

This constructor creates a new feed-forward layer with the specified input size, output size, and vector activation function. The weights are initialized with small random values, and the biases are initialized to zero. Unlike the other constructor, this one accepts a vector activation function that operates on entire vectors rather than individual scalar values.

For Beginners: This is an alternative setup that uses a different kind of activation function.

This constructor is almost identical to the first one, but with one key difference:

  • Regular activation: processes each output value separately
  • Vector activation: processes the entire output vector together

Vector activation functions like Softmax are useful for:

  • Classification problems (choosing between multiple categories)
  • Problems where outputs need to sum to 1 (like probabilities)
  • Cases where output values should influence each other

For example, Softmax makes sure that increasing one output decreases all others, which is perfect for classification tasks.

Properties

ParameterCount

Gets the total number of trainable parameters in this layer.

public override int ParameterCount { get; }

Property Value

int

Remarks

This includes all weights (inputSize × outputSize) and all biases (outputSize).

SupportsGpuExecution

Gets a value indicating whether this layer supports GPU execution.

protected override bool SupportsGpuExecution { get; }

Property Value

bool

SupportsJitCompilation

Gets whether this layer supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool

True if the layer can be JIT compiled, false otherwise.

Remarks

This property indicates whether the layer has implemented ExportComputationGraph() and can benefit from JIT compilation. All layers MUST implement this property.

For Beginners: JIT compilation can make inference 5-10x faster by converting the layer's operations into optimized native code.

Layers should return false if they:

  • Have not yet implemented a working ExportComputationGraph()
  • Use dynamic operations that change based on input data
  • Are too simple to benefit from JIT compilation

When false, the layer will use the standard Forward() method instead.

SupportsTraining

The computation engine (CPU or GPU) for vectorized operations.

public override bool SupportsTraining { get; }

Property Value

bool

Always true because feed-forward layers have trainable parameters (weights and biases).

Remarks

This property indicates that the feed-forward layer supports training through backpropagation. The layer has trainable parameters (weights and biases) that are updated during the training process.

For Beginners: This property tells you that this layer can learn from data.

A value of true means:

  • The layer can adjust its weights and biases during training
  • It will improve its performance as it sees more data
  • It has parameters that are updated to make better predictions

Feed-forward layers are the primary learning components in many neural networks, as they contain most of the trainable parameters.

Methods

Backward(Tensor<T>)

Performs the backward pass of the feed-forward layer to compute gradients.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>

The gradient of the loss with respect to the layer's output.

Returns

Tensor<T>

The gradient of the loss with respect to the layer's input.

Remarks

This method implements the backward pass (backpropagation) of the feed-forward layer. It computes the gradients of the loss with respect to the layer's weights, biases, and inputs. These gradients are used to update the parameters during training and to propagate the error signal back to the previous layer.

For Beginners: This is where the layer learns from its mistakes during training.

The backward pass has several steps:

  1. Apply activation function derivative:
    • This determines how sensitive the output is to small changes
  2. Calculate gradient for weights:
    • Shows how each weight contributed to errors
  3. Calculate gradient for biases:
    • Shows how each bias affected the output
  4. Calculate gradient to pass to previous layer:
    • Helps the earlier layers learn as well

It's like figuring out who was responsible for a mistake in a team:

  • How much did each weight contribute to the error?
  • How much did each bias contribute?
  • How should we adjust them to do better next time?
  • What feedback should we give to the previous layer?

BackwardGpu(IGpuTensor<T>)

Performs the backward pass using GPU-resident tensors.

public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)

Parameters

outputGradient IGpuTensor<T>

GPU-resident gradient of the loss w.r.t. output.

Returns

IGpuTensor<T>

GPU-resident gradient of the loss w.r.t. input.

ExportComputationGraph(List<ComputationNode<T>>)

Exports the layer's computation graph for JIT compilation.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>

List to populate with input computation nodes.

Returns

ComputationNode<T>

The output computation node representing the layer's operation.

Remarks

This method constructs a computation graph representation of the layer's forward pass that can be JIT compiled for faster inference. All layers MUST implement this method to support JIT compilation.

For Beginners: JIT (Just-In-Time) compilation converts the layer's operations into optimized native code for 5-10x faster inference.

To support JIT compilation, a layer must:

  1. Implement this method to export its computation graph
  2. Set SupportsJitCompilation to true
  3. Use ComputationNode and TensorOperations to build the graph

All layers are required to implement this method, even if they set SupportsJitCompilation = false.

Forward(Tensor<T>)

Performs the forward pass of the feed-forward layer.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>

The input tensor to process.

Returns

Tensor<T>

The output tensor after the linear transformation and activation.

Remarks

This method implements the forward pass of the feed-forward layer. It performs a matrix multiplication between the input and the weights, adds the biases, and applies the activation function to produce the final output. The input and output are cached for use during the backward pass.

For Beginners: This is where the layer processes input data to produce predictions.

The forward pass works in three steps:

  1. Linear transformation: Multiply inputs by weights and add biases
    • Each output is a weighted sum of all inputs plus a bias term
  2. Apply activation function: Add non-linearity
    • This enables the network to learn complex patterns
  3. Store inputs and outputs for later use in training
    • This information is needed when updating weights and biases

This simple operation (multiply by weights, add bias, apply activation) is the core of how neural networks transform data.

ForwardGpu(params IGpuTensor<T>[])

Performs the forward pass using GPU-resident tensors.

public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)

Parameters

inputs IGpuTensor<T>[]

Returns

IGpuTensor<T>

A GPU-resident output tensor.

Remarks

This method performs the feed-forward computation (matmul + bias + activation) entirely on GPU without downloading intermediate results to CPU.

GetBiasesTensor()

Gets the bias tensor for JIT compilation and graph composition.

public Tensor<T> GetBiasesTensor()

Returns

Tensor<T>

GetParameters()

Gets all trainable parameters of the layer as a single vector.

public override Vector<T> GetParameters()

Returns

Vector<T>

A vector containing all trainable parameters.

Remarks

This method retrieves all trainable parameters (weights and biases) of the layer as a single vector. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights.

For Beginners: This method collects all the layer's learnable values into a single list.

The parameters include:

  • All the weight values (the majority of the parameters)
  • All the bias values (one per output neuron)

This combined list is useful for:

  • Saving a trained model to disk
  • Loading parameters from a previously trained model
  • Advanced optimization techniques that need all parameters together

For example, a layer with 100 inputs and 10 outputs would have:

  • 1,000 weight parameters (100 × 10)
  • 10 bias parameters (one per output)
  • Totaling 1,010 parameters in the returned vector

GetWeightsTensor()

Gets the weight tensor for JIT compilation and graph composition.

public Tensor<T> GetWeightsTensor()

Returns

Tensor<T>

ResetState()

Resets the internal state of the layer.

public override void ResetState()

Remarks

This method resets the internal state of the layer by clearing all cached values from forward and backward passes. This is useful when starting to process a new batch of data.

For Beginners: This method clears the layer's memory to start fresh.

When resetting the state:

  • The saved input and output are cleared
  • The calculated gradients are cleared
  • The layer forgets previous calculations it performed

This is typically called:

  • Between training batches to free up memory
  • When switching from training to evaluation mode
  • When starting to process completely new data

It's like wiping a whiteboard clean before starting a new calculation. Note that this doesn't affect the learned weights and biases, just the temporary working data.

SetParameters(Vector<T>)

Sets the trainable parameters of the layer from a single vector.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

A vector containing all parameters to set.

Remarks

This method sets all trainable parameters (weights and biases) of the layer from a single vector. This is useful for loading saved model weights or for implementing optimization algorithms that operate on all parameters at once.

For Beginners: This method updates all the layer's learnable values from a provided list.

When setting parameters:

  • The input must be a vector with the exact right length
  • The values are distributed back to the weights and biases
  • This allows loading previously trained weights

Use cases include:

  • Restoring a saved model
  • Using pre-trained weights
  • Testing specific weight configurations

The method throws an error if the provided vector doesn't contain exactly the right number of values.

Exceptions

ArgumentException

Thrown when the parameters vector has incorrect length.

UpdateParameters(T)

Updates the weights and biases using the calculated gradients and the specified learning rate.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T

The learning rate to use for the parameter updates.

Remarks

This method updates the weights and biases based on the gradients calculated during the backward pass. The learning rate determines the size of the parameter updates. Smaller learning rates lead to more stable but slower training, while larger learning rates can lead to faster but potentially unstable training.

For Beginners: This method actually changes the weights and biases to improve future predictions.

After figuring out how each parameter should change:

  • Each weight and bias is adjusted in the direction that reduces errors
  • The learning rate controls how big these adjustments are

Think of it like adjusting a recipe after tasting:

  • Too salty? Reduce salt next time (adjust weights/biases)
  • But make small adjustments (learning rate), not drastic ones

For example, with a learning rate of 0.01:

  • A gradient of 0.5 would change the parameter by -0.005
  • A gradient of -2.0 would change the parameter by +0.02

The minus sign in the code is because we want to go in the opposite direction of the gradient to minimize error.