Table of Contents

Class GraphConvolutionalLayer<T>

Namespace
AiDotNet.NeuralNetworks.Layers
Assembly
AiDotNet.dll

Represents a Graph Convolutional Network (GCN) layer for processing graph-structured data.

public class GraphConvolutionalLayer<T> : LayerBase<T>, IDisposable, IAuxiliaryLossLayer<T>, IGraphConvolutionLayer<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>

Type Parameters

T

The numeric type used for calculations, typically float or double.

Inheritance
GraphConvolutionalLayer<T>
Implements
Inherited Members

Remarks

A Graph Convolutional Layer applies convolution operations to graph-structured data by leveraging an adjacency matrix that defines connections between nodes in the graph. This layer learns representations for nodes in a graph by aggregating feature information from a node's local neighborhood. The layer performs the transformation: output = adjacency_matrix * input * weights + bias.

For Beginners: This layer helps neural networks understand data that's organized like a network or graph.

Think of a social network where people are connected to friends:

  • Each person is a "node" with certain features (age, interests, etc.)
  • Connections between people are "edges"
  • This layer helps the network learn patterns by looking at each person AND their connections

For example, in a social network recommendation system, this layer can help understand that a person might like something because their friends like it, even if their personal profile doesn't suggest they would.

Constructors

GraphConvolutionalLayer(int, int, IActivationFunction<T>?)

Initializes a new instance of the GraphConvolutionalLayer<T> class with the specified dimensions and activation function.

public GraphConvolutionalLayer(int inputFeatures, int outputFeatures, IActivationFunction<T>? activationFunction = null)

Parameters

inputFeatures int

The number of features in the input data for each node.

outputFeatures int

The number of features to output for each node.

activationFunction IActivationFunction<T>

The activation function to apply after the convolution. Defaults to identity if not specified.

Remarks

This constructor creates a new Graph Convolutional Layer with randomly initialized weights and zero biases. The activation function is applied element-wise to the output of the convolution operation.

For Beginners: This creates a new layer with specific input and output sizes.

When you create this layer, you specify:

  • How many features each node in your graph has (inputFeatures)
  • How many features you want in the output for each node (outputFeatures)
  • An optional activation function that adds non-linearity (making the network more powerful)

For example, if your graph represents molecules where each atom has 8 features, and you want to transform this into 16 features per atom, you would use inputFeatures=8 and outputFeatures=16.

GraphConvolutionalLayer(int, int, IVectorActivationFunction<T>?)

Initializes a new instance of the GraphConvolutionalLayer<T> class with the specified dimensions and vector activation function.

public GraphConvolutionalLayer(int inputFeatures, int outputFeatures, IVectorActivationFunction<T>? vectorActivationFunction = null)

Parameters

inputFeatures int

The number of features in the input data for each node.

outputFeatures int

The number of features to output for each node.

vectorActivationFunction IVectorActivationFunction<T>

The vector activation function to apply after the convolution. Defaults to identity if not specified.

Remarks

This constructor creates a new Graph Convolutional Layer with randomly initialized weights and zero biases. The vector activation function is applied to vectors of output features rather than individual elements.

For Beginners: This creates a new layer with a vector-based activation function.

A vector activation function:

  • Operates on entire groups of numbers at once, rather than one at a time
  • Can capture relationships between different elements in the output
  • Defaults to the Identity function, which doesn't change the values

This constructor is useful when you need more complex activation patterns that consider the relationships between different outputs.

Properties

AuxiliaryLossWeight

Gets or sets the weight for the auxiliary loss contribution.

public T AuxiliaryLossWeight { get; set; }

Property Value

T

Remarks

This value determines how much the graph smoothness loss contributes to the total loss. The default value of 0.01 provides a good balance between the main task and smoothness regularization.

For Beginners: This controls how much importance to give to the smoothness penalty.

The weight affects training:

  • Higher values (e.g., 0.1) make the network prioritize smooth features more strongly
  • Lower values (e.g., 0.001) make the smoothness penalty less important
  • The default (0.01) works well for most graph learning tasks

If your graph has very clear structure, you might increase this value. If the main task is more important, you might decrease it.

InputFeatures

Gets the number of input features per node.

public int InputFeatures { get; }

Property Value

int

OutputFeatures

Gets the number of output features per node.

public int OutputFeatures { get; }

Property Value

int

ParameterCount

Gets the total number of trainable parameters in this layer.

public override int ParameterCount { get; }

Property Value

int

SmoothnessWeight

Gets or sets the weight for Laplacian smoothness regularization.

public T SmoothnessWeight { get; set; }

Property Value

T

The weight to apply to the smoothness loss. Default is 0.001.

Remarks

This property controls the strength of Laplacian smoothness regularization applied to node features. Higher values encourage more similar representations for connected nodes, while lower values allow more variation between neighbors.

For Beginners: This controls how strongly to encourage smooth features across edges.

Smoothness regularization:

  • Encourages connected nodes to have similar features
  • Helps the network learn coherent representations across the graph
  • Can improve generalization by enforcing local consistency

Typical values range from 0.0001 to 0.01. Set to 0 to disable smoothness regularization.

SupportsGpuExecution

Gets whether this layer has GPU execution support.

protected override bool SupportsGpuExecution { get; }

Property Value

bool

Remarks

GraphConvolutionalLayer supports GPU execution using sparse matrix operations (CSR SpMM) for efficient message passing on large graphs. When edges are set via SetEdges(), the layer uses O(E) scatter-add operations instead of O(N²) dense matrix multiplication.

SupportsJitCompilation

Gets whether this layer supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool

True if the layer can be JIT compiled, false otherwise.

Remarks

This property indicates whether the layer has implemented ExportComputationGraph() and can benefit from JIT compilation. All layers MUST implement this property.

For Beginners: JIT compilation can make inference 5-10x faster by converting the layer's operations into optimized native code.

Layers should return false if they:

  • Have not yet implemented a working ExportComputationGraph()
  • Use dynamic operations that change based on input data
  • Are too simple to benefit from JIT compilation

When false, the layer will use the standard Forward() method instead.

SupportsTraining

Gets a value indicating whether this layer supports training.

public override bool SupportsTraining { get; }

Property Value

bool

true because this layer has trainable parameters (weights and biases).

Remarks

This property indicates whether the layer can be trained through backpropagation. The GraphConvolutionalLayer always returns true because it contains trainable weights and biases.

For Beginners: This property tells you if the layer can learn from data.

A value of true means:

  • The layer can adjust its internal values during training
  • It will improve its performance as it sees more data
  • It participates in the learning process

This layer always supports training because it has weights and biases that can be updated.

UseAuxiliaryLoss

Gets or sets a value indicating whether auxiliary loss is enabled for this layer.

public bool UseAuxiliaryLoss { get; set; }

Property Value

bool

Remarks

When enabled, the layer computes a graph smoothness auxiliary loss that encourages connected nodes to have similar learned representations. This helps the network learn more coherent graph embeddings.

For Beginners: This setting controls whether the layer uses an additional learning signal.

When enabled (true):

  • The layer encourages connected nodes to learn similar features
  • This helps the network understand that connected nodes should be related
  • Training may be more stable and produce better results

When disabled (false):

  • Only the main task loss is used for training
  • This is the default setting

UsesSparseAggregation

Gets whether sparse (edge-based) aggregation is currently enabled.

public bool UsesSparseAggregation { get; }

Property Value

bool

Methods

Backward(Tensor<T>)

Performs the backward pass of the graph convolutional layer.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>

The gradient of the loss with respect to the layer's output.

Returns

Tensor<T>

The gradient of the loss with respect to the layer's input.

Remarks

This method implements the backward pass of the graph convolutional layer, which is used during training to propagate error gradients back through the network. It calculates the gradients for the weights and biases, and returns the gradient with respect to the input for further backpropagation.

For Beginners: This method is used during training to calculate how the layer's input and parameters should change to reduce errors.

During the backward pass:

  1. The layer receives information about how its output should change to reduce the overall error
  2. It calculates how its weights and biases should change to produce better output
  3. It calculates how its input should change, which will be used by earlier layers

This complex calculation considers how information flows through the graph structure and ensures that connected nodes properly influence each other during learning.

Exceptions

InvalidOperationException

Thrown when Forward has not been called before Backward.

ClearEdges()

Clears the edge list and switches back to dense adjacency matrix aggregation.

public void ClearEdges()

ClearGradients()

Clears the stored gradients for this layer.

public override void ClearGradients()

ComputeAuxiliaryLoss()

Computes the Laplacian smoothness regularization loss on node features.

public T ComputeAuxiliaryLoss()

Returns

T

The Laplacian smoothness loss value.

Remarks

This method computes graph Laplacian smoothness regularization to encourage connected nodes to have similar feature representations. The loss is calculated as the sum of squared L2 distances between features of connected nodes, normalized by the number of edges. This is equivalent to trace(X^T * L * X) where L is the graph Laplacian matrix (L = D - A).

For Beginners: This method encourages connected nodes to have similar features.

Laplacian smoothness regularization:

  • Measures how different neighboring nodes are from each other
  • Adds a penalty when connected nodes have very different features
  • Helps the network learn coherent representations across the graph structure

The process:

  1. For each edge (i, j) in the graph
  2. Calculate the squared distance between node i's features and node j's features
  3. Sum all these distances
  4. Normalize by the number of edges

A higher loss means connected nodes have very different features. A lower loss means connected nodes have similar features, indicating a smooth representation.

This loss encourages the network to learn representations where nearby nodes in the graph have similar feature vectors, which often improves generalization on graph-structured data.

ExportComputationGraph(List<ComputationNode<T>>)

Exports the layer's computation graph for JIT compilation.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>

List to populate with input computation nodes.

Returns

ComputationNode<T>

The output computation node representing the layer's operation.

Remarks

This method constructs a computation graph representation of the layer's forward pass that can be JIT compiled for faster inference. All layers MUST implement this method to support JIT compilation.

For Beginners: JIT (Just-In-Time) compilation converts the layer's operations into optimized native code for 5-10x faster inference.

To support JIT compilation, a layer must:

  1. Implement this method to export its computation graph
  2. Set SupportsJitCompilation to true
  3. Use ComputationNode and TensorOperations to build the graph

All layers are required to implement this method, even if they set SupportsJitCompilation = false.

Forward(Tensor<T>)

Performs the forward pass of the graph convolutional layer.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>

The input tensor to process.

Returns

Tensor<T>

The output tensor after graph convolution and activation.

Remarks

This method implements the forward pass of the graph convolutional layer according to the formula: output = activation(adjacency_matrix * input * weights + bias). The input tensor should have shape [batchSize, numNodes, inputFeatures], and the output will have shape [batchSize, numNodes, outputFeatures].

For Beginners: This method processes data through the graph convolutional layer.

During the forward pass:

  1. The layer checks if you've provided a map of connections (adjacency matrix)
  2. It multiplies the input features by the weights to transform them
  3. It uses the adjacency matrix to gather information from connected nodes
  4. It adds a bias value to each output
  5. It applies an activation function to add non-linearity

This process allows each node to update its features based on both its own data and data from its neighbors in the graph.

Exceptions

InvalidOperationException

Thrown when the adjacency matrix has not been set.

ForwardGpu(params IGpuTensor<T>[])

GPU-accelerated forward pass for graph convolution using sparse matrix operations. Computes: output = activation(A * X * W + bias) entirely on GPU.

public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)

Parameters

inputs IGpuTensor<T>[]

GPU-resident input tensors. First input is node features [batch, numNodes, inputFeatures].

Returns

IGpuTensor<T>

GPU-resident output tensor [batch, numNodes, outputFeatures].

Remarks

This method implements GPU-accelerated graph convolution with support for both: - Sparse aggregation (O(E) complexity) when edges are set via SetEdges() - Dense aggregation (O(N^2) complexity) using adjacency matrix multiplication

For large sparse graphs (typical in GNN applications with 90%+ sparsity), sparse aggregation provides significant speedup over dense operations.

Exceptions

InvalidOperationException

Thrown when GPU execution is unavailable or graph structure not set.

GetAdjacencyMatrix()

Gets the adjacency matrix currently being used by this layer.

public Tensor<T>? GetAdjacencyMatrix()

Returns

Tensor<T>

The adjacency matrix tensor, or null if not set.

Remarks

This method retrieves the adjacency matrix that was set using SetAdjacencyMatrix. It may return null if the adjacency matrix has not been set yet.

For Beginners: This method lets you check what graph structure the layer is using.

This can be useful for:

  • Verifying the correct graph was loaded
  • Debugging graph connectivity issues
  • Visualizing the graph structure

GetAuxiliaryLossDiagnostics()

Gets diagnostic information about the auxiliary loss computation.

public Dictionary<string, string> GetAuxiliaryLossDiagnostics()

Returns

Dictionary<string, string>

A dictionary containing diagnostic information about the auxiliary loss.

Remarks

This method returns diagnostic information that can be used to monitor the auxiliary loss during training. The diagnostics include the total smoothness loss, the weight applied to it, and whether auxiliary loss is enabled.

For Beginners: This method provides information to help you understand how the auxiliary loss is working.

The diagnostics show:

  • TotalSmoothnessLoss: The computed penalty for feature differences between connected nodes
  • SmoothnessWeight: How much this penalty affects the overall training
  • UseSmoothnessLoss: Whether this penalty is currently enabled

You can use this information to:

  • Monitor if the smoothness penalty is too high or too low
  • Debug training issues
  • Understand how the graph structure affects learning

Example: If TotalSmoothnessLoss is very high, it might mean your network is learning very different features for connected nodes, which might indicate the need to adjust hyperparameters.

GetDiagnostics()

Gets diagnostic information about this component's state and behavior. Overrides GetDiagnostics() to include auxiliary loss diagnostics.

public override Dictionary<string, string> GetDiagnostics()

Returns

Dictionary<string, string>

A dictionary containing diagnostic metrics including both base layer diagnostics and auxiliary loss diagnostics from GetAuxiliaryLossDiagnostics().

GetParameterGradients()

Gets the gradients of all trainable parameters in this layer.

public override Vector<T> GetParameterGradients()

Returns

Vector<T>

GetParameters()

Gets all trainable parameters of the layer as a single vector.

public override Vector<T> GetParameters()

Returns

Vector<T>

A vector containing all trainable parameters.

Remarks

This method retrieves all trainable parameters (weights and biases) and combines them into a single vector. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights.

For Beginners: This method collects all the learnable values from the layer.

The parameters:

  • Are the numbers that the neural network learns during training
  • Include weights and biases
  • Are combined into a single long list (vector)

This is useful for:

  • Saving the model to disk
  • Loading parameters from a previously trained model
  • Advanced optimization techniques that need access to all parameters

ResetState()

Resets the internal state of the layer.

public override void ResetState()

Remarks

This method resets the internal state of the layer, clearing cached values from forward and backward passes. This is useful when starting to process a new sequence or when implementing stateful recurrent networks.

For Beginners: This method clears the layer's memory to start fresh.

When resetting the state:

  • Stored inputs and outputs are cleared
  • Gradient information is cleared
  • The layer forgets any information from previous data

This is important for:

  • Processing a new, unrelated graph
  • Preventing information from one training batch affecting another
  • Starting a new training episode

For example, if you've processed one graph and want to start with a new graph, you should reset the state to prevent the new graph from being influenced by the previous one.

SetAdjacencyMatrix(Tensor<T>)

Sets the adjacency matrix that defines the graph structure.

public void SetAdjacencyMatrix(Tensor<T> adjacencyMatrix)

Parameters

adjacencyMatrix Tensor<T>

The adjacency matrix tensor.

Remarks

This method sets the adjacency matrix that defines the graph structure. The adjacency matrix must be set before calling the Forward method. A non-zero value at position [i,j] indicates that node i is connected to node j. This method also extracts the edge list from the adjacency matrix for use in auxiliary loss computation.

Important Limitation: Edge extraction only examines the first batch element (batch index 0). This assumes all samples in a batch share the same graph structure. If different samples have different graph topologies, the smoothness loss computation will only reflect the structure of the first sample. For per-sample graph structures, consider extracting edges dynamically or using separate forward passes.

For Beginners: This method tells the layer how the nodes in your graph are connected.

The adjacency matrix is like a road map:

  • It shows which nodes can directly communicate with each other
  • It determines how information flows through your graph
  • It must be provided before processing data through the layer

For example, in a social network, the adjacency matrix would show who is friends with whom. In a molecule, it would show which atoms are bonded to each other.

SetEdges(Tensor<int>, Tensor<int>)

Sets the edge list representation of the graph structure for sparse aggregation.

public void SetEdges(Tensor<int> sourceIndices, Tensor<int> targetIndices)

Parameters

sourceIndices Tensor<int>

Tensor containing source node indices for each edge. Shape: [numEdges].

targetIndices Tensor<int>

Tensor containing target node indices for each edge. Shape: [numEdges].

Remarks

This method provides an edge-list representation of the graph, enabling memory-efficient sparse aggregation using scatter operations. This is the recommended approach for production GNN workloads, especially for large sparse graphs where O(E) complexity is much better than O(N^2) dense adjacency matrix operations.

SetParameters(Vector<T>)

Sets the trainable parameters of the layer.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>

A vector containing all parameters to set.

Remarks

This method sets the weights and biases of the layer from a single vector of parameters. The vector must have the correct length to match the total number of parameters in the layer. This is useful for loading saved model weights or for implementing optimization algorithms that operate on all parameters at once.

For Beginners: This method updates all the learnable values in the layer.

When setting parameters:

  • The input must be a vector with the correct length
  • The first part of the vector is used for the weights
  • The second part of the vector is used for the biases

This is useful for:

  • Loading a previously saved model
  • Transferring parameters from another model
  • Testing different parameter values

An error is thrown if the input vector doesn't have the expected number of parameters.

Exceptions

ArgumentException

Thrown when the parameters vector has incorrect length.

UpdateParameters(T)

Updates the parameters of the layer using the calculated gradients.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T

The learning rate to use for the parameter updates.

Remarks

This method updates the weights and biases of the layer based on the gradients calculated during the backward pass. The learning rate controls the size of the parameter updates. This is typically called after the backward pass during training.

For Beginners: This method updates the layer's internal values during training.

When updating parameters:

  • The weights and biases are adjusted to reduce prediction errors
  • The learning rate controls how big each update step is
  • Smaller learning rates mean slower but more stable learning
  • Larger learning rates mean faster but potentially unstable learning

This is how the layer "learns" from data over time, gradually improving its ability to extract useful patterns from graph-structured data.

Exceptions

InvalidOperationException

Thrown when Backward has not been called before UpdateParameters.