Class GraphConvolutionalLayer<T>
- Namespace
- AiDotNet.NeuralNetworks.Layers
- Assembly
- AiDotNet.dll
Represents a Graph Convolutional Network (GCN) layer for processing graph-structured data.
public class GraphConvolutionalLayer<T> : LayerBase<T>, IDisposable, IAuxiliaryLossLayer<T>, IGraphConvolutionLayer<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
LayerBase<T>GraphConvolutionalLayer<T>
- Implements
-
ILayer<T>
- Inherited Members
Remarks
A Graph Convolutional Layer applies convolution operations to graph-structured data by leveraging an adjacency matrix that defines connections between nodes in the graph. This layer learns representations for nodes in a graph by aggregating feature information from a node's local neighborhood. The layer performs the transformation: output = adjacency_matrix * input * weights + bias.
For Beginners: This layer helps neural networks understand data that's organized like a network or graph.
Think of a social network where people are connected to friends:
- Each person is a "node" with certain features (age, interests, etc.)
- Connections between people are "edges"
- This layer helps the network learn patterns by looking at each person AND their connections
For example, in a social network recommendation system, this layer can help understand that a person might like something because their friends like it, even if their personal profile doesn't suggest they would.
Constructors
GraphConvolutionalLayer(int, int, IActivationFunction<T>?)
Initializes a new instance of the GraphConvolutionalLayer<T> class with the specified dimensions and activation function.
public GraphConvolutionalLayer(int inputFeatures, int outputFeatures, IActivationFunction<T>? activationFunction = null)
Parameters
inputFeaturesintThe number of features in the input data for each node.
outputFeaturesintThe number of features to output for each node.
activationFunctionIActivationFunction<T>The activation function to apply after the convolution. Defaults to identity if not specified.
Remarks
This constructor creates a new Graph Convolutional Layer with randomly initialized weights and zero biases. The activation function is applied element-wise to the output of the convolution operation.
For Beginners: This creates a new layer with specific input and output sizes.
When you create this layer, you specify:
- How many features each node in your graph has (inputFeatures)
- How many features you want in the output for each node (outputFeatures)
- An optional activation function that adds non-linearity (making the network more powerful)
For example, if your graph represents molecules where each atom has 8 features, and you want to transform this into 16 features per atom, you would use inputFeatures=8 and outputFeatures=16.
GraphConvolutionalLayer(int, int, IVectorActivationFunction<T>?)
Initializes a new instance of the GraphConvolutionalLayer<T> class with the specified dimensions and vector activation function.
public GraphConvolutionalLayer(int inputFeatures, int outputFeatures, IVectorActivationFunction<T>? vectorActivationFunction = null)
Parameters
inputFeaturesintThe number of features in the input data for each node.
outputFeaturesintThe number of features to output for each node.
vectorActivationFunctionIVectorActivationFunction<T>The vector activation function to apply after the convolution. Defaults to identity if not specified.
Remarks
This constructor creates a new Graph Convolutional Layer with randomly initialized weights and zero biases. The vector activation function is applied to vectors of output features rather than individual elements.
For Beginners: This creates a new layer with a vector-based activation function.
A vector activation function:
- Operates on entire groups of numbers at once, rather than one at a time
- Can capture relationships between different elements in the output
- Defaults to the Identity function, which doesn't change the values
This constructor is useful when you need more complex activation patterns that consider the relationships between different outputs.
Properties
AuxiliaryLossWeight
Gets or sets the weight for the auxiliary loss contribution.
public T AuxiliaryLossWeight { get; set; }
Property Value
- T
Remarks
This value determines how much the graph smoothness loss contributes to the total loss. The default value of 0.01 provides a good balance between the main task and smoothness regularization.
For Beginners: This controls how much importance to give to the smoothness penalty.
The weight affects training:
- Higher values (e.g., 0.1) make the network prioritize smooth features more strongly
- Lower values (e.g., 0.001) make the smoothness penalty less important
- The default (0.01) works well for most graph learning tasks
If your graph has very clear structure, you might increase this value. If the main task is more important, you might decrease it.
InputFeatures
Gets the number of input features per node.
public int InputFeatures { get; }
Property Value
OutputFeatures
Gets the number of output features per node.
public int OutputFeatures { get; }
Property Value
ParameterCount
Gets the total number of trainable parameters in this layer.
public override int ParameterCount { get; }
Property Value
SmoothnessWeight
Gets or sets the weight for Laplacian smoothness regularization.
public T SmoothnessWeight { get; set; }
Property Value
- T
The weight to apply to the smoothness loss. Default is 0.001.
Remarks
This property controls the strength of Laplacian smoothness regularization applied to node features. Higher values encourage more similar representations for connected nodes, while lower values allow more variation between neighbors.
For Beginners: This controls how strongly to encourage smooth features across edges.
Smoothness regularization:
- Encourages connected nodes to have similar features
- Helps the network learn coherent representations across the graph
- Can improve generalization by enforcing local consistency
Typical values range from 0.0001 to 0.01. Set to 0 to disable smoothness regularization.
SupportsGpuExecution
Gets whether this layer has GPU execution support.
protected override bool SupportsGpuExecution { get; }
Property Value
Remarks
GraphConvolutionalLayer supports GPU execution using sparse matrix operations (CSR SpMM) for efficient message passing on large graphs. When edges are set via SetEdges(), the layer uses O(E) scatter-add operations instead of O(N²) dense matrix multiplication.
SupportsJitCompilation
Gets whether this layer supports JIT compilation.
public override bool SupportsJitCompilation { get; }
Property Value
- bool
True if the layer can be JIT compiled, false otherwise.
Remarks
This property indicates whether the layer has implemented ExportComputationGraph() and can benefit from JIT compilation. All layers MUST implement this property.
For Beginners: JIT compilation can make inference 5-10x faster by converting the layer's operations into optimized native code.
Layers should return false if they:
- Have not yet implemented a working ExportComputationGraph()
- Use dynamic operations that change based on input data
- Are too simple to benefit from JIT compilation
When false, the layer will use the standard Forward() method instead.
SupportsTraining
Gets a value indicating whether this layer supports training.
public override bool SupportsTraining { get; }
Property Value
- bool
truebecause this layer has trainable parameters (weights and biases).
Remarks
This property indicates whether the layer can be trained through backpropagation. The GraphConvolutionalLayer always returns true because it contains trainable weights and biases.
For Beginners: This property tells you if the layer can learn from data.
A value of true means:
- The layer can adjust its internal values during training
- It will improve its performance as it sees more data
- It participates in the learning process
This layer always supports training because it has weights and biases that can be updated.
UseAuxiliaryLoss
Gets or sets a value indicating whether auxiliary loss is enabled for this layer.
public bool UseAuxiliaryLoss { get; set; }
Property Value
Remarks
When enabled, the layer computes a graph smoothness auxiliary loss that encourages connected nodes to have similar learned representations. This helps the network learn more coherent graph embeddings.
For Beginners: This setting controls whether the layer uses an additional learning signal.
When enabled (true):
- The layer encourages connected nodes to learn similar features
- This helps the network understand that connected nodes should be related
- Training may be more stable and produce better results
When disabled (false):
- Only the main task loss is used for training
- This is the default setting
UsesSparseAggregation
Gets whether sparse (edge-based) aggregation is currently enabled.
public bool UsesSparseAggregation { get; }
Property Value
Methods
Backward(Tensor<T>)
Performs the backward pass of the graph convolutional layer.
public override Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>The gradient of the loss with respect to the layer's output.
Returns
- Tensor<T>
The gradient of the loss with respect to the layer's input.
Remarks
This method implements the backward pass of the graph convolutional layer, which is used during training to propagate error gradients back through the network. It calculates the gradients for the weights and biases, and returns the gradient with respect to the input for further backpropagation.
For Beginners: This method is used during training to calculate how the layer's input and parameters should change to reduce errors.
During the backward pass:
- The layer receives information about how its output should change to reduce the overall error
- It calculates how its weights and biases should change to produce better output
- It calculates how its input should change, which will be used by earlier layers
This complex calculation considers how information flows through the graph structure and ensures that connected nodes properly influence each other during learning.
Exceptions
- InvalidOperationException
Thrown when Forward has not been called before Backward.
ClearEdges()
Clears the edge list and switches back to dense adjacency matrix aggregation.
public void ClearEdges()
ClearGradients()
Clears the stored gradients for this layer.
public override void ClearGradients()
ComputeAuxiliaryLoss()
Computes the Laplacian smoothness regularization loss on node features.
public T ComputeAuxiliaryLoss()
Returns
- T
The Laplacian smoothness loss value.
Remarks
This method computes graph Laplacian smoothness regularization to encourage connected nodes to have similar feature representations. The loss is calculated as the sum of squared L2 distances between features of connected nodes, normalized by the number of edges. This is equivalent to trace(X^T * L * X) where L is the graph Laplacian matrix (L = D - A).
For Beginners: This method encourages connected nodes to have similar features.
Laplacian smoothness regularization:
- Measures how different neighboring nodes are from each other
- Adds a penalty when connected nodes have very different features
- Helps the network learn coherent representations across the graph structure
The process:
- For each edge (i, j) in the graph
- Calculate the squared distance between node i's features and node j's features
- Sum all these distances
- Normalize by the number of edges
A higher loss means connected nodes have very different features. A lower loss means connected nodes have similar features, indicating a smooth representation.
This loss encourages the network to learn representations where nearby nodes in the graph have similar feature vectors, which often improves generalization on graph-structured data.
ExportComputationGraph(List<ComputationNode<T>>)
Exports the layer's computation graph for JIT compilation.
public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)
Parameters
inputNodesList<ComputationNode<T>>List to populate with input computation nodes.
Returns
- ComputationNode<T>
The output computation node representing the layer's operation.
Remarks
This method constructs a computation graph representation of the layer's forward pass that can be JIT compiled for faster inference. All layers MUST implement this method to support JIT compilation.
For Beginners: JIT (Just-In-Time) compilation converts the layer's operations into optimized native code for 5-10x faster inference.
To support JIT compilation, a layer must:
- Implement this method to export its computation graph
- Set SupportsJitCompilation to true
- Use ComputationNode and TensorOperations to build the graph
All layers are required to implement this method, even if they set SupportsJitCompilation = false.
Forward(Tensor<T>)
Performs the forward pass of the graph convolutional layer.
public override Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>The input tensor to process.
Returns
- Tensor<T>
The output tensor after graph convolution and activation.
Remarks
This method implements the forward pass of the graph convolutional layer according to the formula: output = activation(adjacency_matrix * input * weights + bias). The input tensor should have shape [batchSize, numNodes, inputFeatures], and the output will have shape [batchSize, numNodes, outputFeatures].
For Beginners: This method processes data through the graph convolutional layer.
During the forward pass:
- The layer checks if you've provided a map of connections (adjacency matrix)
- It multiplies the input features by the weights to transform them
- It uses the adjacency matrix to gather information from connected nodes
- It adds a bias value to each output
- It applies an activation function to add non-linearity
This process allows each node to update its features based on both its own data and data from its neighbors in the graph.
Exceptions
- InvalidOperationException
Thrown when the adjacency matrix has not been set.
ForwardGpu(params IGpuTensor<T>[])
GPU-accelerated forward pass for graph convolution using sparse matrix operations. Computes: output = activation(A * X * W + bias) entirely on GPU.
public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)
Parameters
inputsIGpuTensor<T>[]GPU-resident input tensors. First input is node features [batch, numNodes, inputFeatures].
Returns
- IGpuTensor<T>
GPU-resident output tensor [batch, numNodes, outputFeatures].
Remarks
This method implements GPU-accelerated graph convolution with support for both: - Sparse aggregation (O(E) complexity) when edges are set via SetEdges() - Dense aggregation (O(N^2) complexity) using adjacency matrix multiplication
For large sparse graphs (typical in GNN applications with 90%+ sparsity), sparse aggregation provides significant speedup over dense operations.
Exceptions
- InvalidOperationException
Thrown when GPU execution is unavailable or graph structure not set.
GetAdjacencyMatrix()
Gets the adjacency matrix currently being used by this layer.
public Tensor<T>? GetAdjacencyMatrix()
Returns
- Tensor<T>
The adjacency matrix tensor, or null if not set.
Remarks
This method retrieves the adjacency matrix that was set using SetAdjacencyMatrix. It may return null if the adjacency matrix has not been set yet.
For Beginners: This method lets you check what graph structure the layer is using.
This can be useful for:
- Verifying the correct graph was loaded
- Debugging graph connectivity issues
- Visualizing the graph structure
GetAuxiliaryLossDiagnostics()
Gets diagnostic information about the auxiliary loss computation.
public Dictionary<string, string> GetAuxiliaryLossDiagnostics()
Returns
- Dictionary<string, string>
A dictionary containing diagnostic information about the auxiliary loss.
Remarks
This method returns diagnostic information that can be used to monitor the auxiliary loss during training. The diagnostics include the total smoothness loss, the weight applied to it, and whether auxiliary loss is enabled.
For Beginners: This method provides information to help you understand how the auxiliary loss is working.
The diagnostics show:
- TotalSmoothnessLoss: The computed penalty for feature differences between connected nodes
- SmoothnessWeight: How much this penalty affects the overall training
- UseSmoothnessLoss: Whether this penalty is currently enabled
You can use this information to:
- Monitor if the smoothness penalty is too high or too low
- Debug training issues
- Understand how the graph structure affects learning
Example: If TotalSmoothnessLoss is very high, it might mean your network is learning very different features for connected nodes, which might indicate the need to adjust hyperparameters.
GetDiagnostics()
Gets diagnostic information about this component's state and behavior. Overrides GetDiagnostics() to include auxiliary loss diagnostics.
public override Dictionary<string, string> GetDiagnostics()
Returns
- Dictionary<string, string>
A dictionary containing diagnostic metrics including both base layer diagnostics and auxiliary loss diagnostics from GetAuxiliaryLossDiagnostics().
GetParameterGradients()
Gets the gradients of all trainable parameters in this layer.
public override Vector<T> GetParameterGradients()
Returns
- Vector<T>
GetParameters()
Gets all trainable parameters of the layer as a single vector.
public override Vector<T> GetParameters()
Returns
- Vector<T>
A vector containing all trainable parameters.
Remarks
This method retrieves all trainable parameters (weights and biases) and combines them into a single vector. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights.
For Beginners: This method collects all the learnable values from the layer.
The parameters:
- Are the numbers that the neural network learns during training
- Include weights and biases
- Are combined into a single long list (vector)
This is useful for:
- Saving the model to disk
- Loading parameters from a previously trained model
- Advanced optimization techniques that need access to all parameters
ResetState()
Resets the internal state of the layer.
public override void ResetState()
Remarks
This method resets the internal state of the layer, clearing cached values from forward and backward passes. This is useful when starting to process a new sequence or when implementing stateful recurrent networks.
For Beginners: This method clears the layer's memory to start fresh.
When resetting the state:
- Stored inputs and outputs are cleared
- Gradient information is cleared
- The layer forgets any information from previous data
This is important for:
- Processing a new, unrelated graph
- Preventing information from one training batch affecting another
- Starting a new training episode
For example, if you've processed one graph and want to start with a new graph, you should reset the state to prevent the new graph from being influenced by the previous one.
SetAdjacencyMatrix(Tensor<T>)
Sets the adjacency matrix that defines the graph structure.
public void SetAdjacencyMatrix(Tensor<T> adjacencyMatrix)
Parameters
adjacencyMatrixTensor<T>The adjacency matrix tensor.
Remarks
This method sets the adjacency matrix that defines the graph structure. The adjacency matrix must be set before calling the Forward method. A non-zero value at position [i,j] indicates that node i is connected to node j. This method also extracts the edge list from the adjacency matrix for use in auxiliary loss computation.
Important Limitation: Edge extraction only examines the first batch element (batch index 0). This assumes all samples in a batch share the same graph structure. If different samples have different graph topologies, the smoothness loss computation will only reflect the structure of the first sample. For per-sample graph structures, consider extracting edges dynamically or using separate forward passes.
For Beginners: This method tells the layer how the nodes in your graph are connected.
The adjacency matrix is like a road map:
- It shows which nodes can directly communicate with each other
- It determines how information flows through your graph
- It must be provided before processing data through the layer
For example, in a social network, the adjacency matrix would show who is friends with whom. In a molecule, it would show which atoms are bonded to each other.
SetEdges(Tensor<int>, Tensor<int>)
Sets the edge list representation of the graph structure for sparse aggregation.
public void SetEdges(Tensor<int> sourceIndices, Tensor<int> targetIndices)
Parameters
sourceIndicesTensor<int>Tensor containing source node indices for each edge. Shape: [numEdges].
targetIndicesTensor<int>Tensor containing target node indices for each edge. Shape: [numEdges].
Remarks
This method provides an edge-list representation of the graph, enabling memory-efficient sparse aggregation using scatter operations. This is the recommended approach for production GNN workloads, especially for large sparse graphs where O(E) complexity is much better than O(N^2) dense adjacency matrix operations.
SetParameters(Vector<T>)
Sets the trainable parameters of the layer.
public override void SetParameters(Vector<T> parameters)
Parameters
parametersVector<T>A vector containing all parameters to set.
Remarks
This method sets the weights and biases of the layer from a single vector of parameters. The vector must have the correct length to match the total number of parameters in the layer. This is useful for loading saved model weights or for implementing optimization algorithms that operate on all parameters at once.
For Beginners: This method updates all the learnable values in the layer.
When setting parameters:
- The input must be a vector with the correct length
- The first part of the vector is used for the weights
- The second part of the vector is used for the biases
This is useful for:
- Loading a previously saved model
- Transferring parameters from another model
- Testing different parameter values
An error is thrown if the input vector doesn't have the expected number of parameters.
Exceptions
- ArgumentException
Thrown when the parameters vector has incorrect length.
UpdateParameters(T)
Updates the parameters of the layer using the calculated gradients.
public override void UpdateParameters(T learningRate)
Parameters
learningRateTThe learning rate to use for the parameter updates.
Remarks
This method updates the weights and biases of the layer based on the gradients calculated during the backward pass. The learning rate controls the size of the parameter updates. This is typically called after the backward pass during training.
For Beginners: This method updates the layer's internal values during training.
When updating parameters:
- The weights and biases are adjusted to reduce prediction errors
- The learning rate controls how big each update step is
- Smaller learning rates mean slower but more stable learning
- Larger learning rates mean faster but potentially unstable learning
This is how the layer "learns" from data over time, gradually improving its ability to extract useful patterns from graph-structured data.
Exceptions
- InvalidOperationException
Thrown when Backward has not been called before UpdateParameters.