Class GraphTransformerLayer<T>
- Namespace
- AiDotNet.NeuralNetworks.Layers
- Assembly
- AiDotNet.dll
Implements Graph Transformer layer using self-attention mechanisms on graph-structured data.
public class GraphTransformerLayer<T> : LayerBase<T>, IDisposable, IGraphConvolutionLayer<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
LayerBase<T>GraphTransformerLayer<T>
- Implements
-
ILayer<T>
- Inherited Members
Remarks
Graph Transformers apply the transformer architecture to graphs by treating graph structure as a bias in the attention mechanism. Unlike standard transformers that process sequences, Graph Transformers incorporate graph connectivity through: 1. Structural encodings (e.g., Laplacian eigenvectors) 2. Attention biasing based on graph structure 3. Relative positional encodings for graph nodes
The attention computation is: Attention(Q, K, V) = softmax((QK^T + B)/√d_k)V where B is a learned bias based on graph structure.
For Beginners: Graph Transformers combine the power of transformers with graph structure.
Think of it like a meeting where:
- Standard transformers: Everyone can talk to everyone equally
- Graph transformers: People connected in the organizational chart get priority
Key advantages:
- Captures long-range dependencies in graphs
- More flexible than fixed neighborhood aggregation
- Can attend to any node, not just immediate neighbors
- Learns importance of connections dynamically
Use cases:
- Large molecules: Atoms far apart but chemically important
- Social networks: Identifying influential users across communities
- Knowledge graphs: Multi-hop reasoning
- Program analysis: Understanding code dependencies
Example: In a citation network, a paper can learn from:
- Direct citations (immediate neighbors)
- Indirectly related papers (through attention)
- Important papers even if not directly cited
Constructors
GraphTransformerLayer(int, int, int, int, bool, double, IActivationFunction<T>?, IActivationFunction<T>?)
Initializes a new instance of the GraphTransformerLayer<T> class with a scalar activation function.
public GraphTransformerLayer(int inputFeatures, int outputFeatures, int numHeads = 8, int headDim = 64, bool useStructuralEncoding = true, double dropoutRate = 0.1, IActivationFunction<T>? activationFunction = null, IActivationFunction<T>? ffnActivation = null)
Parameters
inputFeaturesintNumber of input features per node.
outputFeaturesintNumber of output features per node.
numHeadsintNumber of attention heads (default: 8).
headDimintDimension per attention head (default: 64).
useStructuralEncodingboolWhether to use structural bias (default: true).
dropoutRatedoubleDropout rate for attention (default: 0.1).
activationFunctionIActivationFunction<T>Scalar activation function for the layer output. Defaults to Identity if not specified.
ffnActivationIActivationFunction<T>Scalar activation function for FFN hidden layer. Defaults to GELU if not specified.
Remarks
Creates a Graph Transformer layer with multi-head attention and feed-forward network. The layer includes skip connections and layer normalization for stable training.
For Beginners: This creates a new Graph Transformer layer with scalar activation functions.
Key parameters:
- numHeads: How many parallel attention mechanisms (more = capture different patterns)
- headDim: Size of each attention head (bigger = more expressive per head)
- useStructuralEncoding: Whether to bias attention toward connected nodes
- true: Graph structure guides attention (recommended for most graphs)
- false: Pure attention without graph bias (for dense/complete graphs)
- dropoutRate: Randomly ignore some attention during training (prevents overfitting)
- ffnActivation: Activation for the feed-forward network (GELU, ReLU, SiLU, etc.)
The layer has two main components:
- Multi-head attention: Learns which nodes to focus on
- Feed-forward network: Processes the attended information
Both use skip connections (adding input back to output) for better gradient flow.
GraphTransformerLayer(int, int, int, int, bool, double, IVectorActivationFunction<T>?, IActivationFunction<T>?)
Initializes a new instance of the GraphTransformerLayer<T> class with a vector activation function.
public GraphTransformerLayer(int inputFeatures, int outputFeatures, int numHeads = 8, int headDim = 64, bool useStructuralEncoding = true, double dropoutRate = 0.1, IVectorActivationFunction<T>? vectorActivationFunction = null, IActivationFunction<T>? ffnActivation = null)
Parameters
inputFeaturesintNumber of input features per node.
outputFeaturesintNumber of output features per node.
numHeadsintNumber of attention heads (default: 8).
headDimintDimension per attention head (default: 64).
useStructuralEncodingboolWhether to use structural bias (default: true).
dropoutRatedoubleDropout rate for attention (default: 0.1).
vectorActivationFunctionIVectorActivationFunction<T>Vector activation function for the layer output. Defaults to Identity if not specified.
ffnActivationIActivationFunction<T>Scalar activation function for FFN hidden layer. Defaults to GELU if not specified.
Remarks
Creates a Graph Transformer layer with multi-head attention and feed-forward network, using a vector activation function that operates on entire vectors rather than individual elements.
For Beginners: This constructor is similar to the scalar version but uses a vector activation function.
Vector activation functions like Softmax are useful for:
- Node classification problems (choosing between multiple node types)
- Problems where outputs need to sum to 1 (like probabilities)
- Cases where output values should influence each other
For example, in a graph with molecules, you might use Softmax to classify each atom node into one of several element types.
Properties
InputFeatures
Gets the number of input features per node.
public int InputFeatures { get; }
Property Value
Remarks
This property indicates how many features each node in the graph has as input. For example, in a molecular graph, this might be properties of each atom.
For Beginners: This tells you how many pieces of information each node starts with.
Examples:
- In a social network: age, location, interests (3 features)
- In a molecule: atomic number, charge, mass (3 features)
- In a citation network: word embeddings (300 features)
Each node has the same number of input features.
OutputFeatures
Gets the number of output features per node.
public int OutputFeatures { get; }
Property Value
Remarks
This property indicates how many features each node will have after processing through this layer. The layer transforms each node's input features into output features through learned transformations.
For Beginners: This tells you how many pieces of information each node will have after processing.
The layer learns to:
- Combine input features in useful ways
- Extract important patterns
- Create new representations that are better for the task
For example, if you start with 10 features per node and the layer has 16 output features, each node's 10 numbers will be transformed into 16 numbers that hopefully capture more useful information for your specific task.
SupportsGpuExecution
Gets whether this layer has a GPU execution implementation for inference.
protected override bool SupportsGpuExecution { get; }
Property Value
Remarks
Override this to return true when the layer implements ForwardGpu(params IGpuTensor<T>[]). The actual CanExecuteOnGpu property combines this with engine availability.
For Beginners: This flag indicates if the layer has GPU code for the forward pass. Set this to true in derived classes that implement ForwardGpu.
SupportsJitCompilation
Gets whether this layer supports JIT compilation.
public override bool SupportsJitCompilation { get; }
Property Value
- bool
True if the layer can be JIT compiled, false otherwise.
Remarks
This property indicates whether the layer has implemented ExportComputationGraph() and can benefit from JIT compilation. All layers MUST implement this property.
For Beginners: JIT compilation can make inference 5-10x faster by converting the layer's operations into optimized native code.
Layers should return false if they:
- Have not yet implemented a working ExportComputationGraph()
- Use dynamic operations that change based on input data
- Are too simple to benefit from JIT compilation
When false, the layer will use the standard Forward() method instead.
SupportsTraining
Gets a value indicating whether this layer supports training.
public override bool SupportsTraining { get; }
Property Value
- bool
trueif the layer has trainable parameters and supports backpropagation; otherwise,false.
Remarks
This property indicates whether the layer can be trained through backpropagation. Layers with trainable parameters such as weights and biases typically return true, while layers that only perform fixed transformations (like pooling or activation layers) typically return false.
For Beginners: This property tells you if the layer can learn from data.
A value of true means:
- The layer has parameters that can be adjusted during training
- It will improve its performance as it sees more data
- It participates in the learning process
A value of false means:
- The layer doesn't have any adjustable parameters
- It performs the same operation regardless of training
- It doesn't need to learn (but may still be useful)
Methods
Backward(Tensor<T>)
Performs the backward pass of the graph transformer layer.
public override Tensor<T> Backward(Tensor<T> outputGradient)
Parameters
outputGradientTensor<T>The gradient of the loss with respect to the layer's output.
Returns
- Tensor<T>
The gradient of the loss with respect to the layer's input.
Exceptions
- InvalidOperationException
Thrown when Forward has not been called before Backward.
ExportComputationGraph(List<ComputationNode<T>>)
Exports the layer's computation graph for JIT compilation.
public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)
Parameters
inputNodesList<ComputationNode<T>>List to populate with input computation nodes.
Returns
- ComputationNode<T>
The output computation node representing the layer's operation.
Remarks
This method constructs a computation graph representation of the layer's forward pass that can be JIT compiled for faster inference. All layers MUST implement this method to support JIT compilation.
For Beginners: JIT (Just-In-Time) compilation converts the layer's operations into optimized native code for 5-10x faster inference.
To support JIT compilation, a layer must:
- Implement this method to export its computation graph
- Set SupportsJitCompilation to true
- Use ComputationNode and TensorOperations to build the graph
All layers are required to implement this method, even if they set SupportsJitCompilation = false.
Forward(Tensor<T>)
Performs the forward pass of the layer.
public override Tensor<T> Forward(Tensor<T> input)
Parameters
inputTensor<T>The input tensor to process.
Returns
- Tensor<T>
The output tensor after processing.
Remarks
This abstract method must be implemented by derived classes to define the forward pass of the layer. The forward pass transforms the input tensor according to the layer's operation and activation function.
For Beginners: This method processes your data through the layer.
The forward pass:
- Takes input data from the previous layer or the network input
- Applies the layer's specific transformation (like convolution or matrix multiplication)
- Applies any activation function
- Passes the result to the next layer
This is where the actual data processing happens during both training and prediction.
ForwardGpu(params IGpuTensor<T>[])
GPU-accelerated forward pass for GraphTransformerLayer. Implements multi-head self-attention with structural bias and FFN on GPU.
public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)
Parameters
inputsIGpuTensor<T>[]
Returns
- IGpuTensor<T>
GetAdjacencyMatrix()
Gets the adjacency matrix currently being used by this layer.
public Tensor<T>? GetAdjacencyMatrix()
Returns
- Tensor<T>
The adjacency matrix tensor, or null if not set.
Remarks
This method retrieves the adjacency matrix that was set using SetAdjacencyMatrix. It may return null if the adjacency matrix has not been set yet.
For Beginners: This method lets you check what graph structure the layer is using.
This can be useful for:
- Verifying the correct graph was loaded
- Debugging graph connectivity issues
- Visualizing the graph structure
GetParameters()
Gets all trainable parameters of the layer as a single vector.
public override Vector<T> GetParameters()
Returns
- Vector<T>
A vector containing all trainable parameters.
Remarks
This abstract method must be implemented by derived classes to provide access to all trainable parameters of the layer as a single vector. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights.
For Beginners: This method collects all the learnable values from the layer.
The parameters:
- Are the numbers that the neural network learns during training
- Include weights, biases, and other learnable values
- Are combined into a single long list (vector)
This is useful for:
- Saving the model to disk
- Loading parameters from a previously trained model
- Advanced optimization techniques that need access to all parameters
ResetState()
Resets the internal state of the layer.
public override void ResetState()
Remarks
This abstract method must be implemented by derived classes to reset any internal state the layer maintains between forward and backward passes. This is useful when starting to process a new sequence or when implementing stateful recurrent networks.
For Beginners: This method clears the layer's memory to start fresh.
When resetting the state:
- Cached inputs and outputs are cleared
- Any temporary calculations are discarded
- The layer is ready to process new data without being influenced by previous data
This is important for:
- Processing a new, unrelated sequence
- Preventing information from one sequence affecting another
- Starting a new training episode
SetAdjacencyMatrix(Tensor<T>)
Sets the adjacency matrix that defines the graph structure.
public void SetAdjacencyMatrix(Tensor<T> adjacencyMatrix)
Parameters
adjacencyMatrixTensor<T>The adjacency matrix tensor representing node connections.
Remarks
The adjacency matrix is a square matrix where element [i,j] indicates whether and how strongly node i is connected to node j. Common formats include: - Binary adjacency: 1 if connected, 0 otherwise - Weighted adjacency: connection strength as a value - Normalized adjacency: preprocessed for better training
For Beginners: This method tells the layer how nodes in the graph are connected.
Think of the adjacency matrix as a map:
- Each row represents a node
- Each column represents a potential connection
- The value at position [i,j] tells if node i connects to node j
For example, in a social network:
- adjacencyMatrix[Alice, Bob] = 1 means Alice is friends with Bob
- adjacencyMatrix[Alice, Charlie] = 0 means Alice is not friends with Charlie
This connectivity information is crucial for graph neural networks to propagate information between connected nodes.
SetParameters(Vector<T>)
Sets the trainable parameters of the layer.
public override void SetParameters(Vector<T> parameters)
Parameters
parametersVector<T>A vector containing all parameters to set.
Remarks
This method sets all the trainable parameters of the layer from a single vector of parameters. The parameters vector must have the correct length to match the total number of parameters in the layer. By default, it simply assigns the parameters vector to the Parameters field, but derived classes may override this to handle the parameters differently.
For Beginners: This method updates all the learnable values in the layer.
When setting parameters:
- The input must be a vector with the correct length
- The layer parses this vector to set all its internal parameters
- Throws an error if the input doesn't match the expected number of parameters
This is useful for:
- Loading a previously saved model
- Transferring parameters from another model
- Setting specific parameter values for testing
Exceptions
- ArgumentException
Thrown when the parameters vector has incorrect length.
UpdateParameters(T)
Updates the parameters of the layer using the calculated gradients.
public override void UpdateParameters(T learningRate)
Parameters
learningRateTThe learning rate to use for the parameter updates.
Remarks
This abstract method must be implemented by derived classes to define how the layer's parameters are updated during training. The learning rate controls the size of the parameter updates.
For Beginners: This method updates the layer's internal values during training.
When updating parameters:
- The weights, biases, or other parameters are adjusted to reduce prediction errors
- The learning rate controls how big each update step is
- Smaller learning rates mean slower but more stable learning
- Larger learning rates mean faster but potentially unstable learning
This is how the layer "learns" from data over time, gradually improving its ability to extract useful patterns from inputs.