Class MessagePassingLayer<T>

Namespace: AiDotNet.NeuralNetworks.Layers

Assembly: AiDotNet.dll

Implements a general Message Passing Neural Network (MPNN) layer.

public class MessagePassingLayer<T> : LayerBase<T>, IDisposable, IGraphConvolutionLayer<T>, ILayer<T>, IJitCompilable<T>, IDiagnosticsProvider, IWeightLoadable<T>

Type Parameters

T: The numeric type used for calculations, typically float or double.

Inheritance: object

LayerBase<T>

MessagePassingLayer<T>

Implements: IDisposable

IGraphConvolutionLayer<T>

ILayer<T>

IJitCompilable<T>

IDiagnosticsProvider

IWeightLoadable<T>

Inherited Members: LayerBase<T>.Engine

LayerBase<T>.ScalarActivation

LayerBase<T>.VectorActivation

LayerBase<T>.UsingVectorActivation

LayerBase<T>.NumOps

LayerBase<T>.Random

LayerBase<T>.Parameters

LayerBase<T>.ParameterGradients

LayerBase<T>.InputShape

LayerBase<T>.InputShapes

LayerBase<T>.UpdateInputShape(int[])

LayerBase<T>.OutputShape

LayerBase<T>.IsTrainingMode

LayerBase<T>.InitializationStrategy

LayerBase<T>.IsInitialized

LayerBase<T>.InitializationLock

LayerBase<T>.EnsureInitialized()

LayerBase<T>.UseAutodiff

LayerBase<T>.SetTrainingMode(bool)

LayerBase<T>.GetParameterGradients()

LayerBase<T>.ClearGradients()

LayerBase<T>.GetInputShape()

LayerBase<T>.GetInputShapes()

LayerBase<T>.GetOutputShape()

LayerBase<T>.GetWeights()

LayerBase<T>.GetBiases()

LayerBase<T>.MapActivationToFused()

LayerBase<T>.SupportsGpuTraining

LayerBase<T>.CanExecuteOnGpu

LayerBase<T>.CanTrainOnGpu

LayerBase<T>.UpdateParametersGpu(IGpuOptimizerConfig)

LayerBase<T>.UploadWeightsToGpu()

LayerBase<T>.DownloadWeightsFromGpu()

LayerBase<T>.ZeroGradientsGpu()

LayerBase<T>.GetActivationTypes()

LayerBase<T>.Forward(params Tensor<T>[])

LayerBase<T>.ApplyActivation(Tensor<T>)

LayerBase<T>.ApplyActivation(Vector<T>)

LayerBase<T>.ActivateTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ActivateTensor(IVectorActivationFunction<T>, Tensor<T>)

LayerBase<T>.CalculateInputShape(int, int, int)

LayerBase<T>.CalculateOutputShape(int, int, int)

LayerBase<T>.Clone()

LayerBase<T>.DerivativeTensor(IActivationFunction<T>, Tensor<T>)

LayerBase<T>.ApplyActivationDerivative(T, T)

LayerBase<T>.ApplyActivationDerivative(Tensor<T>, Tensor<T>)

LayerBase<T>.ComputeActivationJacobian(Vector<T>)

LayerBase<T>.ApplyActivationDerivative(Vector<T>, Vector<T>)

LayerBase<T>.UpdateParameters(Vector<T>)

LayerBase<T>.ParameterCount

LayerBase<T>.Serialize(BinaryWriter)

LayerBase<T>.Deserialize(BinaryReader)

LayerBase<T>.GetDiagnostics()

LayerBase<T>.ApplyActivationToGraph(ComputationNode<T>)

LayerBase<T>.CanActivationBeJitted()

LayerBase<T>.RegisterTrainableParameter(Tensor<T>, PersistentTensorRole)

LayerBase<T>.InvalidateTrainableParameter(Tensor<T>)

LayerBase<T>.HasGpuActivation()

LayerBase<T>.ApplyActivationForwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.ApplyActivationBackwardGpu(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int)

LayerBase<T>.GetFusedActivationType()

LayerBase<T>.ApplyGpuActivation(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, int, FusedActivationType)

LayerBase<T>.ApplyGpuActivationBackward(IDirectGpuBackend, IGpuBuffer, IGpuBuffer, IGpuBuffer, IGpuBuffer, int, FusedActivationType, float)

LayerBase<T>.Dispose()

LayerBase<T>.Dispose(bool)

LayerBase<T>.WeightParameterName

LayerBase<T>.BiasParameterName

LayerBase<T>.SetWeights(Tensor<T>)

LayerBase<T>.SetBiases(Tensor<T>)

LayerBase<T>.GetParameterNames()

LayerBase<T>.TryGetParameter(string, out Tensor<T>)

LayerBase<T>.SetParameter(string, Tensor<T>)

LayerBase<T>.GetParameterShape(string)

LayerBase<T>.NamedParameterCount

LayerBase<T>.ValidateWeights(IEnumerable<string>, Func<string, string>)

LayerBase<T>.LoadWeights(Dictionary<string, Tensor<T>>, Func<string, string>, bool)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

Message Passing Neural Networks provide a general framework for graph neural networks. The framework consists of three key functions: 1. Message: Computes messages from neighbors 2. Aggregate: Combines messages from all neighbors 3. Update: Updates node representations using aggregated messages

The layer performs the following computation for each node v: - m_v = AGGREGATE({MESSAGE(h_u, h_v, e_uv) : u ∈ N(v)}) - h_v' = UPDATE(h_v, m_v)

where h_v are node features, e_uv are edge features, and N(v) is the neighborhood of v.

For Beginners: Think of message passing like spreading information through a network.

Imagine a social network where:

Message: Each friend sends you a message (combining their info with yours)
Aggregate: You collect and summarize all messages from friends
Update: You update your own status based on the summary

This happens for all people simultaneously, allowing information to flow through the network.

Use cases:

Molecule analysis: Atoms sharing information about chemical bonds
Social networks: Users influenced by their connections
Citation networks: Papers learning from papers they cite
Recommendation systems: Items learning from similar items

Constructors

MessagePassingLayer(int, int, int, bool, int, IActivationFunction<T>?)

Initializes a new instance of the MessagePassingLayer<T> class.

public MessagePassingLayer(int inputFeatures, int outputFeatures, int messageFeatures = -1, bool useEdgeFeatures = false, int edgeFeatureDim = 0, IActivationFunction<T>? activationFunction = null)

Parameters

inputFeatures int: Number of input features per node.
outputFeatures int: Number of output features per node.
messageFeatures int: Hidden dimension for message computation (default: same as outputFeatures).
useEdgeFeatures bool: Whether to incorporate edge features (default: false).
edgeFeatureDim int: Dimension of edge features if used.
activationFunction IActivationFunction<T>: Activation function to apply.

Remarks

Creates a message passing layer with learnable message, aggregate, and update functions. The message function is implemented as a 2-layer MLP, aggregation uses sum, and update uses a GRU-style gated mechanism.

For Beginners: This creates a new message passing layer.

Key parameters:

messageFeatures: Size of the messages exchanged between nodes
useEdgeFeatures: Whether connections (edges) have their own information
- true: Use edge properties (like "strength of friendship" in social networks)
- false: All connections are treated equally

The layer learns three things:

How to create messages from node pairs
How to combine multiple messages
How to update nodes based on received messages

Properties

InputFeatures

Gets the number of input features per node.

public int InputFeatures { get; }

Property Value

int

Remarks

This property indicates how many features each node in the graph has as input. For example, in a molecular graph, this might be properties of each atom.

For Beginners: This tells you how many pieces of information each node starts with.

Examples:

In a social network: age, location, interests (3 features)
In a molecule: atomic number, charge, mass (3 features)
In a citation network: word embeddings (300 features)

Each node has the same number of input features.

OutputFeatures

Gets the number of output features per node.

public int OutputFeatures { get; }

Property Value

int

Remarks

This property indicates how many features each node will have after processing through this layer. The layer transforms each node's input features into output features through learned transformations.

For Beginners: This tells you how many pieces of information each node will have after processing.

The layer learns to:

Combine input features in useful ways
Extract important patterns
Create new representations that are better for the task

For example, if you start with 10 features per node and the layer has 16 output features, each node's 10 numbers will be transformed into 16 numbers that hopefully capture more useful information for your specific task.

SupportsGpuExecution

Gets whether this layer supports GPU execution.

protected override bool SupportsGpuExecution { get; }

Property Value

bool

Remarks

MessagePassingLayer supports GPU execution with efficient message computation, sum aggregation, and GRU-style update on GPU.

SupportsJitCompilation

Gets whether this layer supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool: True if the layer can be JIT compiled, false otherwise.

Remarks

This property indicates whether the layer has implemented ExportComputationGraph() and can benefit from JIT compilation. All layers MUST implement this property.

For Beginners: JIT compilation can make inference 5-10x faster by converting the layer's operations into optimized native code.

Layers should return false if they:

Have not yet implemented a working ExportComputationGraph()
Use dynamic operations that change based on input data
Are too simple to benefit from JIT compilation

When false, the layer will use the standard Forward() method instead.

SupportsTraining

Gets a value indicating whether this layer supports training.

public override bool SupportsTraining { get; }

Property Value

bool: true if the layer has trainable parameters and supports backpropagation; otherwise, false.

Remarks

This property indicates whether the layer can be trained through backpropagation. Layers with trainable parameters such as weights and biases typically return true, while layers that only perform fixed transformations (like pooling or activation layers) typically return false.

For Beginners: This property tells you if the layer can learn from data.

A value of true means:

The layer has parameters that can be adjusted during training
It will improve its performance as it sees more data
It participates in the learning process

A value of false means:

The layer doesn't have any adjustable parameters
It performs the same operation regardless of training
It doesn't need to learn (but may still be useful)

UsesSparseAggregation

Gets whether sparse (edge-based) aggregation is currently enabled.

public bool UsesSparseAggregation { get; }

Property Value

bool

Methods

Backward(Tensor<T>)

Computes the backward pass for this Message Passing layer.

public override Tensor<T> Backward(Tensor<T> outputGradient)

Parameters

outputGradient Tensor<T>: The gradient of the loss with respect to this layer's output.

Returns

Tensor<T>: The gradient of the loss with respect to this layer's input.

Remarks

Implementation Note: This backward pass computes actual gradients for all parameters through the full message passing, aggregation, and GRU-style update operations. The implementation properly handles gradient flow through:

Message network weights and biases (2-layer MLP with ReLU)
Update gate weights and biases (GRU-style gating)
Reset gate weights and biases (GRU-style gating)
Input gradients for proper backpropagation to upstream layers

This enables effective training of the message passing layer with full gradient-based optimization.

BackwardGpu(IGpuTensor<T>)

GPU-accelerated backward pass for Message Passing Neural Network.

public override IGpuTensor<T> BackwardGpu(IGpuTensor<T> outputGradient)

Parameters

outputGradient IGpuTensor<T>

Returns

IGpuTensor<T>

Remarks

Computes gradients through the MPNN: 1. Backward through GRU-style update (sigmoid derivatives) 2. Backward through aggregation (scatter-add backward = gather) 3. Backward through message MLP (layer 2, ReLU, layer 1) 4. Accumulate gradients for all weight matrices

ClearEdges()

Clears the edge list and switches back to dense adjacency matrix aggregation.

public void ClearEdges()

ExportComputationGraph(List<ComputationNode<T>>)

Exports the layer's computation graph for JIT compilation.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>: List to populate with input computation nodes.

Returns

ComputationNode<T>: The output computation node representing the layer's operation.

Remarks

This method constructs a computation graph representation of the layer's forward pass that can be JIT compiled for faster inference. All layers MUST implement this method to support JIT compilation.

For Beginners: JIT (Just-In-Time) compilation converts the layer's operations into optimized native code for 5-10x faster inference.

To support JIT compilation, a layer must:

Implement this method to export its computation graph
Set SupportsJitCompilation to true
Use ComputationNode and TensorOperations to build the graph

All layers are required to implement this method, even if they set SupportsJitCompilation = false.

Forward(Tensor<T>)

Performs the forward pass of the layer.

public override Tensor<T> Forward(Tensor<T> input)

Parameters

input Tensor<T>: The input tensor to process.

Returns

Tensor<T>: The output tensor after processing.

Remarks

This abstract method must be implemented by derived classes to define the forward pass of the layer. The forward pass transforms the input tensor according to the layer's operation and activation function.

For Beginners: This method processes your data through the layer.

The forward pass:

Takes input data from the previous layer or the network input
Applies the layer's specific transformation (like convolution or matrix multiplication)
Applies any activation function
Passes the result to the next layer

This is where the actual data processing happens during both training and prediction.

ForwardGpu(params IGpuTensor<T>[])

GPU-accelerated forward pass for Message Passing Neural Network.

public override IGpuTensor<T> ForwardGpu(params IGpuTensor<T>[] inputs)

Parameters

inputs IGpuTensor<T>[]

Returns

IGpuTensor<T>

Remarks

Implements the actual MPNN algorithm on GPU: 1. For each edge (i→j), gather source and target features 2. Compute per-edge message: m_ij = MLP(concat(h_source, h_target)) 3. Scatter-add to aggregate messages per target node: m_i = Σ_{j∈N(i)} m_ji 4. GRU-style update: h'_i = (1-z)*h_i + z*m_i

GetAdjacencyMatrix()

Gets the adjacency matrix currently being used by this layer.

public Tensor<T>? GetAdjacencyMatrix()

Returns

Tensor<T>: The adjacency matrix tensor, or null if not set.

Remarks

This method retrieves the adjacency matrix that was set using SetAdjacencyMatrix. It may return null if the adjacency matrix has not been set yet.

For Beginners: This method lets you check what graph structure the layer is using.

This can be useful for:

Verifying the correct graph was loaded
Debugging graph connectivity issues
Visualizing the graph structure

GetParameterTensors()

Gets all trainable parameters as a list of tensors.

public List<Tensor<T>> GetParameterTensors()

Returns

List<Tensor<T>>: A list containing all trainable parameter tensors.

GetParameters()

Gets all trainable parameters of the layer as a single vector.

public override Vector<T> GetParameters()

Returns

Vector<T>: A vector containing all trainable parameters.

Remarks

This abstract method must be implemented by derived classes to provide access to all trainable parameters of the layer as a single vector. This is useful for optimization algorithms that operate on all parameters at once, or for saving and loading model weights.

For Beginners: This method collects all the learnable values from the layer.

The parameters:

Are the numbers that the neural network learns during training
Include weights, biases, and other learnable values
Are combined into a single long list (vector)

This is useful for:

Saving the model to disk
Loading parameters from a previously trained model
Advanced optimization techniques that need access to all parameters

ResetState()

Resets the internal state of the layer.

public override void ResetState()

Remarks

This abstract method must be implemented by derived classes to reset any internal state the layer maintains between forward and backward passes. This is useful when starting to process a new sequence or when implementing stateful recurrent networks.

For Beginners: This method clears the layer's memory to start fresh.

When resetting the state:

Cached inputs and outputs are cleared
Any temporary calculations are discarded
The layer is ready to process new data without being influenced by previous data

This is important for:

Processing a new, unrelated sequence
Preventing information from one sequence affecting another
Starting a new training episode

SetAdjacencyMatrix(Tensor<T>)

Sets the adjacency matrix that defines the graph structure.

public void SetAdjacencyMatrix(Tensor<T> adjacencyMatrix)

Parameters

adjacencyMatrix Tensor<T>: The adjacency matrix tensor representing node connections.

Remarks

The adjacency matrix is a square matrix where element [i,j] indicates whether and how strongly node i is connected to node j. Common formats include: - Binary adjacency: 1 if connected, 0 otherwise - Weighted adjacency: connection strength as a value - Normalized adjacency: preprocessed for better training

For Beginners: This method tells the layer how nodes in the graph are connected.

Think of the adjacency matrix as a map:

Each row represents a node
Each column represents a potential connection
The value at position [i,j] tells if node i connects to node j

For example, in a social network:

adjacencyMatrix[Alice, Bob] = 1 means Alice is friends with Bob
adjacencyMatrix[Alice, Charlie] = 0 means Alice is not friends with Charlie

This connectivity information is crucial for graph neural networks to propagate information between connected nodes.

SetEdgeFeatures(Tensor<T>)

Sets the edge features tensor.

public void SetEdgeFeatures(Tensor<T> edgeFeatures)

Parameters

edgeFeatures Tensor<T>: Tensor of edge features with shape [batch, numNodes * numNodes, edgeFeatureDim]. Edge features are indexed by flattened source-target node indices: edgeIdx = sourceNode * numNodes + targetNode.

Remarks

Shape Contract: The tensor must have shape [batch, numNodes * numNodes, edgeFeatureDim] where numNodes is the number of nodes in the graph. Edge (i, j) is accessed at index i * numNodes + j. This dense representation includes slots for all possible edges; non-existent edges are ignored based on the adjacency matrix during forward computation.

Design Note: Dense edge feature storage is used for efficient random access during message computation. For sparse graphs, consider using attention-based layers (GAT) instead, which do not require explicit edge features.

SetEdges(Tensor<int>, Tensor<int>)

Sets the edge list representation of the graph structure for sparse aggregation.

public void SetEdges(Tensor<int> sourceIndices, Tensor<int> targetIndices)

Parameters

sourceIndices Tensor<int>: Tensor containing source node indices for each edge. Shape: [numEdges].
targetIndices Tensor<int>: Tensor containing target node indices for each edge. Shape: [numEdges].

Remarks

This method provides an edge-list representation of the graph, enabling memory-efficient sparse message passing using scatter operations.

SetParameterTensors(List<Tensor<T>>)

Sets all trainable parameters from a list of tensors.

public void SetParameterTensors(List<Tensor<T>> parameters)

Parameters

parameters List<Tensor<T>>: The list of parameter tensors to set.

SetParameters(Vector<T>)

Sets the trainable parameters of the layer.

public override void SetParameters(Vector<T> parameters)

Parameters

parameters Vector<T>: A vector containing all parameters to set.

Remarks

This method sets all the trainable parameters of the layer from a single vector of parameters. The parameters vector must have the correct length to match the total number of parameters in the layer. By default, it simply assigns the parameters vector to the Parameters field, but derived classes may override this to handle the parameters differently.

For Beginners: This method updates all the learnable values in the layer.

When setting parameters:

The input must be a vector with the correct length
The layer parses this vector to set all its internal parameters
Throws an error if the input doesn't match the expected number of parameters

This is useful for:

Loading a previously saved model
Transferring parameters from another model
Setting specific parameter values for testing

Exceptions

ArgumentException: Thrown when the parameters vector has incorrect length.

UpdateParameters(T)

Updates the parameters of the layer using the calculated gradients.

public override void UpdateParameters(T learningRate)

Parameters

learningRate T: The learning rate to use for the parameter updates.

Remarks

This abstract method must be implemented by derived classes to define how the layer's parameters are updated during training. The learning rate controls the size of the parameter updates.

For Beginners: This method updates the layer's internal values during training.

When updating parameters:

The weights, biases, or other parameters are adjusted to reduce prediction errors
The learning rate controls how big each update step is
Smaller learning rates mean slower but more stable learning
Larger learning rates mean faster but potentially unstable learning

This is how the layer "learns" from data over time, gradually improving its ability to extract useful patterns from inputs.

Table of Contents

Class MessagePassingLayer<T>

Type Parameters

Remarks

Constructors

MessagePassingLayer(int, int, int, bool, int, IActivationFunction<T>?)

Parameters

Remarks

Properties

InputFeatures

Property Value

Remarks

OutputFeatures

Property Value

Remarks

SupportsGpuExecution

Property Value

Remarks

SupportsJitCompilation

Property Value

Remarks

SupportsTraining

Property Value

Remarks

UsesSparseAggregation

Property Value

Methods

Backward(Tensor<T>)

Parameters

Returns

Remarks

BackwardGpu(IGpuTensor<T>)

Parameters

Returns

Remarks

ClearEdges()

ExportComputationGraph(List<ComputationNode<T>>)

Parameters

Returns

Remarks

Forward(Tensor<T>)

Parameters

Returns

Remarks

ForwardGpu(params IGpuTensor<T>[])

Parameters

Returns

Remarks

GetAdjacencyMatrix()

Returns

Remarks

GetParameterTensors()

Returns

GetParameters()

Returns

Remarks

ResetState()

Remarks

SetAdjacencyMatrix(Tensor<T>)

Parameters

Remarks

SetEdgeFeatures(Tensor<T>)

Parameters

Remarks

SetEdges(Tensor<int>, Tensor<int>)

Parameters

Remarks

SetParameterTensors(List<Tensor<T>>)

Parameters

SetParameters(Vector<T>)

Parameters

Remarks

Exceptions

UpdateParameters(T)

Parameters

Remarks