Table of Contents

Class GraphAnalytics

Namespace
AiDotNet.RetrievalAugmentedGeneration.Graph
Assembly
AiDotNet.dll

Provides graph analytics algorithms for analyzing knowledge graphs.

public static class GraphAnalytics
Inheritance
GraphAnalytics
Inherited Members

Remarks

This class implements common graph algorithms used to analyze the structure and importance of nodes and edges in a knowledge graph.

For Beginners: Graph analytics help you understand your graph.

Think of a social network:

  • PageRank: Who are the most influential people?
  • Degree Centrality: Who has the most connections?
  • Closeness Centrality: Who can reach everyone quickly?
  • Betweenness Centrality: Who connects different groups?

These algorithms answer "who's important?" and "how are things connected?"

Methods

CalculateAverageClusteringCoefficient<T>(KnowledgeGraph<T>)

Calculates the average clustering coefficient for the entire graph.

public static double CalculateAverageClusteringCoefficient<T>(KnowledgeGraph<T> graph)

Parameters

graph KnowledgeGraph<T>

The knowledge graph to analyze.

Returns

double

The average clustering coefficient (0 to 1).

Type Parameters

T

The numeric type used for vector operations.

Remarks

For Beginners: This measures how "clustered" the entire graph is.

  • Close to 1: Graph has many tight groups (like friend circles)
  • Close to 0: Graph is sparse, few triangles (like a tree)

Compare to random graphs:

  • Random graph: Low clustering (~0.01 for large graphs)
  • Real social networks: High clustering (~0.3-0.6)
  • Small-world networks: High clustering + short paths

This is one measure of graph structure used in network science.

CalculateBetweennessCentrality<T>(KnowledgeGraph<T>, bool)

Calculates betweenness centrality for all nodes in the graph.

public static Dictionary<string, double> CalculateBetweennessCentrality<T>(KnowledgeGraph<T> graph, bool normalized = true)

Parameters

graph KnowledgeGraph<T>

The knowledge graph to analyze.

normalized bool

Whether to normalize scores (default: true).

Returns

Dictionary<string, double>

Dictionary mapping node IDs to their betweenness centrality scores.

Type Parameters

T

The numeric type used for vector operations.

Remarks

Betweenness centrality measures how often a node appears on shortest paths between other nodes. Nodes that act as "bridges" between different parts of the graph have high betweenness centrality.

For Beginners: Betweenness centrality finds "bridge" nodes.

Think of a transportation network:

  • A bridge connecting two cities has high betweenness
  • Many shortest paths go through the bridge
  • If you remove it, people must take longer routes

In social networks:

  • People connecting different friend groups have high betweenness
  • They're "brokers" or "gatekeepers" of information

Algorithm (simplified Brandes' algorithm):

  1. For all pairs of nodes, find shortest paths
  2. Count how many paths go through each node
  3. Higher count = Higher betweenness

High betweenness = Important connector/bridge in the network

CalculateClosenessCentrality<T>(KnowledgeGraph<T>)

Calculates closeness centrality for all nodes in the graph.

public static Dictionary<string, double> CalculateClosenessCentrality<T>(KnowledgeGraph<T> graph)

Parameters

graph KnowledgeGraph<T>

The knowledge graph to analyze.

Returns

Dictionary<string, double>

Dictionary mapping node IDs to their closeness centrality scores.

Type Parameters

T

The numeric type used for vector operations.

Remarks

Closeness centrality measures how close a node is to all other nodes in the graph. It's calculated as the inverse of the average shortest path distance to all other nodes. Nodes that can quickly reach all others have high closeness centrality.

For Beginners: Closeness centrality measures "how close" you are to everyone.

Think of an airport network:

  • Hub airports (like Atlanta) can reach anywhere with few layovers → high closeness
  • Remote airports need many connections → low closeness

Algorithm:

  1. For each node, find shortest path to every other node
  2. Calculate average distance
  3. Closeness = 1 / average_distance

Higher score = Can reach everyone more quickly

Note: If nodes are unreachable, they're excluded from the calculation.

CalculateClusteringCoefficient<T>(KnowledgeGraph<T>)

Calculates the clustering coefficient for each node.

public static Dictionary<string, double> CalculateClusteringCoefficient<T>(KnowledgeGraph<T> graph)

Parameters

graph KnowledgeGraph<T>

The knowledge graph to analyze.

Returns

Dictionary<string, double>

Dictionary mapping node IDs to their clustering coefficients (0 to 1).

Type Parameters

T

The numeric type used for vector operations.

Remarks

The clustering coefficient measures how well a node's neighbors are connected to each other. A high coefficient means the node is part of a tightly-knit cluster.

For Beginners: Clustering coefficient measures how "clique-like" connections are.

Think of your friend group:

  • High clustering: Your friends all know each other (tight group)
  • Low clustering: Your friends don't know each other (you're the hub)

Formula:

  • Count how many of your friends are friends with each other
  • Divide by maximum possible friendships between them
  • Result: 0 (no friends know each other) to 1 (everyone knows everyone)

Example:

  • You have 3 friends: Alice, Bob, Charlie
  • Maximum connections between them: 3 (Alice-Bob, Bob-Charlie, Alice-Charlie)
  • Actual connections: 2 (Alice-Bob, Bob-Charlie)
  • Clustering coefficient: 2/3 = 0.67

Uses:

  • Identify tight communities
  • Find nodes embedded in groups vs connectors between groups
  • Measure graph's overall "cliquishness"

High coefficient = Node in dense cluster Low coefficient = Node connects different groups

CalculateDegreeCentrality<T>(KnowledgeGraph<T>, bool)

Calculates degree centrality for all nodes in the graph.

public static Dictionary<string, double> CalculateDegreeCentrality<T>(KnowledgeGraph<T> graph, bool normalized = true)

Parameters

graph KnowledgeGraph<T>

The knowledge graph to analyze.

normalized bool

Whether to normalize scores by the maximum possible degree (default: true).

Returns

Dictionary<string, double>

Dictionary mapping node IDs to their degree centrality scores.

Type Parameters

T

The numeric type used for vector operations.

Remarks

Degree centrality is the simplest centrality measure. It counts the number of edges connected to each node. Nodes with more connections are considered more central.

For Beginners: Degree centrality counts connections.

In a social network:

  • Person with 100 friends has higher degree centrality than person with 10 friends
  • It's the simplest measure: "Who knows the most people?"

Types:

  • Out-degree: How many edges go OUT (how many people do you follow?)
  • In-degree: How many edges come IN (how many followers do you have?)
  • Total degree: Sum of both

This implementation calculates total degree (in + out).

Normalized: Divides by (N-1) where N is total nodes, giving a score from 0 to 1.

CalculatePageRank<T>(KnowledgeGraph<T>, double, int, double)

Calculates PageRank scores for all nodes in the graph.

public static Dictionary<string, double> CalculatePageRank<T>(KnowledgeGraph<T> graph, double dampingFactor = 0.85, int maxIterations = 100, double convergenceThreshold = 0.0001)

Parameters

graph KnowledgeGraph<T>

The knowledge graph to analyze.

dampingFactor double

The damping factor (default: 0.85). Must be between 0 and 1.

maxIterations int

Maximum number of iterations (default: 100).

convergenceThreshold double

Convergence threshold (default: 0.0001).

Returns

Dictionary<string, double>

Dictionary mapping node IDs to their PageRank scores.

Type Parameters

T

The numeric type used for vector operations.

Remarks

PageRank is an algorithm used by Google to rank web pages. In a knowledge graph, it identifies the most "important" or "central" nodes based on the structure of relationships. Nodes pointed to by many important nodes get higher scores.

For Beginners: PageRank finds the most important nodes.

Imagine a citation network:

  • Papers cited by many other papers get high PageRank
  • Papers cited by highly-cited papers get even higher PageRank
  • It's like asking: "Which papers are most influential?"

The algorithm:

  1. Start: All nodes have equal rank
  2. Iterate: Each node shares its rank with nodes it points to
  3. Damping: 85% of rank flows through edges, 15% jumps randomly
  4. Repeat until scores stabilize

Higher PageRank = More important/central in the graph

Exceptions

ArgumentNullException

Thrown when graph is null.

ArgumentOutOfRangeException

Thrown when dampingFactor is not between 0 and 1.

DetectCommunitiesLabelPropagation<T>(KnowledgeGraph<T>, int)

Detects communities using Label Propagation algorithm.

public static Dictionary<string, int> DetectCommunitiesLabelPropagation<T>(KnowledgeGraph<T> graph, int maxIterations = 100)

Parameters

graph KnowledgeGraph<T>

The knowledge graph to analyze.

maxIterations int

Maximum number of iterations (default: 100).

Returns

Dictionary<string, int>

Dictionary mapping node IDs to their community labels.

Type Parameters

T

The numeric type used for vector operations.

Remarks

Label Propagation is a fast community detection algorithm. Each node starts with a unique label, then iteratively adopts the most common label among its neighbors. Nodes in the same community will converge to the same label.

For Beginners: Label Propagation finds groups of nodes that cluster together.

Imagine a party where people wear colored hats:

  1. Everyone starts with a random color
  2. Every minute, you look at your friends' hats
  3. You change to the most popular color among your friends
  4. After a while, friend groups wear the same color

In graphs:

  • Start: Each node has unique label
  • Iterate: Each node adopts most common neighbor label
  • Converge: Nodes in same community have same label

Why it works:

  • Densely connected nodes influence each other
  • They converge to the same label
  • Weakly connected nodes drift apart

Result: Community labels (numbers) for each node. Nodes with the same label are in the same community.

Fast: O(k * E) where k = iterations, E = edges Great for: Large graphs, quick community detection

FindConnectedComponents<T>(KnowledgeGraph<T>)

Finds all connected components in the graph.

public static List<HashSet<string>> FindConnectedComponents<T>(KnowledgeGraph<T> graph)

Parameters

graph KnowledgeGraph<T>

The knowledge graph to analyze.

Returns

List<HashSet<string>>

List of connected components, each containing a set of node IDs.

Type Parameters

T

The numeric type used for vector operations.

Remarks

A connected component is a maximal subgraph where every node can reach every other node. This helps identify isolated clusters or communities in the graph.

For Beginners: Connected components find separate "islands" in your graph.

Think of a social network:

  • Component 1: Alice's friend group (all connected to each other)
  • Component 2: Bob's friend group (completely separate from Alice's)
  • Component 3: Isolated person Charlie (no connections)

Uses:

  • Find isolated communities
  • Detect fragmented knowledge bases
  • Identify separate discussion topics
  • Check if graph is fully connected

Algorithm (Union-Find/DFS):

  1. Start with first unvisited node
  2. Find all nodes reachable from it (BFS/DFS)
  3. That's one component
  4. Repeat for remaining unvisited nodes

Result: List of components, each containing node IDs in that island.

GetTopKNodes(Dictionary<string, double>, int)

Identifies the top-k most central nodes based on a centrality measure.

public static List<(string NodeId, double Score)> GetTopKNodes(Dictionary<string, double> centrality, int k)

Parameters

centrality Dictionary<string, double>

Dictionary of node IDs to centrality scores.

k int

Number of top nodes to return.

Returns

List<(string Label, double Confidence)>

List of (nodeId, score) tuples for the top-k nodes, ordered by descending score.

Remarks

For Beginners: This finds the "most important" nodes.

After running PageRank or centrality calculations, use this to get:

  • Top 10 most influential people (PageRank)
  • Top 5 most connected nodes (Degree)
  • Top 3 bridge nodes (Betweenness)

Example:

var pageRank = GraphAnalytics.CalculatePageRank(graph);
var top10 = GraphAnalytics.GetTopKNodes(pageRank, 10);
// Returns the 10 most important nodes with their scores