Class GraphAnalytics
- Namespace
- AiDotNet.RetrievalAugmentedGeneration.Graph
- Assembly
- AiDotNet.dll
Provides graph analytics algorithms for analyzing knowledge graphs.
public static class GraphAnalytics
- Inheritance
-
GraphAnalytics
- Inherited Members
Remarks
This class implements common graph algorithms used to analyze the structure and importance of nodes and edges in a knowledge graph.
For Beginners: Graph analytics help you understand your graph.
Think of a social network:
- PageRank: Who are the most influential people?
- Degree Centrality: Who has the most connections?
- Closeness Centrality: Who can reach everyone quickly?
- Betweenness Centrality: Who connects different groups?
These algorithms answer "who's important?" and "how are things connected?"
Methods
CalculateAverageClusteringCoefficient<T>(KnowledgeGraph<T>)
Calculates the average clustering coefficient for the entire graph.
public static double CalculateAverageClusteringCoefficient<T>(KnowledgeGraph<T> graph)
Parameters
graphKnowledgeGraph<T>The knowledge graph to analyze.
Returns
- double
The average clustering coefficient (0 to 1).
Type Parameters
TThe numeric type used for vector operations.
Remarks
For Beginners: This measures how "clustered" the entire graph is.
- Close to 1: Graph has many tight groups (like friend circles)
- Close to 0: Graph is sparse, few triangles (like a tree)
Compare to random graphs:
- Random graph: Low clustering (~0.01 for large graphs)
- Real social networks: High clustering (~0.3-0.6)
- Small-world networks: High clustering + short paths
This is one measure of graph structure used in network science.
CalculateBetweennessCentrality<T>(KnowledgeGraph<T>, bool)
Calculates betweenness centrality for all nodes in the graph.
public static Dictionary<string, double> CalculateBetweennessCentrality<T>(KnowledgeGraph<T> graph, bool normalized = true)
Parameters
graphKnowledgeGraph<T>The knowledge graph to analyze.
normalizedboolWhether to normalize scores (default: true).
Returns
- Dictionary<string, double>
Dictionary mapping node IDs to their betweenness centrality scores.
Type Parameters
TThe numeric type used for vector operations.
Remarks
Betweenness centrality measures how often a node appears on shortest paths between other nodes. Nodes that act as "bridges" between different parts of the graph have high betweenness centrality.
For Beginners: Betweenness centrality finds "bridge" nodes.
Think of a transportation network:
- A bridge connecting two cities has high betweenness
- Many shortest paths go through the bridge
- If you remove it, people must take longer routes
In social networks:
- People connecting different friend groups have high betweenness
- They're "brokers" or "gatekeepers" of information
Algorithm (simplified Brandes' algorithm):
- For all pairs of nodes, find shortest paths
- Count how many paths go through each node
- Higher count = Higher betweenness
High betweenness = Important connector/bridge in the network
CalculateClosenessCentrality<T>(KnowledgeGraph<T>)
Calculates closeness centrality for all nodes in the graph.
public static Dictionary<string, double> CalculateClosenessCentrality<T>(KnowledgeGraph<T> graph)
Parameters
graphKnowledgeGraph<T>The knowledge graph to analyze.
Returns
- Dictionary<string, double>
Dictionary mapping node IDs to their closeness centrality scores.
Type Parameters
TThe numeric type used for vector operations.
Remarks
Closeness centrality measures how close a node is to all other nodes in the graph. It's calculated as the inverse of the average shortest path distance to all other nodes. Nodes that can quickly reach all others have high closeness centrality.
For Beginners: Closeness centrality measures "how close" you are to everyone.
Think of an airport network:
- Hub airports (like Atlanta) can reach anywhere with few layovers → high closeness
- Remote airports need many connections → low closeness
Algorithm:
- For each node, find shortest path to every other node
- Calculate average distance
- Closeness = 1 / average_distance
Higher score = Can reach everyone more quickly
Note: If nodes are unreachable, they're excluded from the calculation.
CalculateClusteringCoefficient<T>(KnowledgeGraph<T>)
Calculates the clustering coefficient for each node.
public static Dictionary<string, double> CalculateClusteringCoefficient<T>(KnowledgeGraph<T> graph)
Parameters
graphKnowledgeGraph<T>The knowledge graph to analyze.
Returns
- Dictionary<string, double>
Dictionary mapping node IDs to their clustering coefficients (0 to 1).
Type Parameters
TThe numeric type used for vector operations.
Remarks
The clustering coefficient measures how well a node's neighbors are connected to each other. A high coefficient means the node is part of a tightly-knit cluster.
For Beginners: Clustering coefficient measures how "clique-like" connections are.
Think of your friend group:
- High clustering: Your friends all know each other (tight group)
- Low clustering: Your friends don't know each other (you're the hub)
Formula:
- Count how many of your friends are friends with each other
- Divide by maximum possible friendships between them
- Result: 0 (no friends know each other) to 1 (everyone knows everyone)
Example:
- You have 3 friends: Alice, Bob, Charlie
- Maximum connections between them: 3 (Alice-Bob, Bob-Charlie, Alice-Charlie)
- Actual connections: 2 (Alice-Bob, Bob-Charlie)
- Clustering coefficient: 2/3 = 0.67
Uses:
- Identify tight communities
- Find nodes embedded in groups vs connectors between groups
- Measure graph's overall "cliquishness"
High coefficient = Node in dense cluster Low coefficient = Node connects different groups
CalculateDegreeCentrality<T>(KnowledgeGraph<T>, bool)
Calculates degree centrality for all nodes in the graph.
public static Dictionary<string, double> CalculateDegreeCentrality<T>(KnowledgeGraph<T> graph, bool normalized = true)
Parameters
graphKnowledgeGraph<T>The knowledge graph to analyze.
normalizedboolWhether to normalize scores by the maximum possible degree (default: true).
Returns
- Dictionary<string, double>
Dictionary mapping node IDs to their degree centrality scores.
Type Parameters
TThe numeric type used for vector operations.
Remarks
Degree centrality is the simplest centrality measure. It counts the number of edges connected to each node. Nodes with more connections are considered more central.
For Beginners: Degree centrality counts connections.
In a social network:
- Person with 100 friends has higher degree centrality than person with 10 friends
- It's the simplest measure: "Who knows the most people?"
Types:
- Out-degree: How many edges go OUT (how many people do you follow?)
- In-degree: How many edges come IN (how many followers do you have?)
- Total degree: Sum of both
This implementation calculates total degree (in + out).
Normalized: Divides by (N-1) where N is total nodes, giving a score from 0 to 1.
CalculatePageRank<T>(KnowledgeGraph<T>, double, int, double)
Calculates PageRank scores for all nodes in the graph.
public static Dictionary<string, double> CalculatePageRank<T>(KnowledgeGraph<T> graph, double dampingFactor = 0.85, int maxIterations = 100, double convergenceThreshold = 0.0001)
Parameters
graphKnowledgeGraph<T>The knowledge graph to analyze.
dampingFactordoubleThe damping factor (default: 0.85). Must be between 0 and 1.
maxIterationsintMaximum number of iterations (default: 100).
convergenceThresholddoubleConvergence threshold (default: 0.0001).
Returns
- Dictionary<string, double>
Dictionary mapping node IDs to their PageRank scores.
Type Parameters
TThe numeric type used for vector operations.
Remarks
PageRank is an algorithm used by Google to rank web pages. In a knowledge graph, it identifies the most "important" or "central" nodes based on the structure of relationships. Nodes pointed to by many important nodes get higher scores.
For Beginners: PageRank finds the most important nodes.
Imagine a citation network:
- Papers cited by many other papers get high PageRank
- Papers cited by highly-cited papers get even higher PageRank
- It's like asking: "Which papers are most influential?"
The algorithm:
- Start: All nodes have equal rank
- Iterate: Each node shares its rank with nodes it points to
- Damping: 85% of rank flows through edges, 15% jumps randomly
- Repeat until scores stabilize
Higher PageRank = More important/central in the graph
Exceptions
- ArgumentNullException
Thrown when graph is null.
- ArgumentOutOfRangeException
Thrown when dampingFactor is not between 0 and 1.
DetectCommunitiesLabelPropagation<T>(KnowledgeGraph<T>, int)
Detects communities using Label Propagation algorithm.
public static Dictionary<string, int> DetectCommunitiesLabelPropagation<T>(KnowledgeGraph<T> graph, int maxIterations = 100)
Parameters
graphKnowledgeGraph<T>The knowledge graph to analyze.
maxIterationsintMaximum number of iterations (default: 100).
Returns
- Dictionary<string, int>
Dictionary mapping node IDs to their community labels.
Type Parameters
TThe numeric type used for vector operations.
Remarks
Label Propagation is a fast community detection algorithm. Each node starts with a unique label, then iteratively adopts the most common label among its neighbors. Nodes in the same community will converge to the same label.
For Beginners: Label Propagation finds groups of nodes that cluster together.
Imagine a party where people wear colored hats:
- Everyone starts with a random color
- Every minute, you look at your friends' hats
- You change to the most popular color among your friends
- After a while, friend groups wear the same color
In graphs:
- Start: Each node has unique label
- Iterate: Each node adopts most common neighbor label
- Converge: Nodes in same community have same label
Why it works:
- Densely connected nodes influence each other
- They converge to the same label
- Weakly connected nodes drift apart
Result: Community labels (numbers) for each node. Nodes with the same label are in the same community.
Fast: O(k * E) where k = iterations, E = edges Great for: Large graphs, quick community detection
FindConnectedComponents<T>(KnowledgeGraph<T>)
Finds all connected components in the graph.
public static List<HashSet<string>> FindConnectedComponents<T>(KnowledgeGraph<T> graph)
Parameters
graphKnowledgeGraph<T>The knowledge graph to analyze.
Returns
Type Parameters
TThe numeric type used for vector operations.
Remarks
A connected component is a maximal subgraph where every node can reach every other node. This helps identify isolated clusters or communities in the graph.
For Beginners: Connected components find separate "islands" in your graph.
Think of a social network:
- Component 1: Alice's friend group (all connected to each other)
- Component 2: Bob's friend group (completely separate from Alice's)
- Component 3: Isolated person Charlie (no connections)
Uses:
- Find isolated communities
- Detect fragmented knowledge bases
- Identify separate discussion topics
- Check if graph is fully connected
Algorithm (Union-Find/DFS):
- Start with first unvisited node
- Find all nodes reachable from it (BFS/DFS)
- That's one component
- Repeat for remaining unvisited nodes
Result: List of components, each containing node IDs in that island.
GetTopKNodes(Dictionary<string, double>, int)
Identifies the top-k most central nodes based on a centrality measure.
public static List<(string NodeId, double Score)> GetTopKNodes(Dictionary<string, double> centrality, int k)
Parameters
centralityDictionary<string, double>Dictionary of node IDs to centrality scores.
kintNumber of top nodes to return.
Returns
- List<(string Label, double Confidence)>
List of (nodeId, score) tuples for the top-k nodes, ordered by descending score.
Remarks
For Beginners: This finds the "most important" nodes.
After running PageRank or centrality calculations, use this to get:
- Top 10 most influential people (PageRank)
- Top 5 most connected nodes (Degree)
- Top 3 bridge nodes (Betweenness)
Example:
var pageRank = GraphAnalytics.CalculatePageRank(graph);
var top10 = GraphAnalytics.GetTopKNodes(pageRank, 10);
// Returns the 10 most important nodes with their scores