Class CitationNetworkLoader<T>
Loads citation network datasets (Cora, CiteSeer, PubMed) for node classification.
public class CitationNetworkLoader<T> : GraphDataLoaderBase<T>, IGraphDataLoader<T>, IDataLoader<T>, IResettable, ICountable, IBatchIterable<GraphData<T>>
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
CitationNetworkLoader<T>
- Implements
-
IDataLoader<T>
- Inherited Members
- Extension Methods
Remarks
Citation networks are classic benchmarks for graph neural networks. Each dataset represents academic papers as nodes and citations as edges, with the task being to classify papers into research topics.
For Beginners: Citation networks are graphs of research papers.
Structure:
- Nodes: Research papers
- Edges: Citations (Paper A cites Paper B)
- Node Features: Bag-of-words representation of paper abstracts
- Labels: Research topic/category
Datasets:
Cora:
- 2,708 papers
- 5,429 citations
- 1,433 features (unique words)
- 7 classes (topics): Case_Based, Genetic_Algorithms, Neural_Networks, Probabilistic_Methods, Reinforcement_Learning, Rule_Learning, Theory
- Task: Classify papers by topic
CiteSeer:
- 3,312 papers
- 4,732 citations
- 3,703 features
- 6 classes: Agents, AI, DB, IR, ML, HCI
PubMed:
- 19,717 papers (about diabetes)
- 44,338 citations
- 500 features
- 3 classes: Diabetes Mellitus Type 1, Type 2, Experimental
Key Property: Homophily Papers tend to cite papers on similar topics. This makes GNNs effective:
- If neighbors are similar topics, aggregate their features
- GNN learns to propagate topic information through citation network
- Even unlabeled papers can be classified based on what they cite
Constructors
CitationNetworkLoader(CitationDataset, string?, bool)
Initializes a new instance of the CitationNetworkLoader<T> class.
public CitationNetworkLoader(CitationNetworkLoader<T>.CitationDataset dataset, string? dataPath = null, bool autoDownload = true)
Parameters
datasetCitationNetworkLoader<T>.CitationDatasetWhich citation dataset to load.
dataPathstringPath to the dataset files. If null, uses default cache directory.
autoDownloadboolWhether to automatically download the dataset if not found locally.
Remarks
The loader expects data files in the standard Planetoid format: - {dataset}.content: Tab-separated file with paper_id, word features, class_label - {dataset}.cites: Tab-separated file with cited_paper_id, citing_paper_id
For Beginners: Using this loader:
// Load Cora dataset (auto-downloads if not present)
var loader = new CitationNetworkLoader<double>(
CitationNetworkLoader<double>.CitationDataset.Cora,
autoDownload: true);
// Load the data
await loader.LoadAsync();
// Get the graph
var graph = loader.GetNextBatch();
// Access data
Console.WriteLine($"Nodes: {loader.NumNodes}");
Console.WriteLine($"Edges: {loader.NumEdges}");
Console.WriteLine($"Features per node: {loader.NumNodeFeatures}");
// Create node classification task
var task = loader.CreateNodeClassificationTask();
Properties
Description
Gets a description of the dataset and its intended use.
public override string Description { get; }
Property Value
Name
Gets the human-readable name of this data loader.
public override string Name { get; }
Property Value
Remarks
Examples: "MNIST", "Cora Citation Network", "IMDB Reviews"
NumClasses
Gets the number of classes for classification tasks.
public override int NumClasses { get; }
Property Value
Methods
CreateGraphClassificationTask(double, double, int?)
Creates a graph classification task for datasets with multiple graphs.
public override GraphClassificationTask<T> CreateGraphClassificationTask(double trainRatio = 0.8, double valRatio = 0.1, int? seed = null)
Parameters
Returns
CreateLinkPredictionTask(double, double, int?)
Creates a link prediction task for predicting missing edges.
public override LinkPredictionTask<T> CreateLinkPredictionTask(double trainRatio = 0.85, double negativeRatio = 1, int? seed = null)
Parameters
Returns
LoadDataCoreAsync(CancellationToken)
Core data loading implementation to be provided by derived classes.
protected override Task LoadDataCoreAsync(CancellationToken cancellationToken)
Parameters
cancellationTokenCancellationTokenCancellation token for async operation.
Returns
- Task
A task that completes when loading is finished.
Remarks
Derived classes must implement this to perform actual data loading: - Load from files, databases, or remote sources - Parse and validate data format - Store in appropriate internal structures
UnloadDataCore()
Core data unloading implementation to be provided by derived classes.
protected override void UnloadDataCore()
Remarks
Derived classes should implement this to release resources: - Clear internal data structures - Release file handles or connections - Allow garbage collection of loaded data