Table of Contents

Class CitationNetworkLoader<T>

Namespace
AiDotNet.Data.Graph
Assembly
AiDotNet.dll

Loads citation network datasets (Cora, CiteSeer, PubMed) for node classification.

public class CitationNetworkLoader<T> : GraphDataLoaderBase<T>, IGraphDataLoader<T>, IDataLoader<T>, IResettable, ICountable, IBatchIterable<GraphData<T>>

Type Parameters

T

The numeric type used for calculations, typically float or double.

Inheritance
CitationNetworkLoader<T>
Implements
Inherited Members
Extension Methods

Remarks

Citation networks are classic benchmarks for graph neural networks. Each dataset represents academic papers as nodes and citations as edges, with the task being to classify papers into research topics.

For Beginners: Citation networks are graphs of research papers.

Structure:

  • Nodes: Research papers
  • Edges: Citations (Paper A cites Paper B)
  • Node Features: Bag-of-words representation of paper abstracts
  • Labels: Research topic/category

Datasets:

Cora:

  • 2,708 papers
  • 5,429 citations
  • 1,433 features (unique words)
  • 7 classes (topics): Case_Based, Genetic_Algorithms, Neural_Networks, Probabilistic_Methods, Reinforcement_Learning, Rule_Learning, Theory
  • Task: Classify papers by topic

CiteSeer:

  • 3,312 papers
  • 4,732 citations
  • 3,703 features
  • 6 classes: Agents, AI, DB, IR, ML, HCI

PubMed:

  • 19,717 papers (about diabetes)
  • 44,338 citations
  • 500 features
  • 3 classes: Diabetes Mellitus Type 1, Type 2, Experimental

Key Property: Homophily Papers tend to cite papers on similar topics. This makes GNNs effective:

  • If neighbors are similar topics, aggregate their features
  • GNN learns to propagate topic information through citation network
  • Even unlabeled papers can be classified based on what they cite

Constructors

CitationNetworkLoader(CitationDataset, string?, bool)

Initializes a new instance of the CitationNetworkLoader<T> class.

public CitationNetworkLoader(CitationNetworkLoader<T>.CitationDataset dataset, string? dataPath = null, bool autoDownload = true)

Parameters

dataset CitationNetworkLoader<T>.CitationDataset

Which citation dataset to load.

dataPath string

Path to the dataset files. If null, uses default cache directory.

autoDownload bool

Whether to automatically download the dataset if not found locally.

Remarks

The loader expects data files in the standard Planetoid format: - {dataset}.content: Tab-separated file with paper_id, word features, class_label - {dataset}.cites: Tab-separated file with cited_paper_id, citing_paper_id

For Beginners: Using this loader:

// Load Cora dataset (auto-downloads if not present)
var loader = new CitationNetworkLoader<double>(
    CitationNetworkLoader<double>.CitationDataset.Cora,
    autoDownload: true);

// Load the data
await loader.LoadAsync();

// Get the graph
var graph = loader.GetNextBatch();

// Access data
Console.WriteLine($"Nodes: {loader.NumNodes}");
Console.WriteLine($"Edges: {loader.NumEdges}");
Console.WriteLine($"Features per node: {loader.NumNodeFeatures}");

// Create node classification task
var task = loader.CreateNodeClassificationTask();

Properties

Description

Gets a description of the dataset and its intended use.

public override string Description { get; }

Property Value

string

Name

Gets the human-readable name of this data loader.

public override string Name { get; }

Property Value

string

Remarks

Examples: "MNIST", "Cora Citation Network", "IMDB Reviews"

NumClasses

Gets the number of classes for classification tasks.

public override int NumClasses { get; }

Property Value

int

Methods

CreateGraphClassificationTask(double, double, int?)

Creates a graph classification task for datasets with multiple graphs.

public override GraphClassificationTask<T> CreateGraphClassificationTask(double trainRatio = 0.8, double valRatio = 0.1, int? seed = null)

Parameters

trainRatio double
valRatio double
seed int?

Returns

GraphClassificationTask<T>

CreateLinkPredictionTask(double, double, int?)

Creates a link prediction task for predicting missing edges.

public override LinkPredictionTask<T> CreateLinkPredictionTask(double trainRatio = 0.85, double negativeRatio = 1, int? seed = null)

Parameters

trainRatio double
negativeRatio double
seed int?

Returns

LinkPredictionTask<T>

LoadDataCoreAsync(CancellationToken)

Core data loading implementation to be provided by derived classes.

protected override Task LoadDataCoreAsync(CancellationToken cancellationToken)

Parameters

cancellationToken CancellationToken

Cancellation token for async operation.

Returns

Task

A task that completes when loading is finished.

Remarks

Derived classes must implement this to perform actual data loading: - Load from files, databases, or remote sources - Parse and validate data format - Store in appropriate internal structures

UnloadDataCore()

Core data unloading implementation to be provided by derived classes.

protected override void UnloadDataCore()

Remarks

Derived classes should implement this to release resources: - Clear internal data structures - Release file handles or connections - Allow garbage collection of loaded data