Table of Contents

Class OGBDatasetLoader<T>

Namespace
AiDotNet.Data.Graph
Assembly
AiDotNet.dll

Loads datasets from the Open Graph Benchmark (OGB) for standardized evaluation.

public class OGBDatasetLoader<T> : GraphDataLoaderBase<T>, IGraphDataLoader<T>, IDataLoader<T>, IResettable, ICountable, IBatchIterable<GraphData<T>>

Type Parameters

T

The numeric type used for calculations, typically float or double.

Inheritance
OGBDatasetLoader<T>
Implements
Inherited Members
Extension Methods

Remarks

The Open Graph Benchmark (OGB) is a collection of realistic, large-scale graph datasets with standardized evaluation protocols for graph machine learning research.

For Beginners: OGB provides standard benchmarks for fair comparison.

What is OGB?

  • Collection of real-world graph datasets
  • Standardized train/val/test splits
  • Automated evaluation metrics
  • Enables fair comparison between different GNN methods

Why OGB matters:

  • Reproducibility: Everyone uses same data splits
  • Realism: Real-world graphs, not toy datasets
  • Scale: Large graphs that test scalability
  • Diversity: Multiple domains and tasks

OGB Dataset Categories:

1. Node Property Prediction:

  • ogbn-arxiv: Citation network (169K papers)
  • ogbn-products: Amazon product co-purchasing network (2.4M products)
  • ogbn-proteins: Protein association network (132K proteins)

2. Link Property Prediction:

  • ogbl-collab: Author collaboration network
  • ogbl-citation2: Citation network
  • ogbl-ddi: Drug-drug interaction network

3. Graph Property Prediction:

  • ogbg-molhiv: Molecular graphs for HIV activity prediction (41K molecules)
  • ogbg-molpcba: Molecular graphs for biological assays (437K molecules)
  • ogbg-ppa: Protein association graphs

Constructors

OGBDatasetLoader(string, OGBTask, int, string?, bool)

Initializes a new instance of the OGBDatasetLoader<T> class.

public OGBDatasetLoader(string datasetName, OGBDatasetLoader<T>.OGBTask taskType, int batchSize = 32, string? dataPath = null, bool autoDownload = true)

Parameters

datasetName string

OGB dataset name (e.g., "ogbn-arxiv", "ogbg-molhiv").

taskType OGBDatasetLoader<T>.OGBTask

Type of OGB task.

batchSize int

Batch size for loading graphs (graph-level tasks only).

dataPath string

Path to download/cache datasets (optional).

autoDownload bool

Whether to automatically download the dataset if not found locally.

Remarks

Common OGB datasets: - Node: ogbn-arxiv, ogbn-products, ogbn-proteins, ogbn-papers100M - Link: ogbl-collab, ogbl-ddi, ogbl-citation2, ogbl-ppa - Graph: ogbg-molhiv, ogbg-molpcba, ogbg-ppa, ogbg-code2

For Beginners: Using OGB datasets:

// Load molecular HIV dataset
var loader = new OGBDatasetLoader<double>(
    "ogbg-molhiv",
    OGBDatasetLoader<double>.OGBTask.GraphPrediction,
    batchSize: 32,
    autoDownload: true);

// Load the data
await loader.LoadAsync();

// Get batches of graphs
while (loader.HasNext)
{
    var batch = loader.GetNextBatch();
    // Train on batch
}

// Or create task directly
var task = loader.CreateGraphClassificationTask();

Properties

Description

Gets a description of the dataset and its intended use.

public override string Description { get; }

Property Value

string

Name

Gets the human-readable name of this data loader.

public override string Name { get; }

Property Value

string

Remarks

Examples: "MNIST", "Cora Citation Network", "IMDB Reviews"

NumClasses

Gets the number of classes for classification tasks.

public override int NumClasses { get; }

Property Value

int

Methods

CreateGraphClassificationTask(double, double, int?)

Creates a graph classification task for datasets with multiple graphs.

public override GraphClassificationTask<T> CreateGraphClassificationTask(double trainRatio = 0.8, double valRatio = 0.1, int? seed = null)

Parameters

trainRatio double
valRatio double
seed int?

Returns

GraphClassificationTask<T>

CreateLinkPredictionTask(double, double, int?)

Creates a link prediction task for predicting missing edges.

public override LinkPredictionTask<T> CreateLinkPredictionTask(double trainRatio = 0.85, double negativeRatio = 1, int? seed = null)

Parameters

trainRatio double
negativeRatio double
seed int?

Returns

LinkPredictionTask<T>

CreateNodeClassificationTask(double, double, int?)

Creates a node classification task with train/val/test split.

public override NodeClassificationTask<T> CreateNodeClassificationTask(double trainRatio = 0.1, double valRatio = 0.1, int? seed = null)

Parameters

trainRatio double
valRatio double
seed int?

Returns

NodeClassificationTask<T>

LoadDataCoreAsync(CancellationToken)

Core data loading implementation to be provided by derived classes.

protected override Task LoadDataCoreAsync(CancellationToken cancellationToken)

Parameters

cancellationToken CancellationToken

Cancellation token for async operation.

Returns

Task

A task that completes when loading is finished.

Remarks

Derived classes must implement this to perform actual data loading: - Load from files, databases, or remote sources - Parse and validate data format - Store in appropriate internal structures

UnloadDataCore()

Core data unloading implementation to be provided by derived classes.

protected override void UnloadDataCore()

Remarks

Derived classes should implement this to release resources: - Clear internal data structures - Release file handles or connections - Allow garbage collection of loaded data