Class OGBDatasetLoader<T>
Loads datasets from the Open Graph Benchmark (OGB) for standardized evaluation.
public class OGBDatasetLoader<T> : GraphDataLoaderBase<T>, IGraphDataLoader<T>, IDataLoader<T>, IResettable, ICountable, IBatchIterable<GraphData<T>>
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
OGBDatasetLoader<T>
- Implements
-
IDataLoader<T>
- Inherited Members
- Extension Methods
Remarks
The Open Graph Benchmark (OGB) is a collection of realistic, large-scale graph datasets with standardized evaluation protocols for graph machine learning research.
For Beginners: OGB provides standard benchmarks for fair comparison.
What is OGB?
- Collection of real-world graph datasets
- Standardized train/val/test splits
- Automated evaluation metrics
- Enables fair comparison between different GNN methods
Why OGB matters:
- Reproducibility: Everyone uses same data splits
- Realism: Real-world graphs, not toy datasets
- Scale: Large graphs that test scalability
- Diversity: Multiple domains and tasks
OGB Dataset Categories:
1. Node Property Prediction:
- ogbn-arxiv: Citation network (169K papers)
- ogbn-products: Amazon product co-purchasing network (2.4M products)
- ogbn-proteins: Protein association network (132K proteins)
2. Link Property Prediction:
- ogbl-collab: Author collaboration network
- ogbl-citation2: Citation network
- ogbl-ddi: Drug-drug interaction network
3. Graph Property Prediction:
- ogbg-molhiv: Molecular graphs for HIV activity prediction (41K molecules)
- ogbg-molpcba: Molecular graphs for biological assays (437K molecules)
- ogbg-ppa: Protein association graphs
Constructors
OGBDatasetLoader(string, OGBTask, int, string?, bool)
Initializes a new instance of the OGBDatasetLoader<T> class.
public OGBDatasetLoader(string datasetName, OGBDatasetLoader<T>.OGBTask taskType, int batchSize = 32, string? dataPath = null, bool autoDownload = true)
Parameters
datasetNamestringOGB dataset name (e.g., "ogbn-arxiv", "ogbg-molhiv").
taskTypeOGBDatasetLoader<T>.OGBTaskType of OGB task.
batchSizeintBatch size for loading graphs (graph-level tasks only).
dataPathstringPath to download/cache datasets (optional).
autoDownloadboolWhether to automatically download the dataset if not found locally.
Remarks
Common OGB datasets: - Node: ogbn-arxiv, ogbn-products, ogbn-proteins, ogbn-papers100M - Link: ogbl-collab, ogbl-ddi, ogbl-citation2, ogbl-ppa - Graph: ogbg-molhiv, ogbg-molpcba, ogbg-ppa, ogbg-code2
For Beginners: Using OGB datasets:
// Load molecular HIV dataset
var loader = new OGBDatasetLoader<double>(
"ogbg-molhiv",
OGBDatasetLoader<double>.OGBTask.GraphPrediction,
batchSize: 32,
autoDownload: true);
// Load the data
await loader.LoadAsync();
// Get batches of graphs
while (loader.HasNext)
{
var batch = loader.GetNextBatch();
// Train on batch
}
// Or create task directly
var task = loader.CreateGraphClassificationTask();
Properties
Description
Gets a description of the dataset and its intended use.
public override string Description { get; }
Property Value
Name
Gets the human-readable name of this data loader.
public override string Name { get; }
Property Value
Remarks
Examples: "MNIST", "Cora Citation Network", "IMDB Reviews"
NumClasses
Gets the number of classes for classification tasks.
public override int NumClasses { get; }
Property Value
Methods
CreateGraphClassificationTask(double, double, int?)
Creates a graph classification task for datasets with multiple graphs.
public override GraphClassificationTask<T> CreateGraphClassificationTask(double trainRatio = 0.8, double valRatio = 0.1, int? seed = null)
Parameters
Returns
CreateLinkPredictionTask(double, double, int?)
Creates a link prediction task for predicting missing edges.
public override LinkPredictionTask<T> CreateLinkPredictionTask(double trainRatio = 0.85, double negativeRatio = 1, int? seed = null)
Parameters
Returns
CreateNodeClassificationTask(double, double, int?)
Creates a node classification task with train/val/test split.
public override NodeClassificationTask<T> CreateNodeClassificationTask(double trainRatio = 0.1, double valRatio = 0.1, int? seed = null)
Parameters
Returns
LoadDataCoreAsync(CancellationToken)
Core data loading implementation to be provided by derived classes.
protected override Task LoadDataCoreAsync(CancellationToken cancellationToken)
Parameters
cancellationTokenCancellationTokenCancellation token for async operation.
Returns
- Task
A task that completes when loading is finished.
Remarks
Derived classes must implement this to perform actual data loading: - Load from files, databases, or remote sources - Parse and validate data format - Store in appropriate internal structures
UnloadDataCore()
Core data unloading implementation to be provided by derived classes.
protected override void UnloadDataCore()
Remarks
Derived classes should implement this to release resources: - Clear internal data structures - Release file handles or connections - Allow garbage collection of loaded data