Class LeafRedditFederatedDatasetLoader
- Namespace
- AiDotNet.FederatedLearning.Benchmarks.Leaf
- Assembly
- AiDotNet.dll
Loads the LEAF Reddit benchmark JSON files into per-client token-sequence datasets.
public sealed class LeafRedditFederatedDatasetLoader
- Inheritance
-
LeafRedditFederatedDatasetLoader
- Inherited Members
Remarks
The LEAF Reddit preprocessing pipeline stores each sample as a list of token chunks (x) and a metadata
object (y) containing target_tokens (shifted next-token targets) and optional count_tokens.
This loader converts each sample into a single fixed-length token sequence paired with a single next-token label
(v1: last non-pad target token).
For Beginners: Reddit is huge. This loader supports loading a subset of users and sampling per user so you can run CI-friendly benchmark checks.
Methods
LoadDatasetFromFiles(string, string?, LeafFederatedDatasetLoadOptions?)
Loads a LEAF Reddit train dataset and optional test dataset from files.
public LeafFederatedDataset<string[][], string[]> LoadDatasetFromFiles(string trainFilePath, string? testFilePath = null, LeafFederatedDatasetLoadOptions? options = null)
Parameters
trainFilePathstringtestFilePathstringoptionsLeafFederatedDatasetLoadOptions
Returns
- LeafFederatedDataset<string[][], string[]>
LoadSplitFromFile(string, LeafFederatedDatasetLoadOptions?)
Loads a LEAF Reddit split (train/test) from a JSON file.
public LeafFederatedSplit<string[][], string[]> LoadSplitFromFile(string filePath, LeafFederatedDatasetLoadOptions? options = null)
Parameters
filePathstringoptionsLeafFederatedDatasetLoadOptions
Returns
- LeafFederatedSplit<string[][], string[]>
LoadSplitFromJson(string, LeafFederatedDatasetLoadOptions?)
Loads a LEAF Reddit split (train/test) from a JSON string.
public LeafFederatedSplit<string[][], string[]> LoadSplitFromJson(string json, LeafFederatedDatasetLoadOptions? options = null)
Parameters
jsonstringoptionsLeafFederatedDatasetLoadOptions
Returns
- LeafFederatedSplit<string[][], string[]>