Class LeafFederatedDataLoader<T>
Data loader that reads LEAF benchmark JSON splits and exposes both aggregated (X, Y) data and per-client partitions.
public sealed class LeafFederatedDataLoader<T> : InputOutputDataLoaderBase<T, Matrix<T>, Vector<T>>, IFederatedClientDataLoader<T, Matrix<T>, Vector<T>>, IInputOutputDataLoader<T, Matrix<T>, Vector<T>>, IDataLoader<T>, IResettable, ICountable, IBatchIterable<(Matrix<T> Features, Vector<T> Labels)>, IShuffleable
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
LeafFederatedDataLoader<T>
- Implements
-
IDataLoader<T>
- Inherited Members
- Extension Methods
Remarks
LEAF is a federated learning benchmark suite where each user corresponds to one client. This loader preserves that structure through ClientData while also providing aggregated Features/Labels for compatibility with the standard training facade.
For Beginners: Use this loader when you want to run federated learning with realistic per-user splits provided by LEAF datasets.
Constructors
LeafFederatedDataLoader(string, string?, LeafFederatedDatasetLoadOptions?)
Initializes a new instance of the LeafFederatedDataLoader<T> class from LEAF JSON files.
public LeafFederatedDataLoader(string trainFilePath, string? testFilePath = null, LeafFederatedDatasetLoadOptions? options = null)
Parameters
trainFilePathstringPath to the LEAF train JSON file.
testFilePathstringOptional path to the LEAF test JSON file.
optionsLeafFederatedDatasetLoadOptionsOptional load options (subset, validation).
Properties
ClientData
Gets the per-client datasets used for federated learning simulation.
public IReadOnlyDictionary<int, FederatedClientDataset<Matrix<T>, Vector<T>>> ClientData { get; }
Property Value
- IReadOnlyDictionary<int, FederatedClientDataset<Matrix<T>, Vector<T>>>
Remarks
Keys are stable client IDs (typically 0..N-1). Values contain each client's local features and labels.
ClientIdToUserId
Gets the mapping from internal client IDs (0..N-1) to original LEAF user IDs.
public IReadOnlyDictionary<int, string> ClientIdToUserId { get; }
Property Value
Description
Gets a description of the dataset and its intended use.
public override string Description { get; }
Property Value
FeatureCount
Gets the number of features per sample.
public override int FeatureCount { get; }
Property Value
Name
Gets the human-readable name of this data loader.
public override string Name { get; }
Property Value
Remarks
Examples: "MNIST", "Cora Citation Network", "IMDB Reviews"
OutputDimension
Gets the number of output dimensions (1 for regression/binary classification, N for multi-class with N classes).
public override int OutputDimension { get; }
Property Value
TestSplit
Gets the loaded optional test split (one dataset per LEAF user).
public LeafFederatedSplit<Matrix<T>, Vector<T>>? TestSplit { get; }
Property Value
- LeafFederatedSplit<Matrix<T>, Vector<T>>
TotalCount
Gets the total number of samples in the dataset.
public override int TotalCount { get; }
Property Value
TrainSplit
Gets the loaded training split (one dataset per LEAF user).
public LeafFederatedSplit<Matrix<T>, Vector<T>> TrainSplit { get; }
Property Value
- LeafFederatedSplit<Matrix<T>, Vector<T>>
Methods
ExtractBatch(int[])
Extracts a batch of features and labels at the specified indices.
protected override (Matrix<T> Features, Vector<T> Labels) ExtractBatch(int[] indices)
Parameters
indicesint[]The indices of samples to extract.
Returns
Remarks
Derived classes must implement this to extract data based on their specific TInput and TOutput types.
LoadDataCoreAsync(CancellationToken)
Core data loading implementation to be provided by derived classes.
protected override Task LoadDataCoreAsync(CancellationToken cancellationToken)
Parameters
cancellationTokenCancellationTokenCancellation token for async operation.
Returns
- Task
A task that completes when loading is finished.
Remarks
Derived classes must implement this to perform actual data loading: - Load from files, databases, or remote sources - Parse and validate data format - Store in appropriate internal structures
Split(double, double, int?)
Creates a train/validation/test split of the data.
public override (IInputOutputDataLoader<T, Matrix<T>, Vector<T>> Train, IInputOutputDataLoader<T, Matrix<T>, Vector<T>> Validation, IInputOutputDataLoader<T, Matrix<T>, Vector<T>> Test) Split(double trainRatio = 0.7, double validationRatio = 0.15, int? seed = null)
Parameters
trainRatiodoubleFraction of data for training (0.0 to 1.0).
validationRatiodoubleFraction of data for validation (0.0 to 1.0).
seedint?Optional random seed for reproducible splits.
Returns
- (IInputOutputDataLoader<T, Matrix<T>, Vector<T>> Train, IInputOutputDataLoader<T, Matrix<T>, Vector<T>> Validation, IInputOutputDataLoader<T, Matrix<T>, Vector<T>> Test)
A tuple containing three data loaders: (train, validation, test).
Remarks
The test ratio is implicitly 1 - trainRatio - validationRatio.
For Beginners: Splitting data is crucial for evaluating your model: - **Training set**: Data the model learns from - **Validation set**: Data used to tune hyperparameters and prevent overfitting - **Test set**: Data used only once at the end to get an unbiased performance estimate
Common splits are 60/20/20 or 70/15/15 (train/validation/test).
UnloadDataCore()
Core data unloading implementation to be provided by derived classes.
protected override void UnloadDataCore()
Remarks
Derived classes should implement this to release resources: - Clear internal data structures - Release file handles or connections - Allow garbage collection of loaded data