Table of Contents

Class LeafFederatedDataLoader<T>

Namespace
AiDotNet.Data.Loaders
Assembly
AiDotNet.dll

Data loader that reads LEAF benchmark JSON splits and exposes both aggregated (X, Y) data and per-client partitions.

public sealed class LeafFederatedDataLoader<T> : InputOutputDataLoaderBase<T, Matrix<T>, Vector<T>>, IFederatedClientDataLoader<T, Matrix<T>, Vector<T>>, IInputOutputDataLoader<T, Matrix<T>, Vector<T>>, IDataLoader<T>, IResettable, ICountable, IBatchIterable<(Matrix<T> Features, Vector<T> Labels)>, IShuffleable

Type Parameters

T

The numeric type used for calculations, typically float or double.

Inheritance
InputOutputDataLoaderBase<T, Matrix<T>, Vector<T>>
LeafFederatedDataLoader<T>
Implements
IFederatedClientDataLoader<T, Matrix<T>, Vector<T>>
IInputOutputDataLoader<T, Matrix<T>, Vector<T>>
IBatchIterable<(Matrix<T> Features, Vector<T> Labels)>
Inherited Members
Extension Methods

Remarks

LEAF is a federated learning benchmark suite where each user corresponds to one client. This loader preserves that structure through ClientData while also providing aggregated Features/Labels for compatibility with the standard training facade.

For Beginners: Use this loader when you want to run federated learning with realistic per-user splits provided by LEAF datasets.

Constructors

LeafFederatedDataLoader(string, string?, LeafFederatedDatasetLoadOptions?)

Initializes a new instance of the LeafFederatedDataLoader<T> class from LEAF JSON files.

public LeafFederatedDataLoader(string trainFilePath, string? testFilePath = null, LeafFederatedDatasetLoadOptions? options = null)

Parameters

trainFilePath string

Path to the LEAF train JSON file.

testFilePath string

Optional path to the LEAF test JSON file.

options LeafFederatedDatasetLoadOptions

Optional load options (subset, validation).

Properties

ClientData

Gets the per-client datasets used for federated learning simulation.

public IReadOnlyDictionary<int, FederatedClientDataset<Matrix<T>, Vector<T>>> ClientData { get; }

Property Value

IReadOnlyDictionary<int, FederatedClientDataset<Matrix<T>, Vector<T>>>

Remarks

Keys are stable client IDs (typically 0..N-1). Values contain each client's local features and labels.

ClientIdToUserId

Gets the mapping from internal client IDs (0..N-1) to original LEAF user IDs.

public IReadOnlyDictionary<int, string> ClientIdToUserId { get; }

Property Value

IReadOnlyDictionary<int, string>

Description

Gets a description of the dataset and its intended use.

public override string Description { get; }

Property Value

string

FeatureCount

Gets the number of features per sample.

public override int FeatureCount { get; }

Property Value

int

Name

Gets the human-readable name of this data loader.

public override string Name { get; }

Property Value

string

Remarks

Examples: "MNIST", "Cora Citation Network", "IMDB Reviews"

OutputDimension

Gets the number of output dimensions (1 for regression/binary classification, N for multi-class with N classes).

public override int OutputDimension { get; }

Property Value

int

TestSplit

Gets the loaded optional test split (one dataset per LEAF user).

public LeafFederatedSplit<Matrix<T>, Vector<T>>? TestSplit { get; }

Property Value

LeafFederatedSplit<Matrix<T>, Vector<T>>

TotalCount

Gets the total number of samples in the dataset.

public override int TotalCount { get; }

Property Value

int

TrainSplit

Gets the loaded training split (one dataset per LEAF user).

public LeafFederatedSplit<Matrix<T>, Vector<T>> TrainSplit { get; }

Property Value

LeafFederatedSplit<Matrix<T>, Vector<T>>

Methods

ExtractBatch(int[])

Extracts a batch of features and labels at the specified indices.

protected override (Matrix<T> Features, Vector<T> Labels) ExtractBatch(int[] indices)

Parameters

indices int[]

The indices of samples to extract.

Returns

(Matrix<T> Data, Vector<T> Labels)

A tuple containing the features and labels for the batch.

Remarks

Derived classes must implement this to extract data based on their specific TInput and TOutput types.

LoadDataCoreAsync(CancellationToken)

Core data loading implementation to be provided by derived classes.

protected override Task LoadDataCoreAsync(CancellationToken cancellationToken)

Parameters

cancellationToken CancellationToken

Cancellation token for async operation.

Returns

Task

A task that completes when loading is finished.

Remarks

Derived classes must implement this to perform actual data loading: - Load from files, databases, or remote sources - Parse and validate data format - Store in appropriate internal structures

Split(double, double, int?)

Creates a train/validation/test split of the data.

public override (IInputOutputDataLoader<T, Matrix<T>, Vector<T>> Train, IInputOutputDataLoader<T, Matrix<T>, Vector<T>> Validation, IInputOutputDataLoader<T, Matrix<T>, Vector<T>> Test) Split(double trainRatio = 0.7, double validationRatio = 0.15, int? seed = null)

Parameters

trainRatio double

Fraction of data for training (0.0 to 1.0).

validationRatio double

Fraction of data for validation (0.0 to 1.0).

seed int?

Optional random seed for reproducible splits.

Returns

(IInputOutputDataLoader<T, Matrix<T>, Vector<T>> Train, IInputOutputDataLoader<T, Matrix<T>, Vector<T>> Validation, IInputOutputDataLoader<T, Matrix<T>, Vector<T>> Test)

A tuple containing three data loaders: (train, validation, test).

Remarks

The test ratio is implicitly 1 - trainRatio - validationRatio.

For Beginners: Splitting data is crucial for evaluating your model: - **Training set**: Data the model learns from - **Validation set**: Data used to tune hyperparameters and prevent overfitting - **Test set**: Data used only once at the end to get an unbiased performance estimate

Common splits are 60/20/20 or 70/15/15 (train/validation/test).

UnloadDataCore()

Core data unloading implementation to be provided by derived classes.

protected override void UnloadDataCore()

Remarks

Derived classes should implement this to release resources: - Clear internal data structures - Release file handles or connections - Allow garbage collection of loaded data