Class LeafFederatedDataLoader<T>

Namespace: AiDotNet.Data.Loaders

Assembly: AiDotNet.dll

Data loader that reads LEAF benchmark JSON splits and exposes both aggregated (X, Y) data and per-client partitions.

public sealed class LeafFederatedDataLoader<T> : InputOutputDataLoaderBase<T, Matrix<T>, Vector<T>>, IFederatedClientDataLoader<T, Matrix<T>, Vector<T>>, IInputOutputDataLoader<T, Matrix<T>, Vector<T>>, IDataLoader<T>, IResettable, ICountable, IBatchIterable<(Matrix<T> Features, Vector<T> Labels)>, IShuffleable

Type Parameters

T: The numeric type used for calculations, typically float or double.

Inheritance: object

DataLoaderBase<T>

InputOutputDataLoaderBase<T, Matrix<T>, Vector<T>>

LeafFederatedDataLoader<T>

Implements: IFederatedClientDataLoader<T, Matrix<T>, Vector<T>>

IInputOutputDataLoader<T, Matrix<T>, Vector<T>>

IDataLoader<T>

IResettable

ICountable

IBatchIterable<(Matrix<T> Features, Vector<T> Labels)>

IShuffleable

Inherited Members: InputOutputDataLoaderBase<T, Matrix<T>, Vector<T>>.Features

InputOutputDataLoaderBase<T, Matrix<T>, Vector<T>>.Labels

InputOutputDataLoaderBase<T, Matrix<T>, Vector<T>>.FeatureCount

InputOutputDataLoaderBase<T, Matrix<T>, Vector<T>>.OutputDimension

InputOutputDataLoaderBase<T, Matrix<T>, Vector<T>>.BatchSize

InputOutputDataLoaderBase<T, Matrix<T>, Vector<T>>.HasNext

InputOutputDataLoaderBase<T, Matrix<T>, Vector<T>>.IsShuffled

InputOutputDataLoaderBase<T, Matrix<T>, Vector<T>>.GetNextBatch()

InputOutputDataLoaderBase<T, Matrix<T>, Vector<T>>.TryGetNextBatch(out (Matrix<T> Features, Vector<T> Labels))

InputOutputDataLoaderBase<T, Matrix<T>, Vector<T>>.Shuffle(int?)

InputOutputDataLoaderBase<T, Matrix<T>, Vector<T>>.Unshuffle()

InputOutputDataLoaderBase<T, Matrix<T>, Vector<T>>.Split(double, double, int?)

InputOutputDataLoaderBase<T, Matrix<T>, Vector<T>>.GetBatches(int?, bool, bool, int?)

InputOutputDataLoaderBase<T, Matrix<T>, Vector<T>>.GetBatchesAsync(int?, bool, bool, int?, int, CancellationToken)

DataLoaderBase<T>.IsLoaded

DataLoaderBase<T>.CurrentIndex

DataLoaderBase<T>.BatchSize

DataLoaderBase<T>.BatchCount

DataLoaderBase<T>.CurrentBatchIndex

DataLoaderBase<T>.Progress

DataLoaderBase<T>.Reset()

DataLoaderBase<T>.LoadAsync(CancellationToken)

DataLoaderBase<T>.Unload()

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.ReferenceEquals(object, object)

object.ToString()

Extension Methods: ParallelBatchLoaderExtensions.WithParallelLoading<TBatch>(IBatchIterable<TBatch>, Func<int[], TBatch>, int, int?, int?)

DataPipelineExtensions.ToPipeline<TBatch>(IBatchIterable<TBatch>)

DataLoaderExtensions.CreateBatchesAsync<TBatch>(IBatchIterable<TBatch>, int?, int)

DataLoaderExtensions.CreateBatches<TBatch>(IBatchIterable<TBatch>, int?)

Remarks

LEAF is a federated learning benchmark suite where each user corresponds to one client. This loader preserves that structure through ClientData while also providing aggregated Features/Labels for compatibility with the standard training facade.

For Beginners: Use this loader when you want to run federated learning with realistic per-user splits provided by LEAF datasets.

Constructors

LeafFederatedDataLoader(string, string?, LeafFederatedDatasetLoadOptions?)

Initializes a new instance of the LeafFederatedDataLoader<T> class from LEAF JSON files.

public LeafFederatedDataLoader(string trainFilePath, string? testFilePath = null, LeafFederatedDatasetLoadOptions? options = null)

Parameters

trainFilePath string: Path to the LEAF train JSON file.
testFilePath string: Optional path to the LEAF test JSON file.
options LeafFederatedDatasetLoadOptions: Optional load options (subset, validation).

Properties

ClientData

Gets the per-client datasets used for federated learning simulation.

public IReadOnlyDictionary<int, FederatedClientDataset<Matrix<T>, Vector<T>>> ClientData { get; }

Property Value

IReadOnlyDictionary<int, FederatedClientDataset<Matrix<T>, Vector<T>>>

Remarks

Keys are stable client IDs (typically 0..N-1). Values contain each client's local features and labels.

ClientIdToUserId

Gets the mapping from internal client IDs (0..N-1) to original LEAF user IDs.

public IReadOnlyDictionary<int, string> ClientIdToUserId { get; }

Property Value

IReadOnlyDictionary<int, string>

Description

Gets a description of the dataset and its intended use.

public override string Description { get; }

Property Value

string

FeatureCount

Gets the number of features per sample.

public override int FeatureCount { get; }

Property Value

int

Name

Gets the human-readable name of this data loader.

public override string Name { get; }

Property Value

string

Remarks

Examples: "MNIST", "Cora Citation Network", "IMDB Reviews"

OutputDimension

Gets the number of output dimensions (1 for regression/binary classification, N for multi-class with N classes).

public override int OutputDimension { get; }

Property Value

int

TestSplit

Gets the loaded optional test split (one dataset per LEAF user).

public LeafFederatedSplit<Matrix<T>, Vector<T>>? TestSplit { get; }

Property Value

LeafFederatedSplit<Matrix<T>, Vector<T>>

TotalCount

Gets the total number of samples in the dataset.

public override int TotalCount { get; }

Property Value

int

TrainSplit

Gets the loaded training split (one dataset per LEAF user).

public LeafFederatedSplit<Matrix<T>, Vector<T>> TrainSplit { get; }

Property Value

LeafFederatedSplit<Matrix<T>, Vector<T>>

Methods

ExtractBatch(int[])

Extracts a batch of features and labels at the specified indices.

protected override (Matrix<T> Features, Vector<T> Labels) ExtractBatch(int[] indices)

Parameters

indices int[]: The indices of samples to extract.

Returns

(Matrix<T> Data, Vector<T> Labels): A tuple containing the features and labels for the batch.

Remarks

Derived classes must implement this to extract data based on their specific TInput and TOutput types.

LoadDataCoreAsync(CancellationToken)

Core data loading implementation to be provided by derived classes.

protected override Task LoadDataCoreAsync(CancellationToken cancellationToken)

Parameters

cancellationToken CancellationToken: Cancellation token for async operation.

Returns

Task: A task that completes when loading is finished.

Remarks

Derived classes must implement this to perform actual data loading: - Load from files, databases, or remote sources - Parse and validate data format - Store in appropriate internal structures

Split(double, double, int?)

Creates a train/validation/test split of the data.

public override (IInputOutputDataLoader<T, Matrix<T>, Vector<T>> Train, IInputOutputDataLoader<T, Matrix<T>, Vector<T>> Validation, IInputOutputDataLoader<T, Matrix<T>, Vector<T>> Test) Split(double trainRatio = 0.7, double validationRatio = 0.15, int? seed = null)

Parameters

trainRatio double: Fraction of data for training (0.0 to 1.0).
validationRatio double: Fraction of data for validation (0.0 to 1.0).
seed int?: Optional random seed for reproducible splits.

Returns

(IInputOutputDataLoader<T, Matrix<T>, Vector<T>> Train, IInputOutputDataLoader<T, Matrix<T>, Vector<T>> Validation, IInputOutputDataLoader<T, Matrix<T>, Vector<T>> Test): A tuple containing three data loaders: (train, validation, test).

Remarks

The test ratio is implicitly 1 - trainRatio - validationRatio.

For Beginners: Splitting data is crucial for evaluating your model: - **Training set**: Data the model learns from - **Validation set**: Data used to tune hyperparameters and prevent overfitting - **Test set**: Data used only once at the end to get an unbiased performance estimate

Common splits are 60/20/20 or 70/15/15 (train/validation/test).

UnloadDataCore()

Core data unloading implementation to be provided by derived classes.

protected override void UnloadDataCore()

Remarks

Derived classes should implement this to release resources: - Clear internal data structures - Release file handles or connections - Allow garbage collection of loaded data

Table of Contents

Class LeafFederatedDataLoader<T>

Type Parameters

Remarks

Constructors

LeafFederatedDataLoader(string, string?, LeafFederatedDatasetLoadOptions?)

Parameters

Properties

ClientData

Property Value

Remarks

ClientIdToUserId

Property Value

Description

Property Value

FeatureCount

Property Value

Name

Property Value

Remarks

OutputDimension

Property Value

TestSplit

Property Value

TotalCount

Property Value

TrainSplit

Property Value

Methods

ExtractBatch(int[])

Parameters

Returns

Remarks

LoadDataCoreAsync(CancellationToken)

Parameters

Returns

Remarks

Split(double, double, int?)

Parameters

Returns

Remarks

UnloadDataCore()

Remarks