Table of Contents

Class EpisodicDataLoaderBase<T, TInput, TOutput>

Namespace
AiDotNet.Data.Loaders
Assembly
AiDotNet.dll

Provides a base implementation for episodic data loaders with common functionality for N-way K-shot meta-learning.

public abstract class EpisodicDataLoaderBase<T, TInput, TOutput> : DataLoaderBase<T>, IEpisodicDataLoader<T, TInput, TOutput>, IDataLoader<T>, IResettable, ICountable, IBatchIterable<MetaLearningTask<T, TInput, TOutput>>

Type Parameters

T

The numeric data type used for calculations (e.g., float, double).

TInput

The input data type for tasks (e.g., Matrix<T>, Tensor<T>, double[]).

TOutput

The output data type for tasks (e.g., Vector<T>, Tensor<T>, double[]).

Inheritance
EpisodicDataLoaderBase<T, TInput, TOutput>
Implements
IEpisodicDataLoader<T, TInput, TOutput>
IBatchIterable<MetaLearningTask<T, TInput, TOutput>>
Derived
Inherited Members
Extension Methods

Remarks

This abstract class implements the IEpisodicDataLoader interface and provides common functionality for episodic task sampling. It handles dataset validation, class-to-indices preprocessing, and parameter validation while allowing derived classes to focus on implementing specific sampling strategies.

For Beginners: This is the foundation that all episodic data loaders build upon.

Think of it like a template for creating meta-learning tasks:

  • It handles common tasks (validating inputs, organizing data by class, checking requirements)
  • Specific loaders just implement how they sample tasks
  • This ensures all episodic loaders work consistently and follow SOLID principles

The base class takes care of:

  • Validating that your dataset has enough classes and examples
  • Building an efficient index (class to example indices) for fast sampling
  • Storing configuration (N-way, K-shot, query shots)
  • Providing protected access to the dataset and configuration

Constructors

EpisodicDataLoaderBase(Matrix<T>, Vector<T>, int, int, int, int?)

Initializes a new instance of the EpisodicDataLoaderBase class with industry-standard defaults.

protected EpisodicDataLoaderBase(Matrix<T> datasetX, Vector<T> datasetY, int nWay = 5, int kShot = 5, int queryShots = 15, int? seed = null)

Parameters

datasetX Matrix<T>

The feature matrix where each row is an example. Shape: [num_examples, num_features].

datasetY Vector<T>

The label vector containing class labels for each example. Length: num_examples.

nWay int

The number of unique classes per task. Default is 5 (standard in meta-learning).

kShot int

The number of support examples per class. Default is 5 (balanced difficulty).

queryShots int

The number of query examples per class. Default is 15 (3x kShot).

seed int?

Optional random seed for reproducible task sampling. If null, uses a time-based seed.

Remarks

For Beginners: This constructor sets up the base infrastructure for episodic sampling.

It performs several important tasks:

  1. Validates all inputs to catch configuration errors early
  2. Builds an index mapping each class to its example indices for fast lookup
  3. Verifies the dataset has enough classes and examples per class
  4. Stores all configuration for use by derived classes

After construction, the derived class can use the protected fields to implement its specific sampling strategy.

Exceptions

ArgumentNullException

Thrown when datasetX or datasetY is null.

ArgumentException

Thrown when dimensions are invalid or dataset is too small.

Fields

ClassToIndices

Mapping from class label to list of example indices for that class.

protected readonly Dictionary<int, List<int>> ClassToIndices

Field Value

Dictionary<int, List<int>>

DatasetX

The feature matrix containing all examples.

protected readonly Matrix<T> DatasetX

Field Value

Matrix<T>

DatasetY

The label vector containing class labels for all examples.

protected readonly Vector<T> DatasetY

Field Value

Vector<T>

NumOps

Provides mathematical operations for the numeric type T.

protected static readonly INumericOperations<T> NumOps

Field Value

INumericOperations<T>

RandomInstance

Random number generator for task sampling.

protected Random RandomInstance

Field Value

Random

_availableClasses

Array of all available class labels in the dataset.

protected readonly int[] _availableClasses

Field Value

int[]

_kShot

The number of support examples per class (K in K-shot).

protected readonly int _kShot

Field Value

int

_nWay

The number of classes per task (N in N-way).

protected readonly int _nWay

Field Value

int

_queryShots

The number of query examples per class.

protected readonly int _queryShots

Field Value

int

Properties

AvailableClasses

Gets the total number of available classes in the dataset.

public int AvailableClasses { get; }

Property Value

int

BatchSize

Gets or sets the batch size for iteration.

public override int BatchSize { get; set; }

Property Value

int

Description

Gets a description of the dataset and its intended use.

public override string Description { get; }

Property Value

string

HasNext

Gets whether there are more batches available in the current iteration.

public bool HasNext { get; }

Property Value

bool

KShot

Gets the number of support examples per class (K in K-shot).

public int KShot { get; }

Property Value

int

NWay

Gets the number of classes per task (N in N-way).

public int NWay { get; }

Property Value

int

Name

Gets the human-readable name of this data loader.

public override string Name { get; }

Property Value

string

Remarks

Examples: "MNIST", "Cora Citation Network", "IMDB Reviews"

QueryShots

Gets the number of query examples per class.

public int QueryShots { get; }

Property Value

int

TotalCount

Gets the total number of samples in the dataset.

public override int TotalCount { get; }

Property Value

int

Methods

BuildMetaLearningTask(List<Vector<T>>, List<T>, List<Vector<T>>, List<T>)

Builds a MetaLearningTask from lists of examples and labels.

protected MetaLearningTask<T, TInput, TOutput> BuildMetaLearningTask(List<Vector<T>> supportExamples, List<T> supportLabels, List<Vector<T>> queryExamples, List<T> queryLabels)

Parameters

supportExamples List<Vector<T>>
supportLabels List<T>
queryExamples List<Vector<T>>
queryLabels List<T>

Returns

MetaLearningTask<T, TInput, TOutput>

GetBatches(int?, bool, bool, int?)

Iterates through all batches in the dataset using lazy evaluation.

public virtual IEnumerable<MetaLearningTask<T, TInput, TOutput>> GetBatches(int? batchSize = null, bool shuffle = true, bool dropLast = false, int? seed = null)

Parameters

batchSize int?

Optional batch size override. Uses default BatchSize if null.

shuffle bool

Whether to shuffle data before batching. Default is true.

dropLast bool

Whether to drop the last incomplete batch. Default is false.

seed int?

Optional random seed for reproducible shuffling.

Returns

IEnumerable<MetaLearningTask<T, TInput, TOutput>>

An enumerable sequence of batches using yield return for memory efficiency.

Remarks

This method provides a PyTorch-style iteration pattern using IEnumerable and yield return for memory-efficient lazy evaluation. Each call creates a fresh iteration, automatically handling reset and shuffle operations.

For Beginners: This is the recommended way to iterate through your data:

foreach (var (xBatch, yBatch) in dataLoader.GetBatches(batchSize: 32, shuffle: true))
{
    // Train on this batch
    model.TrainOnBatch(xBatch, yBatch);
}

Unlike GetNextBatch(), you don't need to call Reset() - each GetBatches() call starts fresh. The yield return pattern means batches are generated on-demand, not all loaded into memory at once.

GetBatchesAsync(int?, bool, bool, int?, int, CancellationToken)

Asynchronously iterates through all batches with prefetching support.

public virtual IAsyncEnumerable<MetaLearningTask<T, TInput, TOutput>> GetBatchesAsync(int? batchSize = null, bool shuffle = true, bool dropLast = false, int? seed = null, int prefetchCount = 2, CancellationToken cancellationToken = default)

Parameters

batchSize int?

Optional batch size override. Uses default BatchSize if null.

shuffle bool

Whether to shuffle data before batching. Default is true.

dropLast bool

Whether to drop the last incomplete batch. Default is false.

seed int?

Optional random seed for reproducible shuffling.

prefetchCount int

Number of batches to prefetch ahead. Default is 2.

cancellationToken CancellationToken

Token to cancel the iteration.

Returns

IAsyncEnumerable<MetaLearningTask<T, TInput, TOutput>>

An async enumerable sequence of batches.

Remarks

This method enables async batch iteration with configurable prefetching, similar to PyTorch's num_workers or TensorFlow's prefetch(). Batches are prepared in the background while the current batch is being processed.

For Beginners: Use this for large datasets or when batch preparation is slow:

await foreach (var (xBatch, yBatch) in dataLoader.GetBatchesAsync(prefetchCount: 2))
{
    // While training on this batch, the next 2 batches are being prepared
    await model.TrainOnBatchAsync(xBatch, yBatch);
}

Prefetching helps hide data loading latency, especially useful for:

  • Large images that need decoding
  • Data that requires preprocessing
  • Slow storage (network drives, cloud storage)

GetNextBatch()

Gets the next batch of data.

public MetaLearningTask<T, TInput, TOutput> GetNextBatch()

Returns

MetaLearningTask<T, TInput, TOutput>

The next batch of data.

Exceptions

InvalidOperationException

Thrown when no more batches are available.

GetNextTask()

Gets the next meta-learning task (support set + query set).

public MetaLearningTask<T, TInput, TOutput> GetNextTask()

Returns

MetaLearningTask<T, TInput, TOutput>

A MetaLearningTask with support and query sets.

Remarks

Each call returns a new randomly sampled task with: - N randomly selected classes from available classes - K support examples per class - QueryShots query examples per class

GetNextTaskCore()

Core task sampling logic to be implemented by derived classes.

protected abstract MetaLearningTask<T, TInput, TOutput> GetNextTaskCore()

Returns

MetaLearningTask<T, TInput, TOutput>

A MetaLearningTask with sampled support and query sets.

Remarks

For Implementers: This is where you implement your specific task sampling strategy.

You have access to:

  • DatasetX and DatasetY: The full dataset
  • ClassToIndices: Fast lookup of examples by class
  • _availableClasses: Array of all class labels
  • _nWay, _kShot, _queryShots: Task configuration
  • RandomInstance: For randomized sampling
  • NumOps: For numeric operations

Your implementation should:

  1. Select NWay classes
  2. Sample KShot + QueryShots examples per class
  3. Split into support and query sets
  4. Build and return a MetaLearningTask

GetTaskBatch(int)

Gets multiple meta-learning tasks as a batch.

public IReadOnlyList<MetaLearningTask<T, TInput, TOutput>> GetTaskBatch(int numTasks)

Parameters

numTasks int

Number of tasks to sample.

Returns

IReadOnlyList<MetaLearningTask<T, TInput, TOutput>>

A list of MetaLearningTasks.

LoadDataCoreAsync(CancellationToken)

Core data loading implementation to be provided by derived classes.

protected override Task LoadDataCoreAsync(CancellationToken cancellationToken)

Parameters

cancellationToken CancellationToken

Cancellation token for async operation.

Returns

Task

A task that completes when loading is finished.

Remarks

Derived classes must implement this to perform actual data loading: - Load from files, databases, or remote sources - Parse and validate data format - Store in appropriate internal structures

OnReset()

Called after Reset() to allow derived classes to perform additional reset operations.

protected override void OnReset()

Remarks

Override this to reset any domain-specific state. The base indices are already reset when this is called.

SetSeed(int)

Sets the random seed for reproducible task sampling.

public void SetSeed(int seed)

Parameters

seed int

Random seed value.

TryGetNextBatch(out MetaLearningTask<T, TInput, TOutput>)

Attempts to get the next batch without throwing if unavailable.

public bool TryGetNextBatch(out MetaLearningTask<T, TInput, TOutput> batch)

Parameters

batch MetaLearningTask<T, TInput, TOutput>

The batch if available, default otherwise.

Returns

bool

True if a batch was available, false if iteration is complete.

Remarks

When false is returned, batch contains the default value for TBatch. Callers should check the return value before using batch.

UnloadDataCore()

Core data unloading implementation to be provided by derived classes.

protected override void UnloadDataCore()

Remarks

Derived classes should implement this to release resources: - Clear internal data structures - Release file handles or connections - Allow garbage collection of loaded data