Class EpisodicDataLoaderBase<T, TInput, TOutput>
Provides a base implementation for episodic data loaders with common functionality for N-way K-shot meta-learning.
public abstract class EpisodicDataLoaderBase<T, TInput, TOutput> : DataLoaderBase<T>, IEpisodicDataLoader<T, TInput, TOutput>, IDataLoader<T>, IResettable, ICountable, IBatchIterable<MetaLearningTask<T, TInput, TOutput>>
Type Parameters
TThe numeric data type used for calculations (e.g., float, double).
TInputThe input data type for tasks (e.g., Matrix<T>, Tensor<T>, double[]).
TOutputThe output data type for tasks (e.g., Vector<T>, Tensor<T>, double[]).
- Inheritance
-
EpisodicDataLoaderBase<T, TInput, TOutput>
- Implements
-
IEpisodicDataLoader<T, TInput, TOutput>IDataLoader<T>IBatchIterable<MetaLearningTask<T, TInput, TOutput>>
- Derived
- Inherited Members
- Extension Methods
Remarks
This abstract class implements the IEpisodicDataLoader interface and provides common functionality for episodic task sampling. It handles dataset validation, class-to-indices preprocessing, and parameter validation while allowing derived classes to focus on implementing specific sampling strategies.
For Beginners: This is the foundation that all episodic data loaders build upon.
Think of it like a template for creating meta-learning tasks:
- It handles common tasks (validating inputs, organizing data by class, checking requirements)
- Specific loaders just implement how they sample tasks
- This ensures all episodic loaders work consistently and follow SOLID principles
The base class takes care of:
- Validating that your dataset has enough classes and examples
- Building an efficient index (class to example indices) for fast sampling
- Storing configuration (N-way, K-shot, query shots)
- Providing protected access to the dataset and configuration
Constructors
EpisodicDataLoaderBase(Matrix<T>, Vector<T>, int, int, int, int?)
Initializes a new instance of the EpisodicDataLoaderBase class with industry-standard defaults.
protected EpisodicDataLoaderBase(Matrix<T> datasetX, Vector<T> datasetY, int nWay = 5, int kShot = 5, int queryShots = 15, int? seed = null)
Parameters
datasetXMatrix<T>The feature matrix where each row is an example. Shape: [num_examples, num_features].
datasetYVector<T>The label vector containing class labels for each example. Length: num_examples.
nWayintThe number of unique classes per task. Default is 5 (standard in meta-learning).
kShotintThe number of support examples per class. Default is 5 (balanced difficulty).
queryShotsintThe number of query examples per class. Default is 15 (3x kShot).
seedint?Optional random seed for reproducible task sampling. If null, uses a time-based seed.
Remarks
For Beginners: This constructor sets up the base infrastructure for episodic sampling.
It performs several important tasks:
- Validates all inputs to catch configuration errors early
- Builds an index mapping each class to its example indices for fast lookup
- Verifies the dataset has enough classes and examples per class
- Stores all configuration for use by derived classes
After construction, the derived class can use the protected fields to implement its specific sampling strategy.
Exceptions
- ArgumentNullException
Thrown when datasetX or datasetY is null.
- ArgumentException
Thrown when dimensions are invalid or dataset is too small.
Fields
ClassToIndices
Mapping from class label to list of example indices for that class.
protected readonly Dictionary<int, List<int>> ClassToIndices
Field Value
- Dictionary<int, List<int>>
DatasetX
The feature matrix containing all examples.
protected readonly Matrix<T> DatasetX
Field Value
- Matrix<T>
DatasetY
The label vector containing class labels for all examples.
protected readonly Vector<T> DatasetY
Field Value
- Vector<T>
NumOps
Provides mathematical operations for the numeric type T.
protected static readonly INumericOperations<T> NumOps
Field Value
- INumericOperations<T>
RandomInstance
Random number generator for task sampling.
protected Random RandomInstance
Field Value
_availableClasses
Array of all available class labels in the dataset.
protected readonly int[] _availableClasses
Field Value
- int[]
_kShot
The number of support examples per class (K in K-shot).
protected readonly int _kShot
Field Value
_nWay
The number of classes per task (N in N-way).
protected readonly int _nWay
Field Value
_queryShots
The number of query examples per class.
protected readonly int _queryShots
Field Value
Properties
AvailableClasses
Gets the total number of available classes in the dataset.
public int AvailableClasses { get; }
Property Value
BatchSize
Gets or sets the batch size for iteration.
public override int BatchSize { get; set; }
Property Value
Description
Gets a description of the dataset and its intended use.
public override string Description { get; }
Property Value
HasNext
Gets whether there are more batches available in the current iteration.
public bool HasNext { get; }
Property Value
KShot
Gets the number of support examples per class (K in K-shot).
public int KShot { get; }
Property Value
NWay
Gets the number of classes per task (N in N-way).
public int NWay { get; }
Property Value
Name
Gets the human-readable name of this data loader.
public override string Name { get; }
Property Value
Remarks
Examples: "MNIST", "Cora Citation Network", "IMDB Reviews"
QueryShots
Gets the number of query examples per class.
public int QueryShots { get; }
Property Value
TotalCount
Gets the total number of samples in the dataset.
public override int TotalCount { get; }
Property Value
Methods
BuildMetaLearningTask(List<Vector<T>>, List<T>, List<Vector<T>>, List<T>)
Builds a MetaLearningTask from lists of examples and labels.
protected MetaLearningTask<T, TInput, TOutput> BuildMetaLearningTask(List<Vector<T>> supportExamples, List<T> supportLabels, List<Vector<T>> queryExamples, List<T> queryLabels)
Parameters
supportExamplesList<Vector<T>>supportLabelsList<T>queryExamplesList<Vector<T>>queryLabelsList<T>
Returns
- MetaLearningTask<T, TInput, TOutput>
GetBatches(int?, bool, bool, int?)
Iterates through all batches in the dataset using lazy evaluation.
public virtual IEnumerable<MetaLearningTask<T, TInput, TOutput>> GetBatches(int? batchSize = null, bool shuffle = true, bool dropLast = false, int? seed = null)
Parameters
batchSizeint?Optional batch size override. Uses default BatchSize if null.
shuffleboolWhether to shuffle data before batching. Default is true.
dropLastboolWhether to drop the last incomplete batch. Default is false.
seedint?Optional random seed for reproducible shuffling.
Returns
- IEnumerable<MetaLearningTask<T, TInput, TOutput>>
An enumerable sequence of batches using yield return for memory efficiency.
Remarks
This method provides a PyTorch-style iteration pattern using IEnumerable and yield return for memory-efficient lazy evaluation. Each call creates a fresh iteration, automatically handling reset and shuffle operations.
For Beginners: This is the recommended way to iterate through your data:
foreach (var (xBatch, yBatch) in dataLoader.GetBatches(batchSize: 32, shuffle: true))
{
// Train on this batch
model.TrainOnBatch(xBatch, yBatch);
}
Unlike GetNextBatch(), you don't need to call Reset() - each GetBatches() call starts fresh. The yield return pattern means batches are generated on-demand, not all loaded into memory at once.
GetBatchesAsync(int?, bool, bool, int?, int, CancellationToken)
Asynchronously iterates through all batches with prefetching support.
public virtual IAsyncEnumerable<MetaLearningTask<T, TInput, TOutput>> GetBatchesAsync(int? batchSize = null, bool shuffle = true, bool dropLast = false, int? seed = null, int prefetchCount = 2, CancellationToken cancellationToken = default)
Parameters
batchSizeint?Optional batch size override. Uses default BatchSize if null.
shuffleboolWhether to shuffle data before batching. Default is true.
dropLastboolWhether to drop the last incomplete batch. Default is false.
seedint?Optional random seed for reproducible shuffling.
prefetchCountintNumber of batches to prefetch ahead. Default is 2.
cancellationTokenCancellationTokenToken to cancel the iteration.
Returns
- IAsyncEnumerable<MetaLearningTask<T, TInput, TOutput>>
An async enumerable sequence of batches.
Remarks
This method enables async batch iteration with configurable prefetching, similar to PyTorch's num_workers or TensorFlow's prefetch(). Batches are prepared in the background while the current batch is being processed.
For Beginners: Use this for large datasets or when batch preparation is slow:
await foreach (var (xBatch, yBatch) in dataLoader.GetBatchesAsync(prefetchCount: 2))
{
// While training on this batch, the next 2 batches are being prepared
await model.TrainOnBatchAsync(xBatch, yBatch);
}
Prefetching helps hide data loading latency, especially useful for:
- Large images that need decoding
- Data that requires preprocessing
- Slow storage (network drives, cloud storage)
GetNextBatch()
Gets the next batch of data.
public MetaLearningTask<T, TInput, TOutput> GetNextBatch()
Returns
- MetaLearningTask<T, TInput, TOutput>
The next batch of data.
Exceptions
- InvalidOperationException
Thrown when no more batches are available.
GetNextTask()
Gets the next meta-learning task (support set + query set).
public MetaLearningTask<T, TInput, TOutput> GetNextTask()
Returns
- MetaLearningTask<T, TInput, TOutput>
A MetaLearningTask with support and query sets.
Remarks
Each call returns a new randomly sampled task with: - N randomly selected classes from available classes - K support examples per class - QueryShots query examples per class
GetNextTaskCore()
Core task sampling logic to be implemented by derived classes.
protected abstract MetaLearningTask<T, TInput, TOutput> GetNextTaskCore()
Returns
- MetaLearningTask<T, TInput, TOutput>
A MetaLearningTask with sampled support and query sets.
Remarks
For Implementers: This is where you implement your specific task sampling strategy.
You have access to:
- DatasetX and DatasetY: The full dataset
- ClassToIndices: Fast lookup of examples by class
- _availableClasses: Array of all class labels
- _nWay, _kShot, _queryShots: Task configuration
- RandomInstance: For randomized sampling
- NumOps: For numeric operations
Your implementation should:
- Select NWay classes
- Sample KShot + QueryShots examples per class
- Split into support and query sets
- Build and return a MetaLearningTask
GetTaskBatch(int)
Gets multiple meta-learning tasks as a batch.
public IReadOnlyList<MetaLearningTask<T, TInput, TOutput>> GetTaskBatch(int numTasks)
Parameters
numTasksintNumber of tasks to sample.
Returns
- IReadOnlyList<MetaLearningTask<T, TInput, TOutput>>
A list of MetaLearningTasks.
LoadDataCoreAsync(CancellationToken)
Core data loading implementation to be provided by derived classes.
protected override Task LoadDataCoreAsync(CancellationToken cancellationToken)
Parameters
cancellationTokenCancellationTokenCancellation token for async operation.
Returns
- Task
A task that completes when loading is finished.
Remarks
Derived classes must implement this to perform actual data loading: - Load from files, databases, or remote sources - Parse and validate data format - Store in appropriate internal structures
OnReset()
Called after Reset() to allow derived classes to perform additional reset operations.
protected override void OnReset()
Remarks
Override this to reset any domain-specific state. The base indices are already reset when this is called.
SetSeed(int)
Sets the random seed for reproducible task sampling.
public void SetSeed(int seed)
Parameters
seedintRandom seed value.
TryGetNextBatch(out MetaLearningTask<T, TInput, TOutput>)
Attempts to get the next batch without throwing if unavailable.
public bool TryGetNextBatch(out MetaLearningTask<T, TInput, TOutput> batch)
Parameters
batchMetaLearningTask<T, TInput, TOutput>The batch if available, default otherwise.
Returns
- bool
True if a batch was available, false if iteration is complete.
Remarks
When false is returned, batch contains the default value for TBatch. Callers should check the return value before using batch.
UnloadDataCore()
Core data unloading implementation to be provided by derived classes.
protected override void UnloadDataCore()
Remarks
Derived classes should implement this to release resources: - Clear internal data structures - Release file handles or connections - Allow garbage collection of loaded data