Class StratifiedEpisodicDataLoader<T, TInput, TOutput>
Provides stratified episodic task sampling that maintains dataset class proportions across tasks.
public class StratifiedEpisodicDataLoader<T, TInput, TOutput> : EpisodicDataLoaderBase<T, TInput, TOutput>, IEpisodicDataLoader<T, TInput, TOutput>, IDataLoader<T>, IResettable, ICountable, IBatchIterable<MetaLearningTask<T, TInput, TOutput>>
Type Parameters
TThe numeric data type used for features and labels (e.g., float, double).
TInputTOutput
- Inheritance
-
EpisodicDataLoaderBase<T, TInput, TOutput>StratifiedEpisodicDataLoader<T, TInput, TOutput>
- Implements
-
IEpisodicDataLoader<T, TInput, TOutput>IDataLoader<T>IBatchIterable<MetaLearningTask<T, TInput, TOutput>>
- Inherited Members
- Extension Methods
Examples
// Dataset with imbalanced classes
// Class 0: 500 examples (50%)
// Class 1: 300 examples (30%)
// Class 2: 200 examples (20%)
var features = new Matrix<double>(1000, 784);
var labels = new Vector<double>(1000);
// Create stratified loader - maintains 50%/30%/20% distribution
var loader = new StratifiedEpisodicDataLoader<double>(
datasetX: features,
datasetY: labels,
nWay: 2, // 2-way tasks
kShot: 5,
queryShots: 15,
seed: 42
);
// Over many tasks, class 0 will appear in ~50% of tasks,
// class 1 in ~30%, and class 2 in ~20%
for (int episode = 0; episode < 1000; episode++)
{
var task = loader.GetNextTask();
// Train on distribution matching real-world data
}
Remarks
The StratifiedEpisodicDataLoader extends the standard episodic loader by sampling classes proportionally to their representation in the dataset. If a class represents 30% of the dataset, it will appear in approximately 30% of tasks over many episodes, preserving the natural class distribution.
For Beginners: Real-world datasets often have imbalanced class distributions: - Medical datasets might have 90% healthy cases, 10% disease cases - E-commerce might have 80% browsing, 15% cart additions, 5% purchases - Image datasets might have common objects (cars, trees) and rare ones (exotic animals)
Standard random sampling treats all classes equally, which doesn't reflect reality. Stratified sampling maintains these natural proportions:
- Common classes appear in more tasks
- Rare classes appear in fewer tasks
- The model learns to handle the real-world distribution
When to use this:
- When your dataset has natural class imbalance that you want to preserve
- When training for real-world deployment where class frequencies matter
- When you want meta-learning to reflect actual data distributions
When NOT to use this:
- When you want equal exposure to all classes (use BalancedEpisodicDataLoader)
- When evaluating few-shot learning fairly across all classes
- When class frequencies in deployment differ from training data
Thread Safety: This class is not thread-safe due to internal Random state. Create separate instances for concurrent task generation.
Performance: Similar to standard EpisodicDataLoader with O(nWay × (kShot + queryShots)) complexity. Slightly slower due to proportional weight calculation during initialization.
Constructors
StratifiedEpisodicDataLoader(Matrix<T>, Vector<T>, int, int, int, int?)
Initializes a new instance of the StratifiedEpisodicDataLoader for proportional N-way K-shot task sampling.
public StratifiedEpisodicDataLoader(Matrix<T> datasetX, Vector<T> datasetY, int nWay = 5, int kShot = 5, int queryShots = 15, int? seed = null)
Parameters
datasetXMatrix<T>The feature matrix where each row is an example. Shape: [num_examples, num_features].
datasetYVector<T>The label vector containing class labels for each example. Length: num_examples.
nWayintThe number of unique classes per task. Must be at least 2.
kShotintThe number of support examples per class. Must be at least 1.
queryShotsintThe number of query examples per class. Must be at least 1.
seedint?Optional random seed for reproducible task sampling. If null, uses a time-based seed.
Remarks
For Beginners: This constructor calculates the proportion of each class in your dataset and stores these as weights for sampling.
For example, if you have:
- 1000 total examples
- Class A: 500 examples (50%)
- Class B: 300 examples (30%)
- Class C: 200 examples (20%)
The weights will be 0.5, 0.3, 0.2 respectively. When generating tasks, classes will be selected randomly but weighted by these proportions, so Class A appears most often.
Exceptions
- ArgumentNullException
Thrown when datasetX or datasetY is null.
- ArgumentException
Thrown when dimensions are invalid or dataset is too small.
Methods
GetNextTaskCore()
Core implementation of stratified N-way K-shot task sampling with proportional class selection.
protected override MetaLearningTask<T, TInput, TOutput> GetNextTaskCore()
Returns
- MetaLearningTask<T, TInput, TOutput>
A MetaLearningTask with classes sampled proportionally to their dataset representation.
Remarks
This method implements proportional sampling: 1. Selects N classes using weighted random sampling (proportional to class frequency) 2. For each selected class, randomly samples (K + queryShots) examples 3. Shuffles and splits into support and query sets 4. Constructs and returns MetaLearningTask
For Beginners: Think of it like a lottery where each class has tickets proportional to how common it is:
- Class A (50% of data): Gets 500 lottery tickets
- Class B (30% of data): Gets 300 lottery tickets
- Class C (20% of data): Gets 200 lottery tickets
When selecting classes for a task, we draw from this lottery. Class A is most likely to be drawn, Class B moderately likely, and Class C least likely.
This means:
- Common real-world scenarios appear in more training tasks
- Rare scenarios appear less frequently, matching reality
- The model learns the true distribution it will see in deployment