Class M4DatasetLoader<T>
- Namespace
- AiDotNet.Data.TimeSeries
- Assembly
- AiDotNet.dll
Loads time series datasets from the M4 Competition for benchmarking forecasting models.
public class M4DatasetLoader<T> : DataLoaderBase<T>, IDataLoader<T>, IResettable, ICountable
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
M4DatasetLoader<T>
- Implements
-
IDataLoader<T>
- Inherited Members
Remarks
The M4 Competition (Makridakis Competitions) is a highly influential forecasting competition that provides 100,000 time series across multiple frequencies for benchmarking forecasting methods.
For Beginners: The M4 Competition is the gold standard for evaluating time series forecasting models.
What is M4?
- A collection of 100,000 real-world time series
- Multiple frequencies: Yearly, Quarterly, Monthly, Weekly, Daily, Hourly
- Standardized train/test splits for fair comparison
- Established benchmark metrics (SMAPE, MASE, OWA)
Why M4 matters:
- Industry standard: Used by researchers and practitioners worldwide
- Diverse data: Business, economic, demographic, and financial series
- Published baselines: Compare your model against known benchmarks
- Academic recognition: Results published in major forecasting journals
M4 Dataset Statistics:
| Frequency | Series Count | Forecast Horizon | Typical History |
|---|---|---|---|
| Yearly | 23,000 | 6 | 13-835 years |
| Quarterly | 24,000 | 8 | 16-866 quarters |
| Monthly | 48,000 | 18 | 42-2794 months |
| Weekly | 359 | 13 | 80-2597 weeks |
| Daily | 4,227 | 14 | 93-9919 days |
| Hourly | 414 | 48 | 700-960 hours |
Constructors
M4DatasetLoader(M4Frequency, int, string?, bool)
Initializes a new instance of the M4DatasetLoader<T> class.
public M4DatasetLoader(M4Frequency frequency, int batchSize = 32, string? dataPath = null, bool autoDownload = true)
Parameters
frequencyM4FrequencyThe frequency of time series to load.
batchSizeintBatch size for loading series (default: 32).
dataPathstringPath to download/cache datasets (optional).
autoDownloadboolWhether to automatically download the dataset if not found locally.
Remarks
For Beginners: Using M4 datasets:
// Load monthly time series
var loader = new M4DatasetLoader<double>(
M4Frequency.Monthly,
batchSize: 32,
autoDownload: true);
// Load the data
await loader.LoadAsync();
// Train your model on each series
foreach (var series in loader.TrainingSeries)
{
model.Train(series.Values);
}
// Evaluate using test data
foreach (var (train, test) in loader.TrainingSeries.Zip(loader.TestSeries))
{
var forecast = model.Forecast(train.Values, loader.ForecastHorizon);
var smape = CalculateSMAPE(forecast, test.Values);
}
Properties
Description
Gets a description of the dataset and its intended use.
public override string Description { get; }
Property Value
ForecastHorizon
Gets the forecast horizon for this frequency.
public int ForecastHorizon { get; }
Property Value
Frequency
Gets the frequency of the loaded time series.
public M4Frequency Frequency { get; }
Property Value
Name
Gets the human-readable name of this data loader.
public override string Name { get; }
Property Value
Remarks
Examples: "MNIST", "Cora Citation Network", "IMDB Reviews"
SeriesCount
Gets the number of time series in the dataset.
public int SeriesCount { get; }
Property Value
TestSeries
Gets the test time series data (ground truth for evaluation).
public IReadOnlyList<M4TimeSeries<T>> TestSeries { get; }
Property Value
TotalCount
Gets the total number of samples in the dataset.
public override int TotalCount { get; }
Property Value
TrainingSeries
Gets the training time series data.
public IReadOnlyList<M4TimeSeries<T>> TrainingSeries { get; }
Property Value
Methods
CalculateMASE(IReadOnlyList<T>, IReadOnlyList<T>, IReadOnlyList<T>, int)
Calculates the Mean Absolute Scaled Error (MASE) for M4 evaluation.
public static T CalculateMASE(IReadOnlyList<T> forecast, IReadOnlyList<T> actual, IReadOnlyList<T> trainingData, int seasonalPeriod = 1)
Parameters
forecastIReadOnlyList<T>The forecasted values.
actualIReadOnlyList<T>The actual test values.
trainingDataIReadOnlyList<T>The training data used to compute the scaling factor.
seasonalPeriodintThe seasonal period for the scaling factor (default: 1 for non-seasonal).
Returns
- T
The MASE score (lower is better, 1.0 equals naive forecast).
Remarks
MASE is scale-independent and compares forecast accuracy against a naive seasonal forecast. A MASE of 1.0 means the forecast is as good as a seasonal naive forecast. Lower values indicate better performance.
CalculateSMAPE(IReadOnlyList<T>, IReadOnlyList<T>)
Calculates the Symmetric Mean Absolute Percentage Error (SMAPE) for M4 evaluation.
public static T CalculateSMAPE(IReadOnlyList<T> forecast, IReadOnlyList<T> actual)
Parameters
forecastIReadOnlyList<T>The forecasted values.
actualIReadOnlyList<T>The actual test values.
Returns
- T
The SMAPE score (0-200, lower is better).
Remarks
SMAPE is the official metric used in the M4 Competition. It's symmetric and bounded between 0% and 200%.
GetNextBatch()
Gets the next batch of time series for iteration.
public List<M4TimeSeries<T>> GetNextBatch()
Returns
- List<M4TimeSeries<T>>
A list of training time series in the current batch.
GetSeries(int)
Gets a specific time series by index.
public (M4TimeSeries<T> train, M4TimeSeries<T> test) GetSeries(int index)
Parameters
indexintThe index of the series to retrieve.
Returns
- (M4TimeSeries<T> train, M4TimeSeries<T> test)
A tuple containing the training series and its corresponding test values.
LoadDataCoreAsync(CancellationToken)
Core data loading implementation to be provided by derived classes.
protected override Task LoadDataCoreAsync(CancellationToken cancellationToken)
Parameters
cancellationTokenCancellationTokenCancellation token for async operation.
Returns
- Task
A task that completes when loading is finished.
Remarks
Derived classes must implement this to perform actual data loading: - Load from files, databases, or remote sources - Parse and validate data format - Store in appropriate internal structures
OnReset()
Called after Reset() to allow derived classes to perform additional reset operations.
protected override void OnReset()
Remarks
Override this to reset any domain-specific state. The base indices are already reset when this is called.
UnloadDataCore()
Core data unloading implementation to be provided by derived classes.
protected override void UnloadDataCore()
Remarks
Derived classes should implement this to release resources: - Clear internal data structures - Release file handles or connections - Allow garbage collection of loaded data