Table of Contents

Class M4DatasetLoader<T>

Namespace
AiDotNet.Data.TimeSeries
Assembly
AiDotNet.dll

Loads time series datasets from the M4 Competition for benchmarking forecasting models.

public class M4DatasetLoader<T> : DataLoaderBase<T>, IDataLoader<T>, IResettable, ICountable

Type Parameters

T

The numeric type used for calculations, typically float or double.

Inheritance
M4DatasetLoader<T>
Implements
Inherited Members

Remarks

The M4 Competition (Makridakis Competitions) is a highly influential forecasting competition that provides 100,000 time series across multiple frequencies for benchmarking forecasting methods.

For Beginners: The M4 Competition is the gold standard for evaluating time series forecasting models.

What is M4?

  • A collection of 100,000 real-world time series
  • Multiple frequencies: Yearly, Quarterly, Monthly, Weekly, Daily, Hourly
  • Standardized train/test splits for fair comparison
  • Established benchmark metrics (SMAPE, MASE, OWA)

Why M4 matters:

  • Industry standard: Used by researchers and practitioners worldwide
  • Diverse data: Business, economic, demographic, and financial series
  • Published baselines: Compare your model against known benchmarks
  • Academic recognition: Results published in major forecasting journals

M4 Dataset Statistics:

Frequency Series Count Forecast Horizon Typical History
Yearly 23,000 6 13-835 years
Quarterly 24,000 8 16-866 quarters
Monthly 48,000 18 42-2794 months
Weekly 359 13 80-2597 weeks
Daily 4,227 14 93-9919 days
Hourly 414 48 700-960 hours

Constructors

M4DatasetLoader(M4Frequency, int, string?, bool)

Initializes a new instance of the M4DatasetLoader<T> class.

public M4DatasetLoader(M4Frequency frequency, int batchSize = 32, string? dataPath = null, bool autoDownload = true)

Parameters

frequency M4Frequency

The frequency of time series to load.

batchSize int

Batch size for loading series (default: 32).

dataPath string

Path to download/cache datasets (optional).

autoDownload bool

Whether to automatically download the dataset if not found locally.

Remarks

For Beginners: Using M4 datasets:

// Load monthly time series
var loader = new M4DatasetLoader<double>(
    M4Frequency.Monthly,
    batchSize: 32,
    autoDownload: true);

// Load the data
await loader.LoadAsync();

// Train your model on each series
foreach (var series in loader.TrainingSeries)
{
    model.Train(series.Values);
}

// Evaluate using test data
foreach (var (train, test) in loader.TrainingSeries.Zip(loader.TestSeries))
{
    var forecast = model.Forecast(train.Values, loader.ForecastHorizon);
    var smape = CalculateSMAPE(forecast, test.Values);
}

Properties

Description

Gets a description of the dataset and its intended use.

public override string Description { get; }

Property Value

string

ForecastHorizon

Gets the forecast horizon for this frequency.

public int ForecastHorizon { get; }

Property Value

int

Frequency

Gets the frequency of the loaded time series.

public M4Frequency Frequency { get; }

Property Value

M4Frequency

Name

Gets the human-readable name of this data loader.

public override string Name { get; }

Property Value

string

Remarks

Examples: "MNIST", "Cora Citation Network", "IMDB Reviews"

SeriesCount

Gets the number of time series in the dataset.

public int SeriesCount { get; }

Property Value

int

TestSeries

Gets the test time series data (ground truth for evaluation).

public IReadOnlyList<M4TimeSeries<T>> TestSeries { get; }

Property Value

IReadOnlyList<M4TimeSeries<T>>

TotalCount

Gets the total number of samples in the dataset.

public override int TotalCount { get; }

Property Value

int

TrainingSeries

Gets the training time series data.

public IReadOnlyList<M4TimeSeries<T>> TrainingSeries { get; }

Property Value

IReadOnlyList<M4TimeSeries<T>>

Methods

CalculateMASE(IReadOnlyList<T>, IReadOnlyList<T>, IReadOnlyList<T>, int)

Calculates the Mean Absolute Scaled Error (MASE) for M4 evaluation.

public static T CalculateMASE(IReadOnlyList<T> forecast, IReadOnlyList<T> actual, IReadOnlyList<T> trainingData, int seasonalPeriod = 1)

Parameters

forecast IReadOnlyList<T>

The forecasted values.

actual IReadOnlyList<T>

The actual test values.

trainingData IReadOnlyList<T>

The training data used to compute the scaling factor.

seasonalPeriod int

The seasonal period for the scaling factor (default: 1 for non-seasonal).

Returns

T

The MASE score (lower is better, 1.0 equals naive forecast).

Remarks

MASE is scale-independent and compares forecast accuracy against a naive seasonal forecast. A MASE of 1.0 means the forecast is as good as a seasonal naive forecast. Lower values indicate better performance.

CalculateSMAPE(IReadOnlyList<T>, IReadOnlyList<T>)

Calculates the Symmetric Mean Absolute Percentage Error (SMAPE) for M4 evaluation.

public static T CalculateSMAPE(IReadOnlyList<T> forecast, IReadOnlyList<T> actual)

Parameters

forecast IReadOnlyList<T>

The forecasted values.

actual IReadOnlyList<T>

The actual test values.

Returns

T

The SMAPE score (0-200, lower is better).

Remarks

SMAPE is the official metric used in the M4 Competition. It's symmetric and bounded between 0% and 200%.

GetNextBatch()

Gets the next batch of time series for iteration.

public List<M4TimeSeries<T>> GetNextBatch()

Returns

List<M4TimeSeries<T>>

A list of training time series in the current batch.

GetSeries(int)

Gets a specific time series by index.

public (M4TimeSeries<T> train, M4TimeSeries<T> test) GetSeries(int index)

Parameters

index int

The index of the series to retrieve.

Returns

(M4TimeSeries<T> train, M4TimeSeries<T> test)

A tuple containing the training series and its corresponding test values.

LoadDataCoreAsync(CancellationToken)

Core data loading implementation to be provided by derived classes.

protected override Task LoadDataCoreAsync(CancellationToken cancellationToken)

Parameters

cancellationToken CancellationToken

Cancellation token for async operation.

Returns

Task

A task that completes when loading is finished.

Remarks

Derived classes must implement this to perform actual data loading: - Load from files, databases, or remote sources - Parse and validate data format - Store in appropriate internal structures

OnReset()

Called after Reset() to allow derived classes to perform additional reset operations.

protected override void OnReset()

Remarks

Override this to reset any domain-specific state. The base indices are already reset when this is called.

UnloadDataCore()

Core data unloading implementation to be provided by derived classes.

protected override void UnloadDataCore()

Remarks

Derived classes should implement this to release resources: - Clear internal data structures - Release file handles or connections - Allow garbage collection of loaded data