Table of Contents

Enum BenchmarkSuite

Namespace
AiDotNet.Enums
Assembly
AiDotNet.dll

Defines the supported benchmark suites available through the AiDotNet facade.

public enum BenchmarkSuite

Fields

ARCAGI = 13

ARC-AGI - abstract reasoning puzzles.

BoolQ = 15

BoolQ - yes/no question answering.

CIFAR10 = 6

CIFAR-10 - federated image classification (synthetic partitioning of standard CIFAR-10).

CIFAR100 = 7

CIFAR-100 - federated image classification (synthetic partitioning of standard CIFAR-100).

CommonsenseQA = 17

CommonsenseQA - commonsense multiple-choice QA.

DROP = 14

DROP - reading comprehension with discrete reasoning over paragraphs.

FEMNIST = 1

FEMNIST - LEAF federated handwritten character classification (per-writer partitioning).

GSM8K = 9

Grade School Math 8K (GSM8K) - multi-step math word problems.

HellaSwag = 19

HellaSwag - commonsense inference in narrative completion.

HumanEval = 20

HumanEval - code generation / program synthesis evaluation.

LEAF = 0

LEAF - federated benchmark suite (JSON-based train/test splits).

LogiQA = 22

LogiQA - logical reasoning benchmark.

MATH = 10

MATH - competition-style mathematics problems.

MBPP = 21

MBPP - mostly basic programming problems.

MMLU = 11

MMLU - broad multi-subject multiple-choice benchmark.

PIQA = 16

PIQA - physical commonsense reasoning.

Reddit = 4

Reddit - federated next-token prediction benchmark (LEAF Reddit dataset).

Sent140 = 2

Sent140 - LEAF federated sentiment classification benchmark based on tweets.

Shakespeare = 3

Shakespeare - LEAF federated next-character prediction benchmark.

StackOverflow = 5

StackOverflow - federated next-token prediction benchmark (StackOverflow corpus).

TabularNonIID = 8

Generic tabular suite with synthetic non-IID client partitions.

TruthfulQA = 12

TruthfulQA - evaluates truthfulness and resistance to hallucination.

WinoGrande = 18

WinoGrande - pronoun resolution / commonsense reasoning.

Remarks

For Beginners: A benchmark suite is like a standardized test you can run to measure how well your model performs on a specific family of problems. You select a suite using this enum, and AiDotNet runs the benchmark and returns a structured report.