Table of Contents

Class BenchmarkResult<T>

Namespace
AiDotNet.Reasoning.Benchmarks.Models
Assembly
AiDotNet.dll

Results from evaluating a reasoning system on a benchmark.

public class BenchmarkResult<T>

Type Parameters

T

The numeric type used for metrics (e.g., double, float).

Inheritance
BenchmarkResult<T>
Inherited Members

Remarks

For Beginners: This is like a report card for your reasoning system's performance on a standardized test.

Key metrics:

  • Accuracy: Percentage of problems answered correctly (most important)
  • Total Evaluated: How many problems were tested
  • Correct Count: How many were answered correctly
  • Average Confidence: How confident the system was on average

Example:

Benchmark: GSM8K (Grade School Math)
Problems Evaluated: 100
Correct: 87
Accuracy: 87.0%
Average Confidence: 0.92
Average Time: 3.2 seconds per problem

This would indicate the system got 87 out of 100 math problems correct, with high confidence.

Properties

Accuracy

Overall accuracy (correct / total) as a value between 0.0 and 1.0.

public T Accuracy { get; set; }

Property Value

T

Remarks

For Beginners: This is your "score" on the test. 0.87 means 87% correct, 1.0 means perfect score (100%).

AccuracyByCategory

Breakdown of accuracy by category (if applicable).

public Dictionary<string, T> AccuracyByCategory { get; set; }

Property Value

Dictionary<string, T>

Remarks

For Beginners: Shows performance in different areas. For example, in math: {"algebra": 0.92, "geometry": 0.78, "arithmetic": 0.95} This helps identify strengths and weaknesses.

AverageConfidence

Average confidence across all evaluated problems.

public T AverageConfidence { get; set; }

Property Value

T

AverageTimePerProblem

Average time per problem.

public TimeSpan AverageTimePerProblem { get; }

Property Value

TimeSpan

BenchmarkName

Name of the benchmark that was evaluated.

public string BenchmarkName { get; set; }

Property Value

string

ConfidenceScores

Confidence scores for each evaluated problem (as a Vector).

public Vector<T>? ConfidenceScores { get; set; }

Property Value

Vector<T>

Remarks

For Beginners: Each problem gets a confidence score (0.0-1.0) indicating how sure the system was about its answer. High confidence with wrong answers indicates the system doesn't know what it doesn't know (bad calibration).

CorrectCount

Number of problems answered correctly.

public int CorrectCount { get; set; }

Property Value

int

Metrics

Additional benchmark-specific metrics.

public Dictionary<string, object> Metrics { get; set; }

Property Value

Dictionary<string, object>

ProblemResults

Detailed results for each evaluated problem.

public List<ProblemEvaluation<T>> ProblemResults { get; set; }

Property Value

List<ProblemEvaluation<T>>

TotalDuration

Total time spent evaluating all problems.

public TimeSpan TotalDuration { get; set; }

Property Value

TimeSpan

TotalEvaluated

Total number of problems evaluated.

public int TotalEvaluated { get; set; }

Property Value

int

Methods

GetSummary()

Gets a summary string of the benchmark results.

public string GetSummary()

Returns

string

ToString()

Returns a summary string.

public override string ToString()

Returns

string