Class BenchmarkResult<T>

Namespace: AiDotNet.Reasoning.Benchmarks.Models

Assembly: AiDotNet.dll

Results from evaluating a reasoning system on a benchmark.

public class BenchmarkResult<T>

Type Parameters

T: The numeric type used for metrics (e.g., double, float).

Inheritance: object

BenchmarkResult<T>

Inherited Members: object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

Remarks

For Beginners: This is like a report card for your reasoning system's performance on a standardized test.

Key metrics:

Accuracy: Percentage of problems answered correctly (most important)
Total Evaluated: How many problems were tested
Correct Count: How many were answered correctly
Average Confidence: How confident the system was on average

Example:

Benchmark: GSM8K (Grade School Math)
Problems Evaluated: 100
Correct: 87
Accuracy: 87.0%
Average Confidence: 0.92
Average Time: 3.2 seconds per problem

This would indicate the system got 87 out of 100 math problems correct, with high confidence.

Properties

Accuracy

Overall accuracy (correct / total) as a value between 0.0 and 1.0.

public T Accuracy { get; set; }

Property Value

T

Remarks

For Beginners: This is your "score" on the test. 0.87 means 87% correct, 1.0 means perfect score (100%).

AccuracyByCategory

Breakdown of accuracy by category (if applicable).

public Dictionary<string, T> AccuracyByCategory { get; set; }

Property Value

Dictionary<string, T>

Remarks

For Beginners: Shows performance in different areas. For example, in math: {"algebra": 0.92, "geometry": 0.78, "arithmetic": 0.95} This helps identify strengths and weaknesses.

AverageConfidence

Average confidence across all evaluated problems.

public T AverageConfidence { get; set; }

Property Value

T

AverageTimePerProblem

Average time per problem.

public TimeSpan AverageTimePerProblem { get; }

Property Value

TimeSpan

BenchmarkName

Name of the benchmark that was evaluated.

public string BenchmarkName { get; set; }

Property Value

string

ConfidenceScores

Confidence scores for each evaluated problem (as a Vector).

public Vector<T>? ConfidenceScores { get; set; }

Property Value

Vector<T>

Remarks

For Beginners: Each problem gets a confidence score (0.0-1.0) indicating how sure the system was about its answer. High confidence with wrong answers indicates the system doesn't know what it doesn't know (bad calibration).

CorrectCount

Number of problems answered correctly.

public int CorrectCount { get; set; }

Property Value

int

Metrics

Additional benchmark-specific metrics.

public Dictionary<string, object> Metrics { get; set; }

Property Value

Dictionary<string, object>

ProblemResults

Detailed results for each evaluated problem.

public List<ProblemEvaluation<T>> ProblemResults { get; set; }

Property Value

List<ProblemEvaluation<T>>

TotalDuration

Total time spent evaluating all problems.

public TimeSpan TotalDuration { get; set; }

Property Value

TimeSpan

TotalEvaluated

Total number of problems evaluated.

public int TotalEvaluated { get; set; }

Property Value

int

Methods

GetSummary()

Gets a summary string of the benchmark results.

public string GetSummary()

Returns

string

ToString()

Returns a summary string.

public override string ToString()

Returns

string

Table of Contents

Class BenchmarkResult<T>

Type Parameters

Remarks

Properties

Accuracy

Property Value

Remarks

AccuracyByCategory

Property Value

Remarks

AverageConfidence

Property Value

AverageTimePerProblem

Property Value

BenchmarkName

Property Value

ConfidenceScores

Property Value

Remarks

CorrectCount

Property Value

Metrics

Property Value

ProblemResults

Property Value

TotalDuration

Property Value

TotalEvaluated

Property Value

Methods

GetSummary()

Returns

ToString()

Returns