Class BenchmarkResult<T>
- Namespace
- AiDotNet.Reasoning.Benchmarks.Models
- Assembly
- AiDotNet.dll
Results from evaluating a reasoning system on a benchmark.
public class BenchmarkResult<T>
Type Parameters
TThe numeric type used for metrics (e.g., double, float).
- Inheritance
-
BenchmarkResult<T>
- Inherited Members
Remarks
For Beginners: This is like a report card for your reasoning system's performance on a standardized test.
Key metrics:
- Accuracy: Percentage of problems answered correctly (most important)
- Total Evaluated: How many problems were tested
- Correct Count: How many were answered correctly
- Average Confidence: How confident the system was on average
Example:
Benchmark: GSM8K (Grade School Math)
Problems Evaluated: 100
Correct: 87
Accuracy: 87.0%
Average Confidence: 0.92
Average Time: 3.2 seconds per problem
This would indicate the system got 87 out of 100 math problems correct, with high confidence.
Properties
Accuracy
Overall accuracy (correct / total) as a value between 0.0 and 1.0.
public T Accuracy { get; set; }
Property Value
- T
Remarks
For Beginners: This is your "score" on the test. 0.87 means 87% correct, 1.0 means perfect score (100%).
AccuracyByCategory
Breakdown of accuracy by category (if applicable).
public Dictionary<string, T> AccuracyByCategory { get; set; }
Property Value
- Dictionary<string, T>
Remarks
For Beginners: Shows performance in different areas. For example, in math: {"algebra": 0.92, "geometry": 0.78, "arithmetic": 0.95} This helps identify strengths and weaknesses.
AverageConfidence
Average confidence across all evaluated problems.
public T AverageConfidence { get; set; }
Property Value
- T
AverageTimePerProblem
Average time per problem.
public TimeSpan AverageTimePerProblem { get; }
Property Value
BenchmarkName
Name of the benchmark that was evaluated.
public string BenchmarkName { get; set; }
Property Value
ConfidenceScores
Confidence scores for each evaluated problem (as a Vector).
public Vector<T>? ConfidenceScores { get; set; }
Property Value
- Vector<T>
Remarks
For Beginners: Each problem gets a confidence score (0.0-1.0) indicating how sure the system was about its answer. High confidence with wrong answers indicates the system doesn't know what it doesn't know (bad calibration).
CorrectCount
Number of problems answered correctly.
public int CorrectCount { get; set; }
Property Value
Metrics
Additional benchmark-specific metrics.
public Dictionary<string, object> Metrics { get; set; }
Property Value
ProblemResults
Detailed results for each evaluated problem.
public List<ProblemEvaluation<T>> ProblemResults { get; set; }
Property Value
TotalDuration
Total time spent evaluating all problems.
public TimeSpan TotalDuration { get; set; }
Property Value
TotalEvaluated
Total number of problems evaluated.
public int TotalEvaluated { get; set; }
Property Value
Methods
GetSummary()
Gets a summary string of the benchmark results.
public string GetSummary()
Returns
ToString()
Returns a summary string.
public override string ToString()