Class ModelStats<T, TInput, TOutput>
- Namespace
- AiDotNet.Statistics
- Assembly
- AiDotNet.dll
Represents a collection of statistical metrics for evaluating and analyzing machine learning models.
public class ModelStats<T, TInput, TOutput>
Type Parameters
TThe numeric type used for calculations (typically float or double).
TInputTOutput
- Inheritance
-
ModelStats<T, TInput, TOutput>
- Inherited Members
Remarks
This class calculates and stores various statistical measures that help assess the performance, fit, and characteristics of a machine learning model. It includes metrics for model accuracy, feature importance, model complexity, and various distance and similarity measures.
For Beginners: Think of ModelStats as a report card for your AI model.
Just like a school report card shows how well a student is doing in different subjects, ModelStats shows how well your AI model is performing in different areas. It helps you:
- Understand how accurate your model's predictions are
- See which features (inputs) are most important
- Check if your model is too simple or too complex
- Compare your model's performance to simpler alternatives
This information helps you improve your model and decide if it's ready to use in real-world situations.
Properties
Actual
Gets the actual (observed) values from the dataset.
public TOutput Actual { get; }
Property Value
- TOutput
AutoCorrelationFunction
Gets the Auto-Correlation Function, which measures the correlation between a time series and a lagged version of itself.
public Vector<T> AutoCorrelationFunction { get; }
Property Value
- Vector<T>
Remarks
For Beginners: This function helps you understand patterns in time-based data. It shows how similar your data is to itself at different time delays. This can reveal: - Repeating patterns (like seasonal effects) - How long effects last in your data - If your model is missing important time-based patterns It's particularly useful for time series data, like stock prices or weather patterns.
CalinskiHarabaszIndex
Gets the Calinski-Harabasz index, a measure of cluster separation.
public T CalinskiHarabaszIndex { get; }
Property Value
- T
Remarks
For Beginners: This index tells you how well-separated your groups (clusters) are. Higher values mean your groups are more distinct from each other, which is generally better. It's useful when comparing different ways of grouping your data.
ConditionNumber
Gets the condition number, a measure of the model's numerical stability.
public T ConditionNumber { get; }
Property Value
- T
Remarks
For Beginners: The condition number tells you if small changes in your data might cause big changes in your model's predictions. A high condition number (typically above 30) suggests that your model might be unstable and sensitive to small data changes.
CorrelationMatrix
Gets the correlation matrix showing relationships between features.
public Matrix<T> CorrelationMatrix { get; }
Property Value
- Matrix<T>
Remarks
For Beginners: This matrix shows how closely related your features are to each other. Values close to 1 or -1 mean strong relationships, while values near 0 mean weak relationships. This helps you understand which features might be providing similar information.
CosineSimilarity
Gets the cosine similarity between actual and predicted values.
public T CosineSimilarity { get; }
Property Value
- T
Remarks
For Beginners: This measures how similar the direction of your predictions is to the actual values, ignoring their magnitude. Values closer to 1 indicate more similar directions.
CovarianceMatrix
Gets the covariance matrix showing how features vary together.
public Matrix<T> CovarianceMatrix { get; }
Property Value
- Matrix<T>
Remarks
For Beginners: This matrix shows how features change together. It's similar to the correlation matrix but uses a different scale. It helps identify patterns in how your features behave together.
DaviesBouldinIndex
Gets the Davies-Bouldin index, a measure of the average similarity between each cluster and its most similar cluster.
public T DaviesBouldinIndex { get; }
Property Value
- T
Remarks
For Beginners: This index helps you understand how well-separated your groups (clusters) are. Lower values are better, meaning your groups are more distinct from each other. It's particularly useful when you're not sure how many groups to divide your data into.
EffectiveNumberOfParameters
Gets the effective number of parameters in the model.
public T EffectiveNumberOfParameters { get; }
Property Value
- T
Remarks
For Beginners: This estimates how complex your model is in practice. It might be different from the actual number of parameters and helps identify if your model is overfitting (using more complexity than needed to explain the data).
EuclideanDistance
Gets the Euclidean distance between actual and predicted values.
public T EuclideanDistance { get; }
Property Value
- T
Remarks
For Beginners: This measures the straight-line distance between your actual and predicted values. Lower values indicate predictions that are closer to the actual values.
FeatureCount
Gets the number of features (input variables) used in the model.
public int FeatureCount { get; }
Property Value
Remarks
For Beginners: This is the number of different pieces of information your model uses to make predictions. For example, if you're predicting house prices, features might include size, number of bedrooms, location, etc.
FeatureNames
Gets the names of the features used in the model.
public List<string> FeatureNames { get; }
Property Value
FeatureValues
Gets a dictionary mapping feature names to their values.
public Dictionary<string, TOutput> FeatureValues { get; }
Property Value
- Dictionary<string, TOutput>
Features
Gets the feature values used in the model.
public TInput Features { get; }
Property Value
- TInput
HammingDistance
Gets the Hamming distance between actual and predicted values.
public T HammingDistance { get; }
Property Value
- T
Remarks
For Beginners: This counts how many predictions are different from the actual values. It's most useful for categorical predictions. Lower values indicate fewer differences.
JaccardSimilarity
Gets the Jaccard similarity between actual and predicted values.
public T JaccardSimilarity { get; }
Property Value
- T
Remarks
For Beginners: This measures the overlap between your actual and predicted values. It's especially useful for binary (yes/no) predictions. Values closer to 1 indicate more overlap.
LeaveOneOutPredictiveDensities
Gets the leave-one-out predictive densities for each data point.
public List<T> LeaveOneOutPredictiveDensities { get; }
Property Value
- List<T>
Remarks
For Beginners: This shows how well the model predicts each data point when it's trained without that point. It helps identify which data points might be harder for the model to predict accurately.
LogLikelihood
Gets the log-likelihood of the model.
public T LogLikelihood { get; }
Property Value
- T
Remarks
For Beginners: This measures how probable your data is under your model. Higher values mean your model fits the data better. It's often used in more advanced statistical techniques.
LogPointwisePredictiveDensity
Gets the log pointwise predictive density, a measure of prediction accuracy.
public T LogPointwisePredictiveDensity { get; }
Property Value
- T
Remarks
For Beginners: This is a way to measure how well your model's predictions match the actual data. Higher values generally indicate better predictions. It's particularly useful when comparing different models.
MahalanobisDistance
Gets the Mahalanobis distance between actual and predicted values.
public T MahalanobisDistance { get; }
Property Value
- T
Remarks
For Beginners: This is an advanced distance measure that takes into account how your features are related to each other. It can be more meaningful than simpler distances when your features are correlated.
ManhattanDistance
Gets the Manhattan distance between actual and predicted values.
public T ManhattanDistance { get; }
Property Value
- T
Remarks
For Beginners: This measures the distance between actual and predicted values as if you could only move horizontally or vertically (like navigating city blocks). Lower values indicate better predictions.
MarginalLikelihood
Gets the marginal likelihood of the model.
public T MarginalLikelihood { get; }
Property Value
- T
Remarks
For Beginners: This is a measure of how well your model fits the data, taking into account its complexity. It helps in comparing different models, with higher values generally indicating better models.
MeanAveragePrecision
Gets the Mean Average Precision, a measure of ranking quality.
public T MeanAveragePrecision { get; }
Property Value
- T
Remarks
For Beginners: This measures how well your model ranks items, especially in search or recommendation systems. It ranges from 0 to 1, where 1 is perfect. It considers both the order of your predictions and their accuracy. For example, in a search engine, it would measure how well the most relevant results are placed at the top.
MeanReciprocalRank
Gets the Mean Reciprocal Rank, a statistic measuring the performance of a system that produces a list of possible responses to a query.
public T MeanReciprocalRank { get; }
Property Value
- T
Remarks
For Beginners: This measures how well your model places the first correct answer in a list of predictions. It ranges from 0 to 1, where 1 means the correct answer is always first. It's often used in question-answering systems or search engines to measure how quickly a user might find the right answer.
Model
Gets the full model being evaluated.
public IFullModel<T, TInput, TOutput>? Model { get; }
Property Value
- IFullModel<T, TInput, TOutput>
MutualInformation
Gets the mutual information between actual and predicted values.
public T MutualInformation { get; }
Property Value
- T
Remarks
For Beginners: This measures how much information your predictions provide about the actual values. Higher values mean your predictions are more informative and closely related to the actual values.
NormalizedDiscountedCumulativeGain
Gets the Normalized Discounted Cumulative Gain, a measure of ranking quality that takes the position of correct items into account.
public T NormalizedDiscountedCumulativeGain { get; }
Property Value
- T
Remarks
For Beginners: This measures how well your model ranks items, giving more importance to correct predictions at the top of the list. It ranges from 0 to 1, where 1 is perfect. It's often used in search engines or recommendation systems to ensure the most relevant items appear first.
NormalizedMutualInformation
Gets the normalized mutual information between actual and predicted values.
public T NormalizedMutualInformation { get; }
Property Value
- T
Remarks
For Beginners: This is similar to mutual information, but scaled to be between 0 and 1. It's easier to interpret across different datasets. Values closer to 1 indicate better predictions.
ObservedTestStatistic
Gets the observed test statistic for model evaluation.
public T ObservedTestStatistic { get; }
Property Value
- T
Remarks
For Beginners: This is a single number that summarizes how well your model fits the data. It's used in statistical tests to determine if your model is significantly better than a simpler alternative.
PartialAutoCorrelationFunction
Gets the Partial Auto-Correlation Function, which measures the direct relationship between an observation and its lag.
public Vector<T> PartialAutoCorrelationFunction { get; }
Property Value
- Vector<T>
Remarks
For Beginners: This function is similar to the Auto-Correlation Function, but it focuses on the direct relationship between data points at different time delays. It helps you: - Identify how many past time points directly influence the current point - Decide how many past observations to use in time series models - Understand the "memory" of your time series data It's often used in more advanced time series analysis and forecasting.
PosteriorPredictiveSamples
Gets samples from the posterior predictive distribution.
public List<T> PosteriorPredictiveSamples { get; }
Property Value
- List<T>
Remarks
For Beginners: These are possible predictions your model might make if you ran it multiple times. They help you understand the range and uncertainty of your model's predictions.
Predicted
Gets the predicted values from the model.
public TOutput Predicted { get; }
Property Value
- TOutput
ReferenceModelMarginalLikelihood
Gets the marginal likelihood of a reference (simpler) model.
public T ReferenceModelMarginalLikelihood { get; }
Property Value
- T
Remarks
For Beginners: This is the marginal likelihood for a basic, simple model. It's used as a comparison point to see how much better your more complex model performs.
SilhouetteScore
Gets the silhouette score, a measure of how similar an object is to its own cluster compared to other clusters.
public T SilhouetteScore { get; }
Property Value
- T
Remarks
For Beginners: This score helps you understand if your model is grouping similar things together well. It ranges from -1 to 1, where: - Values close to 1 mean your groups (clusters) are well-defined - Values close to 0 mean your groups overlap a lot - Values close to -1 mean some data points might be in the wrong group
VIFList
Gets the Variance Inflation Factor (VIF) for each feature.
public List<T> VIFList { get; }
Property Value
- List<T>
Remarks
For Beginners: VIF helps identify if some features are too similar to others. High VIF values (usually above 5 or 10) suggest that a feature might be redundant, as its information is already captured by other features.
VariationOfInformation
Gets the variation of information between actual and predicted values.
public T VariationOfInformation { get; }
Property Value
- T
Remarks
For Beginners: This measures how different your predictions are from the actual values. Lower values indicate that your predictions are more similar to the actual values. It's particularly useful when comparing different clustering results.
Methods
Empty()
Creates an empty instance of the ModelStats<T> class.
public static ModelStats<T, TInput, TOutput> Empty()
Returns
- ModelStats<T, TInput, TOutput>
An empty ModelStats object.
Remarks
This method creates a ModelStats object with all measures initialized to their default values. It's useful when you need a placeholder or when initializing a ModelStats object before populating it with data.
For Beginners: This is like getting a blank report card. You might use this when: - You're just starting to set up your model evaluation - You want to compare an actual model's stats to a "blank slate" - You're creating a template for future model evaluations
GetMetric(MetricType)
Retrieves the value of a specific metric.
public T GetMetric(MetricType metricType)
Parameters
metricTypeMetricTypeThe type of metric to retrieve.
Returns
- T
The value of the specified metric.
Remarks
For Beginners: This method allows you to get the value of any metric calculated by ModelStats. You specify which metric you want using the MetricType enum, and the method returns its value.
For example, if you want to get the Euclidean Distance, you would call:
T euclideanDistance = modelStats.GetMetric(MetricType.EuclideanDistance);
This is useful when you want to programmatically access different metrics without needing to know the specific property names for each one.
Exceptions
- ArgumentException
Thrown when an unsupported metric type is requested.
HasMetric(MetricType)
Checks if a specific metric is available in this ModelStats instance.
public bool HasMetric(MetricType metricType)
Parameters
metricTypeMetricTypeThe type of metric to check for.
Returns
- bool
True if the metric is available, false otherwise.
Remarks
For Beginners: This method lets you check if a particular metric has been calculated for your model. It's useful when you're not sure if a specific metric is available, especially when working with different types of models or datasets.
For example, if you want to check if the Euclidean Distance is available, you would call:
if (modelStats.HasMetric(MetricType.EuclideanDistance))
{
var distance = modelStats.GetMetric(MetricType.EuclideanDistance);
// Use the distance...
}
This prevents errors that might occur if you try to access a metric that wasn't calculated for your particular model or dataset.