Table of Contents

Class ModelStats<T, TInput, TOutput>

Namespace
AiDotNet.Statistics
Assembly
AiDotNet.dll

Represents a collection of statistical metrics for evaluating and analyzing machine learning models.

public class ModelStats<T, TInput, TOutput>

Type Parameters

T

The numeric type used for calculations (typically float or double).

TInput
TOutput
Inheritance
ModelStats<T, TInput, TOutput>
Inherited Members

Remarks

This class calculates and stores various statistical measures that help assess the performance, fit, and characteristics of a machine learning model. It includes metrics for model accuracy, feature importance, model complexity, and various distance and similarity measures.

For Beginners: Think of ModelStats as a report card for your AI model.

Just like a school report card shows how well a student is doing in different subjects, ModelStats shows how well your AI model is performing in different areas. It helps you:

  • Understand how accurate your model's predictions are
  • See which features (inputs) are most important
  • Check if your model is too simple or too complex
  • Compare your model's performance to simpler alternatives

This information helps you improve your model and decide if it's ready to use in real-world situations.

Properties

Actual

Gets the actual (observed) values from the dataset.

public TOutput Actual { get; }

Property Value

TOutput

AutoCorrelationFunction

Gets the Auto-Correlation Function, which measures the correlation between a time series and a lagged version of itself.

public Vector<T> AutoCorrelationFunction { get; }

Property Value

Vector<T>

Remarks

For Beginners: This function helps you understand patterns in time-based data. It shows how similar your data is to itself at different time delays. This can reveal: - Repeating patterns (like seasonal effects) - How long effects last in your data - If your model is missing important time-based patterns It's particularly useful for time series data, like stock prices or weather patterns.

CalinskiHarabaszIndex

Gets the Calinski-Harabasz index, a measure of cluster separation.

public T CalinskiHarabaszIndex { get; }

Property Value

T

Remarks

For Beginners: This index tells you how well-separated your groups (clusters) are. Higher values mean your groups are more distinct from each other, which is generally better. It's useful when comparing different ways of grouping your data.

ConditionNumber

Gets the condition number, a measure of the model's numerical stability.

public T ConditionNumber { get; }

Property Value

T

Remarks

For Beginners: The condition number tells you if small changes in your data might cause big changes in your model's predictions. A high condition number (typically above 30) suggests that your model might be unstable and sensitive to small data changes.

CorrelationMatrix

Gets the correlation matrix showing relationships between features.

public Matrix<T> CorrelationMatrix { get; }

Property Value

Matrix<T>

Remarks

For Beginners: This matrix shows how closely related your features are to each other. Values close to 1 or -1 mean strong relationships, while values near 0 mean weak relationships. This helps you understand which features might be providing similar information.

CosineSimilarity

Gets the cosine similarity between actual and predicted values.

public T CosineSimilarity { get; }

Property Value

T

Remarks

For Beginners: This measures how similar the direction of your predictions is to the actual values, ignoring their magnitude. Values closer to 1 indicate more similar directions.

CovarianceMatrix

Gets the covariance matrix showing how features vary together.

public Matrix<T> CovarianceMatrix { get; }

Property Value

Matrix<T>

Remarks

For Beginners: This matrix shows how features change together. It's similar to the correlation matrix but uses a different scale. It helps identify patterns in how your features behave together.

DaviesBouldinIndex

Gets the Davies-Bouldin index, a measure of the average similarity between each cluster and its most similar cluster.

public T DaviesBouldinIndex { get; }

Property Value

T

Remarks

For Beginners: This index helps you understand how well-separated your groups (clusters) are. Lower values are better, meaning your groups are more distinct from each other. It's particularly useful when you're not sure how many groups to divide your data into.

EffectiveNumberOfParameters

Gets the effective number of parameters in the model.

public T EffectiveNumberOfParameters { get; }

Property Value

T

Remarks

For Beginners: This estimates how complex your model is in practice. It might be different from the actual number of parameters and helps identify if your model is overfitting (using more complexity than needed to explain the data).

EuclideanDistance

Gets the Euclidean distance between actual and predicted values.

public T EuclideanDistance { get; }

Property Value

T

Remarks

For Beginners: This measures the straight-line distance between your actual and predicted values. Lower values indicate predictions that are closer to the actual values.

FeatureCount

Gets the number of features (input variables) used in the model.

public int FeatureCount { get; }

Property Value

int

Remarks

For Beginners: This is the number of different pieces of information your model uses to make predictions. For example, if you're predicting house prices, features might include size, number of bedrooms, location, etc.

FeatureNames

Gets the names of the features used in the model.

public List<string> FeatureNames { get; }

Property Value

List<string>

FeatureValues

Gets a dictionary mapping feature names to their values.

public Dictionary<string, TOutput> FeatureValues { get; }

Property Value

Dictionary<string, TOutput>

Features

Gets the feature values used in the model.

public TInput Features { get; }

Property Value

TInput

HammingDistance

Gets the Hamming distance between actual and predicted values.

public T HammingDistance { get; }

Property Value

T

Remarks

For Beginners: This counts how many predictions are different from the actual values. It's most useful for categorical predictions. Lower values indicate fewer differences.

JaccardSimilarity

Gets the Jaccard similarity between actual and predicted values.

public T JaccardSimilarity { get; }

Property Value

T

Remarks

For Beginners: This measures the overlap between your actual and predicted values. It's especially useful for binary (yes/no) predictions. Values closer to 1 indicate more overlap.

LeaveOneOutPredictiveDensities

Gets the leave-one-out predictive densities for each data point.

public List<T> LeaveOneOutPredictiveDensities { get; }

Property Value

List<T>

Remarks

For Beginners: This shows how well the model predicts each data point when it's trained without that point. It helps identify which data points might be harder for the model to predict accurately.

LogLikelihood

Gets the log-likelihood of the model.

public T LogLikelihood { get; }

Property Value

T

Remarks

For Beginners: This measures how probable your data is under your model. Higher values mean your model fits the data better. It's often used in more advanced statistical techniques.

LogPointwisePredictiveDensity

Gets the log pointwise predictive density, a measure of prediction accuracy.

public T LogPointwisePredictiveDensity { get; }

Property Value

T

Remarks

For Beginners: This is a way to measure how well your model's predictions match the actual data. Higher values generally indicate better predictions. It's particularly useful when comparing different models.

MahalanobisDistance

Gets the Mahalanobis distance between actual and predicted values.

public T MahalanobisDistance { get; }

Property Value

T

Remarks

For Beginners: This is an advanced distance measure that takes into account how your features are related to each other. It can be more meaningful than simpler distances when your features are correlated.

ManhattanDistance

Gets the Manhattan distance between actual and predicted values.

public T ManhattanDistance { get; }

Property Value

T

Remarks

For Beginners: This measures the distance between actual and predicted values as if you could only move horizontally or vertically (like navigating city blocks). Lower values indicate better predictions.

MarginalLikelihood

Gets the marginal likelihood of the model.

public T MarginalLikelihood { get; }

Property Value

T

Remarks

For Beginners: This is a measure of how well your model fits the data, taking into account its complexity. It helps in comparing different models, with higher values generally indicating better models.

MeanAveragePrecision

Gets the Mean Average Precision, a measure of ranking quality.

public T MeanAveragePrecision { get; }

Property Value

T

Remarks

For Beginners: This measures how well your model ranks items, especially in search or recommendation systems. It ranges from 0 to 1, where 1 is perfect. It considers both the order of your predictions and their accuracy. For example, in a search engine, it would measure how well the most relevant results are placed at the top.

MeanReciprocalRank

Gets the Mean Reciprocal Rank, a statistic measuring the performance of a system that produces a list of possible responses to a query.

public T MeanReciprocalRank { get; }

Property Value

T

Remarks

For Beginners: This measures how well your model places the first correct answer in a list of predictions. It ranges from 0 to 1, where 1 means the correct answer is always first. It's often used in question-answering systems or search engines to measure how quickly a user might find the right answer.

Model

Gets the full model being evaluated.

public IFullModel<T, TInput, TOutput>? Model { get; }

Property Value

IFullModel<T, TInput, TOutput>

MutualInformation

Gets the mutual information between actual and predicted values.

public T MutualInformation { get; }

Property Value

T

Remarks

For Beginners: This measures how much information your predictions provide about the actual values. Higher values mean your predictions are more informative and closely related to the actual values.

NormalizedDiscountedCumulativeGain

Gets the Normalized Discounted Cumulative Gain, a measure of ranking quality that takes the position of correct items into account.

public T NormalizedDiscountedCumulativeGain { get; }

Property Value

T

Remarks

For Beginners: This measures how well your model ranks items, giving more importance to correct predictions at the top of the list. It ranges from 0 to 1, where 1 is perfect. It's often used in search engines or recommendation systems to ensure the most relevant items appear first.

NormalizedMutualInformation

Gets the normalized mutual information between actual and predicted values.

public T NormalizedMutualInformation { get; }

Property Value

T

Remarks

For Beginners: This is similar to mutual information, but scaled to be between 0 and 1. It's easier to interpret across different datasets. Values closer to 1 indicate better predictions.

ObservedTestStatistic

Gets the observed test statistic for model evaluation.

public T ObservedTestStatistic { get; }

Property Value

T

Remarks

For Beginners: This is a single number that summarizes how well your model fits the data. It's used in statistical tests to determine if your model is significantly better than a simpler alternative.

PartialAutoCorrelationFunction

Gets the Partial Auto-Correlation Function, which measures the direct relationship between an observation and its lag.

public Vector<T> PartialAutoCorrelationFunction { get; }

Property Value

Vector<T>

Remarks

For Beginners: This function is similar to the Auto-Correlation Function, but it focuses on the direct relationship between data points at different time delays. It helps you: - Identify how many past time points directly influence the current point - Decide how many past observations to use in time series models - Understand the "memory" of your time series data It's often used in more advanced time series analysis and forecasting.

PosteriorPredictiveSamples

Gets samples from the posterior predictive distribution.

public List<T> PosteriorPredictiveSamples { get; }

Property Value

List<T>

Remarks

For Beginners: These are possible predictions your model might make if you ran it multiple times. They help you understand the range and uncertainty of your model's predictions.

Predicted

Gets the predicted values from the model.

public TOutput Predicted { get; }

Property Value

TOutput

ReferenceModelMarginalLikelihood

Gets the marginal likelihood of a reference (simpler) model.

public T ReferenceModelMarginalLikelihood { get; }

Property Value

T

Remarks

For Beginners: This is the marginal likelihood for a basic, simple model. It's used as a comparison point to see how much better your more complex model performs.

SilhouetteScore

Gets the silhouette score, a measure of how similar an object is to its own cluster compared to other clusters.

public T SilhouetteScore { get; }

Property Value

T

Remarks

For Beginners: This score helps you understand if your model is grouping similar things together well. It ranges from -1 to 1, where: - Values close to 1 mean your groups (clusters) are well-defined - Values close to 0 mean your groups overlap a lot - Values close to -1 mean some data points might be in the wrong group

VIFList

Gets the Variance Inflation Factor (VIF) for each feature.

public List<T> VIFList { get; }

Property Value

List<T>

Remarks

For Beginners: VIF helps identify if some features are too similar to others. High VIF values (usually above 5 or 10) suggest that a feature might be redundant, as its information is already captured by other features.

VariationOfInformation

Gets the variation of information between actual and predicted values.

public T VariationOfInformation { get; }

Property Value

T

Remarks

For Beginners: This measures how different your predictions are from the actual values. Lower values indicate that your predictions are more similar to the actual values. It's particularly useful when comparing different clustering results.

Methods

Empty()

Creates an empty instance of the ModelStats<T> class.

public static ModelStats<T, TInput, TOutput> Empty()

Returns

ModelStats<T, TInput, TOutput>

An empty ModelStats object.

Remarks

This method creates a ModelStats object with all measures initialized to their default values. It's useful when you need a placeholder or when initializing a ModelStats object before populating it with data.

For Beginners: This is like getting a blank report card. You might use this when: - You're just starting to set up your model evaluation - You want to compare an actual model's stats to a "blank slate" - You're creating a template for future model evaluations

GetMetric(MetricType)

Retrieves the value of a specific metric.

public T GetMetric(MetricType metricType)

Parameters

metricType MetricType

The type of metric to retrieve.

Returns

T

The value of the specified metric.

Remarks

For Beginners: This method allows you to get the value of any metric calculated by ModelStats. You specify which metric you want using the MetricType enum, and the method returns its value.

For example, if you want to get the Euclidean Distance, you would call:

T euclideanDistance = modelStats.GetMetric(MetricType.EuclideanDistance);

This is useful when you want to programmatically access different metrics without needing to know the specific property names for each one.

Exceptions

ArgumentException

Thrown when an unsupported metric type is requested.

HasMetric(MetricType)

Checks if a specific metric is available in this ModelStats instance.

public bool HasMetric(MetricType metricType)

Parameters

metricType MetricType

The type of metric to check for.

Returns

bool

True if the metric is available, false otherwise.

Remarks

For Beginners: This method lets you check if a particular metric has been calculated for your model. It's useful when you're not sure if a specific metric is available, especially when working with different types of models or datasets.

For example, if you want to check if the Euclidean Distance is available, you would call:

if (modelStats.HasMetric(MetricType.EuclideanDistance))
{
    var distance = modelStats.GetMetric(MetricType.EuclideanDistance);
    // Use the distance...
}

This prevents errors that might occur if you try to access a metric that wasn't calculated for your particular model or dataset.