Class VIFFitDetectorOptions

Namespace: AiDotNet.Models.Options

Assembly: AiDotNet.dll

Configuration options for detecting multicollinearity in regression models using Variance Inflation Factor (VIF) analysis.

public class VIFFitDetectorOptions

Inheritance: object

VIFFitDetectorOptions

Inherited Members: object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

Variance Inflation Factor (VIF) is a statistical measure used to detect the severity of multicollinearity in regression analysis. Multicollinearity occurs when independent variables in a regression model are highly correlated with each other, which can lead to unstable and unreliable coefficient estimates. VIF quantifies how much the variance of an estimated regression coefficient is increased due to collinearity with other predictors. This class provides configuration options for thresholds used to interpret VIF values and detect problematic levels of multicollinearity in regression models. These thresholds help automate the process of model evaluation and variable selection.

For Beginners: This class helps you detect when predictor variables in your model are too closely related to each other.

When building regression models:

Multicollinearity occurs when predictor variables are highly correlated with each other
This can make your model unstable and difficult to interpret
Coefficients may change dramatically with small changes in the data
Standard errors of coefficients become inflated

VIF (Variance Inflation Factor):

Measures how much the variance of a coefficient is increased due to multicollinearity
Higher VIF values indicate more severe multicollinearity
VIF = 1 means no multicollinearity
VIF > 1 indicates some degree of multicollinearity

This class provides thresholds to automatically detect problematic levels of multicollinearity in your models, helping you identify when you should consider removing or combining variables.

Properties

GoodFitThreshold

Gets or sets the threshold for determining a good fit in terms of the primary metric.

public double GoodFitThreshold { get; set; }

Property Value

double: A double value between 0 and 1, defaulting to 0.7.

Remarks

This property specifies the minimum value of the primary metric (typically R² or adjusted R²) required for the model to be considered a good fit. The primary metric is specified by the PrimaryMetric property. For R² and similar metrics, higher values indicate better fit, with 1.0 representing a perfect fit and 0.0 representing no fit. The default value of 0.7 indicates that the model should explain at least 70% of the variance in the dependent variable to be considered a good fit. A higher threshold is more strict, requiring better model performance, while a lower threshold is more lenient. The appropriate value depends on the specific application and the typical range of the primary metric in the field of study.

For Beginners: This setting determines how well your model must perform to be considered good.

The good fit threshold:

Defines the minimum acceptable value for your primary performance metric
For R², it represents how much variance your model explains
Helps you automatically evaluate if your model performs well enough

The default value of 0.7 means:

For R², the model should explain at least 70% of the variance
This is a moderate threshold suitable for many applications

Think of it like this:

Higher values (e.g., 0.8): More strict, requires better model performance
Lower values (e.g., 0.5): More lenient, accepts models with more unexplained variance

When to adjust this value:

Increase it in fields where high predictive accuracy is expected
Decrease it for problems where even modest predictive power is valuable
Adjust based on the typical R² values in your specific field

For example, in physical sciences where relationships are often well-defined, you might increase this to 0.8 or 0.9, while in social sciences or complex behavioral predictions, you might decrease it to 0.5 or 0.6.

ModerateMulticollinearityThreshold

Gets or sets the threshold for detecting moderate multicollinearity.

public double ModerateMulticollinearityThreshold { get; set; }

Property Value

double: A positive double value, defaulting to 5.0.

Remarks

This property specifies the VIF threshold above which multicollinearity is considered moderate. Moderate multicollinearity indicates that there is some correlation among predictor variables that may affect the stability of the regression coefficients, but not necessarily to a degree that requires immediate action. When a predictor has a VIF value between this threshold and the SevereMulticollinearityThreshold, it might be worth monitoring or investigating further, but it may not require removal from the model. The default value of 5.0 is a commonly used threshold in statistical practice, representing a middle ground between no multicollinearity and severe multicollinearity. A lower threshold is more strict, flagging more variables as having moderate multicollinearity, while a higher threshold is more lenient.

For Beginners: This setting determines when multicollinearity should raise some concern.

Moderate multicollinearity:

Indicates predictor variables have enough correlation to potentially affect your model
May warrant further investigation but doesn't necessarily require immediate action
Can make it harder to determine the individual importance of correlated predictors

The default value of 5.0 means:

VIF values between 5 and 10 indicate moderate multicollinearity
This represents a level where you should be aware of potential issues

Think of it like this:

Lower values (e.g., 2.5): More strict, flags more variables as moderately multicollinear
Higher values (e.g., 7.5): More lenient, fewer variables will be flagged

When to adjust this value:

Decrease it when you want to be more cautious about potential multicollinearity
Increase it when you're more concerned about severe cases than moderate ones
Consider using different thresholds for exploratory versus confirmatory analyses

For example, in an exploratory data analysis where you want to be alerted to potential issues, you might decrease this to 2.5 to catch more potential multicollinearity problems early.

PrimaryMetric

Gets or sets the primary metric used to evaluate model fit.

public MetricType PrimaryMetric { get; set; }

Property Value

MetricType: A value from the MetricType enumeration, defaulting to MetricType.R2.

Remarks

This property specifies which metric is used as the primary criterion for evaluating model fit. The most common metric is R² (coefficient of determination), which measures the proportion of variance in the dependent variable that is predictable from the independent variables. Other possible metrics might include adjusted R² (which adjusts for the number of predictors), mean squared error (MSE), or information criteria such as AIC or BIC. The default value of MetricType.R2 specifies R² as the primary metric, which is appropriate for many applications. The optimal choice depends on the specific goals of the analysis and the characteristics of the data.

For Beginners: This setting determines which statistical measure is used to evaluate your model's performance.

The primary metric:

Specifies which measure is used to evaluate how well your model fits the data
Different metrics emphasize different aspects of model performance
Works with GoodFitThreshold to determine if your model performs well enough

The default value of R2 means:

The coefficient of determination (R²) is used as the primary metric
R² measures the proportion of variance explained by your model
Values range from 0 (no explanation) to 1 (perfect explanation)

Common alternatives include:

AdjustedR2: Similar to R² but penalizes adding unnecessary predictors
MSE: Mean Squared Error, measures the average squared difference between predictions and actual values
AIC/BIC: Information criteria that balance fit and complexity

When to adjust this value:

Change to AdjustedR2 when comparing models with different numbers of predictors
Change to error-based metrics (like MSE) when prediction accuracy is more important than explanation
Consider your specific goals (explanation vs. prediction) when choosing

For example, if you're comparing models with different numbers of variables, you might change this to MetricType.AdjustedR2 to account for model complexity.

SevereMulticollinearityThreshold

Gets or sets the threshold for detecting severe multicollinearity.

public double SevereMulticollinearityThreshold { get; set; }

Property Value

double: A positive double value, defaulting to 10.0.

Remarks

This property specifies the VIF threshold above which multicollinearity is considered severe. Severe multicollinearity indicates that the predictor variables are so highly correlated that the regression coefficients are likely to be very unstable and the model may be unreliable. When a predictor has a VIF value exceeding this threshold, it is often recommended to remove it from the model or to combine it with other correlated predictors. The default value of 10.0 is a commonly used threshold in statistical practice, though some fields may use more or less conservative values depending on the specific application. A lower threshold is more strict, flagging more variables as having severe multicollinearity, while a higher threshold is more lenient.

For Beginners: This setting determines when multicollinearity is considered a serious problem.

Severe multicollinearity:

Indicates predictor variables are so highly correlated that the model is likely unreliable
Often requires intervention such as removing variables or using regularization techniques
Can lead to coefficients with incorrect signs or implausible magnitudes

The default value of 10.0 means:

VIF values above 10 indicate severe multicollinearity
This is a widely accepted threshold in statistical practice

Think of it like this:

Lower values (e.g., 5.0): More strict, flags more variables as severely multicollinear
Higher values (e.g., 20.0): More lenient, only flags the most extreme cases

When to adjust this value:

Decrease it in fields where precise coefficient estimates are critical
Increase it when working with naturally correlated predictors where some multicollinearity is expected
Consider domain-specific standards in your field

For example, in medical research where precise effect estimates are crucial, you might decrease this to 5.0 to be more conservative about multicollinearity.

Table of Contents

Class VIFFitDetectorOptions

Remarks

Properties

GoodFitThreshold

Property Value

Remarks

ModerateMulticollinearityThreshold

Property Value

Remarks

PrimaryMetric

Property Value

Remarks

SevereMulticollinearityThreshold

Property Value

Remarks