Class PrincipalComponentRegressionOptions<T>

Namespace: AiDotNet.Models.Options

Assembly: AiDotNet.dll

Configuration options for Principal Component Regression (PCR), which combines principal component analysis with linear regression to address multicollinearity and dimensionality issues in regression problems.

public class PrincipalComponentRegressionOptions<T> : RegressionOptions<T>

Type Parameters

T

Inheritance: object

ModelOptions

RegressionOptions<T>

PrincipalComponentRegressionOptions<T>

Inherited Members: RegressionOptions<T>.DecompositionMethod

RegressionOptions<T>.UseIntercept

ModelOptions.Seed

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

Principal Component Regression (PCR) is a two-step technique that first uses Principal Component Analysis (PCA) to reduce the dimensionality of the feature space, and then performs linear regression on the resulting principal components. This approach is particularly valuable when dealing with datasets where the predictor variables are highly correlated (multicollinearity) or when the number of predictors is large relative to the number of observations. By transforming the original features into uncorrelated principal components, PCR mitigates issues such as model instability and overfitting that can arise in standard regression. The reduced dimensionality also improves computational efficiency and interpretability. PCR is widely used in fields such as spectroscopy, chemometrics, bioinformatics, and econometrics, where high-dimensional, correlated data is common.

For Beginners: Principal Component Regression helps solve problems with complex, highly related data.

Imagine you're trying to predict house prices with 50 different variables:

Many of these variables are strongly related (like number of rooms, square footage, number of bathrooms)
Using all these related variables directly can confuse your model
The model might become unstable or "overfit" to your training data

What Principal Component Regression does:

Step 1: Principal Component Analysis (PCA)

It combines your original variables into new "super variables" called principal components
Each component captures a different pattern in your data
The first component captures the strongest pattern, the second component the next strongest, and so on
These components are completely unrelated to each other (uncorrelated)

Step 2: Regression

Instead of using your original 50 variables, it uses the top principal components
This makes your model more stable and often more accurate

Think of it like cooking:

Your original variables are like individual spices
PCA combines these into a few special spice mixes (components)
Your recipe now uses these few special mixes instead of dozens of individual spices
This makes cooking (modeling) simpler and often gives better results

This class lets you configure how many components to use and how much information to retain.

Properties

ExplainedVarianceRatio

Gets or sets the minimum ratio of variance to be explained by the selected principal components.

public double ExplainedVarianceRatio { get; set; }

Property Value

double: The explained variance ratio threshold, defaulting to 0.95 (95%).

Remarks

This parameter determines the cumulative proportion of variance that should be explained by the selected principal components when automatic component selection is used (NumComponents = 0). The algorithm will select the minimum number of components needed to reach this threshold. For example, a value of 0.95 means that enough components will be included to explain at least 95% of the total variance in the original features. This approach provides a data-driven method for dimensionality reduction that adapts to the characteristics of each dataset. Higher values preserve more information but reduce the dimensionality benefit, while lower values provide more aggressive dimensionality reduction but may lose important signal.

For Beginners: This setting determines how much of the original information should be retained when automatically selecting components.

The default value of 0.95 means:

The automatically selected components should capture at least 95% of the information in your original data
The remaining 5% is considered less important and can be discarded
This provides a good balance between simplification and information preservation

Think of it like compressing a photo:

A value of 1.0 would keep 100% of the details (no real compression)
A value of 0.95 might remove subtle details but keep the image looking very good
A value of 0.80 would compress more aggressively, losing some visible details
A value of 0.50 would lose significant details, leaving only the main elements

You might want a higher value (like 0.99):

When preserving maximum information is critical
When you want to ensure subtle patterns aren't lost
When you have plenty of computational resources
For exploratory analysis where you want to retain most of the data's complexity

You might want a lower value (like 0.90 or 0.80):

When you need more aggressive dimensionality reduction
When you suspect some of the variance in your data is just noise
When you're dealing with very high-dimensional data
When simpler models are preferred for interpretability or computational efficiency

Note: This parameter is only used when NumComponents = 0. If NumComponents is set to a positive value, ExplainedVarianceRatio is ignored.

NumComponents

Gets or sets the number of principal components to use in the regression model.

public int NumComponents { get; set; }

Property Value

int: The number of components, defaulting to 0 (auto-selection based on explained variance).

Remarks

This parameter specifies the exact number of principal components to retain for the regression step. When set to a positive integer, the algorithm will use exactly that many components, regardless of how much variance they explain. A value of 0 (the default) indicates that the number of components should be automatically determined based on the ExplainedVarianceRatio parameter. Setting a specific number of components gives precise control over model complexity but requires domain knowledge or cross-validation to determine the optimal value. Using too few components may lead to underfitting, while using too many may reintroduce the issues of multicollinearity and overfitting that PCR aims to address.

For Beginners: This setting controls exactly how many principal components to use in your regression model.

The default value of 0 means:

The system will automatically choose the number of components
It will select enough components to explain the percentage of variance specified in ExplainedVarianceRatio
This automatic selection is usually a good starting point

Think of principal components like summarizing a long book:

The first component captures the main plot
The second component adds important subplots
Additional components add more and more details
At some point, additional components add very little meaningful information

You might want to specify a certain number (like 5 or 10):

When you have expert knowledge about how many underlying factors influence your data
When you've used cross-validation to determine the optimal number
When you want to ensure consistent model structure across different datasets
When you need to control model complexity precisely

You might want to keep it at 0 (automatic):

When you're not sure how many components you need
When you want the model to adapt to different datasets
When you prefer to specify how much variance to capture (via ExplainedVarianceRatio)

Note: When NumComponents is 0, the ExplainedVarianceRatio parameter determines how many components are used. When NumComponents is positive, ExplainedVarianceRatio is ignored.

Table of Contents

Class PrincipalComponentRegressionOptions<T>

Type Parameters

Remarks

Properties

ExplainedVarianceRatio

Property Value

Remarks

NumComponents

Property Value

Remarks