Table of Contents

Class PrincipalComponentRegression<T>

Namespace
AiDotNet.Regression
Assembly
AiDotNet.dll

Implements Principal Component Regression (PCR), a technique that combines principal component analysis (PCA) with linear regression to handle multicollinearity in the predictor variables.

public class PrincipalComponentRegression<T> : RegressionBase<T>, IRegression<T>, IFullModel<T, Matrix<T>, Vector<T>>, IModel<Matrix<T>, Vector<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Matrix<T>, Vector<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Matrix<T>, Vector<T>>>, IGradientComputable<T, Matrix<T>, Vector<T>>, IJitCompilable<T>

Type Parameters

T

The numeric data type used for calculations (e.g., float, double).

Inheritance
PrincipalComponentRegression<T>
Implements
IFullModel<T, Matrix<T>, Vector<T>>
IModel<Matrix<T>, Vector<T>, ModelMetadata<T>>
IParameterizable<T, Matrix<T>, Vector<T>>
ICloneable<IFullModel<T, Matrix<T>, Vector<T>>>
IGradientComputable<T, Matrix<T>, Vector<T>>
Inherited Members
Extension Methods

Remarks

Principal Component Regression works by first performing principal component analysis (PCA) on the predictor variables to reduce their dimensionality, then using these principal components as predictors in a linear regression model. This approach is particularly useful when dealing with multicollinearity (high correlation among predictor variables) or when the number of predictors is large relative to the number of observations.

The algorithm first centers and scales the data, performs PCA to extract principal components, selects a subset of these components based on either a fixed number or explained variance ratio, and then performs linear regression using the selected components.

For Beginners: Think of PCR as a two-step process: first, it finds the most important patterns in your input data (principal components), then it uses these patterns instead of the original variables to build a regression model. This can help when your original variables are highly related to each other (multicollinear), which can cause problems in standard regression.

Constructors

PrincipalComponentRegression(PrincipalComponentRegressionOptions<T>?, IRegularization<T, Matrix<T>, Vector<T>>?)

Initializes a new instance of the PrincipalComponentRegression class with the specified options and regularization.

public PrincipalComponentRegression(PrincipalComponentRegressionOptions<T>? options = null, IRegularization<T, Matrix<T>, Vector<T>>? regularization = null)

Parameters

options PrincipalComponentRegressionOptions<T>

Configuration options for the PCR model. If null, default options will be used.

regularization IRegularization<T, Matrix<T>, Vector<T>>

Regularization method to prevent overfitting. If null, no regularization will be applied.

Remarks

The constructor initializes the model with either the provided options or default settings.

For Beginners: This constructor sets up the PCR model with your specified settings or uses default settings if none are provided. Regularization is an optional technique to prevent the model from becoming too complex and overfitting to the training data.

Methods

CalculateFeatureImportances()

Calculates the importance of each feature in the model.

protected override Vector<T> CalculateFeatureImportances()

Returns

Vector<T>

A vector containing the importance score for each feature.

Remarks

This method calculates feature importances based on the absolute values of the regression coefficients. Larger absolute values indicate more important features.

For Beginners: Feature importance tells you which input variables have the most influence on the predictions. In PCR, this is calculated based on the magnitude (absolute value) of each coefficient in the model. Features with larger coefficient magnitudes have a stronger effect on the predictions and are considered more important.

CreateNewInstance()

Creates a new instance of the Principal Component Regression model with the same configuration.

protected override IFullModel<T, Matrix<T>, Vector<T>> CreateNewInstance()

Returns

IFullModel<T, Matrix<T>, Vector<T>>

A new instance of the Principal Component Regression model.

Remarks

This method creates a deep copy of the current Principal Component Regression model, including its options, principal components, coefficients, intercept, and preprocessing parameters (means and standard deviations). The new instance is completely independent of the original, allowing modifications without affecting the original model.

For Beginners: This method creates an exact copy of your trained model.

Think of it like making a perfect copy of your regression model:

  • It duplicates all the configuration settings (like how many components to use)
  • It copies the learned principal components (the patterns found in your data)
  • It preserves the coefficients and intercept (the actual formula for making predictions)
  • It maintains all the scaling information (means and standard deviations) needed to process new data

Creating a copy is useful when you want to:

  • Create a backup before further modifying the model
  • Create variations of the same model for different purposes
  • Share the model with others while keeping your original intact

Exceptions

InvalidOperationException

Thrown when the creation fails or required components are null.

Deserialize(byte[])

Deserializes the model from a byte array.

public override void Deserialize(byte[] modelData)

Parameters

modelData byte[]

The byte array containing the serialized model data.

Remarks

This method reconstructs the model's parameters from a serialized byte array, including base class data and PCR-specific data such as options, principal components, means, and standard deviations.

For Beginners: Deserialization is the opposite of serialization - it takes the saved model data and reconstructs the model's internal state. This allows you to load a previously trained model and use it to make predictions without having to retrain it. It's like loading a saved game to continue where you left off.

GetModelMetadata()

Gets metadata about the model.

public override ModelMetadata<T> GetModelMetadata()

Returns

ModelMetadata<T>

A ModelMetadata object containing information about the model.

Remarks

This method returns metadata about the model, including its type, coefficients, principal components, number of components used, and feature importance.

For Beginners: Model metadata provides information about the model itself, rather than the predictions it makes. This includes details about how the model is configured (like how many components it uses) and information about the importance of different features. This can help you understand which input variables are most influential in making predictions.

GetModelType()

Gets the type of the model.

protected override ModelType GetModelType()

Returns

ModelType

The model type identifier for principal component regression.

Remarks

This method is used for model identification and serialization purposes.

For Beginners: This method simply returns an identifier that indicates this is a principal component regression model. It's used internally by the library to keep track of different types of models.

Predict(Matrix<T>)

Makes predictions for the given input data.

public override Vector<T> Predict(Matrix<T> input)

Parameters

input Matrix<T>

The input features matrix where each row is an example and each column is a feature.

Returns

Vector<T>

A vector of predicted values for each input example.

Remarks

This method scales the input data using the means and standard deviations from the training data, applies the regression coefficients, and adjusts the predictions back to the original scale.

For Beginners: After training, this method is used to make predictions on new data. It first scales your input data the same way the training data was scaled, then applies the learned model to calculate the predicted values. Finally, it transforms the predictions back to the original scale of your target variable.

Serialize()

Serializes the model to a byte array.

public override byte[] Serialize()

Returns

byte[]

A byte array containing the serialized model data.

Remarks

This method serializes the model's parameters, including base class data and PCR-specific data such as options, principal components, means, and standard deviations.

For Beginners: Serialization converts the model's internal state into a format that can be saved to disk or transmitted over a network. This allows you to save a trained model and load it later without having to retrain it. Think of it like saving your progress in a video game.

Train(Matrix<T>, Vector<T>)

Trains the principal component regression model on the provided data.

public override void Train(Matrix<T> x, Vector<T> y)

Parameters

x Matrix<T>

The input features matrix where each row is a training example and each column is a feature.

y Vector<T>

The target values vector corresponding to each training example.

Remarks

This method performs the following steps: 1. Validates the input data 2. Centers and scales the data 3. Performs principal component analysis (PCA) on the predictor variables 4. Selects the appropriate number of principal components 5. Projects the data onto the selected principal components 6. Performs linear regression on the projected data 7. Transforms the coefficients back to the original space 8. Applies regularization to the coefficients 9. Adjusts for scaling and calculates the intercept

For Beginners: Training is the process where the model learns from your data. The PCR algorithm first centers and scales your data (makes all variables have similar ranges), then finds the most important patterns (principal components) in your input features. It selects a subset of these patterns based on your settings, and uses them to build a regression model. Finally, it converts the model back to work with your original variables.