Class PrincipalComponentRegression<T>
- Namespace
- AiDotNet.Regression
- Assembly
- AiDotNet.dll
Implements Principal Component Regression (PCR), a technique that combines principal component analysis (PCA) with linear regression to handle multicollinearity in the predictor variables.
public class PrincipalComponentRegression<T> : RegressionBase<T>, IRegression<T>, IFullModel<T, Matrix<T>, Vector<T>>, IModel<Matrix<T>, Vector<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Matrix<T>, Vector<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Matrix<T>, Vector<T>>>, IGradientComputable<T, Matrix<T>, Vector<T>>, IJitCompilable<T>
Type Parameters
TThe numeric data type used for calculations (e.g., float, double).
- Inheritance
-
PrincipalComponentRegression<T>
- Implements
-
IRegression<T>
- Inherited Members
- Extension Methods
Remarks
Principal Component Regression works by first performing principal component analysis (PCA) on the predictor variables to reduce their dimensionality, then using these principal components as predictors in a linear regression model. This approach is particularly useful when dealing with multicollinearity (high correlation among predictor variables) or when the number of predictors is large relative to the number of observations.
The algorithm first centers and scales the data, performs PCA to extract principal components, selects a subset of these components based on either a fixed number or explained variance ratio, and then performs linear regression using the selected components.
For Beginners: Think of PCR as a two-step process: first, it finds the most important patterns in your input data (principal components), then it uses these patterns instead of the original variables to build a regression model. This can help when your original variables are highly related to each other (multicollinear), which can cause problems in standard regression.
Constructors
PrincipalComponentRegression(PrincipalComponentRegressionOptions<T>?, IRegularization<T, Matrix<T>, Vector<T>>?)
Initializes a new instance of the PrincipalComponentRegression class with the specified options and regularization.
public PrincipalComponentRegression(PrincipalComponentRegressionOptions<T>? options = null, IRegularization<T, Matrix<T>, Vector<T>>? regularization = null)
Parameters
optionsPrincipalComponentRegressionOptions<T>Configuration options for the PCR model. If null, default options will be used.
regularizationIRegularization<T, Matrix<T>, Vector<T>>Regularization method to prevent overfitting. If null, no regularization will be applied.
Remarks
The constructor initializes the model with either the provided options or default settings.
For Beginners: This constructor sets up the PCR model with your specified settings or uses default settings if none are provided. Regularization is an optional technique to prevent the model from becoming too complex and overfitting to the training data.
Methods
CalculateFeatureImportances()
Calculates the importance of each feature in the model.
protected override Vector<T> CalculateFeatureImportances()
Returns
- Vector<T>
A vector containing the importance score for each feature.
Remarks
This method calculates feature importances based on the absolute values of the regression coefficients. Larger absolute values indicate more important features.
For Beginners: Feature importance tells you which input variables have the most influence on the predictions. In PCR, this is calculated based on the magnitude (absolute value) of each coefficient in the model. Features with larger coefficient magnitudes have a stronger effect on the predictions and are considered more important.
CreateNewInstance()
Creates a new instance of the Principal Component Regression model with the same configuration.
protected override IFullModel<T, Matrix<T>, Vector<T>> CreateNewInstance()
Returns
- IFullModel<T, Matrix<T>, Vector<T>>
A new instance of the Principal Component Regression model.
Remarks
This method creates a deep copy of the current Principal Component Regression model, including its options, principal components, coefficients, intercept, and preprocessing parameters (means and standard deviations). The new instance is completely independent of the original, allowing modifications without affecting the original model.
For Beginners: This method creates an exact copy of your trained model.
Think of it like making a perfect copy of your regression model:
- It duplicates all the configuration settings (like how many components to use)
- It copies the learned principal components (the patterns found in your data)
- It preserves the coefficients and intercept (the actual formula for making predictions)
- It maintains all the scaling information (means and standard deviations) needed to process new data
Creating a copy is useful when you want to:
- Create a backup before further modifying the model
- Create variations of the same model for different purposes
- Share the model with others while keeping your original intact
Exceptions
- InvalidOperationException
Thrown when the creation fails or required components are null.
Deserialize(byte[])
Deserializes the model from a byte array.
public override void Deserialize(byte[] modelData)
Parameters
modelDatabyte[]The byte array containing the serialized model data.
Remarks
This method reconstructs the model's parameters from a serialized byte array, including base class data and PCR-specific data such as options, principal components, means, and standard deviations.
For Beginners: Deserialization is the opposite of serialization - it takes the saved model data and reconstructs the model's internal state. This allows you to load a previously trained model and use it to make predictions without having to retrain it. It's like loading a saved game to continue where you left off.
GetModelMetadata()
Gets metadata about the model.
public override ModelMetadata<T> GetModelMetadata()
Returns
- ModelMetadata<T>
A ModelMetadata object containing information about the model.
Remarks
This method returns metadata about the model, including its type, coefficients, principal components, number of components used, and feature importance.
For Beginners: Model metadata provides information about the model itself, rather than the predictions it makes. This includes details about how the model is configured (like how many components it uses) and information about the importance of different features. This can help you understand which input variables are most influential in making predictions.
GetModelType()
Gets the type of the model.
protected override ModelType GetModelType()
Returns
- ModelType
The model type identifier for principal component regression.
Remarks
This method is used for model identification and serialization purposes.
For Beginners: This method simply returns an identifier that indicates this is a principal component regression model. It's used internally by the library to keep track of different types of models.
Predict(Matrix<T>)
Makes predictions for the given input data.
public override Vector<T> Predict(Matrix<T> input)
Parameters
inputMatrix<T>The input features matrix where each row is an example and each column is a feature.
Returns
- Vector<T>
A vector of predicted values for each input example.
Remarks
This method scales the input data using the means and standard deviations from the training data, applies the regression coefficients, and adjusts the predictions back to the original scale.
For Beginners: After training, this method is used to make predictions on new data. It first scales your input data the same way the training data was scaled, then applies the learned model to calculate the predicted values. Finally, it transforms the predictions back to the original scale of your target variable.
Serialize()
Serializes the model to a byte array.
public override byte[] Serialize()
Returns
- byte[]
A byte array containing the serialized model data.
Remarks
This method serializes the model's parameters, including base class data and PCR-specific data such as options, principal components, means, and standard deviations.
For Beginners: Serialization converts the model's internal state into a format that can be saved to disk or transmitted over a network. This allows you to save a trained model and load it later without having to retrain it. Think of it like saving your progress in a video game.
Train(Matrix<T>, Vector<T>)
Trains the principal component regression model on the provided data.
public override void Train(Matrix<T> x, Vector<T> y)
Parameters
xMatrix<T>The input features matrix where each row is a training example and each column is a feature.
yVector<T>The target values vector corresponding to each training example.
Remarks
This method performs the following steps: 1. Validates the input data 2. Centers and scales the data 3. Performs principal component analysis (PCA) on the predictor variables 4. Selects the appropriate number of principal components 5. Projects the data onto the selected principal components 6. Performs linear regression on the projected data 7. Transforms the coefficients back to the original space 8. Applies regularization to the coefficients 9. Adjusts for scaling and calculates the intercept
For Beginners: Training is the process where the model learns from your data. The PCR algorithm first centers and scales your data (makes all variables have similar ranges), then finds the most important patterns (principal components) in your input features. It selects a subset of these patterns based on your settings, and uses them to build a regression model. Finally, it converts the model back to work with your original variables.