Class PartialLeastSquaresRegression<T>
- Namespace
- AiDotNet.Regression
- Assembly
- AiDotNet.dll
Implements Partial Least Squares Regression (PLS), a technique that combines features from principal component analysis and multiple linear regression to handle situations with many correlated predictors.
public class PartialLeastSquaresRegression<T> : RegressionBase<T>, IRegression<T>, IFullModel<T, Matrix<T>, Vector<T>>, IModel<Matrix<T>, Vector<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Matrix<T>, Vector<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Matrix<T>, Vector<T>>>, IGradientComputable<T, Matrix<T>, Vector<T>>, IJitCompilable<T>
Type Parameters
TThe numeric data type used for calculations (e.g., float, double).
- Inheritance
-
PartialLeastSquaresRegression<T>
- Implements
-
IRegression<T>
- Inherited Members
- Extension Methods
Remarks
Partial Least Squares Regression is particularly useful when dealing with many predictor variables that may be highly correlated. It works by finding a linear combination of the predictors (components) that maximizes the covariance between the predictors and the response variable.
Unlike Principal Component Regression which only considers the variance in the predictor variables, PLS regression considers both the variance in the predictors and their relationship with the response variable. This often leads to models with better predictive power, especially when the predictors are highly correlated.
For Beginners: Think of PLS regression as a way to find the most important patterns in your input data that are also strongly related to what you're trying to predict. It's like finding the key ingredients in a recipe that most influence the taste, rather than just the most abundant ingredients.
Constructors
PartialLeastSquaresRegression(PartialLeastSquaresRegressionOptions<T>?, IRegularization<T, Matrix<T>, Vector<T>>?)
Initializes a new instance of the PartialLeastSquaresRegression class with the specified options and regularization.
public PartialLeastSquaresRegression(PartialLeastSquaresRegressionOptions<T>? options = null, IRegularization<T, Matrix<T>, Vector<T>>? regularization = null)
Parameters
optionsPartialLeastSquaresRegressionOptions<T>Configuration options for the PLS regression model. If null, default options will be used.
regularizationIRegularization<T, Matrix<T>, Vector<T>>Regularization method to prevent overfitting. If null, no regularization will be applied.
Remarks
The constructor initializes the model with either the provided options or default settings.
For Beginners: This constructor sets up the PLS regression model with your specified settings or uses default settings if none are provided. Regularization is an optional technique to prevent the model from becoming too complex and overfitting to the training data.
Methods
CalculateFeatureImportances()
Calculates the importance of each feature in the model.
protected override Vector<T> CalculateFeatureImportances()
Returns
- Vector<T>
A vector containing the importance score for each feature.
Remarks
This method calculates the Variable Importance in Projection (VIP) scores, which measure the contribution of each variable to the model based on the variance explained by each PLS component and the weights of each variable in those components.
For Beginners: Feature importance tells you which input variables have the most influence on the predictions. In PLS regression, this is calculated using a measure called VIP (Variable Importance in Projection), which considers both how much each component explains the variation in the data and how much each variable contributes to those components. Higher values indicate more important variables.
CreateNewInstance()
Creates a new instance of the Partial Least Squares Regression model with the same configuration.
protected override IFullModel<T, Matrix<T>, Vector<T>> CreateNewInstance()
Returns
- IFullModel<T, Matrix<T>, Vector<T>>
A new instance of the Partial Least Squares Regression model.
Remarks
This method creates a deep copy of the current Partial Least Squares Regression model, including its options, coefficients, intercept, loadings, scores, weights, and data scaling parameters. The new instance is completely independent of the original, allowing modifications without affecting the original model.
For Beginners: This method creates an exact copy of your trained model.
Think of it like making a perfect duplicate:
- It copies all the configuration settings (like the number of components)
- It preserves the coefficients and intercept that define your regression model
- It duplicates all the internal matrices (loadings, scores, weights) that capture the patterns in your data
- It maintains the scaling information (means and standard deviations) needed to process new data
Creating a copy is useful when you want to:
- Create a backup before further modifying the model
- Create variations of the same model for different purposes
- Share the model with others while keeping your original intact
Exceptions
- InvalidOperationException
Thrown when the creation fails or required components are null.
Deserialize(byte[])
Deserializes the model from a byte array.
public override void Deserialize(byte[] modelData)
Parameters
modelDatabyte[]The byte array containing the serialized model data.
Remarks
This method reconstructs the model's parameters from a serialized byte array, including base class data and PLS-specific data such as loadings, scores, weights, means, and standard deviations.
For Beginners: Deserialization is the opposite of serialization - it takes the saved model data and reconstructs the model's internal state. This allows you to load a previously trained model and use it to make predictions without having to retrain it. It's like loading a saved game to continue where you left off.
GetModelMetadata()
Gets metadata about the model.
public override ModelMetadata<T> GetModelMetadata()
Returns
- ModelMetadata<T>
A ModelMetadata object containing information about the model.
Remarks
This method returns metadata about the model, including its type, coefficients, loadings, scores, weights, number of components, and feature importance.
For Beginners: Model metadata provides information about the model itself, rather than the predictions it makes. This includes details about how the model is configured (like how many components it uses) and information about the importance of different features. This can help you understand which input variables are most influential in making predictions.
GetModelType()
Gets the type of the model.
protected override ModelType GetModelType()
Returns
- ModelType
The model type identifier for partial least squares regression.
Remarks
This method is used for model identification and serialization purposes.
For Beginners: This method simply returns an identifier that indicates this is a partial least squares regression model. It's used internally by the library to keep track of different types of models.
Predict(Matrix<T>)
Makes predictions for the given input data.
public override Vector<T> Predict(Matrix<T> input)
Parameters
inputMatrix<T>The input features matrix where each row is an example and each column is a feature.
Returns
- Vector<T>
A vector of predicted values for each input example.
Remarks
This method scales the input data using the means and standard deviations from the training data, applies the regression coefficients, and adds the intercept to produce predictions.
For Beginners: After training, this method is used to make predictions on new data. It first scales your input data the same way the training data was scaled, then applies the learned model to calculate the predicted values. This is the main purpose of building a regression model - to predict values for new examples.
Serialize()
Serializes the model to a byte array.
public override byte[] Serialize()
Returns
- byte[]
A byte array containing the serialized model data.
Remarks
This method serializes the model's parameters, including base class data and PLS-specific data such as loadings, scores, weights, means, and standard deviations.
For Beginners: Serialization converts the model's internal state into a format that can be saved to disk or transmitted over a network. This allows you to save a trained model and load it later without having to retrain it. Think of it like saving your progress in a video game.
Train(Matrix<T>, Vector<T>)
Trains the partial least squares regression model on the provided data.
public override void Train(Matrix<T> x, Vector<T> y)
Parameters
xMatrix<T>The input features matrix where each row is a training example and each column is a feature.
yVector<T>The target values vector corresponding to each training example.
Remarks
This method performs the following steps: 1. Validates the input data 2. Centers and scales the data 3. Extracts the specified number of components using the NIPALS algorithm 4. Calculates the regression coefficients 5. Adjusts the coefficients for the scaling 6. Calculates the intercept 7. Applies regularization to the model matrices
For Beginners: Training is the process where the model learns from your data. The PLS algorithm first centers and scales your data (makes all variables have similar ranges), then finds the most important patterns (components) that explain both the variation in your input features and their relationship with the target variable. These components are then used to build a regression model that can predict the target variable from new input features.