Class QuantileRegressionForests<T>
- Namespace
- AiDotNet.Regression
- Assembly
- AiDotNet.dll
Implements Quantile Regression Forests, an extension of Random Forests that can predict conditional quantiles of the target variable, not just the conditional mean.
public class QuantileRegressionForests<T> : AsyncDecisionTreeRegressionBase<T>, IAsyncTreeBasedModel<T>, ITreeBasedRegression<T>, INonLinearRegression<T>, IRegression<T>, IFullModel<T, Matrix<T>, Vector<T>>, IModel<Matrix<T>, Vector<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Matrix<T>, Vector<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Matrix<T>, Vector<T>>>, IGradientComputable<T, Matrix<T>, Vector<T>>, IJitCompilable<T>
Type Parameters
TThe numeric data type used for calculations (e.g., float, double).
- Inheritance
-
QuantileRegressionForests<T>
- Implements
-
IRegression<T>
- Inherited Members
- Extension Methods
Remarks
Quantile Regression Forests extend the Random Forests algorithm to estimate the full conditional distribution of the response variable, not just its mean. This allows for prediction of any quantile of the response variable, providing a more complete picture of the relationship between predictors and the response.
The algorithm works by building multiple decision trees on bootstrap samples of the training data, similar to Random Forests. However, instead of averaging the predictions, it uses the empirical distribution of the predictions from all trees to estimate quantiles.
For Beginners: While standard Random Forests tell you the average prediction, Quantile Regression Forests can tell you about the entire range of possible outcomes. For example, they can predict not just the expected value, but also the 10th percentile (a pessimistic scenario) or the 90th percentile (an optimistic scenario). This is particularly useful when you need to understand the uncertainty in your predictions or when the relationship between variables varies across different parts of the distribution.
Constructors
QuantileRegressionForests(QuantileRegressionForestsOptions, IRegularization<T, Matrix<T>, Vector<T>>?)
Initializes a new instance of the QuantileRegressionForests class with the specified options and regularization.
public QuantileRegressionForests(QuantileRegressionForestsOptions options, IRegularization<T, Matrix<T>, Vector<T>>? regularization = null)
Parameters
optionsQuantileRegressionForestsOptionsConfiguration options for the Quantile Regression Forests model.
regularizationIRegularization<T, Matrix<T>, Vector<T>>Regularization method to prevent overfitting. If null, no regularization will be applied.
Remarks
The constructor initializes the model with the provided options and sets up the random number generator.
For Beginners: This constructor sets up the Quantile Regression Forests model with your specified settings. The options control things like how many trees to build, how deep each tree can be, and how many features to consider at each split. Regularization is an optional technique to prevent the model from becoming too complex and overfitting to the training data.
Properties
MaxDepth
Gets the maximum depth of the trees in the forest.
public override int MaxDepth { get; }
Property Value
- int
The maximum depth specified in the options.
NumberOfTrees
Gets the number of trees in the forest.
public override int NumberOfTrees { get; }
Property Value
- int
The number of trees specified in the options.
Methods
CalculateFeatureImportancesAsync(int)
Asynchronously calculates the importance of each feature in the model.
protected override Task CalculateFeatureImportancesAsync(int numFeatures)
Parameters
numFeaturesintThe number of features in the input data.
Returns
- Task
A task that represents the asynchronous calculation operation.
Remarks
This method calculates feature importances by averaging the importances across all trees in the forest.
For Beginners: Feature importance tells you which input variables have the most influence on the predictions. In Quantile Regression Forests, this is calculated by averaging the feature importances from all the individual trees. Higher values indicate more important features.
CreateNewInstance()
Creates a new instance of the Quantile Regression Forests model with the same configuration.
protected override IFullModel<T, Matrix<T>, Vector<T>> CreateNewInstance()
Returns
- IFullModel<T, Matrix<T>, Vector<T>>
A new instance of the Quantile Regression Forests model.
Remarks
This method creates a deep copy of the current model, including its configuration options, trained trees, feature importances, and regularization settings. The new instance is completely independent of the original, allowing modifications without affecting the original model.
For Beginners: This method creates an exact copy of your trained model.
Think of it like making a perfect clone of your forest model:
- It copies all the configuration settings (number of trees, max depth, etc.)
- It duplicates all the individual decision trees that make up the forest
- It preserves the feature importance values that show which inputs matter most
- It maintains all regularization settings that help prevent overfitting
Creating a copy is useful when you want to:
- Create a backup before further modifying the model
- Create variations of the same model for different purposes
- Share the model with others while keeping your original intact
Exceptions
- InvalidOperationException
Thrown when the creation fails or required components are null.
Deserialize(byte[])
Deserializes the model from a byte array.
public override void Deserialize(byte[] modelData)
Parameters
modelDatabyte[]The byte array containing the serialized model data.
Remarks
This method reconstructs the model's parameters from a serialized byte array, including options, feature importances, and all trees in the forest.
For Beginners: Deserialization is the opposite of serialization - it takes the saved model data and reconstructs the model's internal state. This allows you to load a previously trained model and use it to make predictions without having to retrain it. It's like loading a saved game to continue where you left off.
GetModelMetadata()
Gets metadata about the model.
public override ModelMetadata<T> GetModelMetadata()
Returns
- ModelMetadata<T>
A ModelMetadata object containing information about the model.
Remarks
This method returns metadata about the model, including its type, number of trees, maximum depth, and feature importances.
For Beginners: Model metadata provides information about the model itself, rather than the predictions it makes. This includes details about how the model is configured (like how many trees it uses and how deep they are) and information about the importance of different features. This can help you understand which input variables are most influential in making predictions.
PredictAsync(Matrix<T>)
Asynchronously makes predictions for the given input data.
public override Task<Vector<T>> PredictAsync(Matrix<T> input)
Parameters
inputMatrix<T>The input features matrix where each row is an example and each column is a feature.
Returns
- Task<Vector<T>>
A task that represents the asynchronous prediction operation, containing a vector of predicted values.
Remarks
This method predicts the median (0.5 quantile) of the conditional distribution for each input example.
For Beginners: After training, this method is used to make predictions on new data. By default, it predicts the median value (the middle of the distribution), which is often a good central estimate. If you need a different percentile, you can use the PredictQuantileAsync method instead.
PredictQuantileAsync(Matrix<T>, double)
Asynchronously predicts a specific quantile of the target variable for the given input data.
public Task<Vector<T>> PredictQuantileAsync(Matrix<T> input, double quantile)
Parameters
inputMatrix<T>The input features matrix where each row is an example and each column is a feature.
quantiledoubleThe quantile to predict, a value between 0 and 1.
Returns
- Task<Vector<T>>
A task that represents the asynchronous prediction operation, containing a vector of predicted quantile values.
Remarks
This method predicts the specified quantile of the conditional distribution for each input example. The steps are: 1. Validate that the quantile is between 0 and 1 2. Apply regularization to the input matrix 3. Get predictions from all trees in parallel 4. For each input example: a. Sort the predictions from all trees b. Select the value at the position corresponding to the specified quantile 5. Apply regularization to the quantile predictions
For Beginners: This method predicts a specific percentile of the possible outcomes for each example in your input data. For instance, if you specify quantile=0.5, it predicts the median (middle value); if you specify quantile=0.9, it predicts the value below which 90% of the outcomes would fall. This is useful for understanding the range of possible outcomes and the uncertainty in your predictions.
Exceptions
- ArgumentException
Thrown when the quantile is not between 0 and 1.
Serialize()
Serializes the model to a byte array.
public override byte[] Serialize()
Returns
- byte[]
A byte array containing the serialized model data.
Remarks
This method serializes the model's parameters, including options, feature importances, and all trees in the forest.
For Beginners: Serialization converts the model's internal state into a format that can be saved to disk or transmitted over a network. This allows you to save a trained model and load it later without having to retrain it. Think of it like saving your progress in a video game.
TrainAsync(Matrix<T>, Vector<T>)
Asynchronously trains the Quantile Regression Forests model on the provided data.
public override Task TrainAsync(Matrix<T> x, Vector<T> y)
Parameters
xMatrix<T>The input features matrix where each row is a training example and each column is a feature.
yVector<T>The target values vector corresponding to each training example.
Returns
- Task
A task that represents the asynchronous training operation.
Remarks
This method builds multiple decision trees in parallel, each trained on a bootstrap sample of the training data. The steps are: 1. Clear any existing trees 2. For each tree: a. Create a new decision tree with the specified options b. Generate a bootstrap sample of the training data c. Train the tree on the bootstrap sample 3. Calculate feature importances by averaging across all trees
For Beginners: Training is the process where the model learns from your data. The algorithm builds multiple decision trees, each on a slightly different version of your data (created by random sampling with replacement). Each tree learns to predict the target variable based on the features. By building many trees and combining their predictions, the model can capture complex relationships and provide estimates of different quantiles (percentiles) of the target variable.