Class ExtremelyRandomizedTreesRegression<T>
- Namespace
- AiDotNet.Regression
- Assembly
- AiDotNet.dll
Implements an Extremely Randomized Trees regression model, which is an ensemble method that uses multiple decision trees with additional randomization for improved prediction accuracy and reduced overfitting.
public class ExtremelyRandomizedTreesRegression<T> : AsyncDecisionTreeRegressionBase<T>, IAsyncTreeBasedModel<T>, ITreeBasedRegression<T>, INonLinearRegression<T>, IRegression<T>, IFullModel<T, Matrix<T>, Vector<T>>, IModel<Matrix<T>, Vector<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Matrix<T>, Vector<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Matrix<T>, Vector<T>>>, IGradientComputable<T, Matrix<T>, Vector<T>>, IJitCompilable<T>
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
ExtremelyRandomizedTreesRegression<T>
- Implements
-
IRegression<T>
- Inherited Members
- Extension Methods
Remarks
Extremely Randomized Trees (also known as Extra Trees) is an ensemble method that builds multiple decision trees and averages their predictions to improve accuracy and reduce overfitting. Unlike Random Forests, which use the best split for each feature, Extra Trees selects random thresholds for each feature and chooses the best among these random thresholds, adding an additional layer of randomization that can further reduce variance.
For Beginners: This model works like a committee of decision trees that vote on predictions.
While a single decision tree might make mistakes due to its specific structure, a group of different trees can work together to make more reliable predictions:
- Each tree sees a random subset of the training data
- Each tree uses random thresholds for making decisions
- The final prediction is the average of all individual tree predictions
The key advantage is that by adding extra randomness in how the trees are built, the model avoids "memorizing" the training data and becomes better at generalizing to new data. This is similar to how asking many different people for their opinion often leads to better decisions than relying on just one person.
Constructors
ExtremelyRandomizedTreesRegression(ExtremelyRandomizedTreesRegressionOptions, IRegularization<T, Matrix<T>, Vector<T>>?)
Initializes a new instance of the ExtremelyRandomizedTreesRegression<T> class.
public ExtremelyRandomizedTreesRegression(ExtremelyRandomizedTreesRegressionOptions options, IRegularization<T, Matrix<T>, Vector<T>>? regularization = null)
Parameters
optionsExtremelyRandomizedTreesRegressionOptionsConfiguration options for the Extremely Randomized Trees algorithm.
regularizationIRegularization<T, Matrix<T>, Vector<T>>Optional regularization strategy to prevent overfitting.
Remarks
This constructor creates a new Extremely Randomized Trees regression model with the specified options and regularization strategy. The options control parameters such as the number of trees, maximum tree depth, and feature selection. If no regularization is specified, no regularization is applied.
For Beginners: This is how you create a new Extremely Randomized Trees model.
You need to provide:
- options: Controls how the ensemble works (like how many trees to use, how deep each tree can be)
- regularization: Optional setting to help prevent the model from "memorizing" the training data
Example:
// Create options for an Extra Trees model with 100 trees
var options = new ExtremelyRandomizedTreesRegressionOptions {
NumberOfTrees = 100,
MaxDepth = 10
};
// Create the model
var extraTrees = new ExtremelyRandomizedTreesRegression<double>(options);
Properties
MaxDepth
Gets the maximum depth of the decision trees in the ensemble.
public override int MaxDepth { get; }
Property Value
- int
The maximum number of levels in each tree, from the root to the deepest leaf.
Remarks
This property returns the maximum depth of the individual decision trees in the ensemble. This is one of the most important parameters for controlling the complexity of the model. Deeper trees can capture more complex patterns but are more prone to overfitting.
For Beginners: This property tells you how many levels of questions each tree in the ensemble can ask.
Just like with a single decision tree:
- A smaller MaxDepth (e.g., 3-5): Creates simpler trees that might miss some patterns but are less likely to memorize the training data
- A larger MaxDepth (e.g., 10-20): Creates more complex trees that can capture detailed patterns but might learn noise in the training data
One advantage of ensemble methods is that you can often use slightly deeper trees than you would with a single decision tree, because the averaging of multiple trees helps prevent overfitting.
NumberOfTrees
Gets the number of trees in the ensemble model.
public override int NumberOfTrees { get; }
Property Value
- int
The number of decision trees used in the ensemble.
Remarks
This property returns the number of individual decision trees that make up the Extremely Randomized Trees ensemble. A larger number of trees typically improves prediction accuracy but increases training and prediction time.
For Beginners: This tells you how many individual decision trees work together in this model.
Think of it as the size of your "committee of experts":
- A small number (10-50): Faster to train but might be less accurate
- A medium number (50-200): Good balance of accuracy and speed
- A large number (200+): More accurate but slower to train and use
Unlike a single decision tree model, which has just one tree, ensemble methods like Extremely Randomized Trees use multiple trees working together to make better predictions. The final prediction is the average of what all these trees predict.
SupportsJitCompilation
Gets whether this Extremely Randomized Trees model supports JIT compilation.
public override bool SupportsJitCompilation { get; }
Property Value
- bool
truewhen soft tree mode is enabled and trees have been trained;falseotherwise.
Remarks
Extremely Randomized Trees supports JIT compilation when soft tree mode is enabled. In soft mode, each tree in the ensemble uses sigmoid-based soft gating instead of hard if-then splits, making the entire ensemble differentiable.
For Beginners: JIT compilation is available when soft tree mode is enabled.
In soft tree mode:
- Each tree in the Extra Trees ensemble uses smooth transitions
- All trees can be exported as a single computation graph
- The final prediction averages all tree outputs
This gives you the benefits of extra randomization with JIT-compiled speed.
Methods
CalculateFeatureImportancesAsync(int)
Calculates the average feature importances across all trees in the ensemble.
protected override Task CalculateFeatureImportancesAsync(int numFeatures)
Parameters
numFeaturesintThe number of features in the model.
Returns
- Task
A task representing the asynchronous calculation operation.
CreateNewInstance()
Creates a new instance of the extremely randomized trees regression model with the same configuration.
protected override IFullModel<T, Matrix<T>, Vector<T>> CreateNewInstance()
Returns
- IFullModel<T, Matrix<T>, Vector<T>>
A new instance of ExtremelyRandomizedTreesRegression<T> with the same configuration as the current instance.
Remarks
This method creates a new extremely randomized trees regression model that has the same configuration as the current instance. It's used for model persistence, cloning, and transferring the model's configuration to new instances.
For Beginners: This method makes a fresh copy of the current model with the same settings.
It's like making a blueprint copy of your model that can be used to:
- Save your model's settings
- Create a new identical model
- Transfer your model's configuration to another system
This is useful when you want to:
- Create multiple similar models
- Save a model's configuration for later use
- Reset a model while keeping its settings
Deserialize(byte[])
Loads a previously serialized Extremely Randomized Trees model from a byte array.
public override void Deserialize(byte[] modelData)
Parameters
modelDatabyte[]The byte array containing the serialized model.
Remarks
This method reconstructs an Extremely Randomized Trees model from a byte array that was previously created using the Serialize method. It restores the model's configuration options, feature importances, and all individual decision trees in the ensemble, allowing the model to be used for predictions without retraining.
For Beginners: This method loads a previously saved model from a sequence of bytes.
Deserialization allows you to:
- Load a model that was saved earlier
- Use a model without having to retrain it
- Share models between different applications
When you deserialize an Extremely Randomized Trees model:
- All settings are restored
- Feature importances are recovered
- All individual trees in the ensemble are reconstructed
- The model is ready to make predictions immediately
Example:
// Load from a file
byte[] modelData = File.ReadAllBytes("extraTrees.model");
// Deserialize the model
var extraTrees = new ExtremelyRandomizedTreesRegression<double>(options);
extraTrees.Deserialize(modelData);
// Now you can use the model for predictions
var predictions = await extraTrees.PredictAsync(newFeatures);
ExportComputationGraph(List<ComputationNode<T>>)
Exports the Extremely Randomized Trees model's computation graph for JIT compilation.
public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)
Parameters
inputNodesList<ComputationNode<T>>List to populate with input computation nodes.
Returns
- ComputationNode<T>
The root node of the exported computation graph.
Remarks
When soft tree mode is enabled, this exports the entire Extra Trees ensemble as a differentiable computation graph. Each tree is exported individually, and their outputs are averaged to produce the final prediction.
For Beginners: This exports the Extra Trees ensemble as a computation graph.
Extra Trees, like Random Forest, averages predictions from all trees. The main difference is how trees are built (random thresholds instead of optimal), but for JIT compilation the averaging formula is the same.
Exceptions
- NotSupportedException
Thrown when soft tree mode is not enabled.
- InvalidOperationException
Thrown when the forest has not been trained (no trees).
GetModelMetadata()
Gets metadata about the Extremely Randomized Trees model and its configuration.
public override ModelMetadata<T> GetModelMetadata()
Returns
- ModelMetadata<T>
A ModelMetadata object containing information about the model.
Remarks
This method returns metadata about the model, including its type, number of trees, maximum tree depth, and feature importances if available. This information can be useful for model management, comparison, and documentation purposes.
For Beginners: This method provides information about your Extremely Randomized Trees model.
The metadata includes:
- The type of model (Extremely Randomized Trees)
- How many trees are in the ensemble
- Maximum depth of each tree
- How important each feature is for making predictions (if available)
This information is helpful when:
- Comparing different models
- Documenting your model's configuration
- Troubleshooting model performance
- Understanding which features have the biggest impact on predictions
Example:
var metadata = extraTrees.GetModelMetadata();
Console.WriteLine($"Model type: {metadata.ModelType}");
Console.WriteLine($"Number of trees: {metadata.AdditionalInfo["NumberOfTrees"]}");
PredictAsync(Matrix<T>)
Asynchronously predicts target values for the provided input features using the trained ensemble model.
public override Task<Vector<T>> PredictAsync(Matrix<T> input)
Parameters
inputMatrix<T>A matrix where each row represents a sample to predict and each column represents a feature.
Returns
- Task<Vector<T>>
A task that returns a vector of predicted values corresponding to each input sample.
Remarks
This method predicts target values for new input data by averaging the predictions from all decision trees in the ensemble. Each tree's prediction is computed in parallel, and the results are then averaged to form the final prediction. Any specified regularization is applied to both the input data and the predictions.
For Beginners: This method uses your trained model to make predictions on new data.
The prediction process:
- Each individual tree in the ensemble makes its own prediction
- These predictions happen in parallel to save time
- The final prediction is the average of all the individual tree predictions
For example, if you're predicting house prices and have 100 trees:
- Tree 1 predicts: $250,000
- Tree 2 predicts: $275,000
- ...
- Tree 100 predicts: $260,000
- Final prediction: Average of all 100 predictions
The "Async" in the name means this method returns a Task, allowing your program to do other things while waiting for predictions to complete.
Example:
// Make predictions
var predictions = await extraTrees.PredictAsync(newFeatures);
Serialize()
Serializes the Extremely Randomized Trees model to a byte array for storage or transmission.
public override byte[] Serialize()
Returns
- byte[]
A byte array containing the serialized model.
Remarks
This method converts the Extremely Randomized Trees model into a byte array that can be stored in a file, database, or transmitted over a network. The serialized data includes the model's configuration options, feature importances, and all individual decision trees in the ensemble.
For Beginners: This method saves your trained model as a sequence of bytes.
Serialization allows you to:
- Save your model to a file
- Store your model in a database
- Send your model over a network
- Keep your model for later use without having to retrain it
The serialized data includes:
- All the model's settings (like number of trees and maximum depth)
- The importance of each feature
- Every individual decision tree in the ensemble
Because Extremely Randomized Trees models contain multiple trees, the serialized data can be quite large compared to a single decision tree model.
Example:
// Serialize the model
byte[] modelData = extraTrees.Serialize();
// Save to a file
File.WriteAllBytes("extraTrees.model", modelData);
TrainAsync(Matrix<T>, Vector<T>)
Asynchronously trains the Extremely Randomized Trees model using the provided input features and target values.
public override Task TrainAsync(Matrix<T> x, Vector<T> y)
Parameters
xMatrix<T>A matrix where each row represents a sample and each column represents a feature.
yVector<T>A vector of target values corresponding to each sample in x.
Returns
- Task
A task representing the asynchronous training operation.
Remarks
This method builds the Extremely Randomized Trees ensemble by training multiple decision trees in parallel. Each tree is trained on a randomly sampled subset of the training data (bootstrap sampling). The trees are built with additional randomization in feature selection and threshold determination. The level of parallelism can be controlled through the options.
For Beginners: This method teaches the model how to make predictions using your data.
During training:
- The model creates multiple decision trees (as specified in NumberOfTrees)
- Each tree is given a random sample of your training data (some examples may be repeated, others left out)
- Each tree learns independently but with extra randomness in how it makes decisions
- The trees are trained in parallel to save time (using multiple CPU cores)
The "Async" in the name means this method can run without blocking other operations in your program, which is especially helpful when training large models that take significant time.
Example:
// Train the model
await extraTrees.TrainAsync(features, targets);