Table of Contents

Class ExtremelyRandomizedTreesRegression<T>

Namespace
AiDotNet.Regression
Assembly
AiDotNet.dll

Implements an Extremely Randomized Trees regression model, which is an ensemble method that uses multiple decision trees with additional randomization for improved prediction accuracy and reduced overfitting.

public class ExtremelyRandomizedTreesRegression<T> : AsyncDecisionTreeRegressionBase<T>, IAsyncTreeBasedModel<T>, ITreeBasedRegression<T>, INonLinearRegression<T>, IRegression<T>, IFullModel<T, Matrix<T>, Vector<T>>, IModel<Matrix<T>, Vector<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Matrix<T>, Vector<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Matrix<T>, Vector<T>>>, IGradientComputable<T, Matrix<T>, Vector<T>>, IJitCompilable<T>

Type Parameters

T

The numeric type used for calculations, typically float or double.

Inheritance
ExtremelyRandomizedTreesRegression<T>
Implements
IFullModel<T, Matrix<T>, Vector<T>>
IModel<Matrix<T>, Vector<T>, ModelMetadata<T>>
IParameterizable<T, Matrix<T>, Vector<T>>
ICloneable<IFullModel<T, Matrix<T>, Vector<T>>>
IGradientComputable<T, Matrix<T>, Vector<T>>
Inherited Members
Extension Methods

Remarks

Extremely Randomized Trees (also known as Extra Trees) is an ensemble method that builds multiple decision trees and averages their predictions to improve accuracy and reduce overfitting. Unlike Random Forests, which use the best split for each feature, Extra Trees selects random thresholds for each feature and chooses the best among these random thresholds, adding an additional layer of randomization that can further reduce variance.

For Beginners: This model works like a committee of decision trees that vote on predictions.

While a single decision tree might make mistakes due to its specific structure, a group of different trees can work together to make more reliable predictions:

  • Each tree sees a random subset of the training data
  • Each tree uses random thresholds for making decisions
  • The final prediction is the average of all individual tree predictions

The key advantage is that by adding extra randomness in how the trees are built, the model avoids "memorizing" the training data and becomes better at generalizing to new data. This is similar to how asking many different people for their opinion often leads to better decisions than relying on just one person.

Constructors

ExtremelyRandomizedTreesRegression(ExtremelyRandomizedTreesRegressionOptions, IRegularization<T, Matrix<T>, Vector<T>>?)

Initializes a new instance of the ExtremelyRandomizedTreesRegression<T> class.

public ExtremelyRandomizedTreesRegression(ExtremelyRandomizedTreesRegressionOptions options, IRegularization<T, Matrix<T>, Vector<T>>? regularization = null)

Parameters

options ExtremelyRandomizedTreesRegressionOptions

Configuration options for the Extremely Randomized Trees algorithm.

regularization IRegularization<T, Matrix<T>, Vector<T>>

Optional regularization strategy to prevent overfitting.

Remarks

This constructor creates a new Extremely Randomized Trees regression model with the specified options and regularization strategy. The options control parameters such as the number of trees, maximum tree depth, and feature selection. If no regularization is specified, no regularization is applied.

For Beginners: This is how you create a new Extremely Randomized Trees model.

You need to provide:

  • options: Controls how the ensemble works (like how many trees to use, how deep each tree can be)
  • regularization: Optional setting to help prevent the model from "memorizing" the training data

Example:

// Create options for an Extra Trees model with 100 trees
var options = new ExtremelyRandomizedTreesRegressionOptions {
    NumberOfTrees = 100,
    MaxDepth = 10
};

// Create the model
var extraTrees = new ExtremelyRandomizedTreesRegression<double>(options);

Properties

MaxDepth

Gets the maximum depth of the decision trees in the ensemble.

public override int MaxDepth { get; }

Property Value

int

The maximum number of levels in each tree, from the root to the deepest leaf.

Remarks

This property returns the maximum depth of the individual decision trees in the ensemble. This is one of the most important parameters for controlling the complexity of the model. Deeper trees can capture more complex patterns but are more prone to overfitting.

For Beginners: This property tells you how many levels of questions each tree in the ensemble can ask.

Just like with a single decision tree:

  • A smaller MaxDepth (e.g., 3-5): Creates simpler trees that might miss some patterns but are less likely to memorize the training data
  • A larger MaxDepth (e.g., 10-20): Creates more complex trees that can capture detailed patterns but might learn noise in the training data

One advantage of ensemble methods is that you can often use slightly deeper trees than you would with a single decision tree, because the averaging of multiple trees helps prevent overfitting.

NumberOfTrees

Gets the number of trees in the ensemble model.

public override int NumberOfTrees { get; }

Property Value

int

The number of decision trees used in the ensemble.

Remarks

This property returns the number of individual decision trees that make up the Extremely Randomized Trees ensemble. A larger number of trees typically improves prediction accuracy but increases training and prediction time.

For Beginners: This tells you how many individual decision trees work together in this model.

Think of it as the size of your "committee of experts":

  • A small number (10-50): Faster to train but might be less accurate
  • A medium number (50-200): Good balance of accuracy and speed
  • A large number (200+): More accurate but slower to train and use

Unlike a single decision tree model, which has just one tree, ensemble methods like Extremely Randomized Trees use multiple trees working together to make better predictions. The final prediction is the average of what all these trees predict.

SupportsJitCompilation

Gets whether this Extremely Randomized Trees model supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool

true when soft tree mode is enabled and trees have been trained; false otherwise.

Remarks

Extremely Randomized Trees supports JIT compilation when soft tree mode is enabled. In soft mode, each tree in the ensemble uses sigmoid-based soft gating instead of hard if-then splits, making the entire ensemble differentiable.

For Beginners: JIT compilation is available when soft tree mode is enabled.

In soft tree mode:

  • Each tree in the Extra Trees ensemble uses smooth transitions
  • All trees can be exported as a single computation graph
  • The final prediction averages all tree outputs

This gives you the benefits of extra randomization with JIT-compiled speed.

Methods

CalculateFeatureImportancesAsync(int)

Calculates the average feature importances across all trees in the ensemble.

protected override Task CalculateFeatureImportancesAsync(int numFeatures)

Parameters

numFeatures int

The number of features in the model.

Returns

Task

A task representing the asynchronous calculation operation.

CreateNewInstance()

Creates a new instance of the extremely randomized trees regression model with the same configuration.

protected override IFullModel<T, Matrix<T>, Vector<T>> CreateNewInstance()

Returns

IFullModel<T, Matrix<T>, Vector<T>>

A new instance of ExtremelyRandomizedTreesRegression<T> with the same configuration as the current instance.

Remarks

This method creates a new extremely randomized trees regression model that has the same configuration as the current instance. It's used for model persistence, cloning, and transferring the model's configuration to new instances.

For Beginners: This method makes a fresh copy of the current model with the same settings.

It's like making a blueprint copy of your model that can be used to:

  • Save your model's settings
  • Create a new identical model
  • Transfer your model's configuration to another system

This is useful when you want to:

  • Create multiple similar models
  • Save a model's configuration for later use
  • Reset a model while keeping its settings

Deserialize(byte[])

Loads a previously serialized Extremely Randomized Trees model from a byte array.

public override void Deserialize(byte[] modelData)

Parameters

modelData byte[]

The byte array containing the serialized model.

Remarks

This method reconstructs an Extremely Randomized Trees model from a byte array that was previously created using the Serialize method. It restores the model's configuration options, feature importances, and all individual decision trees in the ensemble, allowing the model to be used for predictions without retraining.

For Beginners: This method loads a previously saved model from a sequence of bytes.

Deserialization allows you to:

  • Load a model that was saved earlier
  • Use a model without having to retrain it
  • Share models between different applications

When you deserialize an Extremely Randomized Trees model:

  • All settings are restored
  • Feature importances are recovered
  • All individual trees in the ensemble are reconstructed
  • The model is ready to make predictions immediately

Example:

// Load from a file
byte[] modelData = File.ReadAllBytes("extraTrees.model");

// Deserialize the model
var extraTrees = new ExtremelyRandomizedTreesRegression<double>(options);
extraTrees.Deserialize(modelData);

// Now you can use the model for predictions
var predictions = await extraTrees.PredictAsync(newFeatures);

ExportComputationGraph(List<ComputationNode<T>>)

Exports the Extremely Randomized Trees model's computation graph for JIT compilation.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>

List to populate with input computation nodes.

Returns

ComputationNode<T>

The root node of the exported computation graph.

Remarks

When soft tree mode is enabled, this exports the entire Extra Trees ensemble as a differentiable computation graph. Each tree is exported individually, and their outputs are averaged to produce the final prediction.

For Beginners: This exports the Extra Trees ensemble as a computation graph.

Extra Trees, like Random Forest, averages predictions from all trees. The main difference is how trees are built (random thresholds instead of optimal), but for JIT compilation the averaging formula is the same.

Exceptions

NotSupportedException

Thrown when soft tree mode is not enabled.

InvalidOperationException

Thrown when the forest has not been trained (no trees).

GetModelMetadata()

Gets metadata about the Extremely Randomized Trees model and its configuration.

public override ModelMetadata<T> GetModelMetadata()

Returns

ModelMetadata<T>

A ModelMetadata object containing information about the model.

Remarks

This method returns metadata about the model, including its type, number of trees, maximum tree depth, and feature importances if available. This information can be useful for model management, comparison, and documentation purposes.

For Beginners: This method provides information about your Extremely Randomized Trees model.

The metadata includes:

  • The type of model (Extremely Randomized Trees)
  • How many trees are in the ensemble
  • Maximum depth of each tree
  • How important each feature is for making predictions (if available)

This information is helpful when:

  • Comparing different models
  • Documenting your model's configuration
  • Troubleshooting model performance
  • Understanding which features have the biggest impact on predictions

Example:

var metadata = extraTrees.GetModelMetadata();
Console.WriteLine($"Model type: {metadata.ModelType}");
Console.WriteLine($"Number of trees: {metadata.AdditionalInfo["NumberOfTrees"]}");

PredictAsync(Matrix<T>)

Asynchronously predicts target values for the provided input features using the trained ensemble model.

public override Task<Vector<T>> PredictAsync(Matrix<T> input)

Parameters

input Matrix<T>

A matrix where each row represents a sample to predict and each column represents a feature.

Returns

Task<Vector<T>>

A task that returns a vector of predicted values corresponding to each input sample.

Remarks

This method predicts target values for new input data by averaging the predictions from all decision trees in the ensemble. Each tree's prediction is computed in parallel, and the results are then averaged to form the final prediction. Any specified regularization is applied to both the input data and the predictions.

For Beginners: This method uses your trained model to make predictions on new data.

The prediction process:

  1. Each individual tree in the ensemble makes its own prediction
  2. These predictions happen in parallel to save time
  3. The final prediction is the average of all the individual tree predictions

For example, if you're predicting house prices and have 100 trees:

  • Tree 1 predicts: $250,000
  • Tree 2 predicts: $275,000
  • ...
  • Tree 100 predicts: $260,000
  • Final prediction: Average of all 100 predictions

The "Async" in the name means this method returns a Task, allowing your program to do other things while waiting for predictions to complete.

Example:

// Make predictions
var predictions = await extraTrees.PredictAsync(newFeatures);

Serialize()

Serializes the Extremely Randomized Trees model to a byte array for storage or transmission.

public override byte[] Serialize()

Returns

byte[]

A byte array containing the serialized model.

Remarks

This method converts the Extremely Randomized Trees model into a byte array that can be stored in a file, database, or transmitted over a network. The serialized data includes the model's configuration options, feature importances, and all individual decision trees in the ensemble.

For Beginners: This method saves your trained model as a sequence of bytes.

Serialization allows you to:

  • Save your model to a file
  • Store your model in a database
  • Send your model over a network
  • Keep your model for later use without having to retrain it

The serialized data includes:

  • All the model's settings (like number of trees and maximum depth)
  • The importance of each feature
  • Every individual decision tree in the ensemble

Because Extremely Randomized Trees models contain multiple trees, the serialized data can be quite large compared to a single decision tree model.

Example:

// Serialize the model
byte[] modelData = extraTrees.Serialize();

// Save to a file
File.WriteAllBytes("extraTrees.model", modelData);

TrainAsync(Matrix<T>, Vector<T>)

Asynchronously trains the Extremely Randomized Trees model using the provided input features and target values.

public override Task TrainAsync(Matrix<T> x, Vector<T> y)

Parameters

x Matrix<T>

A matrix where each row represents a sample and each column represents a feature.

y Vector<T>

A vector of target values corresponding to each sample in x.

Returns

Task

A task representing the asynchronous training operation.

Remarks

This method builds the Extremely Randomized Trees ensemble by training multiple decision trees in parallel. Each tree is trained on a randomly sampled subset of the training data (bootstrap sampling). The trees are built with additional randomization in feature selection and threshold determination. The level of parallelism can be controlled through the options.

For Beginners: This method teaches the model how to make predictions using your data.

During training:

  1. The model creates multiple decision trees (as specified in NumberOfTrees)
  2. Each tree is given a random sample of your training data (some examples may be repeated, others left out)
  3. Each tree learns independently but with extra randomness in how it makes decisions
  4. The trees are trained in parallel to save time (using multiple CPU cores)

The "Async" in the name means this method can run without blocking other operations in your program, which is especially helpful when training large models that take significant time.

Example:

// Train the model
await extraTrees.TrainAsync(features, targets);