Class AdaBoostR2Regression<T>

Namespace: AiDotNet.Regression

Assembly: AiDotNet.dll

Implements the AdaBoost.R2 algorithm for regression problems, an ensemble learning method that combines multiple decision tree regressors to improve prediction accuracy.

public class AdaBoostR2Regression<T> : AsyncDecisionTreeRegressionBase<T>, IAsyncTreeBasedModel<T>, ITreeBasedRegression<T>, INonLinearRegression<T>, IRegression<T>, IFullModel<T, Matrix<T>, Vector<T>>, IModel<Matrix<T>, Vector<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Matrix<T>, Vector<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Matrix<T>, Vector<T>>>, IGradientComputable<T, Matrix<T>, Vector<T>>, IJitCompilable<T>

Type Parameters

T: The numeric type used for calculations, typically float or double.

Inheritance: object

AsyncDecisionTreeRegressionBase<T>

AdaBoostR2Regression<T>

Implements: IAsyncTreeBasedModel<T>

ITreeBasedRegression<T>

INonLinearRegression<T>

IRegression<T>

IFullModel<T, Matrix<T>, Vector<T>>

IModel<Matrix<T>, Vector<T>, ModelMetadata<T>>

IModelSerializer

ICheckpointableModel

IParameterizable<T, Matrix<T>, Vector<T>>

IFeatureAware

IFeatureImportance<T>

ICloneable<IFullModel<T, Matrix<T>, Vector<T>>>

IGradientComputable<T, Matrix<T>, Vector<T>>

IJitCompilable<T>

Inherited Members: AsyncDecisionTreeRegressionBase<T>.NumOps

AsyncDecisionTreeRegressionBase<T>.Engine

AsyncDecisionTreeRegressionBase<T>.Root

AsyncDecisionTreeRegressionBase<T>.Options

AsyncDecisionTreeRegressionBase<T>.Regularization

AsyncDecisionTreeRegressionBase<T>.FeatureImportances

AsyncDecisionTreeRegressionBase<T>.Random

AsyncDecisionTreeRegressionBase<T>.FeatureNames

AsyncDecisionTreeRegressionBase<T>.Train(Matrix<T>, Vector<T>)

AsyncDecisionTreeRegressionBase<T>.Predict(Matrix<T>)

AsyncDecisionTreeRegressionBase<T>.GetParameters()

AsyncDecisionTreeRegressionBase<T>.WithParameters(Vector<T>)

AsyncDecisionTreeRegressionBase<T>.GetActiveFeatureIndices()

AsyncDecisionTreeRegressionBase<T>.IsFeatureUsed(int)

AsyncDecisionTreeRegressionBase<T>.SetParameters(Vector<T>)

AsyncDecisionTreeRegressionBase<T>.SetActiveFeatureIndices(IEnumerable<int>)

AsyncDecisionTreeRegressionBase<T>.GetFeatureImportance()

AsyncDecisionTreeRegressionBase<T>.DeepCopy()

AsyncDecisionTreeRegressionBase<T>.Clone()

AsyncDecisionTreeRegressionBase<T>.SaveModel(string)

AsyncDecisionTreeRegressionBase<T>.LoadModel(string)

AsyncDecisionTreeRegressionBase<T>.ParameterCount

AsyncDecisionTreeRegressionBase<T>.DefaultLossFunction

AsyncDecisionTreeRegressionBase<T>.ComputeGradients(Matrix<T>, Vector<T>, ILossFunction<T>)

AsyncDecisionTreeRegressionBase<T>.ApplyGradients(Vector<T>, T)

AsyncDecisionTreeRegressionBase<T>.SaveState(Stream)

AsyncDecisionTreeRegressionBase<T>.LoadState(Stream)

AsyncDecisionTreeRegressionBase<T>.UseSoftTree

AsyncDecisionTreeRegressionBase<T>.SoftTreeTemperature

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Extension Methods: DistributedExtensions.AsDistributedForHighBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributedForLowBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, IShardingConfiguration<T>)

Remarks

AdaBoost.R2 (Adaptive Boosting for Regression) is an extension of the AdaBoost algorithm for regression tasks. It works by training a sequence of weak regressors (decision trees) on repeatedly modified versions of the data. The predictions from all regressors are then combined through a weighted majority vote to produce the final prediction.

In AdaBoost.R2, each training sample is assigned a weight that determines its importance during training. Initially, all weights are equal. For each iteration, the weights of incorrectly predicted samples are increased so that subsequent weak regressors focus more on difficult cases. The algorithm stops when the specified number of estimators is reached or when the error rate exceeds 0.5.

For Beginners: AdaBoost.R2 is a powerful machine learning technique for predicting numeric values (like prices, temperatures, or ages) rather than categories.

Think of AdaBoost.R2 as a team of experts (decision trees) working together to make predictions:

The first "expert" makes predictions on all the training data
The algorithm identifies which samples were predicted poorly
The next expert pays special attention to those difficult samples
This process repeats, creating a team of experts that each specialize in different aspects of the problem
When making predictions, all experts "vote" on the final answer, but experts who performed better get more voting power

This approach is particularly effective because:

It can turn a collection of "weak" learners (simple decision trees) into a "strong" learner
It automatically focuses on the hardest parts of the problem
It's less prone to overfitting than a single, complex model

AdaBoost.R2 is ideal for problems where you need high prediction accuracy and have enough training data to build multiple models.

Constructors

AdaBoostR2Regression(AdaBoostR2RegressionOptions, IRegularization<T, Matrix<T>, Vector<T>>?)

Initializes a new instance of the AdaBoostR2Regression<T> class with specified options and regularization.

public AdaBoostR2Regression(AdaBoostR2RegressionOptions options, IRegularization<T, Matrix<T>, Vector<T>>? regularization = null)

Parameters

options AdaBoostR2RegressionOptions: The options for configuring the AdaBoost.R2 algorithm.
regularization IRegularization<T, Matrix<T>, Vector<T>>: Optional regularization to prevent overfitting.

Remarks

The constructor initializes the AdaBoost.R2 regression model with the specified configuration options and regularization. The options control parameters such as the number of estimators (trees) to use, the maximum depth of each tree, and the minimum number of samples required to split a node.

For Beginners: This creates a new AdaBoost.R2 regression model with specific settings.

The options parameter controls important settings like:

How many decision trees to create (NumberOfEstimators)
How complex each tree can be (MaxDepth)
How much data is needed to make decisions in the trees (MinSamplesSplit)

The regularization parameter helps prevent "overfitting" - a situation where the model works well on training data but poorly on new data because it's too closely tailored to the specific examples it was trained on.

If you're not sure what values to use, the default options typically provide a good starting point for many regression problems.

Properties

MaxDepth

Gets the maximum depth of each decision tree in the ensemble.

public override int MaxDepth { get; }

Property Value

int

NumberOfTrees

Gets the number of decision trees in the ensemble.

public override int NumberOfTrees { get; }

Property Value

int

SupportsJitCompilation

Gets whether this AdaBoost.R2 model supports JIT compilation.

public override bool SupportsJitCompilation { get; }

Property Value

bool: true when soft tree mode is enabled and the ensemble has been trained; false otherwise.

Remarks

AdaBoost.R2 supports JIT compilation when soft tree mode is enabled. In soft mode, each tree in the ensemble uses sigmoid-based soft gating instead of hard if-then splits, making the weighted ensemble differentiable.

The computation graph follows the weighted averaging formula:

prediction = Σ(weight_i × tree_i(input)) / Σ(weight_i)

For Beginners: JIT compilation is available when soft tree mode is enabled.

In soft tree mode:

Each tree in the AdaBoost ensemble uses smooth transitions
Tree weights (based on training error) are embedded in the computation graph
The weighted average is computed just like regular AdaBoost

This gives you adaptive boosting benefits with JIT-compiled speed.

Methods

CalculateFeatureImportancesAsync(int)

Calculates the feature importances across all trees in the ensemble asynchronously.

protected override Task CalculateFeatureImportancesAsync(int numFeatures)

Parameters

numFeatures int: The number of features in the input data.

Returns

Task: A task representing the asynchronous operation.

Remarks

This method calculates the importance of each feature in making predictions, based on the trained ensemble of decision trees. The feature importance for each tree is weighted by the tree's weight in the ensemble, and then the weighted importances are summed across all trees. This provides insight into which features are most influential in the model's predictions.

For Beginners: This method calculates how important each input feature is for making accurate predictions.

Feature importance tells you which input variables have the most influence on the model's predictions. For example, if you're predicting house prices:

High feature importance for "square footage" would indicate that size strongly affects price
Low feature importance for "house color" would suggest color doesn't matter much for price

In AdaBoost.R2, the feature importance calculation:

Gets the importance of each feature from each decision tree
Weights these importances by how much influence each tree has in the ensemble
Combines them to get an overall importance score for each feature

This information is valuable for:

Understanding which factors drive your predictions
Simplifying your model by potentially removing unimportant features
Gaining insights into the underlying patterns in your data

CreateNewInstance()

Creates a new instance of the AdaBoostR2Regression with the same configuration as the current instance.

protected override IFullModel<T, Matrix<T>, Vector<T>> CreateNewInstance()

Returns

IFullModel<T, Matrix<T>, Vector<T>>: A new AdaBoostR2Regression instance with the same options and regularization as the current instance.

Remarks

This method creates a new instance of the AdaBoostR2Regression model with the same configuration options and regularization settings as the current instance. This is useful for model cloning, ensemble methods, or cross-validation scenarios where multiple instances of the same model with identical configurations are needed.

For Beginners: This method creates a fresh copy of the model's blueprint.

When you need multiple versions of the same type of model with identical settings:

This method creates a new, empty model with the same configuration
It's like making a copy of a recipe before you start cooking
The new model has the same settings but no trained data
This is useful for techniques that need multiple models, like cross-validation

For example, when testing your model on different subsets of data, you'd want each test to use a model with identical settings.

Deserialize(byte[])

Deserializes the model from a byte array.

public override void Deserialize(byte[] data)

Parameters

data byte[]: A byte array containing the serialized model.

Remarks

This method deserializes an AdaBoost.R2 regression model from a byte array, restoring the configuration options, the ensemble of trees with their weights, and initializing the random number generator. The deserialization is performed using JSON, with the decision trees deserialized from Base64 strings.

For Beginners: This method restores a previously saved model from its serialized format.

Deserializing allows you to:

Load a previously trained model without having to retrain it
Use models trained by others
Deploy pre-trained models to new environments

The process reconstructs:

All configuration settings
The entire ensemble of decision trees and their weights
The appropriate random number generator state

After deserialization, the model is ready to use for making predictions, just as if you had just finished training it.

ExportComputationGraph(List<ComputationNode<T>>)

Exports the AdaBoost.R2 model's computation graph for JIT compilation.

public override ComputationNode<T> ExportComputationGraph(List<ComputationNode<T>> inputNodes)

Parameters

inputNodes List<ComputationNode<T>>: List to populate with input computation nodes.

Returns

ComputationNode<T>: The root node of the exported computation graph.

Remarks

When soft tree mode is enabled, this exports the entire AdaBoost.R2 ensemble as a differentiable computation graph. The graph implements weighted averaging:

output = Σ(weight_i × tree_i(input)) / Σ(weight_i)

where each tree uses soft split operations.

For Beginners: This exports the AdaBoost ensemble as a computation graph.

AdaBoost uses weighted trees where:

Each tree has a weight based on how well it performed during training
Better-performing trees get higher weights
The final prediction is a weighted average of all tree predictions

The exported graph includes these weights for optimized inference.

Exceptions

NotSupportedException: Thrown when soft tree mode is not enabled.
InvalidOperationException: Thrown when the ensemble has not been trained.

GetModelMetadata()

Gets metadata about the trained model.

public override ModelMetadata<T> GetModelMetadata()

Returns

ModelMetadata<T>: A ModelMetaData<T> object containing information about the model.

Remarks

This method returns metadata about the trained AdaBoost.R2 regression model, including the model type, configuration options, feature importances, and regularization type. This information can be useful for model management, comparison, and documentation.

For Beginners: This method provides information about the trained model, which can be useful for documentation or comparison with other models.

The metadata includes:

The type of model (AdaBoost.R2)
Configuration settings like the number of trees and their maximum depth
Feature importance scores
The type of regularization used (if any)

This information helps you keep track of different models you've trained and understand their characteristics without having to retrain or examine the internal structure.

PredictAsync(Matrix<T>)

Makes predictions on new data using the trained ensemble of decision trees asynchronously.

public override Task<Vector<T>> PredictAsync(Matrix<T> input)

Parameters

input Matrix<T>: The input features matrix where each row is a sample to predict.

Returns

Task<Vector<T>>: A task representing the asynchronous operation, containing the predicted values.

Remarks

This method makes predictions for new data points using the trained AdaBoost.R2 ensemble. The prediction process consists of the following steps: 1. Regularize the input data (if regularization is enabled). 2. For each decision tree in the ensemble: a. Generate predictions for all input samples. b. Multiply the predictions by the tree's weight. 3. Compute the weighted average of all tree predictions for each sample. 4. Apply regularization to the final predictions (if regularization is enabled).

The predictions are processed in parallel to improve performance on multi-core systems.

For Beginners: This method uses the trained model to make predictions on new data.

Here's how the prediction works:

Each decision tree in the ensemble makes its own prediction for each input sample
These predictions are weighted by how well each tree performed during training (better trees have more influence on the final result)
The weighted predictions are averaged to produce the final prediction for each sample

The method uses parallel processing to make predictions faster on computers with multiple processing cores. This means that multiple trees can make their predictions simultaneously, speeding up the overall prediction process.

Serialize()

Serializes the model to a byte array for storage or transmission.

public override byte[] Serialize()

Returns

byte[]: A byte array containing the serialized model.

Remarks

This method serializes the AdaBoost.R2 regression model to a byte array, including the configuration options, the ensemble of trees with their weights, and the regularization type. The serialization is performed using JSON, with the decision trees serialized to Base64 strings.

For Beginners: This method converts the trained model into a format that can be saved to a file or database.

Serializing a model allows you to:

Save it for later use without having to retrain
Share it with others
Deploy it to production environments

The serialized data includes everything needed to recreate the model:

All configuration settings
The entire ensemble of decision trees and their weights
Information about the regularization used

After serializing, you can store the resulting byte array in a file or database, and later restore the model using the Deserialize method.

TrainAsync(Matrix<T>, Vector<T>)

Trains the AdaBoost.R2 regression model on the provided input data and target values asynchronously.

public override Task TrainAsync(Matrix<T> x, Vector<T> y)

Parameters

x Matrix<T>: The input features matrix where each row is a sample and each column is a feature.
y Vector<T>: The target values vector corresponding to the input samples.

Returns

Task: A task representing the asynchronous training operation.

Remarks

This method implements the AdaBoost.R2 algorithm for regression. It trains multiple decision trees sequentially, where each tree focuses more on samples that previous trees predicted poorly. The training process consists of the following steps: 1. Initialize sample weights equally for all training samples. 2. For the specified number of estimators: a. Train a decision tree on the weighted data. b. Calculate prediction errors for each sample. c. Compute the weighted average error. d. If the average error is â‰¥ 0.5, stop the training (the learner is too weak). e. Calculate the weight for the current tree based on its error. f. Update sample weights to focus more on poorly predicted samples. 3. Calculate feature importances across all trees in the ensemble.

For Beginners: This method teaches the model to make predictions based on your training data.

Here's what happens during training:

The method starts by giving equal importance to all training examples
For each new tree to be added to the ensemble:
- It trains a decision tree that pays attention to the importance weights
- It checks how well the tree performed on each example
- It calculates an overall error rate for the tree
- If the tree is too inaccurate (error â‰¥ 0.5), it stops adding more trees
- Otherwise, it calculates how much voting power this tree should get
- It updates the importance weights to focus more on examples that were predicted poorly
Finally, it calculates how important each feature (input variable) is for making predictions

This iterative process creates a diverse ensemble of trees that work together to make accurate predictions, with each tree specializing in different aspects of the problem.

Table of Contents

Class AdaBoostR2Regression<T>

Type Parameters

Remarks

Constructors

AdaBoostR2Regression(AdaBoostR2RegressionOptions, IRegularization<T, Matrix<T>, Vector<T>>?)

Parameters

Remarks

Properties

MaxDepth

Property Value

NumberOfTrees

Property Value

SupportsJitCompilation

Property Value

Remarks

Methods

CalculateFeatureImportancesAsync(int)

Parameters

Returns

Remarks

CreateNewInstance()

Returns

Remarks

Deserialize(byte[])

Parameters

Remarks

ExportComputationGraph(List<ComputationNode<T>>)

Parameters

Returns

Remarks

Exceptions

GetModelMetadata()

Returns

Remarks

PredictAsync(Matrix<T>)

Parameters

Returns

Remarks

Serialize()

Returns

Remarks

TrainAsync(Matrix<T>, Vector<T>)

Parameters

Returns

Remarks