Class DecisionTreeRegression<T>

Namespace: AiDotNet.Regression

Assembly: AiDotNet.dll

Represents a decision tree regression model that predicts continuous values based on input features.

public class DecisionTreeRegression<T> : DecisionTreeRegressionBase<T>, ITreeBasedRegression<T>, INonLinearRegression<T>, IRegression<T>, IFullModel<T, Matrix<T>, Vector<T>>, IModel<Matrix<T>, Vector<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Matrix<T>, Vector<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Matrix<T>, Vector<T>>>, IGradientComputable<T, Matrix<T>, Vector<T>>, IJitCompilable<T>

Type Parameters

T: The numeric type used for calculations, typically float or double.

Inheritance: object

DecisionTreeRegressionBase<T>

DecisionTreeRegression<T>

Implements: ITreeBasedRegression<T>

INonLinearRegression<T>

IRegression<T>

IFullModel<T, Matrix<T>, Vector<T>>

IModel<Matrix<T>, Vector<T>, ModelMetadata<T>>

IModelSerializer

ICheckpointableModel

IParameterizable<T, Matrix<T>, Vector<T>>

IFeatureAware

IFeatureImportance<T>

ICloneable<IFullModel<T, Matrix<T>, Vector<T>>>

IGradientComputable<T, Matrix<T>, Vector<T>>

IJitCompilable<T>

Inherited Members: DecisionTreeRegressionBase<T>.NumOps

DecisionTreeRegressionBase<T>.Engine

DecisionTreeRegressionBase<T>.Root

DecisionTreeRegressionBase<T>.Options

DecisionTreeRegressionBase<T>.Regularization

DecisionTreeRegressionBase<T>.MaxDepth

DecisionTreeRegressionBase<T>.FeatureNames

DecisionTreeRegressionBase<T>.FeatureImportances

DecisionTreeRegressionBase<T>.GetParameters()

DecisionTreeRegressionBase<T>.WithParameters(Vector<T>)

DecisionTreeRegressionBase<T>.GetActiveFeatureIndices()

DecisionTreeRegressionBase<T>.IsFeatureUsed(int)

DecisionTreeRegressionBase<T>.SetParameters(Vector<T>)

DecisionTreeRegressionBase<T>.SetActiveFeatureIndices(IEnumerable<int>)

DecisionTreeRegressionBase<T>.GetFeatureImportance()

DecisionTreeRegressionBase<T>.DeepCopy()

DecisionTreeRegressionBase<T>.Clone()

DecisionTreeRegressionBase<T>.ParameterCount

DecisionTreeRegressionBase<T>.SaveModel(string)

DecisionTreeRegressionBase<T>.LoadModel(string)

DecisionTreeRegressionBase<T>.DefaultLossFunction

DecisionTreeRegressionBase<T>.ComputeGradients(Matrix<T>, Vector<T>, ILossFunction<T>)

DecisionTreeRegressionBase<T>.ApplyGradients(Vector<T>, T)

DecisionTreeRegressionBase<T>.SaveState(Stream)

DecisionTreeRegressionBase<T>.LoadState(Stream)

DecisionTreeRegressionBase<T>.UseSoftTree

DecisionTreeRegressionBase<T>.SoftTreeTemperature

DecisionTreeRegressionBase<T>.SupportsJitCompilation

DecisionTreeRegressionBase<T>.ExportComputationGraph(List<ComputationNode<T>>)

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Extension Methods: DistributedExtensions.AsDistributedForHighBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributedForLowBandwidth<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, ICommunicationBackend<T>)

DistributedExtensions.AsDistributed<T, TInput, TOutput>(IFullModel<T, TInput, TOutput>, IShardingConfiguration<T>)

Remarks

Decision tree regression builds a model in the form of a tree structure where each internal node represents a decision based on a feature, each branch represents an outcome of that decision, and each leaf node represents a predicted value. The model is trained by recursively splitting the data based on the optimal feature and threshold that minimizes the prediction error.

For Beginners: A decision tree regression is like a flowchart that helps predict numerical values.

Think of it like answering a series of yes/no questions to reach a prediction:

"Is the temperature above 75—F?"
"Is the humidity below 50%?"
"Is it a weekend?"

Each question splits the data into two groups, and the tree learns which questions to ask to make the most accurate predictions. For example, a decision tree might predict house prices based on features like square footage, number of bedrooms, and neighborhood.

The model is called a "tree" because it resembles an upside-down tree, with a single starting point (root) that branches out into multiple endpoints (leaves) where the final predictions are made.

Constructors

DecisionTreeRegression(DecisionTreeOptions?, IRegularization<T, Matrix<T>, Vector<T>>?)

Initializes a new instance of the DecisionTreeRegression<T> class with optional configuration.

public DecisionTreeRegression(DecisionTreeOptions? options = null, IRegularization<T, Matrix<T>, Vector<T>>? regularization = null)

Parameters

options DecisionTreeOptions: Optional configuration options for the decision tree algorithm.
regularization IRegularization<T, Matrix<T>, Vector<T>>: Optional regularization strategy to prevent overfitting.

Remarks

This constructor creates a new decision tree regression model with the specified options and regularization strategy. If no options are provided, default values are used. If no regularization is specified, no regularization is applied.

For Beginners: This is how you create a new decision tree prediction model.

When creating a decision tree, you can specify two main things:

Options: Controls how the tree grows (like its maximum depth or how many samples are needed to split)
Regularization: Helps prevent the model from becoming too complex and "memorizing" the training data

If you don't specify these parameters, the model will use reasonable default settings.

Example:

// Create a decision tree with default settings
var tree = new DecisionTreeRegression<double>();

// Create a decision tree with custom options
var options = new DecisionTreeOptions { MaxDepth = 5 };
var customTree = new DecisionTreeRegression<double>(options);

Properties

NumberOfTrees

Gets the number of trees in this model, which is always 1 for a single decision tree.

public override int NumberOfTrees { get; }

Property Value

int: The number of trees in the model, which is 1 for this implementation.

Remarks

This property returns the number of decision trees used in the model. For the DecisionTreeRegression class, this is always 1, as it implements a single decision tree. This property is provided for compatibility with ensemble methods that may use multiple trees.

For Beginners: This property simply tells you how many trees are in the model.

A single decision tree model (like this one) always returns 1.

Other algorithms like Random Forests or Gradient Boosting use multiple trees (sometimes hundreds or thousands) to make better predictions, but a basic decision tree uses just one tree structure.

Methods

CalculateFeatureImportances(int)

Calculates feature importances based on the number of features.

protected override void CalculateFeatureImportances(int featureCount)

Parameters

featureCount int: The total number of features.

CreateNewInstance()

Creates a new instance of the decision tree regression model with the same options.

protected override IFullModel<T, Matrix<T>, Vector<T>> CreateNewInstance()

Returns

IFullModel<T, Matrix<T>, Vector<T>>: A new instance of the model with the same configuration but no trained parameters.

Remarks

This method creates a new instance of the decision tree regression model with the same configuration options and regularization method as the current instance, but without copying the trained parameters.

For Beginners: This method creates a fresh copy of the model configuration without any learned parameters.

Think of it like getting a blank notepad with the same paper quality and size, but without any writing on it yet. The new model has the same:

Maximum depth setting
Minimum samples split setting
Split criterion (how nodes decide which feature to split on)
Random seed (if specified)
Regularization method

But it doesn't have any of the actual tree structure that was learned from data.

This is mainly used internally when doing things like cross-validation or creating ensembles of similar models with different training data.

Deserialize(byte[])

Loads a previously serialized decision tree model from a byte array.

public override void Deserialize(byte[] data)

Parameters

data byte[]: The byte array containing the serialized model.

Remarks

This method reconstructs a decision tree model from a byte array that was previously created using the Serialize method. It restores the model's configuration options and tree structure, allowing the model to be used for predictions without retraining.

For Beginners: This method loads a previously saved model from a sequence of bytes.

Deserialization allows you to:

Load a model that was saved earlier
Use a model without having to retrain it
Share models between different applications

When you deserialize a model:

All settings are restored
The entire tree structure is reconstructed
The model is ready to make predictions immediately

Example:

// Load from a file
byte[] modelData = File.ReadAllBytes("decisionTree.model");

// Deserialize the model
var decisionTree = new DecisionTreeRegression<double>();
decisionTree.Deserialize(modelData);

// Now you can use the model for predictions
var predictions = decisionTree.Predict(newFeatures);

GetFeatureImportance(int)

Gets the importance score of a specific feature in the decision tree model.

public T GetFeatureImportance(int featureIndex)

Parameters

featureIndex int: The index of the feature to get the importance score for.

Returns

T: The importance score of the specified feature.

Remarks

This method returns the importance score of the specified feature in the trained decision tree model. Feature importance scores indicate how useful each feature was in building the tree, with higher values indicating more important features. The scores are normalized to sum to 1.

For Beginners: This method tells you how important each feature is for making predictions.

Feature importance:

Measures how much each feature contributes to the model's predictions
Higher values mean the feature has more influence on the predictions
Values range from 0 to 1, and all feature importances sum to 1

For example, when predicting house prices:

Square footage might have importance 0.6 (very important)
Number of bedrooms might have importance 0.3 (somewhat important)
Year built might have importance 0.1 (less important)

This helps you understand which features matter most for your predictions.

Example:

// Get importance of the first feature (index 0)
var importance = decisionTree.GetFeatureImportance(0);
Console.WriteLine($"Feature 0 importance: {importance}");

Exceptions

InvalidOperationException: Thrown when the model hasn't been trained yet.
ArgumentOutOfRangeException: Thrown when the feature index is invalid.

GetModelMetadata()

Gets metadata about the decision tree model and its configuration.

public override ModelMetadata<T> GetModelMetadata()

Returns

ModelMetadata<T>: A ModelMetadata object containing information about the model.

Remarks

This method returns metadata about the model, including its type and configuration options. This information can be useful for model management, comparison, and documentation purposes.

For Beginners: This method provides information about your decision tree model.

The metadata includes:

The type of model (Decision Tree)
Maximum depth of the tree (how many questions it can ask)
Minimum samples required to split a node (how much data is needed to create a new decision point)
Maximum features considered at each split (how many features the model looks at when deciding how to split)

This information is helpful when:

Comparing different models
Documenting your model's configuration
Troubleshooting model performance

Example:

var metadata = decisionTree.GetModelMetadata();
Console.WriteLine($"Model type: {metadata.ModelType}");
Console.WriteLine($"Max depth: {metadata.AdditionalInfo["MaxDepth"]}");

Predict(Matrix<T>)

Predicts target values for the provided input features using the trained decision tree model.

public override Vector<T> Predict(Matrix<T> input)

Parameters

input Matrix<T>: A matrix where each row represents a sample to predict and each column represents a feature.

Returns

Vector<T>: A vector of predicted values corresponding to each input sample.

Remarks

This method traverses the decision tree for each input sample to find the leaf node that corresponds to the sample's features. The prediction stored in the leaf node is then returned as the predicted value for that sample. Any specified regularization is applied to both the input data and the predictions.

For Beginners: This method uses your trained model to make predictions on new data.

How it works:

For each row of input data, the model starts at the top of the decision tree
At each decision point (node), it checks the value of a specific feature
Based on that value, it follows the appropriate branch
It continues until it reaches a leaf node (endpoint)
The value stored in that leaf node becomes the prediction

For example, if predicting house prices:

"Is square footage > 2000?" If yes, go left; if no, go right
"Is number of bedrooms > 3?" If yes, go left; if no, go right
Reach leaf node: Predict price = $350,000

Example:

// Create test data
var newFeatures = new Matrix<double>(...);

// Make predictions
var predictions = decisionTree.Predict(newFeatures);

Serialize()

Serializes the decision tree model to a byte array for storage or transmission.

public override byte[] Serialize()

Returns

byte[]: A byte array containing the serialized model.

Remarks

This method converts the decision tree model into a byte array that can be stored in a file, database, or transmitted over a network. The serialized data includes the model's configuration options and the complete tree structure.

For Beginners: This method saves your trained model as a sequence of bytes.

Serialization allows you to:

Save your model to a file
Store your model in a database
Send your model over a network
Keep your model for later use without having to retrain it

The serialized data includes:

All the model's settings (like maximum depth)
The entire tree structure with all its decision rules

Example:

// Serialize the model
byte[] modelData = decisionTree.Serialize();

// Save to a file
File.WriteAllBytes("decisionTree.model", modelData);

Train(Matrix<T>, Vector<T>)

Trains the decision tree model using the provided input features and target values.

public override void Train(Matrix<T> x, Vector<T> y)

Parameters

x Matrix<T>: A matrix where each row represents a sample and each column represents a feature.
y Vector<T>: A vector of target values corresponding to each sample in x.

Remarks

This method builds the decision tree model by recursively splitting the data based on features and thresholds that best reduce the prediction error. Unlike traditional regression models, decision trees do not apply data regularization transformations. Instead, they control model complexity through structural parameters such as MaxDepth, MinSamplesSplit, and MinSamplesLeaf. After building the tree, feature importances are calculated.

For Beginners: This method teaches the decision tree how to make predictions using your data.

You provide:

x: Your input data (features) - like house size, number of bedrooms, location, etc.
y: The values you want to predict - like house prices

The training process:

Looks at your data to find patterns
Decides which features are most useful for predictions
Creates a tree structure with decision rules
Figures out how important each feature is

After training, the model is ready to make predictions on new data.

Example:

// Create training data
var features = new Matrix<double>(...); // Input features
var targets = new Vector<double>(...);  // Target values

// Train the model
decisionTree.Train(features, targets);

TrainWithWeights(Matrix<T>, Vector<T>, Vector<T>)

Trains the decision tree model using the provided input features, target values, and sample weights.

public void TrainWithWeights(Matrix<T> x, Vector<T> y, Vector<T> sampleWeights)

Parameters

x Matrix<T>: A matrix where each row represents a sample and each column represents a feature.
y Vector<T>: A vector of target values corresponding to each sample in x.
sampleWeights Vector<T>: A vector of weights for each sample, indicating their importance during training.

Remarks

This method builds the decision tree model similar to the Train method, but allows specifying different weights for each training sample. Samples with higher weights have more influence on the training process, which can be useful for handling imbalanced datasets or for boosting algorithms.

For Beginners: This method is similar to the regular Train method, but lets you specify how important each training example is.

Sample weights allow you to:

Give more importance to certain examples during training
Make the model pay more attention to rare cases
Balance uneven datasets (where some outcomes are much more common than others)

For example, when predicting house prices:

You might give higher weights to recent sales (more relevant)
You might give lower weights to unusual properties (potential outliers)
You might give higher weights to properties similar to the ones you'll make predictions for

Example:

// Create training data
var features = new Matrix<double>(...);  // Input features
var targets = new Vector<double>(...);   // Target values
var weights = new Vector<double>(...);   // Sample weights

// Train the model with weights
decisionTree.TrainWithWeights(features, targets, weights);

Exceptions

ArgumentException: Thrown when input dimensions don't match.

Table of Contents

Class DecisionTreeRegression<T>

Type Parameters

Remarks

Constructors

DecisionTreeRegression(DecisionTreeOptions?, IRegularization<T, Matrix<T>, Vector<T>>?)

Parameters

Remarks

Properties

NumberOfTrees

Property Value

Remarks

Methods

CalculateFeatureImportances(int)

Parameters

CreateNewInstance()

Returns

Remarks

Deserialize(byte[])

Parameters

Remarks

GetFeatureImportance(int)

Parameters

Returns

Remarks

Exceptions

GetModelMetadata()

Returns

Remarks

Predict(Matrix<T>)

Parameters

Returns

Remarks

Serialize()

Returns

Remarks

Train(Matrix<T>, Vector<T>)

Parameters

Remarks

TrainWithWeights(Matrix<T>, Vector<T>, Vector<T>)

Parameters

Remarks

Exceptions