Class DecisionTreeRegression<T>
- Namespace
- AiDotNet.Regression
- Assembly
- AiDotNet.dll
Represents a decision tree regression model that predicts continuous values based on input features.
public class DecisionTreeRegression<T> : DecisionTreeRegressionBase<T>, ITreeBasedRegression<T>, INonLinearRegression<T>, IRegression<T>, IFullModel<T, Matrix<T>, Vector<T>>, IModel<Matrix<T>, Vector<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Matrix<T>, Vector<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Matrix<T>, Vector<T>>>, IGradientComputable<T, Matrix<T>, Vector<T>>, IJitCompilable<T>
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
DecisionTreeRegression<T>
- Implements
-
IRegression<T>
- Inherited Members
- Extension Methods
Remarks
Decision tree regression builds a model in the form of a tree structure where each internal node represents a decision based on a feature, each branch represents an outcome of that decision, and each leaf node represents a predicted value. The model is trained by recursively splitting the data based on the optimal feature and threshold that minimizes the prediction error.
For Beginners: A decision tree regression is like a flowchart that helps predict numerical values.
Think of it like answering a series of yes/no questions to reach a prediction:
- "Is the temperature above 75—F?"
- "Is the humidity below 50%?"
- "Is it a weekend?"
Each question splits the data into two groups, and the tree learns which questions to ask to make the most accurate predictions. For example, a decision tree might predict house prices based on features like square footage, number of bedrooms, and neighborhood.
The model is called a "tree" because it resembles an upside-down tree, with a single starting point (root) that branches out into multiple endpoints (leaves) where the final predictions are made.
Constructors
DecisionTreeRegression(DecisionTreeOptions?, IRegularization<T, Matrix<T>, Vector<T>>?)
Initializes a new instance of the DecisionTreeRegression<T> class with optional configuration.
public DecisionTreeRegression(DecisionTreeOptions? options = null, IRegularization<T, Matrix<T>, Vector<T>>? regularization = null)
Parameters
optionsDecisionTreeOptionsOptional configuration options for the decision tree algorithm.
regularizationIRegularization<T, Matrix<T>, Vector<T>>Optional regularization strategy to prevent overfitting.
Remarks
This constructor creates a new decision tree regression model with the specified options and regularization strategy. If no options are provided, default values are used. If no regularization is specified, no regularization is applied.
For Beginners: This is how you create a new decision tree prediction model.
When creating a decision tree, you can specify two main things:
- Options: Controls how the tree grows (like its maximum depth or how many samples are needed to split)
- Regularization: Helps prevent the model from becoming too complex and "memorizing" the training data
If you don't specify these parameters, the model will use reasonable default settings.
Example:
// Create a decision tree with default settings
var tree = new DecisionTreeRegression<double>();
// Create a decision tree with custom options
var options = new DecisionTreeOptions { MaxDepth = 5 };
var customTree = new DecisionTreeRegression<double>(options);
Properties
NumberOfTrees
Gets the number of trees in this model, which is always 1 for a single decision tree.
public override int NumberOfTrees { get; }
Property Value
- int
The number of trees in the model, which is 1 for this implementation.
Remarks
This property returns the number of decision trees used in the model. For the DecisionTreeRegression class, this is always 1, as it implements a single decision tree. This property is provided for compatibility with ensemble methods that may use multiple trees.
For Beginners: This property simply tells you how many trees are in the model.
A single decision tree model (like this one) always returns 1.
Other algorithms like Random Forests or Gradient Boosting use multiple trees (sometimes hundreds or thousands) to make better predictions, but a basic decision tree uses just one tree structure.
Methods
CalculateFeatureImportances(int)
Calculates feature importances based on the number of features.
protected override void CalculateFeatureImportances(int featureCount)
Parameters
featureCountintThe total number of features.
CreateNewInstance()
Creates a new instance of the decision tree regression model with the same options.
protected override IFullModel<T, Matrix<T>, Vector<T>> CreateNewInstance()
Returns
- IFullModel<T, Matrix<T>, Vector<T>>
A new instance of the model with the same configuration but no trained parameters.
Remarks
This method creates a new instance of the decision tree regression model with the same configuration options and regularization method as the current instance, but without copying the trained parameters.
For Beginners: This method creates a fresh copy of the model configuration without any learned parameters.
Think of it like getting a blank notepad with the same paper quality and size, but without any writing on it yet. The new model has the same:
- Maximum depth setting
- Minimum samples split setting
- Split criterion (how nodes decide which feature to split on)
- Random seed (if specified)
- Regularization method
But it doesn't have any of the actual tree structure that was learned from data.
This is mainly used internally when doing things like cross-validation or creating ensembles of similar models with different training data.
Deserialize(byte[])
Loads a previously serialized decision tree model from a byte array.
public override void Deserialize(byte[] data)
Parameters
databyte[]The byte array containing the serialized model.
Remarks
This method reconstructs a decision tree model from a byte array that was previously created using the Serialize method. It restores the model's configuration options and tree structure, allowing the model to be used for predictions without retraining.
For Beginners: This method loads a previously saved model from a sequence of bytes.
Deserialization allows you to:
- Load a model that was saved earlier
- Use a model without having to retrain it
- Share models between different applications
When you deserialize a model:
- All settings are restored
- The entire tree structure is reconstructed
- The model is ready to make predictions immediately
Example:
// Load from a file
byte[] modelData = File.ReadAllBytes("decisionTree.model");
// Deserialize the model
var decisionTree = new DecisionTreeRegression<double>();
decisionTree.Deserialize(modelData);
// Now you can use the model for predictions
var predictions = decisionTree.Predict(newFeatures);
GetFeatureImportance(int)
Gets the importance score of a specific feature in the decision tree model.
public T GetFeatureImportance(int featureIndex)
Parameters
featureIndexintThe index of the feature to get the importance score for.
Returns
- T
The importance score of the specified feature.
Remarks
This method returns the importance score of the specified feature in the trained decision tree model. Feature importance scores indicate how useful each feature was in building the tree, with higher values indicating more important features. The scores are normalized to sum to 1.
For Beginners: This method tells you how important each feature is for making predictions.
Feature importance:
- Measures how much each feature contributes to the model's predictions
- Higher values mean the feature has more influence on the predictions
- Values range from 0 to 1, and all feature importances sum to 1
For example, when predicting house prices:
- Square footage might have importance 0.6 (very important)
- Number of bedrooms might have importance 0.3 (somewhat important)
- Year built might have importance 0.1 (less important)
This helps you understand which features matter most for your predictions.
Example:
// Get importance of the first feature (index 0)
var importance = decisionTree.GetFeatureImportance(0);
Console.WriteLine($"Feature 0 importance: {importance}");
Exceptions
- InvalidOperationException
Thrown when the model hasn't been trained yet.
- ArgumentOutOfRangeException
Thrown when the feature index is invalid.
GetModelMetadata()
Gets metadata about the decision tree model and its configuration.
public override ModelMetadata<T> GetModelMetadata()
Returns
- ModelMetadata<T>
A ModelMetadata object containing information about the model.
Remarks
This method returns metadata about the model, including its type and configuration options. This information can be useful for model management, comparison, and documentation purposes.
For Beginners: This method provides information about your decision tree model.
The metadata includes:
- The type of model (Decision Tree)
- Maximum depth of the tree (how many questions it can ask)
- Minimum samples required to split a node (how much data is needed to create a new decision point)
- Maximum features considered at each split (how many features the model looks at when deciding how to split)
This information is helpful when:
- Comparing different models
- Documenting your model's configuration
- Troubleshooting model performance
Example:
var metadata = decisionTree.GetModelMetadata();
Console.WriteLine($"Model type: {metadata.ModelType}");
Console.WriteLine($"Max depth: {metadata.AdditionalInfo["MaxDepth"]}");
Predict(Matrix<T>)
Predicts target values for the provided input features using the trained decision tree model.
public override Vector<T> Predict(Matrix<T> input)
Parameters
inputMatrix<T>A matrix where each row represents a sample to predict and each column represents a feature.
Returns
- Vector<T>
A vector of predicted values corresponding to each input sample.
Remarks
This method traverses the decision tree for each input sample to find the leaf node that corresponds to the sample's features. The prediction stored in the leaf node is then returned as the predicted value for that sample. Any specified regularization is applied to both the input data and the predictions.
For Beginners: This method uses your trained model to make predictions on new data.
How it works:
- For each row of input data, the model starts at the top of the decision tree
- At each decision point (node), it checks the value of a specific feature
- Based on that value, it follows the appropriate branch
- It continues until it reaches a leaf node (endpoint)
- The value stored in that leaf node becomes the prediction
For example, if predicting house prices:
- "Is square footage > 2000?" If yes, go left; if no, go right
- "Is number of bedrooms > 3?" If yes, go left; if no, go right
- Reach leaf node: Predict price = $350,000
Example:
// Create test data
var newFeatures = new Matrix<double>(...);
// Make predictions
var predictions = decisionTree.Predict(newFeatures);
Serialize()
Serializes the decision tree model to a byte array for storage or transmission.
public override byte[] Serialize()
Returns
- byte[]
A byte array containing the serialized model.
Remarks
This method converts the decision tree model into a byte array that can be stored in a file, database, or transmitted over a network. The serialized data includes the model's configuration options and the complete tree structure.
For Beginners: This method saves your trained model as a sequence of bytes.
Serialization allows you to:
- Save your model to a file
- Store your model in a database
- Send your model over a network
- Keep your model for later use without having to retrain it
The serialized data includes:
- All the model's settings (like maximum depth)
- The entire tree structure with all its decision rules
Example:
// Serialize the model
byte[] modelData = decisionTree.Serialize();
// Save to a file
File.WriteAllBytes("decisionTree.model", modelData);
Train(Matrix<T>, Vector<T>)
Trains the decision tree model using the provided input features and target values.
public override void Train(Matrix<T> x, Vector<T> y)
Parameters
xMatrix<T>A matrix where each row represents a sample and each column represents a feature.
yVector<T>A vector of target values corresponding to each sample in x.
Remarks
This method builds the decision tree model by recursively splitting the data based on features and thresholds that best reduce the prediction error. Unlike traditional regression models, decision trees do not apply data regularization transformations. Instead, they control model complexity through structural parameters such as MaxDepth, MinSamplesSplit, and MinSamplesLeaf. After building the tree, feature importances are calculated.
For Beginners: This method teaches the decision tree how to make predictions using your data.
You provide:
- x: Your input data (features) - like house size, number of bedrooms, location, etc.
- y: The values you want to predict - like house prices
The training process:
- Looks at your data to find patterns
- Decides which features are most useful for predictions
- Creates a tree structure with decision rules
- Figures out how important each feature is
After training, the model is ready to make predictions on new data.
Example:
// Create training data
var features = new Matrix<double>(...); // Input features
var targets = new Vector<double>(...); // Target values
// Train the model
decisionTree.Train(features, targets);
TrainWithWeights(Matrix<T>, Vector<T>, Vector<T>)
Trains the decision tree model using the provided input features, target values, and sample weights.
public void TrainWithWeights(Matrix<T> x, Vector<T> y, Vector<T> sampleWeights)
Parameters
xMatrix<T>A matrix where each row represents a sample and each column represents a feature.
yVector<T>A vector of target values corresponding to each sample in x.
sampleWeightsVector<T>A vector of weights for each sample, indicating their importance during training.
Remarks
This method builds the decision tree model similar to the Train method, but allows specifying different weights for each training sample. Samples with higher weights have more influence on the training process, which can be useful for handling imbalanced datasets or for boosting algorithms.
For Beginners: This method is similar to the regular Train method, but lets you specify how important each training example is.
Sample weights allow you to:
- Give more importance to certain examples during training
- Make the model pay more attention to rare cases
- Balance uneven datasets (where some outcomes are much more common than others)
For example, when predicting house prices:
- You might give higher weights to recent sales (more relevant)
- You might give lower weights to unusual properties (potential outliers)
- You might give higher weights to properties similar to the ones you'll make predictions for
Example:
// Create training data
var features = new Matrix<double>(...); // Input features
var targets = new Vector<double>(...); // Target values
var weights = new Vector<double>(...); // Sample weights
// Train the model with weights
decisionTree.TrainWithWeights(features, targets, weights);
Exceptions
- ArgumentException
Thrown when input dimensions don't match.