Class ConditionalInferenceTreeRegression<T>
- Namespace
- AiDotNet.Regression
- Assembly
- AiDotNet.dll
Represents a conditional inference tree regression model that builds decision trees based on statistical tests.
public class ConditionalInferenceTreeRegression<T> : AsyncDecisionTreeRegressionBase<T>, IAsyncTreeBasedModel<T>, ITreeBasedRegression<T>, INonLinearRegression<T>, IRegression<T>, IFullModel<T, Matrix<T>, Vector<T>>, IModel<Matrix<T>, Vector<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Matrix<T>, Vector<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Matrix<T>, Vector<T>>>, IGradientComputable<T, Matrix<T>, Vector<T>>, IJitCompilable<T>
Type Parameters
TThe numeric type used for calculations, typically float or double.
- Inheritance
-
ConditionalInferenceTreeRegression<T>
- Implements
-
IRegression<T>
- Inherited Members
- Extension Methods
Remarks
A conditional inference tree is a type of decision tree that uses statistical tests to determine optimal splits in the data. Unlike traditional decision trees that use measures like Gini impurity or information gain, conditional inference trees use statistical significance testing to create unbiased trees that don't favor features with many possible split points.
For Beginners: This class creates a special type of decision tree for predicting numerical values.
Think of a decision tree like a flowchart of yes/no questions that helps you make predictions:
- The tree starts with a question (like "Is temperature > 70°F?")
- Based on the answer, it follows different branches
- It continues asking questions until it reaches a final prediction
What makes this tree special is how it chooses the questions:
- It uses statistical tests to find the most meaningful questions to ask
- It avoids favoring certain types of data unfairly
- It provides a measurement of confidence (p-value) for each split
This approach tends to create more reliable and fair prediction models.
Constructors
ConditionalInferenceTreeRegression(ConditionalInferenceTreeOptions, IRegularization<T, Matrix<T>, Vector<T>>?)
Initializes a new instance of the ConditionalInferenceTreeRegression<T> class.
public ConditionalInferenceTreeRegression(ConditionalInferenceTreeOptions options, IRegularization<T, Matrix<T>, Vector<T>>? regularization = null)
Parameters
optionsConditionalInferenceTreeOptionsThe options that control the tree building process.
regularizationIRegularization<T, Matrix<T>, Vector<T>>Optional regularization to apply to input data to prevent overfitting.
Remarks
The constructor initializes a new conditional inference tree regression model with the specified options and optional regularization. The options control various aspects of tree building, such as the maximum depth, minimum number of samples required to split a node, and the significance level for statistical tests.
For Beginners: This creates a new prediction model with your chosen settings.
The options parameter controls things like:
- How deep (complex) the tree can grow
- How many data points are needed before making a split
- How confident the model must be before creating a split
The regularization parameter is optional and helps prevent "overfitting" - a problem where the model learns the training data too perfectly and performs poorly on new data. Think of it like teaching a student the principles rather than just memorizing specific examples.
Methods
CalculateFeatureImportancesAsync(int)
Asynchronously calculates the importance of each feature in the model.
protected override Task CalculateFeatureImportancesAsync(int featureCount)
Parameters
featureCountintThe total number of features.
Returns
- Task
A task representing the asynchronous operation.
Remarks
This protected method asynchronously calculates the importance of each feature in the model by traversing the decision tree. Feature importance is based on the statistical significance of splits using that feature, where features used in splits with lower p-values (higher significance) receive higher importance scores.
For Beginners: This method figures out which features are most important for predictions.
Feature importance:
- Tells you which inputs have the biggest impact on the predictions
- Is calculated by looking at all the splits in the tree that use each feature
- Gives higher scores to features used in more statistically significant splits
- Helps you understand what factors most influence the outcome you're predicting
For example, if "temperature" has a high importance score, it means that knowing the temperature gives you a lot of information about what the prediction will be.
CreateNewInstance()
Creates a new instance of the conditional inference tree regression model with the same configuration.
protected override IFullModel<T, Matrix<T>, Vector<T>> CreateNewInstance()
Returns
- IFullModel<T, Matrix<T>, Vector<T>>
A new instance of ConditionalInferenceTreeRegression<T> with the same configuration as the current instance.
Remarks
This method creates a new conditional inference tree regression model that has the same configuration as the current instance. It's used for model persistence, cloning, and transferring the model's configuration to new instances.
For Beginners: This method makes a fresh copy of the current model with the same settings.
It's like creating a blueprint copy of your model that can be used to:
- Save your model's settings
- Create a new identical model
- Transfer your model's configuration to another system
This is useful when you want to:
- Create multiple similar models
- Save a model's configuration for later use
- Reset a model while keeping its settings
Deserialize(byte[])
Deserializes the model from a byte array.
public override void Deserialize(byte[] data)
Parameters
databyte[]The byte array containing the serialized model.
Remarks
This method reconstructs the model from a serialized byte array. It reads the options, tree structure, and feature importances from the byte array and rebuilds the model.
For Beginners: This method loads a previously saved model from binary data.
Deserialization converts the binary data back into a working model:
- It loads all the model's settings
- It reconstructs the entire decision tree
- It restores the feature importance scores
This allows you to:
- Use a model that was trained earlier
- Share models between different applications
- Deploy models to production environments
It's like restoring the model from a snapshot so you can use it again without retraining.
GetModelMetadata()
Gets metadata about the regression model.
public override ModelMetadata<T> GetModelMetadata()
Returns
- ModelMetadata<T>
A ModelMetaData<T> object containing model information.
Remarks
This method returns metadata about the regression model, including its type, hyperparameters, and feature importances. This information can be useful for model comparison, logging, and generating reports about model performance.
For Beginners: This method provides a summary of the model's settings and characteristics.
The metadata includes:
- The type of model (Conditional Inference Tree)
- The maximum depth of the tree
- The minimum number of samples required to create a split
- The significance level used for statistical tests
- The importance scores for each feature
This information is useful for:
- Comparing different models
- Documenting what settings were used
- Understanding the model's behavior
- Generating reports about the model's performance
PredictAsync(Matrix<T>)
Asynchronously predicts target values for new input data.
public override Task<Vector<T>> PredictAsync(Matrix<T> input)
Parameters
inputMatrix<T>The input features matrix for prediction.
Returns
- Task<Vector<T>>
A task representing the asynchronous operation, returning a vector of predicted values.
Remarks
This method asynchronously predicts target values for new input data by traversing the decision tree for each input sample. It first applies regularization to the input data if specified, then uses parallel processing to make predictions for multiple samples simultaneously.
For Beginners: This method uses the trained model to make predictions on new data.
When making predictions:
- The new data is first prepared using the same process as during training
- For each data point, the model follows the decision tree from top to bottom
- At each node, it answers a yes/no question and follows the appropriate branch
- When it reaches a leaf node, it returns the prediction stored there
The method can process multiple data points at the same time to make predictions faster. The result is a set of predicted values, one for each input row.
Serialize()
Serializes the model to a byte array.
public override byte[] Serialize()
Returns
- byte[]
A byte array containing the serialized model.
Remarks
This method serializes the model to a byte array for storage or transmission. It includes all necessary information to reconstruct the model, including options, the tree structure, and feature importances.
For Beginners: This method saves the model to binary data that can be stored or shared.
Serialization converts the model into a compact format that:
- Can be saved to a file
- Can be sent over a network
- Can be stored in a database
- Can be loaded later to make predictions without retraining
The saved data includes:
- All the model's settings (like max depth)
- The entire structure of the decision tree
- The feature importance scores
This is like taking a snapshot of the model for future use.
TrainAsync(Matrix<T>, Vector<T>)
Asynchronously trains the regression model on the provided training data.
public override Task TrainAsync(Matrix<T> x, Vector<T> y)
Parameters
xMatrix<T>The input features matrix where rows represent samples and columns represent features.
yVector<T>The target values vector corresponding to each sample in the input matrix.
Returns
- Task
A task representing the asynchronous training operation.
Remarks
This method trains the conditional inference tree regression model using the provided input features and target values. Unlike traditional regression models, tree-based methods do not apply data regularization transformations. Instead, they control model complexity through structural parameters such as MaxDepth, MinSamplesSplit, and MinSamplesLeaf. This method builds the tree recursively starting from the root node and calculates feature importances to identify which features have the most impact on predictions.
For Beginners: This method teaches the model to make predictions based on your data.
During training:
- A decision tree is built by finding the best questions to ask at each step
- The model looks for patterns that connect your input features to the values you want to predict
- The importance of each feature is calculated, showing which inputs matter most for predictions
Note: Unlike linear regression models, tree-based methods control overfitting through tree structure parameters (like max depth) rather than through data regularization.
The "Async" in the method name means it can run in the background while your program does other things, which is helpful for large datasets.