Table of Contents

Class ConditionalInferenceTreeRegression<T>

Namespace
AiDotNet.Regression
Assembly
AiDotNet.dll

Represents a conditional inference tree regression model that builds decision trees based on statistical tests.

public class ConditionalInferenceTreeRegression<T> : AsyncDecisionTreeRegressionBase<T>, IAsyncTreeBasedModel<T>, ITreeBasedRegression<T>, INonLinearRegression<T>, IRegression<T>, IFullModel<T, Matrix<T>, Vector<T>>, IModel<Matrix<T>, Vector<T>, ModelMetadata<T>>, IModelSerializer, ICheckpointableModel, IParameterizable<T, Matrix<T>, Vector<T>>, IFeatureAware, IFeatureImportance<T>, ICloneable<IFullModel<T, Matrix<T>, Vector<T>>>, IGradientComputable<T, Matrix<T>, Vector<T>>, IJitCompilable<T>

Type Parameters

T

The numeric type used for calculations, typically float or double.

Inheritance
ConditionalInferenceTreeRegression<T>
Implements
IFullModel<T, Matrix<T>, Vector<T>>
IModel<Matrix<T>, Vector<T>, ModelMetadata<T>>
IParameterizable<T, Matrix<T>, Vector<T>>
ICloneable<IFullModel<T, Matrix<T>, Vector<T>>>
IGradientComputable<T, Matrix<T>, Vector<T>>
Inherited Members
Extension Methods

Remarks

A conditional inference tree is a type of decision tree that uses statistical tests to determine optimal splits in the data. Unlike traditional decision trees that use measures like Gini impurity or information gain, conditional inference trees use statistical significance testing to create unbiased trees that don't favor features with many possible split points.

For Beginners: This class creates a special type of decision tree for predicting numerical values.

Think of a decision tree like a flowchart of yes/no questions that helps you make predictions:

  • The tree starts with a question (like "Is temperature > 70°F?")
  • Based on the answer, it follows different branches
  • It continues asking questions until it reaches a final prediction

What makes this tree special is how it chooses the questions:

  • It uses statistical tests to find the most meaningful questions to ask
  • It avoids favoring certain types of data unfairly
  • It provides a measurement of confidence (p-value) for each split

This approach tends to create more reliable and fair prediction models.

Constructors

ConditionalInferenceTreeRegression(ConditionalInferenceTreeOptions, IRegularization<T, Matrix<T>, Vector<T>>?)

Initializes a new instance of the ConditionalInferenceTreeRegression<T> class.

public ConditionalInferenceTreeRegression(ConditionalInferenceTreeOptions options, IRegularization<T, Matrix<T>, Vector<T>>? regularization = null)

Parameters

options ConditionalInferenceTreeOptions

The options that control the tree building process.

regularization IRegularization<T, Matrix<T>, Vector<T>>

Optional regularization to apply to input data to prevent overfitting.

Remarks

The constructor initializes a new conditional inference tree regression model with the specified options and optional regularization. The options control various aspects of tree building, such as the maximum depth, minimum number of samples required to split a node, and the significance level for statistical tests.

For Beginners: This creates a new prediction model with your chosen settings.

The options parameter controls things like:

  • How deep (complex) the tree can grow
  • How many data points are needed before making a split
  • How confident the model must be before creating a split

The regularization parameter is optional and helps prevent "overfitting" - a problem where the model learns the training data too perfectly and performs poorly on new data. Think of it like teaching a student the principles rather than just memorizing specific examples.

Methods

CalculateFeatureImportancesAsync(int)

Asynchronously calculates the importance of each feature in the model.

protected override Task CalculateFeatureImportancesAsync(int featureCount)

Parameters

featureCount int

The total number of features.

Returns

Task

A task representing the asynchronous operation.

Remarks

This protected method asynchronously calculates the importance of each feature in the model by traversing the decision tree. Feature importance is based on the statistical significance of splits using that feature, where features used in splits with lower p-values (higher significance) receive higher importance scores.

For Beginners: This method figures out which features are most important for predictions.

Feature importance:

  • Tells you which inputs have the biggest impact on the predictions
  • Is calculated by looking at all the splits in the tree that use each feature
  • Gives higher scores to features used in more statistically significant splits
  • Helps you understand what factors most influence the outcome you're predicting

For example, if "temperature" has a high importance score, it means that knowing the temperature gives you a lot of information about what the prediction will be.

CreateNewInstance()

Creates a new instance of the conditional inference tree regression model with the same configuration.

protected override IFullModel<T, Matrix<T>, Vector<T>> CreateNewInstance()

Returns

IFullModel<T, Matrix<T>, Vector<T>>

A new instance of ConditionalInferenceTreeRegression<T> with the same configuration as the current instance.

Remarks

This method creates a new conditional inference tree regression model that has the same configuration as the current instance. It's used for model persistence, cloning, and transferring the model's configuration to new instances.

For Beginners: This method makes a fresh copy of the current model with the same settings.

It's like creating a blueprint copy of your model that can be used to:

  • Save your model's settings
  • Create a new identical model
  • Transfer your model's configuration to another system

This is useful when you want to:

  • Create multiple similar models
  • Save a model's configuration for later use
  • Reset a model while keeping its settings

Deserialize(byte[])

Deserializes the model from a byte array.

public override void Deserialize(byte[] data)

Parameters

data byte[]

The byte array containing the serialized model.

Remarks

This method reconstructs the model from a serialized byte array. It reads the options, tree structure, and feature importances from the byte array and rebuilds the model.

For Beginners: This method loads a previously saved model from binary data.

Deserialization converts the binary data back into a working model:

  • It loads all the model's settings
  • It reconstructs the entire decision tree
  • It restores the feature importance scores

This allows you to:

  • Use a model that was trained earlier
  • Share models between different applications
  • Deploy models to production environments

It's like restoring the model from a snapshot so you can use it again without retraining.

GetModelMetadata()

Gets metadata about the regression model.

public override ModelMetadata<T> GetModelMetadata()

Returns

ModelMetadata<T>

A ModelMetaData<T> object containing model information.

Remarks

This method returns metadata about the regression model, including its type, hyperparameters, and feature importances. This information can be useful for model comparison, logging, and generating reports about model performance.

For Beginners: This method provides a summary of the model's settings and characteristics.

The metadata includes:

  • The type of model (Conditional Inference Tree)
  • The maximum depth of the tree
  • The minimum number of samples required to create a split
  • The significance level used for statistical tests
  • The importance scores for each feature

This information is useful for:

  • Comparing different models
  • Documenting what settings were used
  • Understanding the model's behavior
  • Generating reports about the model's performance

PredictAsync(Matrix<T>)

Asynchronously predicts target values for new input data.

public override Task<Vector<T>> PredictAsync(Matrix<T> input)

Parameters

input Matrix<T>

The input features matrix for prediction.

Returns

Task<Vector<T>>

A task representing the asynchronous operation, returning a vector of predicted values.

Remarks

This method asynchronously predicts target values for new input data by traversing the decision tree for each input sample. It first applies regularization to the input data if specified, then uses parallel processing to make predictions for multiple samples simultaneously.

For Beginners: This method uses the trained model to make predictions on new data.

When making predictions:

  • The new data is first prepared using the same process as during training
  • For each data point, the model follows the decision tree from top to bottom
  • At each node, it answers a yes/no question and follows the appropriate branch
  • When it reaches a leaf node, it returns the prediction stored there

The method can process multiple data points at the same time to make predictions faster. The result is a set of predicted values, one for each input row.

Serialize()

Serializes the model to a byte array.

public override byte[] Serialize()

Returns

byte[]

A byte array containing the serialized model.

Remarks

This method serializes the model to a byte array for storage or transmission. It includes all necessary information to reconstruct the model, including options, the tree structure, and feature importances.

For Beginners: This method saves the model to binary data that can be stored or shared.

Serialization converts the model into a compact format that:

  • Can be saved to a file
  • Can be sent over a network
  • Can be stored in a database
  • Can be loaded later to make predictions without retraining

The saved data includes:

  • All the model's settings (like max depth)
  • The entire structure of the decision tree
  • The feature importance scores

This is like taking a snapshot of the model for future use.

TrainAsync(Matrix<T>, Vector<T>)

Asynchronously trains the regression model on the provided training data.

public override Task TrainAsync(Matrix<T> x, Vector<T> y)

Parameters

x Matrix<T>

The input features matrix where rows represent samples and columns represent features.

y Vector<T>

The target values vector corresponding to each sample in the input matrix.

Returns

Task

A task representing the asynchronous training operation.

Remarks

This method trains the conditional inference tree regression model using the provided input features and target values. Unlike traditional regression models, tree-based methods do not apply data regularization transformations. Instead, they control model complexity through structural parameters such as MaxDepth, MinSamplesSplit, and MinSamplesLeaf. This method builds the tree recursively starting from the root node and calculates feature importances to identify which features have the most impact on predictions.

For Beginners: This method teaches the model to make predictions based on your data.

During training:

  • A decision tree is built by finding the best questions to ask at each step
  • The model looks for patterns that connect your input features to the values you want to predict
  • The importance of each feature is calculated, showing which inputs matter most for predictions

Note: Unlike linear regression models, tree-based methods control overfitting through tree structure parameters (like max depth) rather than through data regularization.

The "Async" in the method name means it can run in the background while your program does other things, which is helpful for large datasets.