Class RandomForestClassifierOptions<T>

Namespace: AiDotNet.Models.Options

Assembly: AiDotNet.dll

Configuration options for Random Forest classifiers.

public class RandomForestClassifierOptions<T> : ClassifierOptions<T>

Type Parameters

T: The data type used for calculations.

Inheritance: object

ModelOptions

ClassifierOptions<T>

RandomForestClassifierOptions<T>

Inherited Members: ClassifierOptions<T>.TaskType

ClassifierOptions<T>.DecisionThreshold

ClassifierOptions<T>.UseClassWeights

ClassifierOptions<T>.ClassWeights

ModelOptions.Seed

object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.MemberwiseClone()

object.ReferenceEquals(object, object)

object.ToString()

Remarks

Random Forest is an ensemble learning method that constructs multiple decision trees during training and outputs the class that is the mode (most frequent) of the classes predicted by individual trees.

For Beginners: Random Forest is like asking many decision trees for their opinion!

Imagine you're trying to classify a flower species:

Create many decision trees, each trained on a random subset of your data
Each tree considers only a random subset of features at each split
To classify a new flower, ask all trees and take a vote

Why does this work so well?

Each tree is slightly different due to random sampling
Averaging many "weak" trees often creates a "strong" classifier
It's much harder to overfit than a single deep tree

Random Forest is one of the most popular algorithms because it:

Works well with default settings
Handles both classification and regression
Is robust to outliers and noise
Provides feature importance scores

Properties

Bootstrap

Gets or sets whether to use bootstrap sampling.

public bool Bootstrap { get; set; }

Property Value

bool: True to use bootstrap sampling (default), false to use the whole dataset.

Remarks

Bootstrap sampling means each tree is trained on a random sample of the data (with replacement). This is a key part of the Random Forest algorithm.

For Beginners: Should each tree see a different random sample of data?

With Bootstrap = true (default):

Each tree trains on a random sample (with some duplicates)
About 63% of the data is used by each tree
The remaining 37% (out-of-bag) can be used for validation

With Bootstrap = false:

Each tree trains on the full dataset
Less diversity between trees
Generally not recommended

Criterion

Gets or sets the criterion used to measure the quality of a split.

public ClassificationSplitCriterion Criterion { get; set; }

Property Value

ClassificationSplitCriterion: The split criterion. Default is Gini impurity.

MaxDepth

Gets or sets the maximum depth of each tree.

public int? MaxDepth { get; set; }

Property Value

int?: The maximum depth, or null for unlimited depth. Default is null.

Remarks

Limiting depth prevents overfitting. With many trees, shallow trees often work well because the ensemble can still capture complex patterns.

For Beginners: How deep each tree can grow.

In a Random Forest, you often don't need to limit depth because:

Averaging many trees reduces overfitting
Random feature selection at each split adds diversity

However, limiting depth can speed up training significantly. Try MaxDepth = 10-20 if training is too slow.

MaxFeatures

Gets or sets the number of features to consider when looking for the best split.

public string MaxFeatures { get; set; }

Property Value

string: The number of features, a string specifier, or null for auto. Default is "sqrt" (square root of total features).

Remarks

Using a subset of features at each split is key to Random Forest's success. It introduces randomness that decorrelates the trees.

For Beginners: How many features each tree considers at each decision point.

Common settings:

"sqrt": Square root of features (default for classification)
"log2": Log base 2 of features
null or "all": All features (but this loses some randomness!)
A number: Exactly that many features

Using fewer features = more random, more different trees.

MinImpurityDecrease

Gets or sets the minimum impurity decrease required for a split.

public double MinImpurityDecrease { get; set; }

Property Value

double: The minimum impurity decrease. Default is 0.0.

MinSamplesLeaf

Gets or sets the minimum number of samples required at a leaf node.

public int MinSamplesLeaf { get; set; }

Property Value

int: The minimum number of samples. Default is 1.

MinSamplesSplit

Gets or sets the minimum number of samples required to split an internal node.

public int MinSamplesSplit { get; set; }

Property Value

int: The minimum number of samples. Default is 2.

NEstimators

Gets or sets the number of trees in the forest.

public int NEstimators { get; set; }

Property Value

int: The number of decision trees. Default is 100.

Remarks

More trees generally improve performance but increase training time and memory usage. The relationship between number of trees and accuracy has diminishing returns.

For Beginners: How many trees to grow in your forest.

10-50 trees: Quick training, may not be fully stable
100 trees: Good default, usually sufficient
200-500 trees: Better accuracy, slower training
1000+ trees: Rarely needed, diminishing returns

Start with 100 and increase if you need more accuracy.

NJobs

Gets or sets the number of jobs for parallel training.

public int NJobs { get; set; }

Property Value

int: The number of parallel jobs. -1 means use all processors. Default is 1.

Remarks

Since trees in a Random Forest are independent, they can be trained in parallel. This can significantly speed up training on multi-core machines.

OobScore

Gets or sets whether to compute out-of-bag score during training.

public bool OobScore { get; set; }

Property Value

bool: True to compute OOB score. Default is false.

Remarks

OOB score provides a validation estimate without needing a separate validation set. Only available when Bootstrap is true.

For Beginners: Get a "free" accuracy estimate during training!

Because each tree only sees about 63% of the data, the remaining 37% can be used to test that tree. Averaging these gives the OOB score.

It's like cross-validation but comes "for free" during training.

RandomState

Gets or sets the random state for reproducibility.

public int? RandomState { get; set; }

Property Value

int?: The random seed, or null for non-deterministic behavior. Default is null.

Table of Contents

Class RandomForestClassifierOptions<T>

Type Parameters

Remarks

Properties

Bootstrap

Property Value

Remarks

Criterion

Property Value

MaxDepth

Property Value

Remarks

MaxFeatures

Property Value

Remarks

MinImpurityDecrease

Property Value

MinSamplesLeaf

Property Value

MinSamplesSplit

Property Value

NEstimators

Property Value

Remarks

NJobs

Property Value

Remarks

OobScore

Property Value

Remarks

RandomState

Property Value