Table of Contents

Class RandomForestClassifierOptions<T>

Namespace
AiDotNet.Models.Options
Assembly
AiDotNet.dll

Configuration options for Random Forest classifiers.

public class RandomForestClassifierOptions<T> : ClassifierOptions<T>

Type Parameters

T

The data type used for calculations.

Inheritance
RandomForestClassifierOptions<T>
Inherited Members

Remarks

Random Forest is an ensemble learning method that constructs multiple decision trees during training and outputs the class that is the mode (most frequent) of the classes predicted by individual trees.

For Beginners: Random Forest is like asking many decision trees for their opinion!

Imagine you're trying to classify a flower species:

  1. Create many decision trees, each trained on a random subset of your data
  2. Each tree considers only a random subset of features at each split
  3. To classify a new flower, ask all trees and take a vote

Why does this work so well?

  • Each tree is slightly different due to random sampling
  • Averaging many "weak" trees often creates a "strong" classifier
  • It's much harder to overfit than a single deep tree

Random Forest is one of the most popular algorithms because it:

  • Works well with default settings
  • Handles both classification and regression
  • Is robust to outliers and noise
  • Provides feature importance scores

Properties

Bootstrap

Gets or sets whether to use bootstrap sampling.

public bool Bootstrap { get; set; }

Property Value

bool

True to use bootstrap sampling (default), false to use the whole dataset.

Remarks

Bootstrap sampling means each tree is trained on a random sample of the data (with replacement). This is a key part of the Random Forest algorithm.

For Beginners: Should each tree see a different random sample of data?

With Bootstrap = true (default):

  • Each tree trains on a random sample (with some duplicates)
  • About 63% of the data is used by each tree
  • The remaining 37% (out-of-bag) can be used for validation

With Bootstrap = false:

  • Each tree trains on the full dataset
  • Less diversity between trees
  • Generally not recommended

Criterion

Gets or sets the criterion used to measure the quality of a split.

public ClassificationSplitCriterion Criterion { get; set; }

Property Value

ClassificationSplitCriterion

The split criterion. Default is Gini impurity.

MaxDepth

Gets or sets the maximum depth of each tree.

public int? MaxDepth { get; set; }

Property Value

int?

The maximum depth, or null for unlimited depth. Default is null.

Remarks

Limiting depth prevents overfitting. With many trees, shallow trees often work well because the ensemble can still capture complex patterns.

For Beginners: How deep each tree can grow.

In a Random Forest, you often don't need to limit depth because:

  • Averaging many trees reduces overfitting
  • Random feature selection at each split adds diversity

However, limiting depth can speed up training significantly. Try MaxDepth = 10-20 if training is too slow.

MaxFeatures

Gets or sets the number of features to consider when looking for the best split.

public string MaxFeatures { get; set; }

Property Value

string

The number of features, a string specifier, or null for auto. Default is "sqrt" (square root of total features).

Remarks

Using a subset of features at each split is key to Random Forest's success. It introduces randomness that decorrelates the trees.

For Beginners: How many features each tree considers at each decision point.

Common settings:

  • "sqrt": Square root of features (default for classification)
  • "log2": Log base 2 of features
  • null or "all": All features (but this loses some randomness!)
  • A number: Exactly that many features

Using fewer features = more random, more different trees.

MinImpurityDecrease

Gets or sets the minimum impurity decrease required for a split.

public double MinImpurityDecrease { get; set; }

Property Value

double

The minimum impurity decrease. Default is 0.0.

MinSamplesLeaf

Gets or sets the minimum number of samples required at a leaf node.

public int MinSamplesLeaf { get; set; }

Property Value

int

The minimum number of samples. Default is 1.

MinSamplesSplit

Gets or sets the minimum number of samples required to split an internal node.

public int MinSamplesSplit { get; set; }

Property Value

int

The minimum number of samples. Default is 2.

NEstimators

Gets or sets the number of trees in the forest.

public int NEstimators { get; set; }

Property Value

int

The number of decision trees. Default is 100.

Remarks

More trees generally improve performance but increase training time and memory usage. The relationship between number of trees and accuracy has diminishing returns.

For Beginners: How many trees to grow in your forest.

  • 10-50 trees: Quick training, may not be fully stable
  • 100 trees: Good default, usually sufficient
  • 200-500 trees: Better accuracy, slower training
  • 1000+ trees: Rarely needed, diminishing returns

Start with 100 and increase if you need more accuracy.

NJobs

Gets or sets the number of jobs for parallel training.

public int NJobs { get; set; }

Property Value

int

The number of parallel jobs. -1 means use all processors. Default is 1.

Remarks

Since trees in a Random Forest are independent, they can be trained in parallel. This can significantly speed up training on multi-core machines.

OobScore

Gets or sets whether to compute out-of-bag score during training.

public bool OobScore { get; set; }

Property Value

bool

True to compute OOB score. Default is false.

Remarks

OOB score provides a validation estimate without needing a separate validation set. Only available when Bootstrap is true.

For Beginners: Get a "free" accuracy estimate during training!

Because each tree only sees about 63% of the data, the remaining 37% can be used to test that tree. Averaging these gives the OOB score.

It's like cross-validation but comes "for free" during training.

RandomState

Gets or sets the random state for reproducibility.

public int? RandomState { get; set; }

Property Value

int?

The random seed, or null for non-deterministic behavior. Default is null.