Class RandomForestClassifierOptions<T>
Configuration options for Random Forest classifiers.
public class RandomForestClassifierOptions<T> : ClassifierOptions<T>
Type Parameters
TThe data type used for calculations.
- Inheritance
-
RandomForestClassifierOptions<T>
- Inherited Members
Remarks
Random Forest is an ensemble learning method that constructs multiple decision trees during training and outputs the class that is the mode (most frequent) of the classes predicted by individual trees.
For Beginners: Random Forest is like asking many decision trees for their opinion!
Imagine you're trying to classify a flower species:
- Create many decision trees, each trained on a random subset of your data
- Each tree considers only a random subset of features at each split
- To classify a new flower, ask all trees and take a vote
Why does this work so well?
- Each tree is slightly different due to random sampling
- Averaging many "weak" trees often creates a "strong" classifier
- It's much harder to overfit than a single deep tree
Random Forest is one of the most popular algorithms because it:
- Works well with default settings
- Handles both classification and regression
- Is robust to outliers and noise
- Provides feature importance scores
Properties
Bootstrap
Gets or sets whether to use bootstrap sampling.
public bool Bootstrap { get; set; }
Property Value
- bool
True to use bootstrap sampling (default), false to use the whole dataset.
Remarks
Bootstrap sampling means each tree is trained on a random sample of the data (with replacement). This is a key part of the Random Forest algorithm.
For Beginners: Should each tree see a different random sample of data?
With Bootstrap = true (default):
- Each tree trains on a random sample (with some duplicates)
- About 63% of the data is used by each tree
- The remaining 37% (out-of-bag) can be used for validation
With Bootstrap = false:
- Each tree trains on the full dataset
- Less diversity between trees
- Generally not recommended
Criterion
Gets or sets the criterion used to measure the quality of a split.
public ClassificationSplitCriterion Criterion { get; set; }
Property Value
- ClassificationSplitCriterion
The split criterion. Default is Gini impurity.
MaxDepth
Gets or sets the maximum depth of each tree.
public int? MaxDepth { get; set; }
Property Value
- int?
The maximum depth, or null for unlimited depth. Default is null.
Remarks
Limiting depth prevents overfitting. With many trees, shallow trees often work well because the ensemble can still capture complex patterns.
For Beginners: How deep each tree can grow.
In a Random Forest, you often don't need to limit depth because:
- Averaging many trees reduces overfitting
- Random feature selection at each split adds diversity
However, limiting depth can speed up training significantly. Try MaxDepth = 10-20 if training is too slow.
MaxFeatures
Gets or sets the number of features to consider when looking for the best split.
public string MaxFeatures { get; set; }
Property Value
- string
The number of features, a string specifier, or null for auto. Default is "sqrt" (square root of total features).
Remarks
Using a subset of features at each split is key to Random Forest's success. It introduces randomness that decorrelates the trees.
For Beginners: How many features each tree considers at each decision point.
Common settings:
- "sqrt": Square root of features (default for classification)
- "log2": Log base 2 of features
- null or "all": All features (but this loses some randomness!)
- A number: Exactly that many features
Using fewer features = more random, more different trees.
MinImpurityDecrease
Gets or sets the minimum impurity decrease required for a split.
public double MinImpurityDecrease { get; set; }
Property Value
- double
The minimum impurity decrease. Default is 0.0.
MinSamplesLeaf
Gets or sets the minimum number of samples required at a leaf node.
public int MinSamplesLeaf { get; set; }
Property Value
- int
The minimum number of samples. Default is 1.
MinSamplesSplit
Gets or sets the minimum number of samples required to split an internal node.
public int MinSamplesSplit { get; set; }
Property Value
- int
The minimum number of samples. Default is 2.
NEstimators
Gets or sets the number of trees in the forest.
public int NEstimators { get; set; }
Property Value
- int
The number of decision trees. Default is 100.
Remarks
More trees generally improve performance but increase training time and memory usage. The relationship between number of trees and accuracy has diminishing returns.
For Beginners: How many trees to grow in your forest.
- 10-50 trees: Quick training, may not be fully stable
- 100 trees: Good default, usually sufficient
- 200-500 trees: Better accuracy, slower training
- 1000+ trees: Rarely needed, diminishing returns
Start with 100 and increase if you need more accuracy.
NJobs
Gets or sets the number of jobs for parallel training.
public int NJobs { get; set; }
Property Value
- int
The number of parallel jobs. -1 means use all processors. Default is 1.
Remarks
Since trees in a Random Forest are independent, they can be trained in parallel. This can significantly speed up training on multi-core machines.
OobScore
Gets or sets whether to compute out-of-bag score during training.
public bool OobScore { get; set; }
Property Value
- bool
True to compute OOB score. Default is false.
Remarks
OOB score provides a validation estimate without needing a separate validation set. Only available when Bootstrap is true.
For Beginners: Get a "free" accuracy estimate during training!
Because each tree only sees about 63% of the data, the remaining 37% can be used to test that tree. Averaging these gives the OOB score.
It's like cross-validation but comes "for free" during training.
RandomState
Gets or sets the random state for reproducibility.
public int? RandomState { get; set; }
Property Value
- int?
The random seed, or null for non-deterministic behavior. Default is null.