Enum ImportanceThresholdStrategy
Defines strategies for setting the importance threshold in feature selection.
public enum ImportanceThresholdStrategy
Fields
Mean = 0Keep features with importance greater than or equal to the mean importance.
For Beginners: This strategy calculates the average importance of all features and keeps only the features that are above average.
For example, if you have 10 features with importances: [0.01, 0.05, 0.10, 0.12, 0.15, 0.18, 0.20, 0.22, 0.25, 0.30], the mean is 0.158, so it would keep the 6 features with importance >= 0.158.
Advantages: - Simple and intuitive - Automatically adapts to your data - Tends to keep roughly half the features (assuming symmetric distribution)
Disadvantages: - May keep too many features if importances are skewed - Doesn't guarantee a specific number of features
Median = 1Keep features with importance greater than or equal to the median importance.
For Beginners: This strategy keeps the top 50% of features by importance. The median is the middle value when all importances are sorted.
This is like a class where exactly half the students are above the median score.
Advantages: - Always keeps approximately 50% of features - More robust to outliers than mean - Predictable number of features
Disadvantages: - May keep too many features if most are unimportant - May keep too few features if many are equally important
Remarks
For Beginners: When selecting features based on importance scores from a model, you need to decide which features are "important enough" to keep. This enum provides different strategies for making that decision.
Think of it like deciding which students make the honor roll: - Mean: Keep students who score above the class average - Median: Keep the top 50% of students
Note: You can also use a custom threshold by calling the SelectFromModel constructor that accepts a specific threshold value instead of a strategy.