Class FeatureImportanceFitDetectorOptions
Configuration options for the Feature Importance Fit Detector, which analyzes how different input features contribute to a model's predictions and evaluates potential issues with model fit.
public class FeatureImportanceFitDetectorOptions
- Inheritance
-
FeatureImportanceFitDetectorOptions
- Inherited Members
Remarks
Feature importance analysis helps identify which input variables have the strongest influence on model predictions. This detector uses permutation importance (randomly shuffling feature values and measuring the impact on predictions) to assess feature relevance and detect potential issues like overfitting, underfitting, or redundant features.
For Beginners: Think of this as a tool that helps you understand which of your input data points actually matter for making predictions. For example, if you're predicting house prices, this would tell you whether square footage, number of bedrooms, or neighborhood has the biggest impact on price predictions. It also helps identify potential problems with your model, like whether it's focusing too much on unimportant details or not capturing important patterns. The options below let you adjust how sensitive this analysis should be.
Properties
CorrelationThreshold
Gets or sets the threshold for considering features as correlated.
public double CorrelationThreshold { get; set; }
Property Value
- double
The correlation threshold, defaulting to 0.7 (70%).
Remarks
Features with correlation coefficients above this threshold (in absolute value) are considered strongly correlated. Highly correlated features often provide redundant information, and removing some of them might simplify the model without significant loss of performance.
For Beginners: This setting determines when two features are considered to contain similar information. With the default value of 0.7, if two features have a correlation of 70% or higher, they're considered strongly related. For example, in housing data, square footage and number of rooms might be highly correlated because larger houses tend to have more rooms. Including both features might not add much value compared to just using one of them. Identifying correlated features can help you simplify your model by removing redundant inputs. A higher threshold (like 0.8 or 0.9) would only flag the most extremely correlated features.
HighImportanceThreshold
Gets or sets the threshold for considering feature importance as high.
public double HighImportanceThreshold { get; set; }
Property Value
- double
The high importance threshold, defaulting to 0.1 (10%).
Remarks
Features with importance scores above this threshold are considered highly influential on the model's predictions. The importance score represents how much the model's performance decreases when the feature values are randomly shuffled.
For Beginners: This setting determines when a feature is considered "very important" to your model. With the default value of 0.1, any feature that, when randomized, causes your model's accuracy to drop by 10% or more is considered highly important. Think of it like identifying the key players on a sports team - these are the features that your model relies on heavily to make good predictions. If you want to be more selective about what counts as important, you could increase this value (e.g., to 0.15 or 0.2).
HighVarianceThreshold
Gets or sets the threshold for considering importance variance as high.
public double HighVarianceThreshold { get; set; }
Property Value
- double
The high variance threshold, defaulting to 0.2 (20%).
Remarks
This threshold determines when the variance in a feature's importance scores across multiple permutations is considered high. High variance suggests that the feature's importance is unstable and might indicate complex interactions or potential overfitting.
For Beginners: This setting helps identify features whose importance is inconsistent or unreliable. With the default value of 0.2, if a feature's importance scores vary by more than 20% across calculations, its importance is considered unstable. Continuing the runner analogy, this would be like a runner whose race times are all over the place - sometimes fast, sometimes slow. Features with high variance might indicate complex relationships in your data or potential problems with your model. They warrant closer investigation since the model's reliance on them is unpredictable.
LowImportanceThreshold
Gets or sets the threshold for considering feature importance as low.
public double LowImportanceThreshold { get; set; }
Property Value
- double
The low importance threshold, defaulting to 0.01 (1%).
Remarks
Features with importance scores below this threshold are considered to have minimal influence on the model's predictions. These features might be candidates for removal to simplify the model without significant loss of performance.
For Beginners: This setting determines when a feature is considered "not important" to your model. With the default value of 0.01, any feature that, when randomized, causes your model's accuracy to drop by less than 1% is considered unimportant. These are like bench players who rarely affect the outcome of a game. Identifying these features can help you simplify your model by removing inputs that don't contribute much. If you want to be more aggressive about removing features, you could increase this threshold (e.g., to 0.02 or 0.03).
LowVarianceThreshold
Gets or sets the threshold for considering importance variance as low.
public double LowVarianceThreshold { get; set; }
Property Value
- double
The low variance threshold, defaulting to 0.05 (5%).
Remarks
This threshold determines when the variance (inconsistency) in a feature's importance scores across multiple permutations is considered low. Low variance suggests that the feature's importance is stable and reliable.
For Beginners: This setting helps determine when a feature's importance is consistent and reliable. The system calculates importance multiple times (see NumPermutations), and this threshold checks how much the results vary. With the default value of 0.05, if the importance scores vary by less than 5% across calculations, the feature's importance is considered stable. Think of it like measuring a runner's race times - if they always finish within a few seconds of the same time, their performance is consistent. Features with low variance are ones you can confidently say are either important or unimportant to your model.
NumPermutations
Gets or sets the number of permutations to perform for each feature when calculating importance.
public int NumPermutations { get; set; }
Property Value
- int
The number of permutations, defaulting to 5.
Remarks
This parameter determines how many times each feature is randomly shuffled to calculate its importance. More permutations provide more stable importance estimates but increase computation time.
For Beginners: This setting determines how many times the system calculates each feature's importance. With the default value of 5, the system will shuffle each feature's values 5 different times and measure the impact on predictions each time, then average the results. More permutations (like 10 or 20) give more reliable importance scores but take longer to calculate. Think of it like taking multiple measurements to get a more accurate average - more measurements generally mean more confidence in the result, but at the cost of more time and effort.
RandomSeed
Gets or sets the random seed for feature permutation.
public int RandomSeed { get; set; }
Property Value
- int
The random seed, defaulting to 42.
Remarks
The random seed ensures reproducibility when shuffling feature values during permutation importance calculation. Using the same seed value will produce the same random shuffles each time the analysis is run.
For Beginners: This setting controls the randomization process used when calculating feature importance. The specific value (42) doesn't matter much, but keeping it constant ensures you get the same results each time you run the analysis. This is important for reproducibility - like setting a starting point for a random number generator. You generally don't need to change this unless you want to verify that your results are stable across different randomizations, in which case you might run the analysis multiple times with different seed values.
UncorrelatedRatioThreshold
Gets or sets the threshold for the ratio of uncorrelated feature pairs to consider features as mostly uncorrelated.
public double UncorrelatedRatioThreshold { get; set; }
Property Value
- double
The uncorrelated ratio threshold, defaulting to 0.8 (80%).
Remarks
This threshold determines when the overall feature set is considered to have low correlation. If the proportion of feature pairs with correlation below the CorrelationThreshold exceeds this value, the feature set is considered mostly uncorrelated, which is generally desirable.
For Beginners: This setting helps evaluate whether your overall set of features is diverse or redundant. With the default value of 0.8, if at least 80% of all possible feature pairs have low correlation (below the CorrelationThreshold), then your feature set is considered diverse with minimal redundancy. This is generally good because it means each feature is contributing unique information. If this threshold isn't met, it suggests many of your features contain overlapping information, and you might benefit from feature selection or dimensionality reduction techniques to simplify your model.