-
Notifications
You must be signed in to change notification settings - Fork 1
Description
I was testing a model in the same vein as zoish is doing and observed that many of my predictors,
, i.e. no contribution for the predictor for any training data point. The model I'm using is GPBoost but the principle is the same for others. I see that in zoish the default threshold=None and in this case zoish could choose to retain features that have no effect if num_features is sufficiently high (and additionally which are selected would be arbitrary).
To deal with this possible issue I would suggest you change the default threshold to 0 and additionally change the threshold usage to require the feature importance be strictly greater than the threshold, which is used at
zoish/zoish/feature_selectors/shap_selectors.py
Lines 755 to 770 in 956b03c
| # select features based on number or threshold | |
| if self.num_features is None and self.threshold is not None: | |
| self.selected_feature_idx = np.where( | |
| self.feature_importances_ >= self.threshold | |
| )[0] | |
| self.selected_feature_idx = list( | |
| set(self.selected_feature_idx).union(set(obligatory_feature_idx)) | |
| ) | |
| elif self.num_features is not None: | |
| self.selected_feature_idx = list( | |
| set(self.importance_order[: self.num_features]).union( | |
| set(obligatory_feature_idx) | |
| ) | |
| ) | |
| else: | |
| self.selected_feature_idx = [] |
None.