Skip to content

Consider Adjusting Usage of threshold for ShapFeatureSelector #84

@brcopeland

Description

@brcopeland

I was testing a model in the same vein as zoish is doing and observed that many of my predictors, $p$, had
$$\overline{\mathrm{SHAP}(p)} = 0$$
, i.e. no contribution for the predictor for any training data point. The model I'm using is GPBoost but the principle is the same for others. I see that in zoish the default threshold=None and in this case zoish could choose to retain features that have no effect if num_features is sufficiently high (and additionally which are selected would be arbitrary).

To deal with this possible issue I would suggest you change the default threshold to 0 and additionally change the threshold usage to require the feature importance be strictly greater than the threshold, which is used at

# select features based on number or threshold
if self.num_features is None and self.threshold is not None:
self.selected_feature_idx = np.where(
self.feature_importances_ >= self.threshold
)[0]
self.selected_feature_idx = list(
set(self.selected_feature_idx).union(set(obligatory_feature_idx))
)
elif self.num_features is not None:
self.selected_feature_idx = list(
set(self.importance_order[: self.num_features]).union(
set(obligatory_feature_idx)
)
)
else:
self.selected_feature_idx = []
. This way by default you would not ever include (unless forced into the model) features that have zero contribution. You could if desired still utilize the current functionality by passing in a negative threshold or None.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions