Consider Adjusting Usage of `threshold` for `ShapFeatureSelector`

I was testing a model in the same vein as zoish is doing and observed that many of my predictors, $p$,  had
$$\overline{\mathrm{SHAP}(p)} = 0$$
, i.e. no contribution for the predictor for any training data point. The model I'm using is GPBoost but the principle is the same for others. I see that in zoish the default [`threshold=None`](https://github.com/TorkamaniLab/zoish/blob/956b03c49e987369d51b0d2c3124a869f48233e9/zoish/feature_selectors/shap_selectors.py#L426-L481) and in this case zoish could choose to retain features that have no effect if `num_features` is sufficiently high (and additionally which are selected would be arbitrary).

To deal with this possible issue I would suggest you change the default `threshold` to 0 and additionally change the `threshold` usage to require the feature importance be strictly greater than the threshold, which is used at https://github.com/TorkamaniLab/zoish/blob/956b03c49e987369d51b0d2c3124a869f48233e9/zoish/feature_selectors/shap_selectors.py#L755-L770. This way by default you would not ever include (unless forced into the model) features that have zero contribution. You could if desired still utilize the current functionality by passing in a negative threshold or `None`.

	# select features based on number or threshold
	if self.num_features is None and self.threshold is not None:
	self.selected_feature_idx = np.where(
	self.feature_importances_ >= self.threshold
	)[0]
	self.selected_feature_idx = list(
	set(self.selected_feature_idx).union(set(obligatory_feature_idx))
	)
	elif self.num_features is not None:
	self.selected_feature_idx = list(
	set(self.importance_order[: self.num_features]).union(
	set(obligatory_feature_idx)
	)
	)
	else:
	self.selected_feature_idx = []

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider Adjusting Usage of `threshold` for `ShapFeatureSelector` #84

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Consider Adjusting Usage of threshold for ShapFeatureSelector #84

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Consider Adjusting Usage of `threshold` for `ShapFeatureSelector` #84