Potential error of method in mimic evaluation #9

josephenguehard · 2022-10-10T10:25:25Z

Hi,

There seems to be an error of method in the mimic evaluation. For each saliency method, the topk most important features are replaced by an average. However this topk is selected on the whole test set, which means that theoretically some patients' data could be completely replaced by the average. This is a problem since such evaluation method not only rewards important features, but also important patients.

To corroborate this insight, I've run a simple method: select the topk patients with the highest model predictions, and replace all of their data by the average. Such a method performs "better" than DeepLift, without explaining any feature, which is concerning. I've included the plots and code below:

N.B: DeepLift results were run with model in eval mode: please see #8 .

In experiments/results/mimic/plot_benchmarks.py:

...
name_dict = {
        "fit": "FIT",
        "deep_lift": "DL",
        "afo": "AFO",
        "fo": "FO",
        "retain": "RT",
        "integrated_gradient": "IG",
        "gradient_shap": "GS",
        "lime": "LIME",
        "dynamask": "MASK",
        "top_pred": "TOPK PREDS"
    }
...
        # Load the model:
        model = StateClassifier(
            feature_size=N_features, n_state=2, hidden_size=200, rnn="GRU", device=device, return_all=True
        )
        model.load_state_dict(torch.load(os.path.join(path, f"model_{cv}.pt")))
        model.eval()

        # For each mask area, we compute the CE and the ACC for each attribution method:
        for i, fraction in enumerate(areas):
            N_drop = int(fraction * N_exp * N_features * T)  # The number of inputs to perturb
            Y = model(X.transpose(1, 2))
            Y = Y[:, -1]
            Y = Y.reshape(-1, 2)
            Y_s = torch.softmax(Y, dim=-1)
            Y = torch.argmax(Y_s, dim=-1).detach().cpu().numpy()  # This is the predicted class for the unperturbed input

            # For each attribution method, use the saliency map to construct a perturbed input:
            for k, explainer in enumerate(explainers):
                if explainer == "dynamask":
...
                elif explainer == "top_pred":
                        idx = torch.topk(Y_s[:, 1], int(len(Y_s) * fraction)).indices
                        mask_tensor = torch.zeros_like(X)
                        mask_tensor[idx] = 1.
                        # Perturb the most relevant inputs and compute the associated output:
                        X_pert = (1 - mask_tensor) * X + mask_tensor * X_avg
                        Y_pert = model(X_pert.transpose(1, 2))
                        Y_pert = Y_pert[:, -1]
                        Y_pert = Y_pert.reshape(-1, 2)
                        Y_pert = torch.softmax(Y_pert, dim=-1)
                        proba_pert = Y_pert.detach().cpu().numpy()
                        Y_pert = torch.argmax(Y_pert, dim=-1).detach().cpu().numpy()
                        metrics_array[k, i, 0, cv] = metrics.log_loss(Y, proba_pert)
                        metrics_array[k, i, 1, cv] = metrics.accuracy_score(Y, Y_pert)  # This is ACC
                        
                else:
...

acc.pdf
ce.pdf

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential error of method in mimic evaluation #9

Potential error of method in mimic evaluation #9

josephenguehard commented Oct 10, 2022

Potential error of method in mimic evaluation #9

Potential error of method in mimic evaluation #9

Comments

josephenguehard commented Oct 10, 2022