Interpretability-Guided-Defense

This is the official repository for the ECCV 2024 paper: "Interpretability-Guided Test-Time Adversarial Defense"
- We propose the first neuron-interpretability-guided test-time defense (IG-Defense) utilizing neuron importance ranking to improve adversarial robustness. IG-Defense is training-free, efficient, and effective.
- We uncover novel insights into improving adversarial robustness by analyzing adversarial attacks through the lens of neuron-level interpretability.
- Our proposed IG-Defense consistently improves the robustness on standard CIFAR10, CIFAR100, and ImageNet-1k benchmarks.
- We also demonstrate improved robustness upto 3.4%, 3.8%, and 1.5% against a wide range of white-box, black-box, and adaptive attacks respectively with the lowest inference time (4x faster) among existing test-time defenses.
We illustrate the overview of our IG-Defense below. For more information about IG-Defense, please check out our project page.

Requirements

Python 3.6 (not very strict though, anything 3.6+ should work out)

Install the required packages:

pip install -r requirements.txt
pip install git+https://github.com/RobustBench/robustbench.git

Pretrained weights

Download DAJAT ResNet18 CIFAR10 pretrained weights, TRADES-AWP WideResNet-34-10 CIFAR10 pretrained weights, and FAT ResNet50 ImageNet-1k pretrained weights to the checkpoints/ directory.
Other pretrained weights can be used from the RobustBench model zoo. The corresponding model code needs to be added to the models/ directory (and modified similar to given example models).

Neuron Importance Ranking Methods

The scripts below can be used to obtain the CLIP-Dissect (CD-IR) and Leave-one-Out (LO-IR) neuron importance rankings.
By default, they are for DAJAT ResNet18 pretrained weights, but commented out examples are given for TRADES-AWP and FAT ResNet50 (ImageNet) models.
- For ImageNet, modify L130,131 of utils.py and L19 of clip-dissect/utils.py with the path to ImageNet dataset. We did not use the entire ImageNet training set since it takes too long, we created a random 10% train-subset using this code repo.
```
bash scripts/get_cdir_rankings.sh
bash scripts/get_loir_rankings.sh
```

Analysis Experiment

The analysis experiment (Fig. 2 in the paper) uses the LO-IR neuron importance rankings, so please run it first using bash scripts/get_loir_rankings.sh.
After this, we can run the analysis experiment (by default for DAJAT ResNet18 pretrained model):
```
bash scripts/analysis.sh
```

AutoAttack Evaluation

Standard AutoAttack evaluation can be run for the base model, CD-IR defended model and LO-IR defended model (by default for pretrained DAJAT RN18 CIFAR10 model) using
```
bash scripts/eval.sh
```
Adaptive attack evaluation will be released soon.

Sources

CLIP-Dissect: https://github.com/Trustworthy-ML-Lab/CLIP-dissect
RobustBench: https://robustbench.github.io/

Cite this work

A. Kulkarni and T.-W. Weng, Interpretability-Guided Test-Time Adversarial Defense, ECCV 2024.

@inproceedings{kulkarni2024igdefense,
    title={Interpretability-Guided Test-Time Adversarial Defense},
    author={Kulkarni, Akshay and Weng, Tsui-Wei},
    booktitle={European Conference on Computer Vision},
    year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
clip-dissect		clip-dissect
clip		clip
data		data
models		models
scripts		scripts
.gitignore		.gitignore
README.md		README.md
analysis.py		analysis.py
eval.py		eval.py
get_loir_rankings.py		get_loir_rankings.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Interpretability-Guided-Defense

Requirements

Pretrained weights

Neuron Importance Ranking Methods

Analysis Experiment

AutoAttack Evaluation

Sources

Cite this work

About

Releases

Packages

Contributors 2

Languages

Trustworthy-ML-Lab/Interpretability-Guided-Defense

Folders and files

Latest commit

History

Repository files navigation

Interpretability-Guided-Defense

Requirements

Pretrained weights

Neuron Importance Ranking Methods

Analysis Experiment

AutoAttack Evaluation

Sources

Cite this work

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages