- This is the official repository for the ECCV 2024 paper: "Interpretability-Guided Test-Time Adversarial Defense"
- We propose the first neuron-interpretability-guided test-time defense (IG-Defense) utilizing neuron importance ranking to improve adversarial robustness. IG-Defense is training-free, efficient, and effective.
- We uncover novel insights into improving adversarial robustness by analyzing adversarial attacks through the lens of neuron-level interpretability.
- Our proposed IG-Defense consistently improves the robustness on standard CIFAR10, CIFAR100, and ImageNet-1k benchmarks.
- We also demonstrate improved robustness upto 3.4%, 3.8%, and 1.5% against a wide range of white-box, black-box, and adaptive attacks respectively with the lowest inference time (4x faster) among existing test-time defenses.
- We illustrate the overview of our IG-Defense below. For more information about IG-Defense, please check out our project page.
Python 3.6 (not very strict though, anything 3.6+ should work out)
Install the required packages:
pip install -r requirements.txt
pip install git+https://github.com/RobustBench/robustbench.git
- Download DAJAT ResNet18 CIFAR10 pretrained weights, TRADES-AWP WideResNet-34-10 CIFAR10 pretrained weights, and FAT ResNet50 ImageNet-1k pretrained weights to the
checkpoints/
directory. - Other pretrained weights can be used from the RobustBench model zoo. The corresponding model code needs to be added to the
models/
directory (and modified similar to given example models).
-
The scripts below can be used to obtain the CLIP-Dissect (CD-IR) and Leave-one-Out (LO-IR) neuron importance rankings.
-
By default, they are for DAJAT ResNet18 pretrained weights, but commented out examples are given for TRADES-AWP and FAT ResNet50 (ImageNet) models.
- For ImageNet, modify L130,131 of
utils.py
and L19 ofclip-dissect/utils.py
with the path to ImageNet dataset. We did not use the entire ImageNet training set since it takes too long, we created a random 10% train-subset using this code repo.
bash scripts/get_cdir_rankings.sh bash scripts/get_loir_rankings.sh
- For ImageNet, modify L130,131 of
-
The analysis experiment (Fig. 2 in the paper) uses the LO-IR neuron importance rankings, so please run it first using
bash scripts/get_loir_rankings.sh
. -
After this, we can run the analysis experiment (by default for DAJAT ResNet18 pretrained model):
bash scripts/analysis.sh
-
Standard AutoAttack evaluation can be run for the base model, CD-IR defended model and LO-IR defended model (by default for pretrained DAJAT RN18 CIFAR10 model) using
bash scripts/eval.sh
-
Adaptive attack evaluation will be released soon.
- CLIP-Dissect: https://github.com/Trustworthy-ML-Lab/CLIP-dissect
- RobustBench: https://robustbench.github.io/
A. Kulkarni and T.-W. Weng, Interpretability-Guided Test-Time Adversarial Defense, ECCV 2024.
@inproceedings{kulkarni2024igdefense,
title={Interpretability-Guided Test-Time Adversarial Defense},
author={Kulkarni, Akshay and Weng, Tsui-Wei},
booktitle={European Conference on Computer Vision},
year={2024}
}