Skip to content

Latest commit

 

History

History
403 lines (270 loc) · 17.3 KB

README.md

File metadata and controls

403 lines (270 loc) · 17.3 KB

GRAB: Uncovering Gradient Inversion Risks in Practical Language Model Training

image

Environment setup

Install the conda environment with the following commands one by one:

conda create -n GRAB python=3.9.4
conda activate GRAB
pip install torch==2.0.1+cu118 --index-url https://download.pytorch.org/whl/cu118
pip install transformers==4.34.1 joblib==1.3.2 numpy==1.26.4 datasets==1.16.1 torchmetrics==1.4.1 nltk==3.9.1

Download the required data for nltk in the python console

python
import nltk
nltk.download('punkt_tab')

Experiments

Benchmark Settings (Figure 2 in paper)

GRAB

To run our attack, from the root directory

cd main_attack

Create symbolic link of utils.

ln -s ../utils utils

Run the attack with the following command.

python bert_benchmark_attack.py --device DEVICE --model MODEL --dataset DATASET --batch_size BATCH_SIZE --parallel --run RUN

--device: the device to run experiments on, e.g., cuda:0
--model: the model to be attacked. Only use bert-base-uncased for this experiment.
--dataset: the dataset for experiments. Only use cola, sst2, or rotten_tomatoes.
--batch_size: the batch size for experiments. Choose from 1 to 32.
--parallel: whether to use parallel computing in discrete optimization. Remove this flag if you do not want to use parallel computing.
--run: the number of runs for each experiment. Use first, second, or third.

The results will be saved in results/benchmark/DATASET from root directory.

In our experiments, we do not set a fixed random seed and run the experiments three times by changing the --run parameter between first, second and third.

To evaluate the results, go back to the root directory and run the following command.

python evaluation.py --model MODEL --dataset DATASET --batch_size BATCH_SIZE --setting SETTING

--setting: the setting for evaluation. Only use benchmark here.

As mentioned above, we run the experiments three times. If you only wish to run the experiment once and evaluate the result, you can change line 35 in the evaluation script to keep "first" only.

Baselines

To run baseline attacks, from the root directory, go to the baselines folder.

cd baselines/lamp

Create the environment and download the required files provided by LAMP.

conda env create -f environment.yml
conda activate lamp
wget -r -np -R "index.html*" https://files.sri.inf.ethz.ch/lamp/
mv files.sri.inf.ethz.ch/lamp/* ./
rm -rf files.sri.inf.ethz.ch

We modified some of the code of the original implementation, such as the datasets loader, to make it compatible with our evaluation.

To run DLG attack, run the following command.

python attack.py --baseline --dataset DATASET --split test --loss dlg --n_inputs N_INPUTS -b BATCH_SIZE --lr 0.1 --lr_decay 1 --bert_path MODEL --n_steps 2500 --run RUN

--dataset: the dataset for experiments. Only use cola, sst2, or rotten_tomatoes.
--n_inputs: The number of batches. Our selected datasets have 64 samples. This should be 64/batch_size.
--b: the batch size for experiments. Choose from 1 to 32.
--bert_path: the model to be attacked. Only use bert-base-uncased for this experiment.
--run: the number of runs for each experiment. Use first, second, or third.

To run the TAG attack, run the following command.

python attack.py --baseline --dataset DATASET --split test --loss tag --n_inputs N_INPUTS -b BATCH_SIZE --lr 0.1 --lr_decay 1 --tag_factor 0.01 --bert_path MODEL --n_steps 2500 --run RUN

To run the LAMP_COS attack, run the following command.

python attack.py --dataset DATASET --split test --loss cos --n_inputs N_INPUTS -b BATCH_SIZE --coeff_perplexity 0.2 --coeff_reg 1 --lr 0.01 --lr_decay 0.89 --bert_path MODEL --n_steps 2000 --run RUN

To run the LAMP_L1L2 attack, run the following command.

python attack.py --dataset DATASET --split test --loss tag --n_inputs N_INPUTS -b BATCH_SIZE --coeff_perplexity 60 --coeff_reg 25 --lr 0.01 --lr_decay 0.89 --tag_factor 0.01 --bert_path MODEL --n_steps 2000 --run RUN

The results will be saved to results/benchmark/METHOD/DATASET from the lamp folder.

To evaluate the results, go back to the lamp folder and run the following command.

python evaluation.py --model MODEL --dataset DATASET --batch_size BATCH_SIZE --setting SETTING --method METHOD

--model: the model to be attacked. Only use bert-base-uncased for this experiment.
--dataset: the dataset for experiments. Only use cola, sst2, or rotten_tomatoes.
--batch_size: the batch size for experiments. Choose from 1 to 32.
--setting: the setting for evaluation. Only use benchmark here.
--method: the method being evaluated. Choose from dlg, tag, lamp_cos, and lamp_l1l2.\

Similar to our attack, we run the experiments three times. If you only wish to run the experiment once and evaluate the result, you can change line 25 in the evaluation script to keep "first" only.

For more information on the baseline attacks, check out the LAMP repository at https://github.com/eth-sri/lamp

Practical Settings (Figure 3 in paper)

GRAB

From the root directory, go to the main_attack folder, activate the GRAB environment and run the attack with the following command.
python bert_practical_attack.py --device DEVICE --model MODEL --dataset DATASET --batch_size BATCH_SIZE --parallel

The results will be saved in results/practical/DATASET from root directory. Make sure you create these folders before running the experiments.

To evaluate the results, go back to the root directory and run the following command.
python evaluation.py --model MODEL --dataset DATASET --batch_size BATCH_SIZE --setting SETTING

--setting: the setting for evaluation. Only use practical here.

To run our attack without dropout mask learning, from the root directory, go to the main_attack folder, activate the GRAB environment and run the attack with the following command.

python bert_practical_attack.py --device DEVICE --model MODEL --dataset DATASET --batch_size BATCH_SIZE --parallel

The results will be saved in results/practical_no_DL/DATASET from root directory. Make sure you create these folders before running the experiments.

To evaluate the results, go back to the root directory and run the following command.
python evaluation.py --model MODEL --dataset DATASET --batch_size BATCH_SIZE --setting SETTING

--setting: the setting for evaluation. Only use practical_no_DL here.

Baselines

From the root directory, go to the baselines/lamp folder, activate the LAMP environment.

We modify some of the code of the original implementation to turn off the embedding layer learning and activate the dropout to make it compatible with the practical settings.

To run DLG attack, run the following command.
python attack_practical.py --baseline --dataset DATASET --split test --loss dlg --n_inputs N_INPUTS -b BATCH_SIZE --lr 0.1 --lr_decay 1 --bert_path MODEL --n_steps 2500 --run RUN

--n_inputs: The number of batches. Our selected datasets have 64 samples. This should be 64/batch_size.
--run: the number of runs for each experiment. Use first, second, or third.

To run the TAG attack, run the following command.
python attack_practical.py --baseline --dataset DATASET --split test --loss tag --n_inputs N_INPUTS -b BATCH_SIZE --lr 0.1 --lr_decay 1 --tag_factor 0.01 --bert_path MODEL --n_steps 2500 --run RUN

To run the LAMP_COS attack, run the following command.
python attack_practical.py --dataset DATASET --split test --loss cos --n_inputs N_INPUTS -b BATCH_SIZE --coeff_perplexity 0.2 --coeff_reg 1 --lr 0.01 --lr_decay 0.89 --bert_path MODEL --n_steps 2000 --run RUN

To run the LAMP_L1L2 attack, run the following command.
python attack_practical.py --dataset DATASET --split test --loss tag --n_inputs N_INPUTS -b BATCH_SIZE --coeff_perplexity 60 --coeff_reg 25 --lr 0.01 --lr_decay 0.89 --tag_factor 0.01 --bert_path MODEL --n_steps 2000 --run RUN

The results will be saved to results/practical/METHOD/DATASET from the lamp folder. Make sure you create these folders before running the experiments.

To evaluate the results, go back to the lamp folder and run the following command.
python evaluation.py --model MODEL --dataset DATASET --batch_size BATCH_SIZE --setting SETTING --method METHOD

--setting: the setting for evaluation. Only use practical here. --method: the method being evaluated. Choose from dlg, tag, lamp_cos, and lamp_l1l2.

Similar to our attack, we run the experiments three times. If you only wish to run the experiment once and evaluate the result, you can change line 25 in the evaluation script to keep "first" only.

For more information on the baseline attacks, check out the LAMP repository: https://github.com/eth-sri/lamp

Ablation Studies

Model Sizes/Types (Table 3 in paper)

To run GRAB on bert-tiny, from the root directory, go to the ablation folder, activate the GRAB environment, and run the following commands.

python bert_tiny_practical_attack.py --run RUN

python bert_tiny_practical_no_DL_attack.py --run RUN

The results will be saved in results/ablation/bert_tiny/practical and results/ablation/bert_tiny/practical_no_DL from root directory. Make sure you create these folders before running the experiments.

To run the evaluation, go back to the root directory and run the following command.

python evaluation.py --ablation bert_tiny --setting SETTING --run RUN

--setting: the setting for evaluation. Use either practical or practical_no_DL.

To run GRAB on bert-large, from the root directory, go to the ablation folder, activate the GRAB environment, and run the following commands.

python bert_large_practical_attack.py --run RUN

python bert_large_practical_no_DL_attack.py --run RUN

The results will be saved in results/ablation/bert_large/practical and results/ablation/bert_large/practical_no_DL from root directory. Make sure you create these folders before running the experiments.

To run the evaluation, go back to the root directory and run the following command.

python evaluation.py --ablation bert_large --setting SETTING --run RUN

--setting: the setting for evaluation. Use either practical or practical_no_DL.

To run GRAB on RoBERTa_base, RoBERTa_tiny, and RoBERTa_large, from the root directory, go to the ablation folder, activate the GRAB environment, and run the following commands.

python roberta_base_practical_attack.py --run RUN

python roberta_tiny_practical_attack.py --run RUN

python roberta_large_practical_attack.py --run RUN

python roberta_base_practical_no_DL_attack.py --run RUN

python roberta_tiny_practical_no_DL_attack.py --run RUN

python roberta_large_practical_no_DL_attack.py --run RUN

The results and the evaluation will follow similar patterns. Please refer to previous sections.

To run baselines on bert-tiny and bert-large, from the root directory, go to the baselines/lamp folder, activate the LAMP environment.

To run DLG attack, run the following command.

python attack_tiny_practical.py --baseline --dataset DATASET --split test --loss dlg --n_inputs N_INPUTS -b BATCH_SIZE --lr 0.1 --lr_decay 1 --bert_path MODEL --n_steps 2500 --run RUN

--bert_path: use huawei-noah/TinyBERT_General_6L_768D here

python attack_large_practical_no_clip.py --baseline --dataset DATASET --split test --loss dlg --n_inputs N_INPUTS -b BATCH_SIZE --lr 0.1 --lr_decay_type LambdaLR --grad_clip 1.0 --bert_path bert-large-uncased --n_steps 10000 --opt_alg bert-adam --lr_max_it 10000 --run RUN

To run TAG attack, run the following command.

python attack_tiny_practical.py --baseline --dataset DATASET --split test --loss tag --n_inputs N_INPUTS -b BATCH_SIZE --lr 0.1 --lr_decay 1 --tag_factor 0.01 --bert_path MODEL --n_steps 2500 --run RUN

--bert_path: use huawei-noah/TinyBERT_General_6L_768D here

python attack_large_practical_no_clip.py --baseline --dataset DATASET --split test --loss tag --n_inputs N_INPUTS -b BATCH_SIZE --tag_factor 0.01 --lr 0.03 --lr_decay_type LambdaLR --grad_clip 1.0 --bert_path bert-large-uncased --n_steps 10000 --opt_alg bert-adam --lr_max_it 10000 --run RUN

To run LAMP_COS attack, run the following command.

python attack_tiny_practical.py --dataset DATASET --split test --loss cos --n_inputs N_INPUTS -b BATCH_SIZE --coeff_perplexity 0.2 --coeff_reg 1 --lr 0.01 --lr_decay 0.89 --bert_path MODEL --n_steps 2000 --run RUN

--bert_path: use huawei-noah/TinyBERT_General_6L_768D here

python attack_large_practical_no_clip.py --dataset DATASET --split test --loss cos --n_inputs N_INPUTS -b BATCH_SIZE --swap_burnin 0.1 --swap_every 200 --coeff_perplexity 0.2 --coeff_reg 1 --lr 0.01 --lr_decay_type LambdaLR --grad_clip 0.5 --bert_path bert-large-uncased --n_steps 5000 --opt_alg bert-adam --lr_max_it 10000 --run RUN

To run LAMP_L1L2 attack, run the following command.

python attack_tiny_practical.py --dataset DATASET --split test --loss tag --n_inputs N_INPUTS -b BATCH_SIZE --coeff_perplexity 60 --coeff_reg 25 --lr 0.01 --lr_decay 0.89 --tag_factor 0.01 --bert_path MODEL --n_steps 2000 --run RUN

--bert_path: use huawei-noah/TinyBERT_General_6L_768D here

python attack_large_practical_no_clip.py --dataset DATASET --split test --loss tag --n_inputs N_INPUTS -b BATCH_SIZE --swap_burnin 0.1 --swap_every 200 --coeff_perplexity 60 --coeff_reg 25 --tag_factor 0.01 --lr 0.01 --lr_decay_type LambdaLR --grad_clip 0.5 --bert_path bert-large-uncased --n_steps 5000 --opt_alg bert-adam --lr_max_it 10000 --run RUN

The results and the evaluation will follow similar patterns. Please refer to previous sections.

Gradient clips (Table 4 in paper)

To run GRAB on bert-large and roberta-large with gradient clipping, from the root directory, go to the ablation.

python bert_large_practical_grad_clip_attack.py --run RUN

python bert_large_practical_no_DL_grad_clip_attack.py --run RUN

python roberta_large_practical_grad_clip_attack.py --run RUN

python roberta_large_practical_no_DL_grad_clip_attack.py --run RUN

To run baselines, run the commands in the above section on large models but replace the name of the script with attack_large_practical.py

Dropout Rates (Figure 4 in paper)

To run GRAB on different dropout rates, from the root directory, go to the ablation folder, activate the GRAB environment, and run the following commands.

python bert_practical_dropout_attack.py --run RUN, --dropout DROPOUT

--dropout, the dropout rate to use, choose between 0.1 to 0.4

python bert_practical_no_DL_dropout_attack.py --run RUN, --dropout DROPOUT

The results and evaluation will follow similar patterns. Please refer to previous sections.

To run baselines on different dropout rates, follow the practical section and run the commands, replace the script name with attack_practical_dropout.py and pass in the dropout rate with the --dropout parameter.

Assumptions relaxation (Table 6 in paper)

To run GRAB on relaxed assumptions, from the root directory, go to the ablation folder, activate the GRAB environment, and run the following commands.

python bert_practical_label_attack.py --run RUN

python bert_practical_longest_length_attack.py --run RUN

python bert_practical_label_longest_length_attack.py --run RUN

Defenses

Gradient noise (Figure 5 in paper)

To run GRAB with gradient noise, from the root directory, go to the defense folder, activate the GRAB environment, and run the following commands.

python bert_practical_attack_noise.py --run RUN --noise NOISE

--noise: the noise level to use, choose between 0.001 to 0.05

python bert_practical_no_DL_attack_noise.py --run RUN --noise NOISE

python bert_practical_no_NN_attack_noise.py --run RUN --noise NOISE

python bert_practical_no_DL_no_NN_attack_noise.py --run RUN --noise NOISE

To run baselines with gradient noise, follow the practical section and run the commands, replace the script name with attack_practical_noise.py and pass in the noise level with the --defense_noise parameter.

Gradient pruning (Table 5 in paper)

To run GRAB with gradient pruning, from the root directory, go to the defense folder, activate the GRAB environment, and run the following commands.

python bert_practical_attack_prune.py --run RUN --prune PRUNE

--prune: the prune level to use, choose between 0.75 to 0.99

python bert_practical_no_DL_attack_prune.py --run RUN --prune PRUNE

python bert_practical_no_PM_attack_prune.py --run RUN --prune PRUNE

python bert_practical_no_DL_no_PM_attack_prune.py --run RUN --prune PRUNE

To run baselines with gradient pruning, follow the practical section and run the commands, replace the script name with attack_practical_prune.py and pass in the prune level with the --defense_pct_mask parameter.