This repo contains the official code for the Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals paper. The datasets used are also available on HuggingFace
## Run the ExperimentsYou can run the experiments running the notebooks/experiments.ipynb
notebook. This notebook contains the code to run the experiments for the logit lens, logit attribution, and attention pattern.
You can run the experiment running the following command:
cd Script
python run_all.py
with the following arguments:
--model-name
: the name of the model to run the experiments on. It can begpt2
orEleuhterAI/pythia-6.9b
.--batch N
: the batch size to use for the experiments. ( Suggested 40 for gpt2, 10 for pythia)--experiment copyVSfact
: the experiment to run.--logit-attribution
: if you want to run the logit attribution experiment.--logit-len
: if you want to run the logit lens (fig 2) experiment.--pattern
: if you want to retrieve the attention pattern.
The script will create a folder in the Results/copyVSfact
directory with the name of the model.
Example:
cd Script
python run_all.py --model-name gpt2 --batch 40 --experiment copyVSfact --logit-attribution
To run the attention modification experiments, you should look at the notebooks/attention_modification.ipynb
notebook. This notebook contains the code to run the experiments for the attention modification.
You can plot using the src_figure/PaperPlot_multiple_subject.Rmd
.