This repo contains the SynthWiki dataset and code to evaluate LLMs on long-context SynthWiki RAG tasks, with and without Attention Sorting, as described in Attention Sorting Combats Recency Bias In Long Context Language Models.
SynthWiki is a collection of "mad-lib" snippets generated by prompting LLMs with incomplete templates based on Wikipedia articles. The project then evaluates the performance of different LLMs in understanding and completing these snippets, focusing on their tendency to exhibit recency bias. It also implements "attention sorting", which sorts documents according to their average attention weight, ameliorating recency bias.
pip install -r requirements.txt
python setup.py install
# optional: get openai/anthropic keys and install them
pip install openai
pip install anthropic
- Pre-generated SynthWiki dataset available at
data/madlibs/madlibs1.csv
- Alternatively, generate new datasets using
generate_madlibs.py
(requires OpenAI API key)
- Evaluate different LLMs using the following scripts:
- GPT-3: eval_gpt3.py
- Claude: eval_claude.py
- Llama: eval_llama.py
- The analysis for the plots presented in the paper is located in the analysis sub-directory.
@misc{peysakhovich2023attention,
title={Attention Sorting Combats Recency Bias In Long Context Language Models},
author={Alexander Peysakhovich and Adam Lerer},
year={2023},
eprint={2310.01427},
archivePrefix={arXiv},
primaryClass={cs.CL}
}