The official repository of our [Paper] at NAACL 2024.
To reproduce the results of AdaRefiner, please follow the instructions below.
- Clone this repository and navigate to LLaVA folder
git clone https://github.com/PKU-RL/AdaRefiner.git
cd AdaRefiner
- Install Package
conda create -n AdaRefiner python=3.10 -y
conda activate AdaRefiner
pip install --upgrade pip
pip install -r requirements.txt
- Prepare models and configs
- You need to first download the Llama-2-7b-chat-hf model weights following https://huggingface.co/meta-llama/Llama-2-7b-chat-hf.
- After this, modify the following variables in
utils.py
:
API_KEY = '' # Your OpenAI API key
LLM_PATH = '' # Path to the LLM model (Llama-2-7b-chat-hf)
GPT_MODEL = 'gpt-4' # GPT model name
QUERY_INTERVAL = 100 # Query interval for LLMs
DATA_PATH = './data' # Data path for SFT
To train the AdaRefiner from scratch, first modify the following paths in sft.py
.
model_path = '' # path to the LLM to be finetuned
save_path = '' # path to save the finetuned model
In the first stage, the model_path
should be the path to the original Llama-2-7b-chat-hf
model. In the following stages, the model_path
should be changed to the finetuned model from the previous stage (ie, the model in the save_path
).
Then repeat the following steps to complete the multi-stage training:
- Train the policy and collect feedback data. (You may change the
max_train_steps
to a larger number for better performance. In our paper, we take 5 stages.)
python train.py --max_train_steps=1000000
-
Modify the
model_path
andsave_path
insft.py
as described above. -
Finetune the Adapter LM with the collected feedback data.
python sft.py
To test with the final models, run python test.py
.
Parts of the code are based on the crafter and stable-baselines3 repository.
If you find our work useful in your research and would like to cite our project, please use the following citation:
@inproceedings{zhang2024adarefiner,
title={{A}da{R}efiner: Refining Decisions of Language Models with Adaptive Feedback},
author={Zhang, Wanpeng and Lu, Zongqing},
booktitle={Findings of the Association for Computational Linguistics: NAACL 2024},
year={2024},
publisher={Association for Computational Linguistics},
pages={782--799}
}