📄 AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback

The official repository of our [Paper] at NAACL 2024.

To reproduce the results of AdaRefiner, please follow the instructions below.

Installation & Preparation

Clone this repository and navigate to LLaVA folder

git clone https://github.com/PKU-RL/AdaRefiner.git
cd AdaRefiner

Install Package

conda create -n AdaRefiner python=3.10 -y
conda activate AdaRefiner
pip install --upgrade pip
pip install -r requirements.txt

Prepare models and configs

You need to first download the Llama-2-7b-chat-hf model weights following https://huggingface.co/meta-llama/Llama-2-7b-chat-hf.
After this, modify the following variables in utils.py:

API_KEY = '' # Your OpenAI API key
LLM_PATH = '' # Path to the LLM model (Llama-2-7b-chat-hf)
GPT_MODEL = 'gpt-4' # GPT model name
QUERY_INTERVAL = 100 # Query interval for LLMs
DATA_PATH = './data' # Data path for SFT

Training

To train the AdaRefiner from scratch, first modify the following paths in sft.py.

model_path = '' # path to the LLM to be finetuned
save_path = '' # path to save the finetuned model

In the first stage, the model_path should be the path to the original Llama-2-7b-chat-hf model. In the following stages, the model_path should be changed to the finetuned model from the previous stage (ie, the model in the save_path).

Then repeat the following steps to complete the multi-stage training:

Train the policy and collect feedback data. (You may change the max_train_steps to a larger number for better performance. In our paper, we take 5 stages.)

python train.py --max_train_steps=1000000

Modify the model_path and save_path in sft.py as described above.
Finetune the Adapter LM with the collected feedback data.

python sft.py

Test with the trained models

To test with the final models, run python test.py.

Acknowledgements

Parts of the code are based on the crafter and stable-baselines3 repository.

Citation

If you find our work useful in your research and would like to cite our project, please use the following citation:

@inproceedings{zhang2024adarefiner,
    title={{A}da{R}efiner: Refining Decisions of Language Models with Adaptive Feedback},
    author={Zhang, Wanpeng and Lu, Zongqing},
    booktitle={Findings of the Association for Computational Linguistics: NAACL 2024},
    year={2024},
    publisher={Association for Computational Linguistics},
    pages={782--799}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
config		config
crafter		crafter
framework		framework
imgs		imgs
models		models
stable_baselines3		stable_baselines3
tasks		tasks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
sft.py		sft.py
test.py		test.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📄 AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback

Installation & Preparation

Training

Test with the trained models

Acknowledgements

Citation

About

Releases

Packages

Languages

License

PKU-RL/AdaRefiner

Folders and files

Latest commit

History

Repository files navigation

📄 AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback

Installation & Preparation

Training

Test with the trained models

Acknowledgements

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages