-
Notifications
You must be signed in to change notification settings - Fork 346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
On the reproduction of the 8B-DPO model #588
Comments
I am not sure why there is lower mmlu / math flex score in the picture you shared. We recently evaluated it again and found numbers to be consistent. ![]() Maybe it's the eval setup? We use https://github.com/allenai/olmes for evaluation (though we use an internal fork via the following script) open-instruct/scripts/eval/oe-eval.sh Lines 173 to 190 in 5ba9f0b
Here are what the prompts look like: ![]() ![]()
|
Actuall yeah it's the prompt issues. Few shot prompts with mmlu got lower results (consistent with yours) DPO (mmlu:cot:summaize)
DPO (mmlu:mc:tulu)
|
Thank you very much for your work. I am reproducing the 8B-DPO model, and I find that there is a big difference between the reproduced results and the results in your paper. Could you please help me to check whether my training script is correct?
This is the comparison result between our model and the official model

Here is our training script, we used 4 machines with 32 Gpus
Second, I noticed that in the training script you gave for the 8B model, GPUs*gradient_accumulation_steps=8*16=128. But the effective batch_size you give in your paper is 32, which should I follow?

This is the script you gave:https://github.com/allenai/open-instruct/blob/main/docs/tulu3.md
This is a screenshot from your paper:
The text was updated successfully, but these errors were encountered: