Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cutoff Length only followed for chosen response in Pairwise Data for DPO #4402

Open
1 task done
niravlg opened this issue Jun 20, 2024 · 1 comment
Open
1 task done
Labels
pending This problem is yet to be addressed

Comments

@niravlg
Copy link

niravlg commented Jun 20, 2024

Reminder

  • I have read the README and searched the existing issues.

System Info

I encountered an inconsistency in the way truncation is implemented for DPO in LLAMA-Factory and DPOTrainer in HuggingFace.

In LLAMA-Factory, it seems the cutoff length is only applicable to the chosen response length. The infer_max_len is applied individually for both (prompt + chosen) and (prompt + rejected) responses (checkout pairwise Dataset Implementation here.

Check out the way infer_max_len is used. The definition of infer_max_len is here.

However, to maintain the same prompt for both the chosen and rejected responses, the prompt obtained from cutting off the chosen length ids is added in front of the rejected response. This results in rejected responses that exceed the cutoff limit.

Reproduction

I printed out the maximum response lengths for both chosen and rejected responses and noticed this discrepancy (cutoff is set as 2048, chosen responses adhere to this, rejected responses do not):

{'chosen_input_ids_length': 2048, 'rejected_input_ids_length': 2543, 'chosen_labels_length': 2048, 'rejected_labels_length': 2543}
{'chosen_input_ids_length': 2048, 'rejected_input_ids_length': 2487, 'chosen_labels_length': 2048, 'rejected_labels_length': 2487}
{'chosen_input_ids_length': 2048, 'rejected_input_ids_length': 2400, 'chosen_labels_length': 2048, 'rejected_labels_length': 2400}
{'chosen_input_ids_length': 2048, 'rejected_input_ids_length': 2334, 'chosen_labels_length': 2048, 'rejected_labels_length': 2334}

Expected behavior

This has two major issues:

  1. The response length of the rejected responses may exceed the cutoff limit, resulting in Out of Memory (OOM) errors in the middle of the runs.
  2. This is inconsistent with how HuggingFace's DPOTrainer is implemented.

In HuggingFace, the DPOTrainer uses the longer of the chosen and rejected responses to decide the length of the prompt and the response that should be cut off. They limit both the chosen and rejected responses to the max_length. Check out the exact implementation here.

Could you also let us know why the cutoff length has been implemented this way? Is this a commonly used method for DPO?

Others

No response

@github-actions github-actions bot added the pending This problem is yet to be addressed label Jun 20, 2024
@niravlg niravlg changed the title Cutoff Length only followed for chosen response in LLAMA-Factory DPO Cutoff Length only followed for chosen response in Pairwise Data for DPO Jun 20, 2024
@FangLi1
Copy link

FangLi1 commented Jun 21, 2024

follow+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending This problem is yet to be addressed
Projects
None yet
Development

No branches or pull requests

2 participants