Cutoff Length only followed for chosen response in Pairwise Data for DPO #4402

niravlg · 2024-06-20T18:01:37Z

Reminder

I have read the README and searched the existing issues.

System Info

I encountered an inconsistency in the way truncation is implemented for DPO in LLAMA-Factory and DPOTrainer in HuggingFace.

In LLAMA-Factory, it seems the cutoff length is only applicable to the chosen response length. The infer_max_len is applied individually for both (prompt + chosen) and (prompt + rejected) responses (checkout pairwise Dataset Implementation here.

Check out the way infer_max_len is used. The definition of infer_max_len is here.

However, to maintain the same prompt for both the chosen and rejected responses, the prompt obtained from cutting off the chosen length ids is added in front of the rejected response. This results in rejected responses that exceed the cutoff limit.

Reproduction

I printed out the maximum response lengths for both chosen and rejected responses and noticed this discrepancy (cutoff is set as 2048, chosen responses adhere to this, rejected responses do not):

{'chosen_input_ids_length': 2048, 'rejected_input_ids_length': 2543, 'chosen_labels_length': 2048, 'rejected_labels_length': 2543}
{'chosen_input_ids_length': 2048, 'rejected_input_ids_length': 2487, 'chosen_labels_length': 2048, 'rejected_labels_length': 2487}
{'chosen_input_ids_length': 2048, 'rejected_input_ids_length': 2400, 'chosen_labels_length': 2048, 'rejected_labels_length': 2400}
{'chosen_input_ids_length': 2048, 'rejected_input_ids_length': 2334, 'chosen_labels_length': 2048, 'rejected_labels_length': 2334}

Expected behavior

This has two major issues:

The response length of the rejected responses may exceed the cutoff limit, resulting in Out of Memory (OOM) errors in the middle of the runs.
This is inconsistent with how HuggingFace's DPOTrainer is implemented.

In HuggingFace, the DPOTrainer uses the longer of the chosen and rejected responses to decide the length of the prompt and the response that should be cut off. They limit both the chosen and rejected responses to the max_length. Check out the exact implementation here.

Could you also let us know why the cutoff length has been implemented this way? Is this a commonly used method for DPO?

Others

No response

The text was updated successfully, but these errors were encountered:

FangLi1 · 2024-06-21T02:27:21Z

follow+1

github-actions bot added the pending This problem is yet to be addressed label Jun 20, 2024

niravlg changed the title ~~Cutoff Length only followed for chosen response in LLAMA-Factory DPO~~ Cutoff Length only followed for chosen response in Pairwise Data for DPO Jun 20, 2024

niravlg mentioned this issue Jun 30, 2024

DPO 训练时，prompt 与 answer 拼接问题，导致cutoff_length这一超参数无法对数据进行有效截断。 #4617

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cutoff Length only followed for chosen response in Pairwise Data for DPO #4402

Cutoff Length only followed for chosen response in Pairwise Data for DPO #4402

niravlg commented Jun 20, 2024 •

edited

Loading

FangLi1 commented Jun 21, 2024

Cutoff Length only followed for chosen response in Pairwise Data for DPO #4402

Cutoff Length only followed for chosen response in Pairwise Data for DPO #4402

Comments

niravlg commented Jun 20, 2024 • edited Loading

Reminder

System Info

Reproduction

Expected behavior

Others

FangLi1 commented Jun 21, 2024

niravlg commented Jun 20, 2024 •

edited

Loading