generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Open
Labels
❓ questionSeeking clarification or more informationSeeking clarification or more information🏋 GRPORelated to GRPORelated to GRPO
Description
Reproduction
Is there a reason why the GRPOTrainer code's RepeatSampler repeats the eval dataset by the num_generations as well? I don't see why the evaluation dataset also needs the same number of generations as the training dataset. The purpose of the number of generations in the train dataset is to compute advantages. But the eval dataset is merely there to evaluate the performance of the model at various intervals. So why exactly does the repeat sampler repeat the same prompt by num_generations?

System Info
Current trl version
Checklist
- I have checked that my issue isn't already filed (see open issues)
- I have included my system information
- Any code provided is minimal, complete, and reproducible (more on MREs)
- Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
- Any traceback provided is complete
amitlevy, mingxuetian and colinzhaoxp
Metadata
Metadata
Assignees
Labels
❓ questionSeeking clarification or more informationSeeking clarification or more information🏋 GRPORelated to GRPORelated to GRPO