GRPOTrainer - Repeat Sampler - _get_eval_sampler

### Reproduction

Is there a reason why the GRPOTrainer code's RepeatSampler repeats the eval dataset by the num_generations as well? I don't see why the evaluation dataset also needs the same number of generations as the training dataset. The purpose of the number of generations in the train dataset is to compute advantages. But the eval dataset is merely there to evaluate the performance of the model at various intervals. So why exactly does the repeat sampler repeat the same prompt by num_generations?
<img width="1017" alt="Image" src="https://github.com/user-attachments/assets/a7ceb2ea-34bf-4798-a469-6dbdb00982a3" />

### System Info

Current trl version

### Checklist

- [x] I have checked that my issue isn't already filed (see [open issues](https://github.com/huggingface/trl/issues?q=is%3Aissue))
- [x] I have included my system information
- [x] Any code provided is minimal, complete, and reproducible ([more on MREs](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks))
- [x] Any code provided is properly formatted in code blocks, (no screenshot, [more on code blocks](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks))
- [x] Any traceback provided is complete

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GRPOTrainer - Repeat Sampler - _get_eval_sampler #3539

Reproduction

System Info

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GRPOTrainer - Repeat Sampler - _get_eval_sampler #3539

Description

Reproduction

System Info

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions