Skip to content

GRPOTrainer - Repeat Sampler - _get_eval_sampler #3539

@SnorkelerVigi

Description

@SnorkelerVigi

Reproduction

Is there a reason why the GRPOTrainer code's RepeatSampler repeats the eval dataset by the num_generations as well? I don't see why the evaluation dataset also needs the same number of generations as the training dataset. The purpose of the number of generations in the train dataset is to compute advantages. But the eval dataset is merely there to evaluate the performance of the model at various intervals. So why exactly does the repeat sampler repeat the same prompt by num_generations?
Image

System Info

Current trl version

Checklist

  • I have checked that my issue isn't already filed (see open issues)
  • I have included my system information
  • Any code provided is minimal, complete, and reproducible (more on MREs)
  • Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
  • Any traceback provided is complete

Metadata

Metadata

Assignees

No one assigned

    Labels

    ❓ questionSeeking clarification or more information🏋 GRPORelated to GRPO

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions