-
Notifications
You must be signed in to change notification settings - Fork 2.2k
😷 Refactor GRPO/RLOO to isolate _generate
#4114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…_thw` in GRPO and RLOO trainers; update `split_pixel_values_by_grid` to use `image_grid_thw`
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
cc @Peter-Chou, customization is made easier with this one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once the tests pass and a question about whether we should split _generate
from scoring entirely
@qgallouedec Yes. The original |
_generate
_generate
_generate
_generate
_generate
_generate
_generate
_generate
This PR belongs to a sequence of PR that aims to refactor the generation part of GRPO/RLOO to allow for easier customization and ultimately tool calling
Previous:
image_split_sizes
in favour ofimage_grid_thw
#4111Next:
_generate
in GRPO/RLOO: list of ints instead of tensors #4146_generate
in GRPO/RLOO: Useprompt_ids
from generation #4152_generate
in GRPO/RLOO: Rely on generator for prompt truncation #4153_generate
in GRPO/RLOO: Moveforward_kwargs
outside generation method #4154_generate
in GRPO/RLOO: Insert images in the prompt #4155