-
Notifications
You must be signed in to change notification settings - Fork 2.2k
🟩 Drop image_split_sizes in favour of image_grid_thw
#4111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…_thw` in GRPO and RLOO trainers; update `split_pixel_values_by_grid` to use `image_grid_thw`
|
|
||
| if hasattr(self.processing_class, "_get_num_multimodal_tokens"): | ||
| image_sizes = [(image.height, image.width) for image in images] | ||
| multimodal_extra_data = self.processing_class._get_num_multimodal_tokens(image_sizes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it also avoids calling a private method
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice clean up - LGTM
| **kwargs, | ||
| ) | ||
| prompt_inputs = super(_GRPOTrainer, self)._prepare_inputs(prompt_inputs) | ||
| prompt_inputs = super()._prepare_inputs(prompt_inputs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much cleaner!
| if "image_grid_thw" not in batch or "pixel_values" not in batch: | ||
| return batch | ||
|
|
||
| lengths = batch["image_split_sizes"] # [batch_size] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly, that image_split_sizes was only really used in this helper method to extract the image lengths? If so, then I agree it's redundant with image_grid_thw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, precisely
image_split_sizes in favour of image_grid_thwimage_split_sizes in favour of image_grid_thw
This PR belongs to a sequence of PR that aims to refactor the generation part of GRPO/RLOO to allow for easier customization and ultimately tool calling
Next:
_generate#4114_generatein GRPO/RLOO: list of ints instead of tensors #4146_generatein GRPO/RLOO: Useprompt_idsfrom generation #4152_generatein GRPO/RLOO: Rely on generator for prompt truncation #4153_generatein GRPO/RLOO: Moveforward_kwargsoutside generation method #4154_generatein GRPO/RLOO: Insert images in the prompt #4155While refactoring, I realized that having
image_split_sizesactually doesn't help much, and is redundant withimage_grid_thw. Consequently, I suggest to revert #4032