generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Closed
Labels
Description
Reproduction
Thank you for your hard work.
I observed a couple of issues with the current TRL released version for training VLMs with GRPO:
- multiple-images inputs are not supported
- If for some reason (I am still wondering), the model could generate a image token id in the completion, raising a mismatch error between the number of image tokens and the image feature size.
Using a similar code where the dataset, is a custom dataset with multiple images as input:
from datasets import load_dataset
from trl import GRPOConfig, GRPOTrainer
dataset = load_dataset("custom_with multiple images in prompt ", split="train")
# Define the reward function, which rewards completions that are close to 20 characters
def reward_len(completions, **kwargs):
return [-abs(20 - len(completion)) for completion in completions]
training_args = GRPOConfig(output_dir="Llava7b-GRPO")
trainer = GRPOTrainer(
model="[Qwen/Qwen2-0.5B-Instruct](https://huggingface.co/llava-hf/llava-1.5-7b-hf)",
reward_funcs=reward_len,
args=training_args,
train_dataset=dataset,
)
trainer.train()
Some minor code manipulations were required at:
trl/trl/trainer/grpo_trainer.py
Line 1369 in 5d914a4
kwargs = {"images": [[img] for img in images]} |
trl/trl/trainer/grpo_trainer.py
Line 1105 in 5d914a4
model_inputs["pixel_values"] = pixel_values[start : start + batch_size] |
For the image token generation:
a replacing token is required at:
trl/trl/trainer/grpo_trainer.py
Line 1587 in 5d914a4
completion_ids = prompt_completion_ids[:, prompt_length:] |
System Info
Ubuntu 24
TRL 0.20.0
Transformers 4.54.1
PEFT 0.17.0
Checklist
- I have checked that my issue isn't already filed (see open issues)
- I have included my system information
- Any code provided is minimal, complete, and reproducible (more on MREs)
- Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
- Any traceback provided is complete