Skip to content

Issues at GRPO with VLM #3847

@Fhrozen

Description

@Fhrozen

Reproduction

Thank you for your hard work.

I observed a couple of issues with the current TRL released version for training VLMs with GRPO:

  • multiple-images inputs are not supported
  • If for some reason (I am still wondering), the model could generate a image token id in the completion, raising a mismatch error between the number of image tokens and the image feature size.

Using a similar code where the dataset, is a custom dataset with multiple images as input:

from datasets import load_dataset
from trl import GRPOConfig, GRPOTrainer

dataset = load_dataset("custom_with multiple images in prompt ", split="train")

# Define the reward function, which rewards completions that are close to 20 characters
def reward_len(completions, **kwargs):
    return [-abs(20 - len(completion)) for completion in completions]


training_args = GRPOConfig(output_dir="Llava7b-GRPO")
trainer = GRPOTrainer(
    model="[Qwen/Qwen2-0.5B-Instruct](https://huggingface.co/llava-hf/llava-1.5-7b-hf)",
    reward_funcs=reward_len,
    args=training_args,
    train_dataset=dataset,
)
trainer.train()

Some minor code manipulations were required at:

kwargs = {"images": [[img] for img in images]}
, and
model_inputs["pixel_values"] = pixel_values[start : start + batch_size]
to handle batches with multiple images.

For the image token generation:

a replacing token is required at:

completion_ids = prompt_completion_ids[:, prompt_length:]

System Info

Ubuntu 24
TRL 0.20.0
Transformers 4.54.1
PEFT 0.17.0

Checklist

  • I have checked that my issue isn't already filed (see open issues)
  • I have included my system information
  • Any code provided is minimal, complete, and reproducible (more on MREs)
  • Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
  • Any traceback provided is complete

Metadata

Metadata

Assignees

No one assigned

    Labels

    🏋 GRPORelated to GRPO🐛 bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions