GRPO on InternVL3.5 does not support dynamic vision tokens

### Description
The issue likely occurs because InternVL uses a dynamic amount of Vision tokens, which are replaced by <IMG_CONTEXT>.
Setting the size of each image to a fixed resolution fixes the error, but it is less than ideal.

### Reproduction

```python
training_args = GRPOConfig(
        output_dir=test
        bf16=True,
        remove_unused_columns = False,
        per_device_train_batch_size=4,
        num_train_epochs=4,
        logging_steps=50,
        max_prompt_length = 4096,
        eval_strategy="steps",
        eval_steps=500,
        max_completion_length = 512,
        num_generations = 4,
        learning_rate = 2e-7, 
        )

        trainer = GRPOTrainer(
            model=model, # internvl3.5 type
            args=training_args,
            reward_funcs=[..],
            train_dataset=grpo_train_dataset,
            eval_dataset=grpo_eval_dataset,  
            processing_class=processor,
        )
        
        trainer.train()

```

outputs:

```
Traceback (most recent call last):
  ...
[rank0]:   File "/mnt/home/../miniconda3/envs/../lib/python3.10/site-packages/transformers/models/internvl/modeling_internvl.py", line 654, in get_placeholder_mask
[rank0]:     raise ValueError(
[rank0]: ValueError: Image features and image tokens do not match: tokens: 1536, features 512
```





### System Info

trl==0.22

### Checklist

- [x] I have checked that my issue isn't already filed (see [open issues](https://github.com/huggingface/trl/issues?q=is%3Aissue))
- [x] I have included my system information
- [x] Any code provided is minimal, complete, and reproducible ([more on MREs](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks))
- [x] Any code provided is properly formatted in code blocks, (no screenshot, [more on code blocks](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks))
- [x] Any traceback provided is complete

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GRPO on InternVL3.5 does not support dynamic vision tokens #4061

Description

Reproduction

System Info

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GRPO on InternVL3.5 does not support dynamic vision tokens #4061

Description

Description

Reproduction

System Info

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions