generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Open
Labels
Description
Description
The issue likely occurs because InternVL uses a dynamic amount of Vision tokens, which are replaced by <IMG_CONTEXT>.
Setting the size of each image to a fixed resolution fixes the error, but it is less than ideal.
Reproduction
training_args = GRPOConfig(
output_dir=test
bf16=True,
remove_unused_columns = False,
per_device_train_batch_size=4,
num_train_epochs=4,
logging_steps=50,
max_prompt_length = 4096,
eval_strategy="steps",
eval_steps=500,
max_completion_length = 512,
num_generations = 4,
learning_rate = 2e-7,
)
trainer = GRPOTrainer(
model=model, # internvl3.5 type
args=training_args,
reward_funcs=[..],
train_dataset=grpo_train_dataset,
eval_dataset=grpo_eval_dataset,
processing_class=processor,
)
trainer.train()
outputs:
Traceback (most recent call last):
...
[rank0]: File "/mnt/home/../miniconda3/envs/../lib/python3.10/site-packages/transformers/models/internvl/modeling_internvl.py", line 654, in get_placeholder_mask
[rank0]: raise ValueError(
[rank0]: ValueError: Image features and image tokens do not match: tokens: 1536, features 512
System Info
trl==0.22
Checklist
- I have checked that my issue isn't already filed (see open issues)
- I have included my system information
- Any code provided is minimal, complete, and reproducible (more on MREs)
- Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
- Any traceback provided is complete
qgallouedec and sergiopaniego