[Question] Image padding features removal in LLaVA-1.5-HD #1833

0xnakul · 2025-02-10T00:49:08Z

Question

Hello @haotian-liu !! In the A.1 section of the "Improved Baselines with Visual Instruction Tuning" paper, it is mentioned that:

Padding removal. Features corresponding exclusively to the paddings are discarded.

How is this done? I'm having trouble finding this block in the LLaVA and the LLaVA-NeXT codebase. Can you please point me so that I can look into the implementation?

This further raises the question: the bounding boxes in the instruction tuning datasets are normalized assuming a square padded image as mentioned in #606. If the padding tokens are cut-off, won't it impact the normalization of the bounding boxes and ultimately the training?

Am I missing something here?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Image padding features removal in LLaVA-1.5-HD #1833

[Question] Image padding features removal in LLaVA-1.5-HD #1833

0xnakul commented Feb 10, 2025

[Question] Image padding features removal in LLaVA-1.5-HD #1833

[Question] Image padding features removal in LLaVA-1.5-HD #1833

Comments

0xnakul commented Feb 10, 2025

Question