You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello @haotian-liu !! In the A.1 section of the "Improved Baselines with Visual Instruction Tuning" paper, it is mentioned that:
Padding removal. Features corresponding exclusively to the paddings are discarded.
How is this done? I'm having trouble finding this block in the LLaVA and the LLaVA-NeXT codebase. Can you please point me so that I can look into the implementation?
This further raises the question: the bounding boxes in the instruction tuning datasets are normalized assuming a square padded image as mentioned in #606. If the padding tokens are cut-off, won't it impact the normalization of the bounding boxes and ultimately the training?
Am I missing something here?
The text was updated successfully, but these errors were encountered:
Question
Hello @haotian-liu !! In the A.1 section of the "Improved Baselines with Visual Instruction Tuning" paper, it is mentioned that:
How is this done? I'm having trouble finding this block in the LLaVA and the LLaVA-NeXT codebase. Can you please point me so that I can look into the implementation?
This further raises the question: the bounding boxes in the instruction tuning datasets are normalized assuming a square padded image as mentioned in #606. If the padding tokens are cut-off, won't it impact the normalization of the bounding boxes and ultimately the training?
Am I missing something here?
The text was updated successfully, but these errors were encountered: