Incorrect docstring of `get_anyres_image_grid_shape` #31588

DarkLight1337 · 2024-06-25T12:25:46Z

Upon inspecting the source code, the image_size tuple should be in the form (height, width) instead of (width, height)

transformers/src/transformers/models/llava_next/modeling_llava_next.py

Line 52 in aab0829

The size of the input image in the format (width, height).

The text was updated successfully, but these errors were encountered:

amyeroberts · 2024-06-25T12:29:54Z

@DarkLight1337 Would you like to open a PR to fix this?

cc @zucchini-nlp To confirm, as I think this was raised elsewhere and there's a double inversion which happens (?)

DarkLight1337 · 2024-06-25T12:40:08Z

After looking at the code a bit more, now I am more confused. It seems that LLaVA-NeXT model treats it as (width, height) but still works correctly. Or is that just incorrect variable naming?

zucchini-nlp · 2024-06-25T12:51:06Z

Hey!

Yes, this issue has been noticed by several people and I can confirm that our implementation matched perfectly with the LLaVa-NeXT. Yes, there are naming discrepancies between the two, which is confusing but it all comes from the way it's done in the original repo.

But if we try to get the correct way, the way is should be as I understand, then there is a "bug" in both implementations. Because LLaVa-NeXT treat is as (width, height) up to some point in modeling where the order is swapped back to (height, width) (they permute image to "height, width" and not "width, height").

I raised a question to LLaVa authors a week ago and didn't get a reply yet. So I wouldn't change anything in transformers until authors confirm it's a bug and not an intended thing. Only thing I can do is add a small comment in code clarifying the point. I could align naming with LLaVa-NeXT repo by using (width, height) order in processing, but that would raise more questions about why we use incorrect order while image-processing

Hope this clarifies it a bit ;)

DarkLight1337 · 2024-06-25T13:16:47Z

Thanks for the clarification! Let's wait until the authors respond then.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect docstring of `get_anyres_image_grid_shape` #31588

Incorrect docstring of `get_anyres_image_grid_shape` #31588

DarkLight1337 commented Jun 25, 2024

amyeroberts commented Jun 25, 2024

DarkLight1337 commented Jun 25, 2024

zucchini-nlp commented Jun 25, 2024

DarkLight1337 commented Jun 25, 2024

Incorrect docstring of get_anyres_image_grid_shape #31588

Incorrect docstring of get_anyres_image_grid_shape #31588

Comments

DarkLight1337 commented Jun 25, 2024

amyeroberts commented Jun 25, 2024

DarkLight1337 commented Jun 25, 2024

zucchini-nlp commented Jun 25, 2024

DarkLight1337 commented Jun 25, 2024

Incorrect docstring of `get_anyres_image_grid_shape` #31588

Incorrect docstring of `get_anyres_image_grid_shape` #31588