fix HunyuanOCR crash in vLLM #39

souvikchand · 2025-11-27T09:37:04Z

This PR fixes a runtime error in the vLLM multimodal pipeline when running HunyuanOCR.
The issue #35 was caused by sending images using the wrong message schema, which led vLLM to misinterpret the image input and generate an invalid tensor shape.

ValueError: image_grid_thw has rank 3 but expected 2.
Expected shape: ('ni', 3), but got torch.Size([2, 1, 3])

what i changed

updated request format

{ "type": "image_url", "image_url": { "url": "data:image/jpeg;base64,..." } }

to

{
    "type": "image_url",
    "image_url": f"data:{mime};base64,{encode_image(image_path)}"
 },

Added automatic MIME-type detection to ensure images are sent with the correct format (png/jpeg/webp/etc)
Ensured image_url is a string, not a nested object, which aligns with vLLM’s expected schema for HuggingFace vision models.

diegocarturan-debug · 2025-11-30T00:17:47Z

souvikchand · 2025-12-02T04:20:08Z

@diegocarturan-debug
sorry but can you explain your comment above. it's actually redirecting to this same page

changed create_chat_messages()

4669efb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix HunyuanOCR crash in vLLM #39

fix HunyuanOCR crash in vLLM #39

Uh oh!

souvikchand commented Nov 27, 2025

Uh oh!

diegocarturan-debug commented Nov 30, 2025

Uh oh!

souvikchand commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix HunyuanOCR crash in vLLM #39

Are you sure you want to change the base?

fix HunyuanOCR crash in vLLM #39

Uh oh!

Conversation

souvikchand commented Nov 27, 2025

what i changed

Uh oh!

diegocarturan-debug commented Nov 30, 2025

Uh oh!

souvikchand commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants