Qwen3-VL minor fix/stopgap for M-RoPE implementation #988

i4TsU · 2025-11-20T00:43:06Z

TLDR: fixes a bug where n_ubatch < n_batch causes llama_decode to internally split batches during image/audio chunk processing, which breaks M-RoPE positional embeddings. This PR "fixes" the issue by capping n_batch to n_ubatch before calculating batch splits so that M-RoPE data stays intact.

Context:
I have been getting terrible results across multiple Qwen3-VL models, with many different variations of llama-server and llama-mtmd-cli parameters, when trying to extract 2d bbox coords from single images. Most of the time, models would seem to describe the general contents of the image okay, but then just provide bboxes that weren't even close, despite the exact same models working fine via llama.cpp. I had even started working on trying to port across the ~20 multimodal/qwen3vl-related commits from mainline llama.cpp, but had random thought to check this after the first few hrs lol. Anyway, until I, or whoever, actually brings over the real fixes for Qwen3-VL implementation, this PR at least helps anyone using default ubatch size with image inputs - especially considering Qwen team have said that any grounding tasks suck with fewer than ~1024 image tokens.

I tested it all morning pretty much, with all different combinations of settings, but these show the main idea hopefully - all generated with the same command save for -ub (e.g. & .\llama-server.exe -m "X:\Models\unsloth\Qwen3-VL-8B-Instruct\Qwen3-VL-8B-Instruct-Q8_0.gguf" --mmproj "X:\Models\unsloth\Qwen3-VL-8B-Instruct\mmproj-F32.gguf" -ngl 999 -t 10 -tb 12 -c 8192 --jinja --temp 0 --top-k 1 --presence-penalty 1.5 --samplers 'top_k;temperature' --mlock -b 2048 -ub 512):

Main branch:

`-b 2048 -ub 512` (default)

`-b 2048 -ub 2048`

PR

`-b 2048 -ub 512` (default)

`-b 2048 -ub 2048` (exact same output as main branch)

llama.cpp main branch (for reference)

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

Ensure n_batch does not exceed n_ubatch to prevent llama_decode from splitting the batch, which would break M-RoPE positional embeddings during image chunk decoding.

firecoperana · 2025-11-20T03:00:00Z

Is it the same issue ggml-org/llama.cpp#13694?

i4TsU · 2025-11-20T07:07:51Z

Is it the same issue ggml-org/llama.cpp#13694?

yea same underlying issue with the extra dimensions from M-RoPE. this PR gets qwen3vl models working but not sure if/what it might break for other models using mrope. im yet to finish going through the exact timeline of commits from the squashed d261223 merge on mainline that added initial qwen3vl support but I would honestly be pretty out of my depth on most of it anyway so if you/anyone wants to point me in the right direction, ill gladly accept the guidance :) there was 3-4+ different directions they tried going with for mrope impl, partly so that support for other mtmd models didnt break (i think they ended up storing positions in KV cells for proper 2D causal masking). but I have no idea how much harder/easier it would be to implement in ik_llama.cpp

ikawrakow · 2025-11-20T07:58:24Z

Wouldn't it be better to put the n_batch change in mtmd-cli, and issue a warning that this is being done?

Using n_batch = n_ubatch seems to improve things, but it does not look like it is working very well (based on the screenshots).

I guess we can merge this as a temporary workaround, but then one needs to pick up all other changes that have been added in mainline.

firecoperana · 2025-11-20T12:43:53Z

I have a branch that merges most changes of this from mainline, but it lacks one commit that I don't know how to port. I can push it as a draft.

ikawrakow · 2025-11-20T13:33:06Z

I have a branch that merges most changes of this from mainline, but it lacks one commit that I don't know how to port

Which is the missing commit and what is the problem porting it?

firecoperana · 2025-11-20T13:37:46Z

ggml-org/llama.cpp#16825. I'm not familiar with kv cache and there were a lot of refactors done to it. Also flash attention in clip is not working, but I just disable it for now.

firecoperana · 2025-11-20T13:55:14Z

It has #988 which improves M-RoPE too

i4TsU · 2025-11-21T20:39:01Z

yea this is definitely only a slight improvement and would need to figure out how best to translate the other mainline changes for ik_llama.cpp implementation - assuming there is enough interest in supporting the qwen3 vision models here. i will share the rough list of relevant llama.cpp commits I had identified so far in #993, in case thats of any help - just wasnt sure how much time/interest you guys had spare to work on it, and not confident enough to do it myself :)

…wrakow#988

fix(mtmd): prevent batch splitting by capping n_batch to n_ubatch

2d8a1db

Ensure n_batch does not exceed n_ubatch to prevent llama_decode from splitting the batch, which would break M-RoPE positional embeddings during image chunk decoding.

firecoperana mentioned this pull request Nov 20, 2025

Update mtmd to improve accuracy of M-RoPE #993

Merged

Merge branch 'ikawrakow:main' into fix-ubatch-lt-img-tok

3727070

i4TsU added a commit to i4TsU/ik_llama.cpp that referenced this pull request Nov 25, 2025

fix(mtmd): prevent batch splitting by capping n_batch to n_ubatch ika…

38d7996

…wrakow#988

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qwen3-VL minor fix/stopgap for M-RoPE implementation #988

Qwen3-VL minor fix/stopgap for M-RoPE implementation #988

Uh oh!

i4TsU commented Nov 20, 2025

Uh oh!

firecoperana commented Nov 20, 2025 •

edited

Loading

Uh oh!

i4TsU commented Nov 20, 2025

Uh oh!

ikawrakow commented Nov 20, 2025

Uh oh!

firecoperana commented Nov 20, 2025

Uh oh!

ikawrakow commented Nov 20, 2025

Uh oh!

firecoperana commented Nov 20, 2025

Uh oh!

firecoperana commented Nov 20, 2025

Uh oh!

i4TsU commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Qwen3-VL minor fix/stopgap for M-RoPE implementation #988

Are you sure you want to change the base?

Qwen3-VL minor fix/stopgap for M-RoPE implementation #988

Uh oh!

Conversation

i4TsU commented Nov 20, 2025

Main branch:

-b 2048 -ub 512 (default)

-b 2048 -ub 2048

PR

-b 2048 -ub 512 (default)

-b 2048 -ub 2048 (exact same output as main branch)

llama.cpp main branch (for reference)

Uh oh!

firecoperana commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

i4TsU commented Nov 20, 2025

Uh oh!

ikawrakow commented Nov 20, 2025

Uh oh!

firecoperana commented Nov 20, 2025

Uh oh!

ikawrakow commented Nov 20, 2025

Uh oh!

firecoperana commented Nov 20, 2025

Uh oh!

firecoperana commented Nov 20, 2025

Uh oh!

i4TsU commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

`-b 2048 -ub 512` (default)

`-b 2048 -ub 2048`

`-b 2048 -ub 512` (default)

`-b 2048 -ub 2048` (exact same output as main branch)

firecoperana commented Nov 20, 2025 •

edited

Loading