Skip to content

Conversation

jiminha
Copy link
Contributor

@jiminha jiminha commented Oct 14, 2025

This is to optimize gemma3 multimodal memory/performance.

  • bucket vision tower based on batch bucket to reduce recompile overhead
  • modify merge_multimodal to use torch.where instead of masked_scatter for performance issue
  • add warmup multimodal bucket to precompile vision tower
  • port PT_HPU_SDPA_QKV_SLICE_MODE_FWD feature from vllm-fork v0 : this is necessary to reduce the memory for the longer sequence length.

Copy link

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

@jiminha jiminha marked this pull request as draft October 14, 2025 14:16
jiminha and others added 6 commits October 15, 2025 11:54
Signed-off-by: Jimin Ha <[email protected]>
Reduces memory usage for long sequences by eliminating dual attention
mask creation. Improves capacity from 150 to 400 images with 8K prompts
by avoiding OOM issues.
Limitation: Only available when block_list is None.

Signed-off-by: Jimin Ha <[email protected]>
Copy link

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

@jiminha jiminha marked this pull request as ready for review October 15, 2025 20:17
@jiminha
Copy link
Contributor Author

jiminha commented Oct 15, 2025

@xuechendi Could you review this? This includes model file and some utils changes which are necessary for gemma3 model optimization.
@adobrzyn Could you review this? This includes multimodal warmup, added vision bucket into bucketing. Also ported PT_HPU_SDPA_QKV_SLICE_MODE_FWD feature from V0.

Copy link

✅ CI Passed

All checks passed successfully against the following vllm commit:
f57438338d819c8e3e7e70293281c575ebd77411

Copy link

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants