upgrade flashinfer to v0.4.0rc1 by mmangkad · Pull Request #25315 · vllm-project/vllm

mmangkad · 2025-09-20T16:57:44Z

Purpose

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu>

gemini-code-assist

Code Review

This pull request upgrades FlashInfer to version v0.4.0rc1. The version updates in the Dockerfiles and setup.py are consistent with this goal. However, there is a critical issue in vllm/v1/attention/backends/flashinfer.py where an API call for the new FlashInfer version appears to be only partially updated. This is likely to cause a runtime error. Please see the specific comment for details.

gemini-code-assist · 2025-09-20T16:59:48Z

vllm/v1/attention/backends/flashinfer.py


    try:
-        # Make sure we pass exactly 15 arguments for tensor core version
+        # Make sure we pass exactly 18 arguments for tensor core version


While you've correctly updated the internal _cached_module.plan call for the new flashinfer version, the corresponding public self.plan method call (at lines 1025-1043) seems to have been missed. This public method is used for the initial warm-up when CUDA graphs are enabled.

If the public plan API also changed (which is highly likely given the internal API change), this will cause a TypeError at runtime during the warm-up. Please update the call at lines 1025-1043 to include any new arguments. Based on the changes to the internal call, it's likely that arguments such as window_right and allow_fp16_qk_reduction need to be added.

mgoin · 2025-10-09T14:20:10Z

Resolved by #26326

mmangkad added 3 commits September 20, 2025 19:02

upgrade flashinfer to v0.4.0rc1

5ac2ab9

Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu>

fix fast_plan_decode for flashinfer v0.4.0rc1

9300660

Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu>

upgrade flashinfer to v0.4.0rc1

c211def

Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu>

mmangkad requested a review from mgoin as a code owner September 20, 2025 16:57

mergify bot added ci/build v1 labels Sep 20, 2025

gemini-code-assist bot reviewed Sep 20, 2025

View reviewed changes

mgoin closed this Oct 9, 2025

mmangkad deleted the flashinfer-upgrade branch October 9, 2025 15:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

upgrade flashinfer to v0.4.0rc1#25315

upgrade flashinfer to v0.4.0rc1#25315
mmangkad wants to merge 3 commits intovllm-project:mainfrom
mmangkad:flashinfer-upgrade

mmangkad commented Sep 20, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 20, 2025

Uh oh!

mgoin commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mmangkad commented Sep 20, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 20, 2025

Choose a reason for hiding this comment

Uh oh!

mgoin commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mmangkad commented Sep 20, 2025 •

edited by github-actions bot

Loading