[Kernel] Raise an exception in MoE kernel if the batch size is larger then 65k #5939

comaniac · 2024-06-27T22:09:46Z

See #5938 for details.
This PR raises an exception in the MoE kernel when the batch size is too large, which likely causes illegal memory access.

cadedaniel · 2024-06-27T22:27:54Z

vllm/model_executor/layers/fused_moe/fused_moe.py

@@ -392,6 +392,11 @@ def fused_experts(hidden_states: torch.Tensor,
    M, _ = hidden_states.shape
    E, N, _ = w1.shape

+    if M > 65536:


We need to catch the second invocation which is 2x the first, right?

WoosukKwon

Do we happen to have any clue why this happens? Is it a known limitation in Triton?

comaniac · 2024-06-27T23:55:58Z

We need to catch the second invocation which is 2x the first, right?

The weird thing is when the batch size is too high it happens in the first invocation. This is also a puzzle to me..

Do we happen to have any clue why this happens? Is it a known limitation in Triton?

It's possible a Triton limitation but I don't have more time to dive into more...

DarkLight1337 · 2024-06-28T10:22:09Z

To speed up the CI queue, I've cancelled the distributed tests for the latest CI run in this PR since they won't pass anyway until #5905 has been merged. Now that it has been merged, please merge main into your branch so that the CI can pass once again.

… then 65k (vllm-project#5939)

comaniac requested review from pcmoritz, Yard1 and WoosukKwon June 27, 2024 22:10

cadedaniel reviewed Jun 27, 2024

View reviewed changes

WoosukKwon reviewed Jun 27, 2024

View reviewed changes

cadedaniel approved these changes Jun 28, 2024

View reviewed changes

guard

8974872

comaniac force-pushed the moe_guard branch from 3dbc77a to 8974872 Compare June 28, 2024 16:19

DarkLight1337 merged commit f7dac83 into vllm-project:main Jun 29, 2024
68 checks passed

llmpros pushed a commit to llmpros/vllm that referenced this pull request Jun 30, 2024

[Kernel] Raise an exception in MoE kernel if the batch size is larger…

e730666

… then 65k (vllm-project#5939)

robertgshaw2-neuralmagic pushed a commit to neuralmagic/nm-vllm that referenced this pull request Jul 1, 2024

[Kernel] Raise an exception in MoE kernel if the batch size is larger…

b22f1be

… then 65k (vllm-project#5939)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Kernel] Raise an exception in MoE kernel if the batch size is larger then 65k #5939

[Kernel] Raise an exception in MoE kernel if the batch size is larger then 65k #5939

comaniac commented Jun 27, 2024

cadedaniel Jun 27, 2024

WoosukKwon left a comment

comaniac commented Jun 27, 2024

DarkLight1337 commented Jun 28, 2024 •

edited

Loading

[Kernel] Raise an exception in MoE kernel if the batch size is larger then 65k #5939

[Kernel] Raise an exception in MoE kernel if the batch size is larger then 65k #5939

Conversation

comaniac commented Jun 27, 2024

cadedaniel Jun 27, 2024

Choose a reason for hiding this comment

WoosukKwon left a comment

Choose a reason for hiding this comment

comaniac commented Jun 27, 2024

DarkLight1337 commented Jun 28, 2024 • edited Loading

DarkLight1337 commented Jun 28, 2024 •

edited

Loading