Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kernel] Raise an exception in MoE kernel if the batch size is larger then 65k #5939

Merged
merged 1 commit into from
Jun 29, 2024

Conversation

comaniac
Copy link
Collaborator

See #5938 for details.
This PR raises an exception in the MoE kernel when the batch size is too large, which likely causes illegal memory access.

@@ -392,6 +392,11 @@ def fused_experts(hidden_states: torch.Tensor,
M, _ = hidden_states.shape
E, N, _ = w1.shape

if M > 65536:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to catch the second invocation which is 2x the first, right?

Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we happen to have any clue why this happens? Is it a known limitation in Triton?

@comaniac
Copy link
Collaborator Author

We need to catch the second invocation which is 2x the first, right?

The weird thing is when the batch size is too high it happens in the first invocation. This is also a puzzle to me..

Do we happen to have any clue why this happens? Is it a known limitation in Triton?

It's possible a Triton limitation but I don't have more time to dive into more...

@DarkLight1337
Copy link
Collaborator

DarkLight1337 commented Jun 28, 2024

To speed up the CI queue, I've cancelled the distributed tests for the latest CI run in this PR since they won't pass anyway until #5905 has been merged. Now that it has been merged, please merge main into your branch so that the CI can pass once again.

@DarkLight1337 DarkLight1337 merged commit f7dac83 into vllm-project:main Jun 29, 2024
68 checks passed
llmpros pushed a commit to llmpros/vllm that referenced this pull request Jun 30, 2024
robertgshaw2-neuralmagic pushed a commit to neuralmagic/nm-vllm that referenced this pull request Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants