Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] Update benchmark_moe.py to use block wise quant for Deepseek V3 #11574

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

simon-mo
Copy link
Collaborator

@simon-mo simon-mo commented Dec 28, 2024

trying to adjust the benchmark script from the block wise quant fused moe in

def test_w8a8_block_fp8_fused_moe(M, N, K, E, topk, block_size, dtype, seed):

unfortunately it is not working yet. with opaque error

  File "/data/xmo/vllm/benchmarks/kernels/benchmark_moe.py", line 239, in benchmark                                                  
    kernel_time = benchmark_config(config, num_tokens, num_experts,                                                                  
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                  
  File "/data/xmo/vllm/benchmarks/kernels/benchmark_moe.py", line 150, in benchmark_config                                           
    with torch.cuda.graph(graph):                                                                                                    
  File "/home/eecs/xmo/miniconda3/envs/vllm-brewster/lib/python3.11/site-packages/torch/cuda/graphs.py", line 186, in __exit__       
    self.cuda_graph.capture_end()                                                                                                    
  File "/home/eecs/xmo/miniconda3/envs/vllm-brewster/lib/python3.11/site-packages/torch/cuda/graphs.py", line 84, in capture_end
    super().capture_end()
RuntimeError: CUDA error: operation failed due to a previous error during capture

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@simon-mo
Copy link
Collaborator Author

I actually don't have time to work on this right now, so any help appreciated. I'm happy to run it once working.

@robertgshaw2-redhat
Copy link
Collaborator

Your issue is that the current script is running the profiling inside a CUDAGraph, but the DeepSeek kernel does not support CUDAGraphs yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants