[Draft] Update benchmark_moe.py to use block wise quant for Deepseek V3 #11574

simon-mo · 2024-12-28T01:11:01Z

trying to adjust the benchmark script from the block wise quant fused moe in

Line 214 in a607312

def test_w8a8_block_fp8_fused_moe(M, N, K, E, topk, block_size, dtype, seed):

unfortunately it is not working yet. with opaque error

  File "/data/xmo/vllm/benchmarks/kernels/benchmark_moe.py", line 239, in benchmark                                                  
    kernel_time = benchmark_config(config, num_tokens, num_experts,                                                                  
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                  
  File "/data/xmo/vllm/benchmarks/kernels/benchmark_moe.py", line 150, in benchmark_config                                           
    with torch.cuda.graph(graph):                                                                                                    
  File "/home/eecs/xmo/miniconda3/envs/vllm-brewster/lib/python3.11/site-packages/torch/cuda/graphs.py", line 186, in __exit__       
    self.cuda_graph.capture_end()                                                                                                    
  File "/home/eecs/xmo/miniconda3/envs/vllm-brewster/lib/python3.11/site-packages/torch/cuda/graphs.py", line 84, in capture_end
    super().capture_end()
RuntimeError: CUDA error: operation failed due to a previous error during capture

github-actions · 2024-12-28T01:11:15Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

simon-mo · 2024-12-28T01:11:20Z

I actually don't have time to work on this right now, so any help appreciated. I'm happy to run it once working.

robertgshaw2-redhat · 2024-12-28T15:09:40Z

Your issue is that the current script is running the profiling inside a CUDAGraph, but the DeepSeek kernel does not support CUDAGraphs yet.

wip

bf2b0ac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Draft] Update benchmark_moe.py to use block wise quant for Deepseek V3 #11574

[Draft] Update benchmark_moe.py to use block wise quant for Deepseek V3 #11574

simon-mo commented Dec 28, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 28, 2024

simon-mo commented Dec 28, 2024

robertgshaw2-redhat commented Dec 28, 2024

[Draft] Update benchmark_moe.py to use block wise quant for Deepseek V3 #11574

Are you sure you want to change the base?

[Draft] Update benchmark_moe.py to use block wise quant for Deepseek V3 #11574

Conversation

simon-mo commented Dec 28, 2024 • edited by github-actions bot Loading

github-actions bot commented Dec 28, 2024

simon-mo commented Dec 28, 2024

robertgshaw2-redhat commented Dec 28, 2024

simon-mo commented Dec 28, 2024 •

edited by github-actions bot

Loading