benchmark performance #5

BaiStone2017 · 2024-11-28T08:25:10Z

In https://github.com/kvcache-ai/mooncake/blob/main/doc/en/vllm_benchmark_results.md。

The performance of "Non-disaggregated", use 2 A10?

ShangmingCai · 2024-11-29T02:42:39Z

In https://github.com/kvcache-ai/mooncake/blob/main/doc/en/vllm_benchmark_results.md。

The performance of "Non-disaggregated", use 2 A10?

Currently, it is conducted on 1 A10 to test and compare the TTFT latency and verify the feasibility of inter-node disaggregated designs. To fairly compare the total throughput of non-disaggregated and disaggregated designs, we need to conduct experiments under specific prefill/decode workloads to utilize the prefill node fully. However, we have not found a good way to conduct a fair comparison of 2 non-disaggregated instances and 1 prefill + 1 decode without OOM.
According to the author of PR 8498,

"for disagg prefill it will have lower throughput compared to chunked prefill if the prefill workload / decode workload doesn’t match # of prefill GPUs / # of decode GPUs. In my current implementation, the # of prefill GPU / # of decode GPU is 1:1, but the prefill workload / decode workload is typically a really small number (roughly 0.1 IIRC)."

After we solve the TP problem, we will conduct a series of experiments with different GPU ratios. If you are interested, you can also join vllm's slack channel about prefill disaggregation to get the latest updates.

Edenzzzz · 2024-11-29T05:02:12Z

Are there benchmark comparisons against NCCL?

ShangmingCai · 2024-11-29T08:42:10Z

Are there benchmark comparisons against NCCL?

We are unable to obtain inter-node disaggregated results with NCCL based on PR 8498 currently due to its parallel_state initialization process of disagg_group in conflict with vllm's process_group. This could be fixed with the help of PR 10072, which has already been merged. More results will be provided once we finish the integration of mooncake_transfer_engine with PR 10502.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmark performance #5

benchmark performance #5

BaiStone2017 commented Nov 28, 2024

ShangmingCai commented Nov 29, 2024

Edenzzzz commented Nov 29, 2024

ShangmingCai commented Nov 29, 2024

benchmark performance #5

benchmark performance #5

Comments

BaiStone2017 commented Nov 28, 2024

ShangmingCai commented Nov 29, 2024

Edenzzzz commented Nov 29, 2024

ShangmingCai commented Nov 29, 2024