-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
benchmark performance #5
Comments
Currently, it is conducted on 1 A10 to test and compare the TTFT latency and verify the feasibility of inter-node disaggregated designs. To fairly compare the total throughput of non-disaggregated and disaggregated designs, we need to conduct experiments under specific prefill/decode workloads to utilize the prefill node fully. However, we have not found a good way to conduct a fair comparison of 2 non-disaggregated instances and 1 prefill + 1 decode without OOM.
After we solve the TP problem, we will conduct a series of experiments with different GPU ratios. If you are interested, you can also join vllm's slack channel about prefill disaggregation to get the latest updates. |
Are there benchmark comparisons against NCCL? |
We are unable to obtain inter-node disaggregated results with NCCL based on PR 8498 currently due to its |
In https://github.com/kvcache-ai/mooncake/blob/main/doc/en/vllm_benchmark_results.md。
The performance of "Non-disaggregated", use 2 A10?
The text was updated successfully, but these errors were encountered: