Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

大佬有没有对比和VLLM的推理效果? #72

Open
white-wolf-tech opened this issue Feb 2, 2024 · 2 comments
Open

大佬有没有对比和VLLM的推理效果? #72

white-wolf-tech opened this issue Feb 2, 2024 · 2 comments

Comments

@white-wolf-tech
Copy link

我实验发现。
没有并发的时候,tp=1,tp=2,tp=4。
Tensorrt-LLM推理速度都是高于VLLM。

启用并发的时候,VLLM使用异步IO,启用continuous batching。
Tensorrt-LLM编译也使用了inflight_batching
对比下来,Tensorrt-LLM慢到离谱。
我提了一个issue。里面有详细对比数据:
NVIDIA/TensorRT-LLM#965

@liyunhan
Copy link

liyunhan commented Apr 8, 2024

@x-transformers 大佬,从效果上看哪个更好?我看一些tensorrt-llm测评是掉1~2个点

@white-wolf-tech
Copy link
Author

@x-transformers 大佬,从效果上看哪个更好?我看一些tensorrt-llm测评是掉1~2个点

你可以试试最新的tensorrt-llm,好像是解决了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants