-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark torchao and torch.compile (need torch 2.5) #1874
Comments
@jerryzh168 When this configuration is used for |
[2024-11-03 00:41:08 TP0] Init torch distributed begin. When I tried to follow your instructions, I got this error. |
@CortexEdgeUser that seems reasonable since https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/model_loader/loader.py does not have |
I used a commit from before the changes, and it worked well; however, I’m encountering issues with tp > 1, but this is also the case with vLLM. |
vllm updated to use pytorch 2.5 recently, so we can benchmark torchao with torch.compile now (previously blocked by 2.5 update)
install most recent vllm:
pip install https://vllm-wheels.s3.us-west-2.amazonaws.com/nightly/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl
make some small modifications to sglang:
https://gist.github.com/jerryzh168/bd65f122f24d5c92525f2504a1ff5870
install sglang from source
Benchmark
int8wo no compile:
python3 -m sglang.bench_latency --model meta-llama/Meta-Llama-3-8B --batch-size 1 --input 128 --output 8 --torchao-config int8wo
int8wo with compile
python3 -m sglang.bench_latency --model meta-llama/Meta-Llama-3-8B --batch-size 1 --input 128 --output 8 --torchao-config int8wo --enable-torch-compile
The text was updated successfully, but these errors were encountered: