Triton的显存占用是TensorRT—llm的两倍 #51

lyc728 · 2023-12-22T02:22:28Z

我这边测试qwen-72b的，采用了--weight_only_precision int4 这边采用4张卡进行加载，每张卡占用12G左右，然而Triton进行推理时，每张卡能占用到28G左右，请问下为什么差距是这么大呢？

Tlntin · 2023-12-22T05:35:08Z

trtion那边开infight-batch后，会尽量多占用显存以支持更大并发。

lyc728 · 2023-12-22T09:42:46Z

infight-batch
但是我这边设置了V1 有走infight-batch吗？
parameters: {
key: "gpt_model_type"
value: {
#string_value: "inflight_fused_batching"
string_value: "V1"
}
}

Tlntin · 2023-12-22T09:57:25Z

infight-batch
但是我这边设置了V1 有走infight-batch吗？
parameters: {
key: "gpt_model_type"
value: {
#string_value: "inflight_fused_batching"
string_value: "V1"
}
}

编译的时候有加infight-batch吗？如果有，可以试试关掉，编译的时候设置batch-size等于1试试。还不行可以去trtion官方问问了。

lyc728 · 2023-12-23T03:32:30Z

infight-batch
但是我这边设置了V1 有走infight-batch吗？
parameters: {
key: "gpt_model_type"
value: {
#string_value: "inflight_fused_batching"
string_value: "V1"
}
}

编译的时候有加infight-batch吗？如果有，可以试试关掉，编译的时候设置batch-size等于1试试。还不行可以去trtion官方问问了。

由于我用的两卡，但是它这个端口会被占用
root@4f2bb9ebf657:/models/tensorrtllm_backend/Qwen-TensorRT/qwen# mpirun -n 2 --allow-run-as-root python api.py
Loading engine from /models/tensorrtllm_backend/tensorrt_llm/examples/Qwen-72B_16k/trt_engines/qwen_float16_tp2_rank0.engine
Loading engine from /models/tensorrtllm_backend/tensorrt_llm/examples/Qwen-72B_16k/trt_engines/qwen_float16_tp2_rank1.engine
INFO: Started server process [22325]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO: Started server process [22326]
INFO: Waiting for application startup.
INFO: Application startup complete.
ERROR: [Errno 98] error while attempting to bind on address ('0.0.0.0', 8000): address already in use
INFO: Waiting for application shutdown.
INFO: Application shutdown complete.

Tlntin · 2023-12-23T04:40:48Z

api.py不支持多卡😅

lyc728 · 2023-12-25T09:16:47Z

博主遇到过这个问题吗？
我在终端下是可以运行，但是在编译器就报错了
也添加了环境变量 LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/mpi/lib;OPAL_PREFIX=/opt/hpcx/ompi

Sorry! You were supposed to get help about:
mpi_init:startup:internal-failure
But I couldn't open the help file:
/build-result/hpcx-v2.16-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.18-x86_64/ompi/share/openmpi/help-mpi-runtime.txt: No such file or directory. Sorry!

*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[4f2bb9ebf657:67032] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!

Tlntin · 2023-12-25T09:22:19Z

你是手动编译trt-llm？

lyc728 · 2023-12-25T09:38:51Z

没有我是直接按照你得博客进行操作的 pip install git+https://github.com/NVIDIA/TensorRT-LLM.git@release/0.5.0

Tlntin · 2023-12-25T09:44:41Z

没有我是直接按照你得博客进行操作的 pip install git+https://github.com/NVIDIA/TensorRT-LLM.git@release/0.5.0

哦，了解了，这个可能是mpi库升级了导致的，你可以试试下面的操作：

手动按照mpi库

apt update
apt install libopenmpi-dev
pip install https://github.com/Shixiaowei02/mpi4py/tarball/fix-setuptools-version

再去安装上面的

pip install git+https://github.com/NVIDIA/TensorRT-LLM.git@release/0.5.0

lyc728 · 2023-12-25T09:47:42Z

没有我是直接按照你得博客进行操作的 pip install git+https://github.com/NVIDIA/TensorRT-LLM.git@release/0.5.0

哦，了解了，这个可能是mpi库升级了导致的，你可以试试下面的操作：

手动按照mpi库
apt update
apt install libopenmpi-dev
pip install https://github.com/Shixiaowei02/mpi4py/tarball/fix-setuptools-version
再去安装上面的
pip install git+https://github.com/NVIDIA/TensorRT-LLM.git@release/0.5.0

可是我终端是可以运行的，只是在vscode上面运行不行，现在我怀疑的是环境变量导致的

Tlntin · 2023-12-25T09:48:19Z

没有我是直接按照你得博客进行操作的 pip install git+https://github.com/NVIDIA/TensorRT-LLM.git@release/0.5.0

哦，了解了，这个可能是mpi库升级了导致的，你可以试试下面的操作：

手动按照mpi库
apt update
apt install libopenmpi-dev
pip install https://github.com/Shixiaowei02/mpi4py/tarball/fix-setuptools-version
再去安装上面的
pip install git+https://github.com/NVIDIA/TensorRT-LLM.git@release/0.5.0
可是我终端是可以运行的，只是在vscode上面运行不行，现在我怀疑的是环境变量导致的

哦哦，好吧，那就不知道了。

lyc728 · 2023-12-26T01:54:17Z

0.6.1版本的trt-llm，main分支的qwen-trt-llm, build用了python build.py --hf_model_dir /data/llms/Qwen-72B-Chat/
--dtype float16
--remove_input_padding
--use_gpt_attention_plugin float16
--use_gemm_plugin float16
--enable_context_fmha
--use_weight_only
--rotary_base 1000000
--weight_only_precision int4
--output_dir /data/qwen_test/examples/qwen/out_engine_72B_2gpu_12/
--world_size 2
--tp_size 2
run没修改直接用 mpirun -n 2 --allow-run-as-root python3 run.py
现在运行后出现了下面的报错，博主遇到过吗？

Tlntin · 2023-12-26T02:02:02Z

0.6.1版本的trt-llm，main分支的qwen-trt-llm, build用了python build.py --hf_model_dir /data/llms/Qwen-72B-Chat/ --dtype float16 --remove_input_padding --use_gpt_attention_plugin float16 --use_gemm_plugin float16 --enable_context_fmha --use_weight_only --rotary_base 1000000 --weight_only_precision int4 --output_dir /data/qwen_test/examples/qwen/out_engine_72B_2gpu_12/ --world_size 2 --tp_size 2 run没修改直接用 mpirun -n 2 --allow-run-as-root python3 run.py 现在运行后出现了下面的报错，博主遇到过吗？

可以去试试0.5.0，0.6.1目前测试还不充分，有一些隐藏问题。

lyc728 · 2023-12-26T02:06:01Z

你好，麻烦再确认下，这两个分支是已经对齐测试完成了对吧

Tlntin · 2023-12-26T02:08:45Z

是的，对齐的。0.6.1目前来看还不算稳定版，没有triton对应。目前最新的trtion又将trt-llm直接升级到0.7.0，说实话有点小坑，所以建议还是用着0.5.0先。

lyc728 · 2023-12-26T02:14:32Z

你好，再确认一个问题，我用NVIDIA/TensorRT-LLM v0.6.0及以上，同你的0.5.0的版本，TRT推出来的结果是不一致的，请问你知道这边有什么不同吗？

Tlntin · 2023-12-26T02:15:55Z

你好，再确认一个问题，我用NVIDIA/TensorRT-LLM v0.6.0及以上，同你的0.5.0的版本，TRT推出来的结果是不一致的，请问你知道这边有什么不同吗？

应该是参数配置问题，我这边有改过一次默认参数，同原版做过对齐。对应的commit.

lyc728 · 2023-12-26T02:21:51Z

好的我再试下

Tlntin · 2023-12-29T02:31:44Z

0.6.1版本的trt-llm，main分支的qwen-trt-llm, build用了python build.py --hf_model_dir /data/llms/Qwen-72B-Chat/ --dtype float16 --remove_input_padding --use_gpt_attention_plugin float16 --use_gemm_plugin float16 --enable_context_fmha --use_weight_only --rotary_base 1000000 --weight_only_precision int4 --output_dir /data/qwen_test/examples/qwen/out_engine_72B_2gpu_12/ --world_size 2 --tp_size 2 run没修改直接用 mpirun -n 2 --allow-run-as-root python3 run.py 现在运行后出现了下面的报错，博主遇到过吗？

重新看了一下你的运行命令，上面的命令你的Engine输出路径是/data/qwen_test/examples/qwen/out_engine_72B_2gpu_12/，模型路径是在/data/llms/Qwen-72B-Chat/，你这俩路径和我设置的默认路径是不一样的，所以不能直接跑run.py，而是需要指定run.py需要的hf词表路径和trt engine路径才行，否则会走默认的路径导致报错。
我刚刚本地测试了最新main分支的代码，目前是正常的，所以这个还是你操作的问题。
建议run.py运行命令修改为：

mpirun -n 2 --allow-run-as-root python3 run.py --engine_dir /data/qwen_test/examples/qwen/out_engine_72B_2gpu_12/ \
  --tokenizer_dir /data/llms/Qwen-72B-Chat/

Tlntin · 2024-06-12T06:22:42Z

@lyc728 测试了最新的trt-llm 0.10.0和配套的tritonserver后，显存占用过多问题已经解决了，你可以试试这个，使用方法基本和现在的0.8.0差不多。

Tlntin closed this as completed Dec 29, 2023

Tlntin reopened this Jun 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Triton的显存占用是TensorRT—llm的两倍 #51

Triton的显存占用是TensorRT—llm的两倍 #51

lyc728 commented Dec 22, 2023

Tlntin commented Dec 22, 2023

lyc728 commented Dec 22, 2023

Tlntin commented Dec 22, 2023

lyc728 commented Dec 23, 2023

Tlntin commented Dec 23, 2023

lyc728 commented Dec 25, 2023 •

edited

Loading

Tlntin commented Dec 25, 2023

lyc728 commented Dec 25, 2023

Tlntin commented Dec 25, 2023

lyc728 commented Dec 25, 2023

Tlntin commented Dec 25, 2023

lyc728 commented Dec 26, 2023

Tlntin commented Dec 26, 2023

lyc728 commented Dec 26, 2023

Tlntin commented Dec 26, 2023

lyc728 commented Dec 26, 2023

Tlntin commented Dec 26, 2023 •

edited

Loading

lyc728 commented Dec 26, 2023

Tlntin commented Dec 29, 2023

Tlntin commented Jun 12, 2024

Triton的显存占用是TensorRT—llm的两倍 #51

Triton的显存占用是TensorRT—llm的两倍 #51

Comments

lyc728 commented Dec 22, 2023

Tlntin commented Dec 22, 2023

lyc728 commented Dec 22, 2023

Tlntin commented Dec 22, 2023

lyc728 commented Dec 23, 2023

Tlntin commented Dec 23, 2023

lyc728 commented Dec 25, 2023 • edited Loading

Sorry! You were supposed to get help about: mpi_init:startup:internal-failure But I couldn't open the help file: /build-result/hpcx-v2.16-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.18-x86_64/ompi/share/openmpi/help-mpi-runtime.txt: No such file or directory. Sorry!

Tlntin commented Dec 25, 2023

lyc728 commented Dec 25, 2023

Tlntin commented Dec 25, 2023

lyc728 commented Dec 25, 2023

Tlntin commented Dec 25, 2023

lyc728 commented Dec 26, 2023

Tlntin commented Dec 26, 2023

lyc728 commented Dec 26, 2023

Tlntin commented Dec 26, 2023

lyc728 commented Dec 26, 2023

Tlntin commented Dec 26, 2023 • edited Loading

lyc728 commented Dec 26, 2023

Tlntin commented Dec 29, 2023

Tlntin commented Jun 12, 2024

lyc728 commented Dec 25, 2023 •

edited

Loading

Sorry! You were supposed to get help about:
mpi_init:startup:internal-failure
But I couldn't open the help file:
/build-result/hpcx-v2.16-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.18-x86_64/ompi/share/openmpi/help-mpi-runtime.txt: No such file or directory. Sorry!

Tlntin commented Dec 26, 2023 •

edited

Loading