build trtllm very slow and raise an error #2469

anaivebird · 2024-11-20T03:25:53Z

System Info

GPU： NVIDIA H100 80G
TensorRT-LLM branch main
TensorRT-LLM commit: 535c9cc

Who can help?

@byshiue @Superjomn

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

apt-get update && apt-get -y install git git-lfs
git lfs install

git clone https://github.com/NVIDIA/TensorRT-LLM.git
cd TensorRT-LLM
git submodule update --init --recursive
git lfs pull
BUILD_WHEEL_ARGS="--trt_root /usr/local/tensorrt --python_bindings --benchmarks --cuda_architectures 90 -j8" python3 scripts/build_wheel.py ${BUILD_WHEEL_ARGS}

Expected behavior

build in less than one hour

actual behavior

build slow more than 2 hours


[100%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/kernelDispatcherFp16Int8GroupwiseColumnMajorFalse.cu.o
[100%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/kernelDispatcherFp16Int8GroupwiseColumnMajorInterleavedTrue.cu.o
[100%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/kernelDispatcherFp16Int8PerChannelColumnMajorFalse.cu.o
[100%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/kernelDispatcherFp16Int8PerChannelColumnMajorInterleavedTrue.cu.o
/home/work/xingwuFileSystem/qserve_trtllm/TensorRT-LLM/cpp/tensorrt_llm/kernels/quantization.cu(280): warning #780-D: reference is to variable "i" (declared at line 260) -- under old for-init scoping rules it would have been variable "i" (declared at line 265)
              smemBuffer[i] = vec;
                         ^
          detected during instantiation of "void tensorrt_llm::kernels::invokePerTokenQuantization(QuantT *, const T *, int64_t, int64_t, const float *, float *, float *, tensorrt_llm::common::QuantMode, cudaStream_t) [with T=float, QuantT=int8_t]" at line 354

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

[100%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/kernelDispatcherFp16Int8PerChannelColumnMajorTrue.cu.o
[100%] Built target context_attention_src
[100%] Linking CUDA device code CMakeFiles/cutlass_src.dir/cmake_device_link.o
[100%] Linking CXX static library libcutlass_src.a
[100%] Built target cutlass_src
[100%] Linking CUDA device code CMakeFiles/gemm_swiglu_sm90_src.dir/cmake_device_link.o
[100%] Linking CUDA static library libgemm_swiglu_sm90_src.a
[100%] Built target gemm_swiglu_sm90_src
[100%] Built target selective_scan_src

additional notes

lots of process when compiling, with ptxas -arch sm_80 which is unrelated to sm90 even when I use --cuda_architectures 90

ps.txt

The text was updated successfully, but these errors were encountered:

anaivebird · 2024-11-20T06:07:28Z

it raised an error

[  0%] Generating .check_symbol
[  0%] Generating .check_symbol_executor
[  0%] Generating .check_symbol_internal_cutlass_kernels
[  0%] Built target gemm_swiglu_sm90_src
[  0%] Built target fb_gemm_src
[  0%] Built target check_symbol
[  0%] Built target check_symbol_executor
[  0%] Built target check_symbol_internal_cutlass_kernels
[  0%] Built target cutlass_src
[  1%] Built target selective_scan_src
[  2%] Built target common_src
[  2%] Built target layers_src
[  3%] Built target moe_gemm_src
[  4%] Built target fpA_intB_gemm_src
[  4%] Building CXX object tensorrt_llm/runtime/CMakeFiles/runtime_src.dir/tllmRuntime.cpp.o
[  5%] Built target decoder_attention
/home/work/xingwuFileSystem/qserve_trtllm/TensorRT-LLM/cpp/tensorrt_llm/runtime/tllmRuntime.cpp: In function ‘void {anonymous}::setWeightStreaming(nvinfer1::ICudaEngine&, float)’:
/home/work/xingwuFileSystem/qserve_trtllm/TensorRT-LLM/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:113:16: error: ‘class nvinfer1::ICudaEngine’ has no member named ‘setWeightStreamingBudgetV2’; did you mean ‘setWeightStreamingBudget’?
  113 |         engine.setWeightStreamingBudgetV2(budget);
      |                ^~~~~~~~~~~~~~~~~~~~~~~~~~
      |                setWeightStreamingBudget
/home/work/xingwuFileSystem/qserve_trtllm/TensorRT-LLM/cpp/tensorrt_llm/runtime/tllmRuntime.cpp: In constructor ‘tensorrt_llm::runtime::TllmRuntime::TllmRuntime(const tensorrt_llm::runtime::RawEngine&, nvinfer1::ILogger*, float, bool)’:
/home/work/xingwuFileSystem/qserve_trtllm/TensorRT-LLM/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:242:41: error: ‘class nvinfer1::ICudaEngine’ has no member named ‘getDeviceMemorySizeV2’; did you mean ‘getDeviceMemorySize’?
  242 |     auto const devMemorySize = mEngine->getDeviceMemorySizeV2();
      |                                         ^~~~~~~~~~~~~~~~~~~~~
      |                                         getDeviceMemorySize
/home/work/xingwuFileSystem/qserve_trtllm/TensorRT-LLM/cpp/tensorrt_llm/runtime/tllmRuntime.cpp: In member function ‘nvinfer1::IExecutionContext& tensorrt_llm::runtime::TllmRuntime::addContext(int32_t)’:
/home/work/xingwuFileSystem/qserve_trtllm/TensorRT-LLM/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:284:13: error: ‘class nvinfer1::IExecutionContext’ has no member named ‘setDeviceMemoryV2’; did you mean ‘setDeviceMemory’?
  284 |     context.setDeviceMemoryV2(mEngineBuffer->data(), static_cast<int64_t>(mEngineBuffer->getCapacity()));
      |             ^~~~~~~~~~~~~~~~~
      |             setDeviceMemory
gmake[3]: *** [tensorrt_llm/runtime/CMakeFiles/runtime_src.dir/build.make:527: tensorrt_llm/runtime/CMakeFiles/runtime_src.dir/tllmRuntime.cpp.o] Error 1
gmake[2]: *** [CMakeFiles/Makefile2:1935: tensorrt_llm/runtime/CMakeFiles/runtime_src.dir/all] Error 2
gmake[2]: *** Waiting for unfinished jobs....
[ 23%] Built target decoder_attention_src
[ 63%] Built target kernels_src
[ 98%] Built target context_attention_src
gmake[1]: *** [CMakeFiles/Makefile2:1537: tensorrt_llm/CMakeFiles/tensorrt_llm.dir/rule] Error 2
gmake: *** [Makefile:218: tensorrt_llm] Error 2
Traceback (most recent call last):
  File "/home/work/xingwuFileSystem/qserve_trtllm/TensorRT-LLM/scripts/build_wheel.py", line 434, in <module>
    main(**vars(args))
  File "/home/work/xingwuFileSystem/qserve_trtllm/TensorRT-LLM/scripts/build_wheel.py", line 208, in main
    build_run(
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'cmake --build . --config Release --parallel 192 --target tensorrt_llm nvinfer_plugin_tensorrt_llm th_common bindings   executorWorker  ' returned non-zero exit status 2.

byshiue · 2024-11-20T07:38:34Z

Thank you to report the issue. It is a bug at https://github.com/NVIDIA/TensorRT-LLM/blob/main/cpp/tensorrt_llm/kernels/CMakeLists.txt#L25. We should replace SRC_CU by SRC_CPP. We will fix it ASAP.

anaivebird · 2024-11-20T07:44:20Z

change this will fix which bug? compiling is slow, or the error?

byshiue · 2024-11-20T08:42:40Z

It fixes the slow compiling.

For the error, please create another bug if it is not related to the issue above.

anaivebird added the bug Something isn't working label Nov 20, 2024

anaivebird changed the title ~~build trtllm very slow~~ build trtllm very slow and raise an error Nov 20, 2024

hello-11 assigned byshiue Nov 20, 2024

hello-11 added the triaged Issue has been triaged by maintainers label Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build trtllm very slow and raise an error #2469

build trtllm very slow and raise an error #2469

anaivebird commented Nov 20, 2024 •

edited

Loading

anaivebird commented Nov 20, 2024

byshiue commented Nov 20, 2024

anaivebird commented Nov 20, 2024

byshiue commented Nov 20, 2024

build trtllm very slow and raise an error #2469

build trtllm very slow and raise an error #2469

Comments

anaivebird commented Nov 20, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

anaivebird commented Nov 20, 2024

byshiue commented Nov 20, 2024

anaivebird commented Nov 20, 2024

byshiue commented Nov 20, 2024

anaivebird commented Nov 20, 2024 •

edited

Loading