Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build trtllm very slow and raise an error #2469

Open
2 of 4 tasks
anaivebird opened this issue Nov 20, 2024 · 4 comments
Open
2 of 4 tasks

build trtllm very slow and raise an error #2469

anaivebird opened this issue Nov 20, 2024 · 4 comments
Assignees
Labels
bug Something isn't working triaged Issue has been triaged by maintainers

Comments

@anaivebird
Copy link

anaivebird commented Nov 20, 2024

System Info

  • GPU: NVIDIA H100 80G
  • TensorRT-LLM branch main
  • TensorRT-LLM commit: 535c9cc

Who can help?

@byshiue @Superjomn

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

apt-get update && apt-get -y install git git-lfs
git lfs install

git clone https://github.com/NVIDIA/TensorRT-LLM.git
cd TensorRT-LLM
git submodule update --init --recursive
git lfs pull
BUILD_WHEEL_ARGS="--trt_root /usr/local/tensorrt --python_bindings --benchmarks --cuda_architectures 90 -j8" python3 scripts/build_wheel.py ${BUILD_WHEEL_ARGS}

Expected behavior

build in less than one hour

actual behavior

build slow more than 2 hours


[100%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/kernelDispatcherFp16Int8GroupwiseColumnMajorFalse.cu.o
[100%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/kernelDispatcherFp16Int8GroupwiseColumnMajorInterleavedTrue.cu.o
[100%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/kernelDispatcherFp16Int8PerChannelColumnMajorFalse.cu.o
[100%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/kernelDispatcherFp16Int8PerChannelColumnMajorInterleavedTrue.cu.o
/home/work/xingwuFileSystem/qserve_trtllm/TensorRT-LLM/cpp/tensorrt_llm/kernels/quantization.cu(280): warning #780-D: reference is to variable "i" (declared at line 260) -- under old for-init scoping rules it would have been variable "i" (declared at line 265)
              smemBuffer[i] = vec;
                         ^
          detected during instantiation of "void tensorrt_llm::kernels::invokePerTokenQuantization(QuantT *, const T *, int64_t, int64_t, const float *, float *, float *, tensorrt_llm::common::QuantMode, cudaStream_t) [with T=float, QuantT=int8_t]" at line 354

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

[100%] Building CUDA object tensorrt_llm/kernels/CMakeFiles/kernels_src.dir/weightOnlyBatchedGemv/kernelDispatcherFp16Int8PerChannelColumnMajorTrue.cu.o
[100%] Built target context_attention_src
[100%] Linking CUDA device code CMakeFiles/cutlass_src.dir/cmake_device_link.o
[100%] Linking CXX static library libcutlass_src.a
[100%] Built target cutlass_src
[100%] Linking CUDA device code CMakeFiles/gemm_swiglu_sm90_src.dir/cmake_device_link.o
[100%] Linking CUDA static library libgemm_swiglu_sm90_src.a
[100%] Built target gemm_swiglu_sm90_src
[100%] Built target selective_scan_src

additional notes

lots of process when compiling, with ptxas -arch sm_80 which is unrelated to sm90 even when I use --cuda_architectures 90

ps.txt

@anaivebird anaivebird added the bug Something isn't working label Nov 20, 2024
@anaivebird
Copy link
Author

it raised an error

[  0%] Generating .check_symbol
[  0%] Generating .check_symbol_executor
[  0%] Generating .check_symbol_internal_cutlass_kernels
[  0%] Built target gemm_swiglu_sm90_src
[  0%] Built target fb_gemm_src
[  0%] Built target check_symbol
[  0%] Built target check_symbol_executor
[  0%] Built target check_symbol_internal_cutlass_kernels
[  0%] Built target cutlass_src
[  1%] Built target selective_scan_src
[  2%] Built target common_src
[  2%] Built target layers_src
[  3%] Built target moe_gemm_src
[  4%] Built target fpA_intB_gemm_src
[  4%] Building CXX object tensorrt_llm/runtime/CMakeFiles/runtime_src.dir/tllmRuntime.cpp.o
[  5%] Built target decoder_attention
/home/work/xingwuFileSystem/qserve_trtllm/TensorRT-LLM/cpp/tensorrt_llm/runtime/tllmRuntime.cpp: In function ‘void {anonymous}::setWeightStreaming(nvinfer1::ICudaEngine&, float)’:
/home/work/xingwuFileSystem/qserve_trtllm/TensorRT-LLM/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:113:16: error: ‘class nvinfer1::ICudaEngine’ has no member named ‘setWeightStreamingBudgetV2’; did you mean ‘setWeightStreamingBudget’?
  113 |         engine.setWeightStreamingBudgetV2(budget);
      |                ^~~~~~~~~~~~~~~~~~~~~~~~~~
      |                setWeightStreamingBudget
/home/work/xingwuFileSystem/qserve_trtllm/TensorRT-LLM/cpp/tensorrt_llm/runtime/tllmRuntime.cpp: In constructor ‘tensorrt_llm::runtime::TllmRuntime::TllmRuntime(const tensorrt_llm::runtime::RawEngine&, nvinfer1::ILogger*, float, bool)’:
/home/work/xingwuFileSystem/qserve_trtllm/TensorRT-LLM/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:242:41: error: ‘class nvinfer1::ICudaEngine’ has no member named ‘getDeviceMemorySizeV2’; did you mean ‘getDeviceMemorySize’?
  242 |     auto const devMemorySize = mEngine->getDeviceMemorySizeV2();
      |                                         ^~~~~~~~~~~~~~~~~~~~~
      |                                         getDeviceMemorySize
/home/work/xingwuFileSystem/qserve_trtllm/TensorRT-LLM/cpp/tensorrt_llm/runtime/tllmRuntime.cpp: In member function ‘nvinfer1::IExecutionContext& tensorrt_llm::runtime::TllmRuntime::addContext(int32_t)’:
/home/work/xingwuFileSystem/qserve_trtllm/TensorRT-LLM/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:284:13: error: ‘class nvinfer1::IExecutionContext’ has no member named ‘setDeviceMemoryV2’; did you mean ‘setDeviceMemory’?
  284 |     context.setDeviceMemoryV2(mEngineBuffer->data(), static_cast<int64_t>(mEngineBuffer->getCapacity()));
      |             ^~~~~~~~~~~~~~~~~
      |             setDeviceMemory
gmake[3]: *** [tensorrt_llm/runtime/CMakeFiles/runtime_src.dir/build.make:527: tensorrt_llm/runtime/CMakeFiles/runtime_src.dir/tllmRuntime.cpp.o] Error 1
gmake[2]: *** [CMakeFiles/Makefile2:1935: tensorrt_llm/runtime/CMakeFiles/runtime_src.dir/all] Error 2
gmake[2]: *** Waiting for unfinished jobs....
[ 23%] Built target decoder_attention_src
[ 63%] Built target kernels_src
[ 98%] Built target context_attention_src
gmake[1]: *** [CMakeFiles/Makefile2:1537: tensorrt_llm/CMakeFiles/tensorrt_llm.dir/rule] Error 2
gmake: *** [Makefile:218: tensorrt_llm] Error 2
Traceback (most recent call last):
  File "/home/work/xingwuFileSystem/qserve_trtllm/TensorRT-LLM/scripts/build_wheel.py", line 434, in <module>
    main(**vars(args))
  File "/home/work/xingwuFileSystem/qserve_trtllm/TensorRT-LLM/scripts/build_wheel.py", line 208, in main
    build_run(
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'cmake --build . --config Release --parallel 192 --target tensorrt_llm nvinfer_plugin_tensorrt_llm th_common bindings   executorWorker  ' returned non-zero exit status 2.

@anaivebird anaivebird changed the title build trtllm very slow build trtllm very slow and raise an error Nov 20, 2024
@byshiue
Copy link
Collaborator

byshiue commented Nov 20, 2024

Thank you to report the issue. It is a bug at https://github.com/NVIDIA/TensorRT-LLM/blob/main/cpp/tensorrt_llm/kernels/CMakeLists.txt#L25. We should replace SRC_CU by SRC_CPP. We will fix it ASAP.

@anaivebird
Copy link
Author

change this will fix which bug? compiling is slow, or the error?

@hello-11 hello-11 added the triaged Issue has been triaged by maintainers label Nov 20, 2024
@byshiue
Copy link
Collaborator

byshiue commented Nov 20, 2024

It fixes the slow compiling.

For the error, please create another bug if it is not related to the issue above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants