Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error: make -C docker release_build : Command 'git submodule update --init --recursive' returned non-zero exit status 128 #2479

Open
1 of 4 tasks
xddun opened this issue Nov 21, 2024 · 3 comments
Labels
installation triaged Issue has been triaged by maintainers

Comments

@xddun
Copy link

xddun commented Nov 21, 2024

System Info

env:

ubuntu22
RTX3090
Linux euler-MS-7D30 6.8.0-45-generic #45~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Sep 11 15:25:05 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

I wanted to build an image, but unexpectedly encountered an error. My process was as follows in 4steps:

  1. git clone https://github.com/NVIDIA/TensorRT-LLM.git
  2. cd TensorRT-LLM
  3. git lfs pull
  4. make -C docker release_build

error log:

TensorRT-LLM$ make -C docker release_build
make: Entering directory '/data/xiedong/TensorRT-LLM/docker'
Building docker image: tensorrt_llm/release:latest
DOCKER_BUILDKIT=1 docker build --pull  \
        --progress auto \
         --build-arg BASE_IMAGE=nvcr.io/nvidia/pytorch \
         --build-arg BASE_TAG=24.10-py3 \
         --build-arg BUILD_WHEEL_ARGS="--clean --trt_root /usr/local/tensorrt --python_bindings --benchmarks" \
         --build-arg TORCH_INSTALL_TYPE="skip" \
         \
         \
         \
         \
         \
         --build-arg TRT_LLM_VER="0.16.0.dev2024111900" \
         \
         --build-arg GIT_COMMIT="535c9cc6730f5ac999e4b1cb621402b58138f819" \
         --target release \
        --file Dockerfile.multi \
        --tag tensorrt_llm/release:latest \
        ..
[+] Building 3.5s (33/44)                                                                                                                            docker:default
 => [internal] load build definition from Dockerfile.multi                                                                                                     0.0s
 => => transferring dockerfile: 3.98kB                                                                                                                         0.0s
 => WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 6)                                                                                 0.0s
 => WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 14)                                                                                0.0s
 => WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 57)                                                                                0.0s
 => WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 75)                                                                                0.0s
 => [internal] load metadata for nvcr.io/nvidia/pytorch:24.10-py3                                                                                              2.4s
 => [internal] load .dockerignore                                                                                                                              0.1s
 => => transferring context: 257B                                                                                                                              0.0s
 => [internal] load build context                                                                                                                              0.2s
 => => transferring context: 342.26kB                                                                                                                          0.1s
 => [base 1/1] FROM nvcr.io/nvidia/pytorch:24.10-py3@sha256:36555b43d382425a4281ecfbcb41de2f95fb542ca8e531c5486be10df8026f9d                                   0.0s
 => CACHED [devel  1/16] COPY docker/common/install_base.sh install_base.sh                                                                                    0.0s
 => CACHED [devel  2/16] RUN bash ./install_base.sh && rm install_base.sh                                                                                      0.0s
 => CACHED [devel  3/16] COPY docker/common/install_cmake.sh install_cmake.sh                                                                                  0.0s
 => CACHED [devel  4/16] RUN bash ./install_cmake.sh && rm install_cmake.sh                                                                                    0.0s
 => CACHED [devel  5/16] COPY docker/common/install_ccache.sh install_ccache.sh                                                                                0.0s
 => CACHED [devel  6/16] RUN bash ./install_ccache.sh && rm install_ccache.sh                                                                                  0.0s
 => CACHED [devel  7/16] COPY docker/common/install_cuda_toolkit.sh install_cuda_toolkit.sh                                                                    0.0s
 => CACHED [devel  8/16] RUN bash ./install_cuda_toolkit.sh && rm install_cuda_toolkit.sh                                                                      0.0s
 => CACHED [devel  9/16] COPY docker/common/install_tensorrt.sh install_tensorrt.sh                                                                            0.0s
 => CACHED [devel 10/16] RUN bash ./install_tensorrt.sh     --TRT_VER=${TRT_VER}     --CUDA_VER=${CUDA_VER}     --CUDNN_VER=${CUDNN_VER}     --NCCL_VER=${NCC  0.0s
 => CACHED [devel 11/16] COPY docker/common/install_polygraphy.sh install_polygraphy.sh                                                                        0.0s
 => CACHED [devel 12/16] RUN bash ./install_polygraphy.sh && rm install_polygraphy.sh                                                                          0.0s
 => CACHED [devel 13/16] COPY docker/common/install_mpi4py.sh install_mpi4py.sh                                                                                0.0s
 => CACHED [devel 14/16] RUN bash ./install_mpi4py.sh && rm install_mpi4py.sh                                                                                  0.0s
 => CACHED [devel 15/16] COPY docker/common/install_pytorch.sh install_pytorch.sh                                                                              0.0s
 => CACHED [devel 16/16] RUN bash ./install_pytorch.sh skip && rm install_pytorch.sh                                                                           0.0s
 => CACHED [release  1/13] RUN mkdir -p /root/.cache/pip                                                                                                       0.0s
 => CACHED [release  2/13] WORKDIR /app/tensorrt_llm                                                                                                           0.0s
 => CACHED [wheel  1/10] WORKDIR /src/tensorrt_llm                                                                                                             0.0s
 => CACHED [wheel  2/10] COPY benchmarks benchmarks                                                                                                            0.0s
 => CACHED [wheel  3/10] COPY cpp cpp                                                                                                                          0.0s
 => CACHED [wheel  4/10] COPY benchmarks benchmarks                                                                                                            0.0s
 => CACHED [wheel  5/10] COPY scripts scripts                                                                                                                  0.0s
 => CACHED [wheel  6/10] COPY tensorrt_llm tensorrt_llm                                                                                                        0.0s
 => CACHED [wheel  7/10] COPY 3rdparty 3rdparty                                                                                                                0.0s
 => CACHED [wheel  8/10] COPY .gitmodules setup.py requirements.txt requirements-dev.txt ./                                                                    0.0s
 => CACHED [wheel  9/10] RUN mkdir -p /root/.cache/pip /root/.cache/ccache                                                                                     0.0s
 => ERROR [wheel 10/10] RUN --mount=type=cache,target=/root/.cache/pip --mount=type=cache,target=/root/.cache/ccache     python3 scripts/build_wheel.py --cle  0.6s
------
 > [wheel 10/10] RUN --mount=type=cache,target=/root/.cache/pip --mount=type=cache,target=/root/.cache/ccache     python3 scripts/build_wheel.py --clean --trt_root /usr/local/tensorrt --python_bindings --benchmarks:
0.460 fatal: not a git repository (or any of the parent directories): .git
0.460 Traceback (most recent call last):
0.460   File "/src/tensorrt_llm/scripts/build_wheel.py", line 434, in <module>
0.460     main(**vars(args))
0.460   File "/src/tensorrt_llm/scripts/build_wheel.py", line 107, in main
0.460     build_run('git submodule update --init --recursive')
0.460   File "/usr/lib/python3.10/subprocess.py", line 526, in run
0.469     raise CalledProcessError(retcode, process.args,
0.469 subprocess.CalledProcessError: Command 'git submodule update --init --recursive' returned non-zero exit status 128.
------
Dockerfile.multi:72
--------------------
  71 |     ARG BUILD_WHEEL_ARGS="--clean --trt_root /usr/local/tensorrt --python_bindings --benchmarks"
  72 | >>> RUN --mount=type=cache,target=/root/.cache/pip --mount=type=cache,target=/root/.cache/ccache \
  73 | >>>     python3 scripts/build_wheel.py ${BUILD_WHEEL_ARGS}
  74 |
--------------------
ERROR: failed to solve: process "/bin/bash -c python3 scripts/build_wheel.py ${BUILD_WHEEL_ARGS}" did not complete successfully: exit code: 1
make: *** [Makefile:64: release_build] Error 1
make: Leaving directory '/data/xiedong/TensorRT-LLM/docker'

Who can help?

Is it possible to provide a pre-configured image with the environment already set up? Compiling the image is really challenging!

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

env:

ubuntu22
RTX3090
Linux euler-MS-7D30 6.8.0-45-generic #45~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Sep 11 15:25:05 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

I wanted to build an image, but unexpectedly encountered an error. My process was as follows in 4steps:

  1. git clone https://github.com/NVIDIA/TensorRT-LLM.git
  2. cd TensorRT-LLM
  3. git lfs pull
  4. make -C docker release_build

error log:

TensorRT-LLM$ make -C docker release_build
make: Entering directory '/data/xiedong/TensorRT-LLM/docker'
Building docker image: tensorrt_llm/release:latest
DOCKER_BUILDKIT=1 docker build --pull  \
        --progress auto \
         --build-arg BASE_IMAGE=nvcr.io/nvidia/pytorch \
         --build-arg BASE_TAG=24.10-py3 \
         --build-arg BUILD_WHEEL_ARGS="--clean --trt_root /usr/local/tensorrt --python_bindings --benchmarks" \
         --build-arg TORCH_INSTALL_TYPE="skip" \
         \
         \
         \
         \
         \
         --build-arg TRT_LLM_VER="0.16.0.dev2024111900" \
         \
         --build-arg GIT_COMMIT="535c9cc6730f5ac999e4b1cb621402b58138f819" \
         --target release \
        --file Dockerfile.multi \
        --tag tensorrt_llm/release:latest \
        ..
[+] Building 3.5s (33/44)                                                                                                                            docker:default
 => [internal] load build definition from Dockerfile.multi                                                                                                     0.0s
 => => transferring dockerfile: 3.98kB                                                                                                                         0.0s
 => WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 6)                                                                                 0.0s
 => WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 14)                                                                                0.0s
 => WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 57)                                                                                0.0s
 => WARN: FromAsCasing: 'as' and 'FROM' keywords' casing do not match (line 75)                                                                                0.0s
 => [internal] load metadata for nvcr.io/nvidia/pytorch:24.10-py3                                                                                              2.4s
 => [internal] load .dockerignore                                                                                                                              0.1s
 => => transferring context: 257B                                                                                                                              0.0s
 => [internal] load build context                                                                                                                              0.2s
 => => transferring context: 342.26kB                                                                                                                          0.1s
 => [base 1/1] FROM nvcr.io/nvidia/pytorch:24.10-py3@sha256:36555b43d382425a4281ecfbcb41de2f95fb542ca8e531c5486be10df8026f9d                                   0.0s
 => CACHED [devel  1/16] COPY docker/common/install_base.sh install_base.sh                                                                                    0.0s
 => CACHED [devel  2/16] RUN bash ./install_base.sh && rm install_base.sh                                                                                      0.0s
 => CACHED [devel  3/16] COPY docker/common/install_cmake.sh install_cmake.sh                                                                                  0.0s
 => CACHED [devel  4/16] RUN bash ./install_cmake.sh && rm install_cmake.sh                                                                                    0.0s
 => CACHED [devel  5/16] COPY docker/common/install_ccache.sh install_ccache.sh                                                                                0.0s
 => CACHED [devel  6/16] RUN bash ./install_ccache.sh && rm install_ccache.sh                                                                                  0.0s
 => CACHED [devel  7/16] COPY docker/common/install_cuda_toolkit.sh install_cuda_toolkit.sh                                                                    0.0s
 => CACHED [devel  8/16] RUN bash ./install_cuda_toolkit.sh && rm install_cuda_toolkit.sh                                                                      0.0s
 => CACHED [devel  9/16] COPY docker/common/install_tensorrt.sh install_tensorrt.sh                                                                            0.0s
 => CACHED [devel 10/16] RUN bash ./install_tensorrt.sh     --TRT_VER=${TRT_VER}     --CUDA_VER=${CUDA_VER}     --CUDNN_VER=${CUDNN_VER}     --NCCL_VER=${NCC  0.0s
 => CACHED [devel 11/16] COPY docker/common/install_polygraphy.sh install_polygraphy.sh                                                                        0.0s
 => CACHED [devel 12/16] RUN bash ./install_polygraphy.sh && rm install_polygraphy.sh                                                                          0.0s
 => CACHED [devel 13/16] COPY docker/common/install_mpi4py.sh install_mpi4py.sh                                                                                0.0s
 => CACHED [devel 14/16] RUN bash ./install_mpi4py.sh && rm install_mpi4py.sh                                                                                  0.0s
 => CACHED [devel 15/16] COPY docker/common/install_pytorch.sh install_pytorch.sh                                                                              0.0s
 => CACHED [devel 16/16] RUN bash ./install_pytorch.sh skip && rm install_pytorch.sh                                                                           0.0s
 => CACHED [release  1/13] RUN mkdir -p /root/.cache/pip                                                                                                       0.0s
 => CACHED [release  2/13] WORKDIR /app/tensorrt_llm                                                                                                           0.0s
 => CACHED [wheel  1/10] WORKDIR /src/tensorrt_llm                                                                                                             0.0s
 => CACHED [wheel  2/10] COPY benchmarks benchmarks                                                                                                            0.0s
 => CACHED [wheel  3/10] COPY cpp cpp                                                                                                                          0.0s
 => CACHED [wheel  4/10] COPY benchmarks benchmarks                                                                                                            0.0s
 => CACHED [wheel  5/10] COPY scripts scripts                                                                                                                  0.0s
 => CACHED [wheel  6/10] COPY tensorrt_llm tensorrt_llm                                                                                                        0.0s
 => CACHED [wheel  7/10] COPY 3rdparty 3rdparty                                                                                                                0.0s
 => CACHED [wheel  8/10] COPY .gitmodules setup.py requirements.txt requirements-dev.txt ./                                                                    0.0s
 => CACHED [wheel  9/10] RUN mkdir -p /root/.cache/pip /root/.cache/ccache                                                                                     0.0s
 => ERROR [wheel 10/10] RUN --mount=type=cache,target=/root/.cache/pip --mount=type=cache,target=/root/.cache/ccache     python3 scripts/build_wheel.py --cle  0.6s
------
 > [wheel 10/10] RUN --mount=type=cache,target=/root/.cache/pip --mount=type=cache,target=/root/.cache/ccache     python3 scripts/build_wheel.py --clean --trt_root /usr/local/tensorrt --python_bindings --benchmarks:
0.460 fatal: not a git repository (or any of the parent directories): .git
0.460 Traceback (most recent call last):
0.460   File "/src/tensorrt_llm/scripts/build_wheel.py", line 434, in <module>
0.460     main(**vars(args))
0.460   File "/src/tensorrt_llm/scripts/build_wheel.py", line 107, in main
0.460     build_run('git submodule update --init --recursive')
0.460   File "/usr/lib/python3.10/subprocess.py", line 526, in run
0.469     raise CalledProcessError(retcode, process.args,
0.469 subprocess.CalledProcessError: Command 'git submodule update --init --recursive' returned non-zero exit status 128.
------
Dockerfile.multi:72
--------------------
  71 |     ARG BUILD_WHEEL_ARGS="--clean --trt_root /usr/local/tensorrt --python_bindings --benchmarks"
  72 | >>> RUN --mount=type=cache,target=/root/.cache/pip --mount=type=cache,target=/root/.cache/ccache \
  73 | >>>     python3 scripts/build_wheel.py ${BUILD_WHEEL_ARGS}
  74 |
--------------------
ERROR: failed to solve: process "/bin/bash -c python3 scripts/build_wheel.py ${BUILD_WHEEL_ARGS}" did not complete successfully: exit code: 1
make: *** [Makefile:64: release_build] Error 1
make: Leaving directory '/data/xiedong/TensorRT-LLM/docker'

Expected behavior

actual behavior

additional notes

@xddun xddun added the bug Something isn't working label Nov 21, 2024
@hello-11
Copy link
Collaborator

@xddun You can follow this guide.

@hello-11 hello-11 added triaged Issue has been triaged by maintainers installation and removed bug Something isn't working labels Nov 22, 2024
@xddun
Copy link
Author

xddun commented Nov 22, 2024

I follow this page, it works:

https://www.dong-blog.fun/post/1863

By the way, I have an additional question. I noticed that the interface for accessing this Triton deployment is quite stiff. Is there a question-and-answer interface similar to OpenAI's available?

When I access it this way, the model's responses seem to be completing my sentences rather than the usual question-and-answer format.

# curl -X POST http://101.136.8.66:8000/v2/models/ensemble/generate -d '{"text_input": "Who are you?", "max_tokens": 200, "bad_words": "", "stop_words": ""}'


{"model_name":"ensemble","model_version":"1","sequence_end":false,"sequence_id":0,"sequence_index":0,"sequence_start":false,"text_output":"Who are you? Where do you come from? Where are you going? These are the questions that philosophers ponder. For businesses, these three questions are equally important. Where a business comes from determines its genes; where a business is going determines its strategy; and who a business is determines its culture. Corporate culture is the soul of a business and its intrinsic driving force for development. Corporate culture is the sum of the business's values, spirit, system, and code of conduct, and it forms a unique, stable, and distinctive corporate culture system over the course of long-term development.\nCorporate culture is the soul of a business and its intrinsic driving force for development. Corporate culture is the sum of the business's values, spirit, system, and code of conduct, and it forms a unique, stable, and distinctive corporate culture system over the course of long-term development.\nCorporate culture is the intrinsic driving force for a business's development. Corporate culture is the sum of the business's values, spirit, system, and code of conduct, and it forms a unique, stable, and distinctive corporate culture system over the course of long-term development. Corporate culture"}


@lucasavila00
Copy link

To fix the Command 'git submodule update --init --recursive' returned non-zero exit status 128 issue, fetch the submodules before running the make command

git submodule update --init --recursive

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
installation triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants