llama2-70b cmake issue #2009

Xi0131 · 2025-01-02T17:13:21Z

Hi, I am currently facing this issue but have no idea how to solve it. I couldn't decide whether it was CMake's, git lfs', or the config's issue. Please help me to figure this out.

Here are the error messages:

CMake Error at tensorrt_llm/CMakeLists.txt:101 (message):
  The batch manager library is truncated or incomplete.  This is usually
  caused by using Git LFS (Large File Storage) incorrectly.  Please try
  running command `git lfs install && git lfs pull`.


-- Configuring incomplete, errors occurred!
Traceback (most recent call last):
  File "/code/tensorrt_llm/scripts/build_wheel.py", line 319, in <module>
    main(**vars(args))
  File "/code/tensorrt_llm/scripts/build_wheel.py", line 160, in main
    build_run(
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'cmake -DCMAKE_BUILD_TYPE="Release" -DBUILD_PYT="ON" -DBUILD_PYBIND="ON" "-DCMAKE_CUDA_ARCHITECTURES=86" -DTRT_LIB_DIR=/usr/local/tensorrt//targets/x86_64-linux-gnu/lib -DTRT_INCLUDE_DIR=/usr/local/tensorrt//include  -S "/code/tensorrt_llm/cpp"' returned non-zero exit status 1.
make: *** [Makefile:102: devel_run] Error 1
make: Leaving directory '/home/mhw/CM/repos/local/cache/0ddf209540f44cbc/repo/docker'

CM error: Portable CM script failed (name = get-ml-model-llama2, return code = 256)

arjunsuresh · 2025-01-02T17:16:51Z

Hi @Xi0131 can you please share the command you have tried? Looks like you are running the CM commands for nvidia implementation without the --docker flag.

Xi0131 · 2025-01-02T17:24:05Z

Hi, I am using just the command provided on the official website

cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev \
   --model=llama2-70b-99 \
   --implementation=nvidia \
   --framework=tensorrt \
   --category=datacenter \
   --scenario=Offline \
   --execution_mode=test \
   --device=cuda  \
   --docker --quiet \
   --test_query_count=50 \
   --tp_size=2 \
   --nvidia_llama2_dataset_file_path=<PATH_TO_PICKLE_FILE>

arjunsuresh · 2025-01-02T17:56:24Z

yes, --docker is there and so that part is fine.

Looks like the git checkout failed. The git clone of the model can take many hours here - it is about 500G. You can do

rm -rf /home/mhw/CM/repos/local/cache/0ddf209540f44cbc/repo

and retry the same command. But minimum 8xH100 GPUs might be needed to generate the quantized model here.

Xi0131 · 2025-01-02T18:21:49Z

Thanks for the potential solution. I think it should be fine for me to download the model, will try it later.

Now I am concerned about the GPU requirements. You have mentioned 8xH100 GPUs are needed, but I only have an A6000 along my side. Is it still possible for me to run through the whole process although the time consumed will be significantly longer?

arjunsuresh · 2025-01-02T18:35:47Z

Unfortunately no. llama2-70b is a big model needing 70B * 4 size just to store the weights in fp32 which is needed to do the quantization. Even if you generate the quantized model on a different system, running it in fp8 itself will need at least 80GB of GPU memory.

Xi0131 · 2025-01-02T18:57:45Z

Thank you for your clarifications! I double-checked the website, I indeed missed out on the hardware requirements.

Xi0131 closed this as completed Jan 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama2-70b cmake issue #2009

llama2-70b cmake issue #2009

Xi0131 commented Jan 2, 2025

arjunsuresh commented Jan 2, 2025

Xi0131 commented Jan 2, 2025

arjunsuresh commented Jan 2, 2025

Xi0131 commented Jan 2, 2025

arjunsuresh commented Jan 2, 2025

Xi0131 commented Jan 2, 2025

llama2-70b cmake issue #2009

llama2-70b cmake issue #2009

Comments

Xi0131 commented Jan 2, 2025

arjunsuresh commented Jan 2, 2025

Xi0131 commented Jan 2, 2025

arjunsuresh commented Jan 2, 2025

Xi0131 commented Jan 2, 2025

arjunsuresh commented Jan 2, 2025

Xi0131 commented Jan 2, 2025