Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama2-70b cmake issue #2009

Closed
Xi0131 opened this issue Jan 2, 2025 · 6 comments
Closed

llama2-70b cmake issue #2009

Xi0131 opened this issue Jan 2, 2025 · 6 comments

Comments

@Xi0131
Copy link

Xi0131 commented Jan 2, 2025

Hi, I am currently facing this issue but have no idea how to solve it. I couldn't decide whether it was CMake's, git lfs', or the config's issue. Please help me to figure this out.

Here are the error messages:

CMake Error at tensorrt_llm/CMakeLists.txt:101 (message):
  The batch manager library is truncated or incomplete.  This is usually
  caused by using Git LFS (Large File Storage) incorrectly.  Please try
  running command `git lfs install && git lfs pull`.


-- Configuring incomplete, errors occurred!
Traceback (most recent call last):
  File "/code/tensorrt_llm/scripts/build_wheel.py", line 319, in <module>
    main(**vars(args))
  File "/code/tensorrt_llm/scripts/build_wheel.py", line 160, in main
    build_run(
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'cmake -DCMAKE_BUILD_TYPE="Release" -DBUILD_PYT="ON" -DBUILD_PYBIND="ON" "-DCMAKE_CUDA_ARCHITECTURES=86" -DTRT_LIB_DIR=/usr/local/tensorrt//targets/x86_64-linux-gnu/lib -DTRT_INCLUDE_DIR=/usr/local/tensorrt//include  -S "/code/tensorrt_llm/cpp"' returned non-zero exit status 1.
make: *** [Makefile:102: devel_run] Error 1
make: Leaving directory '/home/mhw/CM/repos/local/cache/0ddf209540f44cbc/repo/docker'

CM error: Portable CM script failed (name = get-ml-model-llama2, return code = 256)
@arjunsuresh
Copy link
Contributor

Hi @Xi0131 can you please share the command you have tried? Looks like you are running the CM commands for nvidia implementation without the --docker flag.

@Xi0131
Copy link
Author

Xi0131 commented Jan 2, 2025

Hi, I am using just the command provided on the official website

cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev \
   --model=llama2-70b-99 \
   --implementation=nvidia \
   --framework=tensorrt \
   --category=datacenter \
   --scenario=Offline \
   --execution_mode=test \
   --device=cuda  \
   --docker --quiet \
   --test_query_count=50 \
   --tp_size=2 \
   --nvidia_llama2_dataset_file_path=<PATH_TO_PICKLE_FILE>

@arjunsuresh
Copy link
Contributor

yes, --docker is there and so that part is fine.

Looks like the git checkout failed. The git clone of the model can take many hours here - it is about 500G. You can do

rm -rf /home/mhw/CM/repos/local/cache/0ddf209540f44cbc/repo

and retry the same command. But minimum 8xH100 GPUs might be needed to generate the quantized model here.

@Xi0131
Copy link
Author

Xi0131 commented Jan 2, 2025

Thanks for the potential solution. I think it should be fine for me to download the model, will try it later.

Now I am concerned about the GPU requirements. You have mentioned 8xH100 GPUs are needed, but I only have an A6000 along my side. Is it still possible for me to run through the whole process although the time consumed will be significantly longer?

@arjunsuresh
Copy link
Contributor

Unfortunately no. llama2-70b is a big model needing 70B * 4 size just to store the weights in fp32 which is needed to do the quantization. Even if you generate the quantized model on a different system, running it in fp8 itself will need at least 80GB of GPU memory.

@Xi0131
Copy link
Author

Xi0131 commented Jan 2, 2025

Thank you for your clarifications! I double-checked the website, I indeed missed out on the hardware requirements.

@Xi0131 Xi0131 closed this as completed Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants