-
Notifications
You must be signed in to change notification settings - Fork 539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama2-70b cmake issue #2009
Comments
Hi @Xi0131 can you please share the command you have tried? Looks like you are running the CM commands for nvidia implementation without the |
Hi, I am using just the command provided on the official website cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev \
--model=llama2-70b-99 \
--implementation=nvidia \
--framework=tensorrt \
--category=datacenter \
--scenario=Offline \
--execution_mode=test \
--device=cuda \
--docker --quiet \
--test_query_count=50 \
--tp_size=2 \
--nvidia_llama2_dataset_file_path=<PATH_TO_PICKLE_FILE> |
yes, Looks like the
and retry the same command. But minimum 8xH100 GPUs might be needed to generate the quantized model here. |
Thanks for the potential solution. I think it should be fine for me to download the model, will try it later. Now I am concerned about the GPU requirements. You have mentioned 8xH100 GPUs are needed, but I only have an A6000 along my side. Is it still possible for me to run through the whole process although the time consumed will be significantly longer? |
Unfortunately no. llama2-70b is a big model needing 70B * 4 size just to store the weights in fp32 which is needed to do the quantization. Even if you generate the quantized model on a different system, running it in fp8 itself will need at least 80GB of GPU memory. |
Thank you for your clarifications! I double-checked the website, I indeed missed out on the hardware requirements. |
Hi, I am currently facing this issue but have no idea how to solve it. I couldn't decide whether it was CMake's, git lfs', or the config's issue. Please help me to figure this out.
Here are the error messages:
The text was updated successfully, but these errors were encountered: