Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update CM run commands for llama3_1-405b #2019

Merged
merged 3 commits into from
Jan 7, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 34 additions & 3 deletions language/llama3.1-405b/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,11 @@

Please see the [new docs site](https://docs.mlcommons.org/inference/benchmarks/language/llama3.1-405b) for an automated way to run this benchmark across different available implementations and do an end-to-end submission with or without docker.

## Automated command to run the benchmark via MLCommons CM

Please see the [new docs site](https://docs.mlcommons.org/inference/benchmarks/language/llama3_1-405b/) for an automated way to run this benchmark across different available implementations and do an end-to-end submission with or without docker.

You can also do pip install cm4mlops and then use cm commands for downloading the model and datasets using the commands given in the later sections.

## Prepare environment

Expand Down Expand Up @@ -109,6 +114,15 @@ git clone https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct ${CHECKPOINT
cd ${CHECKPOINT_PATH} && git checkout be673f326cab4cd22ccfef76109faf68e41aa5f1
```

### Download model through CM (Collective Minds)

```
cm run script --tags=get,ml-model,llama3 --outdirname=<path_to_download> --hf_token=<huggingface access token> -j
```

**Note:**
Downloading llama3.1-405B model from Hugging Face will require an [**access token**](https://huggingface.co/settings/tokens) which could be generated for your account. Additionally, ensure that your account has access to the [llama3.1-405B](https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct) model.

## Get Dataset

### Preprocessed
Expand Down Expand Up @@ -136,6 +150,19 @@ You can also download the calibration dataset from the Cloudflare R2 bucket by r
rclone copy mlc-inference:mlcommons-inference-wg-public/llama3.1_405b/mlperf_llama3.1_405b_calibration_dataset_512_processed_fp16_eval.pkl ./ -P
```

**CM Command**

Validation Dataset:
```
cm run script --tags=get,dataset,mlperf,inference,llama3,_validation --outdirname=<path to download> -j
```

Calibration Dataset:
```
cm run script --tags=get,dataset,mlperf,inference,llama3,_calibration --outdirname=<path to download> -j
```


## Run Performance Benchmarks

### Offline
Expand Down Expand Up @@ -169,7 +196,6 @@ python -u main.py --scenario Server \

The ServerSUT was not tested for GPU runs.


## Run Accuracy Benchmarks

### Offline
Expand Down Expand Up @@ -201,7 +227,6 @@ fi
For the GPU run - The above steps have been automated in `run_accuracy.sh`. You can also modify this script to use
`--device cpu` to adapt it to a CPU-only run.


### Server
```
OUTPUT_LOG_DIR=server-accuracy-logs
Expand All @@ -218,7 +243,6 @@ python -u main.py --scenario Server \
--tensor-parallel-size ${GPU_COUNT} \
--vllm


ACCURACY_LOG_FILE=${OUTPUT_LOG_DIR}/mlperf_log_accuracy.json
if [ -e ${ACCURACY_LOG_FILE} ]; then
python evaluate-accuracy.py --checkpoint-path ${CHECKPOINT_PATH} \
Expand All @@ -228,6 +252,13 @@ fi

The ServerSUT was not tested for GPU runs.

### Evaluate the accuracy

```
cm run script --tags=process,mlperf,accuracy,_dataset_llama3 --result_dir=<Path to directory where files are generated after the benchmark run>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be good to mention mlperf log files

```

Please click [here](https://github.com/anandhu-eng/inference/blob/patch-14/language/llama3.1-405b/evaluate-accuracy.py) to view the Python script for evaluating accuracy for the Llama3 dataset.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main repo link?


## Accuracy Target
Running the GPU implementation in FP16 precision resulted in the following FP16 accuracy targets:
Expand Down
Loading