Skip to content

Commit

Permalink
Added explicit greedy and updated readme
Browse files Browse the repository at this point in the history
  • Loading branch information
Maxusmusti committed Jan 22, 2024
1 parent 40c851b commit 86e3b72
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 1 deletion.
2 changes: 2 additions & 0 deletions language/llama2-70b/SUT.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,7 @@ def query_api(self, input):
'parameters': {
'max_new_tokens': 1024,
'min_new_tokens': 1,
'decoding_method': "GREEDY"
},
}

Expand Down Expand Up @@ -390,6 +391,7 @@ def stream_api(self, input, response_ids):
'parameters': {
'max_new_tokens': 1024,
'min_new_tokens': 1,
'decoding_method': "GREEDY"
},
}

Expand Down
8 changes: 8 additions & 0 deletions language/llama2-70b/api-endpoint-artifacts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ Prerequisites:
- Apply `secret.yaml`, `sa.yaml`, `serving-runtime.yaml`, then finally `model.yaml`
- Create a benchmark pod using `benchmark.yaml`

In the pod, before any benchmark, first run `cd inference/language/llama2-70b`

For the full accuracy benchmark (offline), run in the pod:
```
Expand All @@ -21,4 +22,11 @@ python3 -u main.py --scenario Offline --model-path ${CHECKPOINT_PATH} --api-serv
(It is the same, just with `--accuracy` removed)


For the performance benchmark (server), run in the pod:
```
python3 -u main.py --scenario Server --model-path ${CHECKPOINT_PATH} --api-server <INSERT SERVER STREAM API CALL ENDPOINT> --api-model-name Llama-2-70b-chat-hf-caikit --mlperf-conf mlperf.conf --user-conf user.conf --total-sample-count 24576 --dataset-path ${DATASET_PATH} --output-log-dir server-logs --dtype float32 --device cpu 2>&1 | tee server_performance_log.log
```
(Configure target qps in `user.conf`)


NOTE: Hyperparams are currently configured for 8xH100
2 changes: 1 addition & 1 deletion language/llama2-70b/api-endpoint-artifacts/benchmark.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ spec:
restartPolicy: Never
containers:
- name: mlperf-env
image: quay.io/meyceoz/mlperf-inference:v3-base
image: quay.io/meyceoz/mlperf-inference:v3-greedy
resources:
requests:
memory: 20000Mi
Expand Down

0 comments on commit 86e3b72

Please sign in to comment.