Skip to content

Commit

Permalink
Fully functional, updated README
Browse files Browse the repository at this point in the history
  • Loading branch information
Maxusmusti committed Jan 25, 2024
1 parent 079f7da commit 207a29b
Show file tree
Hide file tree
Showing 3 changed files with 31 additions and 3 deletions.
2 changes: 1 addition & 1 deletion language/llama2-70b/SUT.py
Original file line number Diff line number Diff line change
Expand Up @@ -376,7 +376,7 @@ def __del__(self):


class SUTServer(SUT):
def __init__(self, model_path=None, api_server=None, api_model_name=None, grpc=False, dtype="bfloat16", device="cpu", total_sample_count=24576, dataset_path=None, workers=1):
def __init__(self, model_path=None, api_server=None, api_model_name=None, grpc=False, batch_grpc=False, dtype="bfloat16", device="cpu", total_sample_count=24576, dataset_path=None, workers=1):

super().__init__(model_path=model_path, api_server=api_server, api_model_name=api_model_name, grpc=grpc, dtype=dtype, device=device, total_sample_count=total_sample_count, dataset_path=dataset_path, workers=workers)

Expand Down
30 changes: 29 additions & 1 deletion language/llama2-70b/api-endpoint-artifacts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,39 @@
Prerequisites:
- Install the OpenShift AI model serving stack
- Add your AWS credentials to `secret.yaml` access the model files
- Apply `secret.yaml`, `sa.yaml`, `serving-runtime.yaml`, then finally `model.yaml`
- Apply `secret.yaml`, `sa.yaml`
- FOR CAIKIT: Apply `serving-runtime.yaml`, then finally `model.yaml`
- FOR TGIS STANDALONE: Apply `serving-tgis.yaml`, then finally `model-tgis.yaml`
- Create a benchmark pod using `benchmark.yaml`

In the pod, before any benchmark, first run `cd inference/language/llama2-70b`

## STANDALONE TGIS INSTRUCTIONS
For the full accuracy benchmark (offline), run in the pod:
```
python3 -u main.py --scenario Offline --model-path ${CHECKPOINT_PATH} --api-server <INSERT API HOST> --api-model-name Llama-2-70b-chat-hf --mlperf-conf mlperf.conf --accuracy --grpc --batch-grpc --user-conf user.conf --total-sample-count 24576 --dataset-path ${DATASET_PATH} --output-log-dir offline-logs --dtype float32 --device cpu 2>&1 | tee offline_performance_log.log
```
You can then run the same evaluation/consolidation scripts as the regular benchmark
Example API host: `https://llama-2-70b-chat-isvc-predictor-llama-service.apps.h100serving.perf.lab.eng.bos.redhat.com`


For the performance benchmark (offline), run in the pod:
```
python3 -u main.py --scenario Offline --model-path ${CHECKPOINT_PATH} --api-server <INSERT API HOST> --api-model-name Llama-2-70b-chat-hf --mlperf-conf mlperf.conf --grpc --batch-grpc --user-conf user.conf --total-sample-count 24576 --dataset-path ${DATASET_PATH} --output-log-dir offline-logs --dtype float32 --device cpu 2>&1 | tee offline_performance_log.log
```
(It is the same, just with `--accuracy` removed)


For the performance benchmark (server), run in the pod:
```
python3 -u main.py --scenario Server --model-path ${CHECKPOINT_PATH} --api-server <INSERT API HOST> --api-model-name Llama-2-70b-chat-hf --mlperf-conf mlperf.conf --grpc --user-conf user.conf --total-sample-count 24576 --dataset-path ${DATASET_PATH} --output-log-dir server-logs --dtype float32 --device cpu 2>&1 | tee server_performance_log.log
```
(Configure target qps in `user.conf`)


NOTE: Hyperparams are currently configured for 8xH100

## CAIKIT INSTRUCTIONS
For the full accuracy benchmark (offline), run in the pod:
```
python3 -u main.py --scenario Offline --model-path ${CHECKPOINT_PATH} --api-server <INSERT SERVER API CALL ENDPOINT> --api-model-name Llama-2-70b-chat-hf-caikit --accuracy --mlperf-conf mlperf.conf --user-conf user.conf --total-sample-count 24576 --dataset-path ${DATASET_PATH} --output-log-dir offline-logs --dtype float32 --device cpu 2>&1 | tee offline_performance_log.log
Expand Down
2 changes: 1 addition & 1 deletion language/llama2-70b/api-endpoint-artifacts/benchmark.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ spec:
restartPolicy: Never
containers:
- name: mlperf-env
image: quay.io/meyceoz/mlperf-inference:v5
image: quay.io/meyceoz/mlperf-inference:v6
resources:
requests:
memory: 20000Mi
Expand Down

0 comments on commit 207a29b

Please sign in to comment.