Skip to content

Commit 9a9fc11

Browse files
authored
Suggest removing the docker, and instead installing the CLI. (#15)
* Suggest removing the docker, and instead installing the CLI.
1 parent d91a016 commit 9a9fc11

File tree

1 file changed

+16
-24
lines changed

1 file changed

+16
-24
lines changed

README.md

Lines changed: 16 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,17 @@ It can be used to benchmark any text generation server that exposes an OpenAI-co
4949

5050
## Get started
5151

52+
### Install
53+
54+
If you have [cargo](https://rustup.rs/) already installed:
55+
```bash
56+
cargo install --git https://github.com/huggingface/inference-benchmarker/
57+
```
58+
59+
Or download the [latest released binary](https://github.com/huggingface/inference-benchmarker/releases/latest)
60+
61+
Or you can run docker images.
62+
5263
### Run a benchmark
5364

5465
#### 1. Start an inference server
@@ -76,22 +87,12 @@ docker run --runtime nvidia --gpus all \
7687
--model $MODEL
7788
```
7889

79-
#### 2. Run a benchmark using Docker image
90+
91+
#### 2. Run a benchmark
8092

8193
```shell
82-
MODEL=meta-llama/Llama-3.1-8B-Instruct
83-
HF_TOKEN=<your HF READ token>
84-
# run a benchmark to evaluate the performance of the model for chat use case
85-
# we mount results to the current directory
86-
$ docker run \
87-
--rm \
88-
-it \
89-
--net host \
90-
-v $(pwd):/opt/inference-benchmarker/results \
91-
-e "HF_TOKEN=$HF_TOKEN" \
92-
ghcr.io/huggingface/inference-benchmarker:latest \
93-
inference-benchmarker \
94-
--tokenizer-name "$MODEL" \
94+
inference-benchmarker
95+
--tokenizer-name "meta-llama/Llama-3.1-8B-Instruct" \
9596
--url http://localhost:8080 \
9697
--profile chat
9798
```
@@ -132,16 +133,7 @@ Available modes:
132133
Example running a benchmark at a fixed request rates:
133134

134135
```shell
135-
MODEL=meta-llama/Llama-3.1-8B-Instruct
136-
HF_TOKEN=<your HF READ token>
137-
$ docker run \
138-
--rm \
139-
-it \
140-
--net host \
141-
-v $(pwd):/opt/inference-benchmarker/results \
142-
-e "HF_TOKEN=$HF_TOKEN" \
143-
ghcr.io/huggingface/inference-benchmarker:latest \
144-
inference-benchmarker \
136+
inference-benchmarker \
145137
--tokenizer-name "meta-llama/Llama-3.1-8B-Instruct" \
146138
--max-vus 800 \
147139
--duration 120s \

0 commit comments

Comments
 (0)