-
Notifications
You must be signed in to change notification settings - Fork 18
Open
Description
Why is some occasion the tool seems to fail at the end of the entire test run and does not generate the json output file?
Here is an example of my launcher command:
docker run --rm -it --net host -v $(pwd):/opt/inference-benchmarker/results -e "HF_TOKEN=$HF_TOKEN" ghcr.io/huggingface/inference-benchmarker:latest inference-benchmarker --tokenizer-name "$MODEL" --url http://localhost:8080/ --profile chat
I run distributed vllm on a 4-node Ray cluster
VLLM_HOST_IP=${HEAD_NODE_IP} python3 -m vllm.entrypoints.openai.api_server \
--model ${MODEL_NAME} \
--host 0.0.0.0 \
--port ${VLLM_PORT} \
--tensor-parallel-size 4 \
--pipeline-parallel-size 8 \
--max-model-len 4096 \
--max-num-batched-tokens 8192 \
--max-num-seqs 256 \
--disable-custom-all-reduce \
--distributed-executor-backend ray"
Metadata
Metadata
Assignees
Labels
No labels