Skip to content

Commit

Permalink
Add new runtime image with chat-template for vllm (#1798)
Browse files Browse the repository at this point in the history
Add new image

Signed-off-by: Tarun Kumar <[email protected]>
  • Loading branch information
tarukumar authored Sep 11, 2024
1 parent 91a5a74 commit 4930cd4
Show file tree
Hide file tree
Showing 3 changed files with 6 additions and 4 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ spec:
- '--model=/mnt/models'
- '--served-model-name={{.Name}}'
- '--distributed-executor-backend=mp'
image: quay.io/modh/vllm@sha256:a2593489ee20b8e5f01358a9aa984fc90618c6335f4c8e138e94ce635ffb112a
- '--chat-template=/app/data/template/template_chatml.jinja'
image: quay.io/modh/vllm@sha256:2e7f97b69d6e0aa7366ee6a841a7e709829136a143608bee859b1fe700c36d31
name: kserve-container
command:
- python3
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ spec:
- '--model=/mnt/models'
- '--served-model-name={{.Name}}'
- '--distributed-executor-backend=mp'
image: quay.io/modh/vllm@sha256:a2593489ee20b8e5f01358a9aa984fc90618c6335f4c8e138e94ce635ffb112a
- '--chat-template=/app/data/template/template_chatml.jinja'
image: quay.io/modh/vllm@sha256:2e7f97b69d6e0aa7366ee6a841a7e709829136a143608bee859b1fe700c36d31
name: kserve-container
command:
- python3
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -217,10 +217,10 @@ Verify User Can Serve And Query A elyza/elyza-japanese-llama-2-7b-instruct Model
... port_forwarding=${use_port_forwarding}
ELSE IF "${RUNTIME_NAME}" == "vllm-runtime" and "${KSERVE_MODE}" == "Serverless"
Query Model Multiple Times model_name=${model_name} runtime=${RUNTIME_NAME} protocol=http
... inference_type=chat-completions n_times=1 query_idx=9
... inference_type=completions n_times=1 query_idx=10
... namespace=${test_namespace} string_check_only=${TRUE}
Query Model Multiple Times model_name=${model_name} runtime=${RUNTIME_NAME} protocol=http
... inference_type=completions n_times=1 query_idx=10
... inference_type=chat-completions n_times=1 query_idx=9
... namespace=${test_namespace} string_check_only=${TRUE}
END
[Teardown] Run Keywords
Expand Down

0 comments on commit 4930cd4

Please sign in to comment.