You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: AudioQnA/docker_compose/intel/cpu/xeon/README.md
+86-25Lines changed: 86 additions & 25 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,6 +2,10 @@
2
2
3
3
This document outlines the deployment process for a AudioQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Xeon server.
4
4
5
+
The default pipeline deploys with vLLM as the LLM serving component. It also provides options of using TGI backend for LLM microservice, please refer to [Start the MegaService](#-start-the-megaservice) section in this page.
6
+
7
+
Note: The default LLM is `meta-llama/Meta-Llama-3-8B-Instruct`. Before deploying the application, please make sure either you've requested and been granted the access to it on [Huggingface](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) or you've downloaded the model locally from [ModelScope](https://www.modelscope.cn/models).
Intel Xeon optimized image hosted in huggingface repo will be used for TGI service: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu (https://github.com/huggingface/text-generation-inference)
In the first startup, this service will take more time to download, load and warm up the model. After it's finished, the service will be ready and the container (`vllm-service` or `tgi-service`) status shown via `docker ps` will be `healthy`. Before that, the status will be `health: starting`.
135
+
136
+
Or try the command below to check whether the LLM serving is ready.
137
+
138
+
```bash
139
+
# vLLM service
140
+
docker logs vllm-service 2>&1| grep complete
141
+
# If the service is ready, you will get the response like below.
142
+
INFO: Application startup complete.
143
+
```
144
+
145
+
```bash
146
+
# TGI service
147
+
docker logs tgi-service | grep Connected
148
+
# If the service is ready, you will get the response like below.
149
+
2024-09-03T02:47:53.402023Z INFO text_generation_router::server: router/src/server.rs:2311: Connected
150
+
```
151
+
152
+
Then try the `cURL` command below to validate services.
0 commit comments