Skip to content

Commit 617e119

Browse files
authored
Remove useless vllm ray (opea-project#859)
Signed-off-by: Xinyao Wang <[email protected]>
1 parent 3401db2 commit 617e119

File tree

18 files changed

+10
-915
lines changed

18 files changed

+10
-915
lines changed

.github/workflows/docker/compose/llms-compose.yaml

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -24,11 +24,3 @@ services:
2424
build:
2525
dockerfile: comps/llms/text-generation/vllm/langchain/Dockerfile
2626
image: ${REGISTRY:-opea}/llm-vllm:${TAG:-latest}
27-
llm-vllm-ray:
28-
build:
29-
dockerfile: comps/llms/text-generation/vllm/ray/Dockerfile
30-
image: ${REGISTRY:-opea}/llm-vllm-ray:${TAG:-latest}
31-
llm-vllm-ray-hpu:
32-
build:
33-
dockerfile: comps/llms/text-generation/vllm/ray/dependency/Dockerfile
34-
image: ${REGISTRY:-opea}/llm-vllm-ray-hpu:${TAG:-latest}

comps/llms/text-generation/README.md

Lines changed: 10 additions & 128 deletions
Original file line numberDiff line numberDiff line change
@@ -2,20 +2,20 @@
22

33
This microservice, designed for Language Model Inference (LLM), processes input consisting of a query string and associated reranked documents. It constructs a prompt based on the query and documents, which is then used to perform inference with a large language model. The service delivers the inference results as output.
44

5-
A prerequisite for using this microservice is that users must have a LLM text generation service (etc., TGI, vLLM and Ray) already running. Users need to set the LLM service's endpoint into an environment variable. The microservice utilizes this endpoint to create an LLM object, enabling it to communicate with the LLM service for executing language model operations.
5+
A prerequisite for using this microservice is that users must have a LLM text generation service (etc., TGI, vLLM) already running. Users need to set the LLM service's endpoint into an environment variable. The microservice utilizes this endpoint to create an LLM object, enabling it to communicate with the LLM service for executing language model operations.
66

7-
Overall, this microservice offers a streamlined way to integrate large language model inference into applications, requiring minimal setup from the user beyond initiating a TGI/vLLM/Ray service and configuring the necessary environment variables. This allows for the seamless processing of queries and documents to generate intelligent, context-aware responses.
7+
Overall, this microservice offers a streamlined way to integrate large language model inference into applications, requiring minimal setup from the user beyond initiating a TGI/vLLM service and configuring the necessary environment variables. This allows for the seamless processing of queries and documents to generate intelligent, context-aware responses.
88

99
## Validated LLM Models
1010

11-
| Model | TGI-Gaudi | vLLM-CPU | vLLM-Gaudi | Ray |
12-
| --------------------------- | --------- | -------- | ---------- | --- |
13-
| [Intel/neural-chat-7b-v3-3] |||||
14-
| [Llama-2-7b-chat-hf] |||||
15-
| [Llama-2-70b-chat-hf] || - || x |
16-
| [Meta-Llama-3-8B-Instruct] |||||
17-
| [Meta-Llama-3-70B-Instruct] || - || x |
18-
| [Phi-3] | x | Limit 4K | Limit 4K ||
11+
| Model | TGI-Gaudi | vLLM-CPU | vLLM-Gaudi |
12+
| --------------------------- | --------- | -------- | ---------- |
13+
| [Intel/neural-chat-7b-v3-3] ||||
14+
| [Llama-2-7b-chat-hf] ||||
15+
| [Llama-2-70b-chat-hf] || - ||
16+
| [Meta-Llama-3-8B-Instruct] ||||
17+
| [Meta-Llama-3-70B-Instruct] || - ||
18+
| [Phi-3] | x | Limit 4K | Limit 4K |
1919

2020
## Clone OPEA GenAIComps
2121

@@ -121,53 +121,6 @@ export vLLM_ENDPOINT="http://${vLLM_HOST_IP}:8008"
121121
python llm.py
122122
```
123123

124-
#### 1.2.3 Start the Ray Service
125-
126-
Install the requirements for Ray Service
127-
128-
```bash
129-
cd ${OPEA_GENAICOMPS_ROOT}/comps/llms/text-generation/vllm/ray
130-
131-
pip install -r requirements.txt
132-
```
133-
134-
Execute the docker run command to initiate the backend, along with the Python script that launches the microservice.
135-
136-
```bash
137-
export vLLM_RAY_HOST_IP=$(hostname -I | awk '{print $1}') # This sets IP of the current machine
138-
export LLM_MODEL=${your_hf_llm_model}
139-
export DATA_DIR=$HOME/data # Location to download the model
140-
export HF_TOKEN=${your_hf_api_token}
141-
142-
# Build the image first as opea/vllm:cpu
143-
bash ${OPEA_GENAICOMPS_ROOT}/comps/llms/text-generation/vllm/ray/dependency/build_docker_vllmray.sh
144-
145-
# Initiate the backend
146-
docker run \
147-
--name="vllm-ray-service" \
148-
--runtime=habana \
149-
-v $DATA_DIR:/data \
150-
-e HABANA_VISIBLE_DEVICES=all \
151-
-e OMPI_MCA_btl_vader_single_copy_mechanism=none \
152-
--cap-add=sys_nice \
153-
--ipc=host \
154-
-p 8006:8000 \
155-
-e HF_TOKEN=$HF_TOKEN \
156-
opea/vllm_ray:habana \
157-
/bin/bash -c " \
158-
ray start --head && \
159-
python vllm_ray_openai.py \
160-
--port_number 8000 \
161-
--model_id_or_path $LLM_MODEL \
162-
--tensor_parallel_size 2 \
163-
--enforce_eager False"
164-
165-
# Start the microservice with an endpoint as the above docker run command
166-
export vLLM_RAY_ENDPOINT="http://${vLLM_RAY_HOST_IP}:8006"
167-
168-
python llm.py
169-
```
170-
171124
## 🚀2. Start Microservice with Docker (Option 2)
172125

173126
In order to start the microservices with docker, you need to build the docker images first for the microservice.
@@ -203,22 +156,6 @@ docker build \
203156
-f comps/llms/text-generation/vllm/langchain/Dockerfile .
204157
```
205158

206-
#### 2.1.3 Ray
207-
208-
```bash
209-
# Build the Ray Serve docker
210-
bash ${OPEA_GENAICOMPS_ROOT}/comps/llms/text-generation/vllm/ray/dependency/build_docker_vllmray.sh
211-
212-
# Build the microservice docker
213-
cd ${OPEA_GENAICOMPS_ROOT}
214-
215-
docker build \
216-
--build-arg https_proxy=$https_proxy \
217-
--build-arg http_proxy=$http_proxy \
218-
-t opea/llm-vllm-ray:latest \
219-
-f comps/llms/text-generation/vllm/ray/Dockerfile .
220-
```
221-
222159
### 2.2 Start LLM Service with the built image
223160

224161
To start a docker container, you have two options:
@@ -247,15 +184,6 @@ export vLLM_LLM_ENDPOINT="http://${your_ip}:8008"
247184
export LLM_MODEL=${your_hf_llm_model}
248185
```
249186

250-
In order to start Ray serve and LLM services, you need to setup the following environment variables first.
251-
252-
```bash
253-
export HF_TOKEN=${your_hf_api_token}
254-
export RAY_Serve_ENDPOINT="http://${your_ip}:8008"
255-
export LLM_MODEL=${your_hf_llm_model}
256-
export CHAT_PROCESSOR="ChatModelLlama"
257-
```
258-
259187
### 2.3 Run Docker with CLI (Option A)
260188

261189
#### 2.3.1 TGI
@@ -311,29 +239,6 @@ docker run \
311239
opea/llm-vllm:latest
312240
```
313241

314-
#### 2.3.3 Ray Serve
315-
316-
Start Ray Serve endpoint.
317-
318-
```bash
319-
bash ${OPEA_GENAICOMPS_ROOT}/comps/llms/text-generation/vllm/ray/dependency/launch_vllmray.sh
320-
```
321-
322-
Start Ray Serve microservice.
323-
324-
```bash
325-
docker run -d \
326-
--name="llm-ray-server" \
327-
-p 9000:9000 \
328-
--ipc=host \
329-
-e http_proxy=$http_proxy \
330-
-e https_proxy=$https_proxy \
331-
-e RAY_Serve_ENDPOINT=$RAY_Serve_ENDPOINT \
332-
-e HF_TOKEN=$HF_TOKEN \
333-
-e LLM_MODEL=$LLM_MODEL \
334-
opea/llm-ray:latest
335-
```
336-
337242
### 2.4 Run Docker with Docker Compose (Option B)
338243

339244
#### 2.4.1 TGI
@@ -350,13 +255,6 @@ cd ${OPEA_GENAICOMPS_ROOT}/comps/llms/text-generation/vllm/langchain
350255
docker compose -f docker_compose_llm.yaml up -d
351256
```
352257

353-
#### 2.4.3 Ray Serve
354-
355-
```bash
356-
cd ${OPEA_GENAICOMPS_ROOT}/comps/llms/text-generation/vllm/ray
357-
docker compose -f docker_compose_llm.yaml up -d
358-
```
359-
360258
## 🚀3. Consume LLM Service
361259

362260
### 3.1 Check Service Status
@@ -391,22 +289,6 @@ curl http://${your_ip}:8008/v1/completions \
391289
}'
392290
```
393291

394-
#### 3.2.3 Verify the Ray Service
395-
396-
```bash
397-
curl http://${your_ip}:8008/v1/chat/completions \
398-
-H "Content-Type: application/json" \
399-
-d '{
400-
"model": ${your_hf_llm_model},
401-
"messages": [
402-
{"role": "assistant", "content": "You are a helpful assistant."},
403-
{"role": "user", "content": "What is Deep Learning?"}
404-
],
405-
"max_tokens": 32,
406-
"stream": true
407-
}'
408-
```
409-
410292
### 3.3 Consume LLM Service
411293

412294
You can set the following model parameters according to your actual needs, such as `max_tokens`, `streaming`.

comps/llms/text-generation/ray_serve/llm.py

Lines changed: 0 additions & 82 deletions
This file was deleted.

comps/llms/text-generation/ray_serve/requirements.txt

Lines changed: 0 additions & 14 deletions
This file was deleted.

comps/llms/text-generation/vllm/ray/Dockerfile

Lines changed: 0 additions & 25 deletions
This file was deleted.

0 commit comments

Comments
 (0)