Skip to content

Commit a7de400

Browse files
committed
update README about using vLLM to deploy and run local models
1 parent 36413f5 commit a7de400

File tree

1 file changed

+69
-2
lines changed

1 file changed

+69
-2
lines changed

README.md

+69-2
Original file line numberDiff line numberDiff line change
@@ -117,9 +117,76 @@ export EGO_NUM=2
117117
```
118118

119119
### Using Local Deployed Models
120-
We use VLLM to deploy our models.
120+
We use vLLM to deploy and run local models. Here are the detailed deployment steps:
121121

122-
(TBW)
122+
#### step1: Environment Setup
123+
First, create and configure the vLLM environment:
124+
```bash
125+
conda create -n vllm python=3.12 -y
126+
conda activate vllm
127+
git clone https://github.com/vllm-project/vllm.git
128+
cd vllm
129+
VLLM_USE_PRECOMPILED=1 pip install --editable .
130+
```
131+
132+
#### step2: Download Models
133+
Create a model storage directory and download the required models:
134+
```bash
135+
mkdir vlm_models
136+
huggingface-cli download Qwen/Qwen2.5-VL-3B-Instruct-AWQ --local-dir vlm_models/Qwen/Qwen2.5-VL-3B-Instruct-AWQ
137+
```
138+
139+
#### step3: Start Services
140+
You can choose to start services with different model sizes. Here are two examples:
141+
142+
Start 3B model:
143+
```bash
144+
CUDA_VISIBLE_DEVICES=2 python -m vllm.entrypoints.openai.api_server \
145+
--model Qwen/Qwen2.5-VL-3B-Instruct-AWQ \
146+
--download-dir /other/vlm_models \
147+
--host 0.0.0.0 \
148+
--port 8000 \
149+
--dtype float16 \
150+
--gpu-memory-utilization 0.7 \
151+
--max-model-len 8192 \
152+
--trust-remote-code
153+
```
154+
155+
Start 7B model:
156+
```bash
157+
CUDA_VISIBLE_DEVICES=3 python -m vllm.entrypoints.openai.api_server \
158+
--model Qwen/Qwen2.5-VL-7B-Instruct-AWQ \
159+
--download-dir /other/vlm_models \
160+
--host 0.0.0.0 \
161+
--port 8001 \
162+
--dtype float16 \
163+
--gpu-memory-utilization 0.7 \
164+
--max-model-len 8192 \
165+
--trust-remote-code
166+
```
167+
168+
#### step4: Configuration File Modification
169+
When using locally deployed models, modify the configuration file `LangCoop/vlmdrive/vlm/hypes_yaml/api_vlm_drive_speed_curvature_qwen2.5-3b-awq.yaml`:
170+
171+
```yaml
172+
api_model_name: Qwen/Qwen2.5-VL-3B-Instruct-AWQ
173+
api_base_url: http://localhost:8000/v1
174+
api_key: dummy_key
175+
```
176+
177+
#### step5: Test Service
178+
You can test if the service is running properly using the following command:
179+
```bash
180+
curl http://localhost:8000/v1/chat/completions \
181+
-H "Content-Type: application/json" \
182+
-d '{
183+
"model": "Qwen/Qwen2.5-VL-3B-Instruct-AWQ",
184+
"messages": [
185+
{"role": "user", "content": "Hello, how are you?"}
186+
],
187+
"max_tokens": 100
188+
}'
189+
```
123190

124191
### Controller Config
125192
We support three controllers:

0 commit comments

Comments
 (0)