Skip to content

Commit ef35e8c

Browse files
ilana-nIlana Nguyen
authored andcommitted
fix: fixed container stopping problem and added tags for vllm in gpu telemetry documentation
1 parent 1d031b8 commit ef35e8c

File tree

2 files changed

+22
-3
lines changed

2 files changed

+22
-3
lines changed

docs/tutorials/gpu-telemetry.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -141,6 +141,7 @@ This path works with **vLLM, SGLang, TRT-LLM, or any inference server**. We'll u
141141

142142
The setup includes three steps: creating a custom metrics configuration, starting the DCGM Exporter, and launching the vLLM server.
143143

144+
<!-- setup-vllm-gpu-telemetry-default-openai-endpoint-server -->
144145
```bash
145146
# Step 1: Create a custom metrics configuration
146147
cat > custom_gpu_metrics.csv << 'EOF'
@@ -204,6 +205,7 @@ docker run -d --name vllm-server \
204205
--host 0.0.0.0 \
205206
--port 8000
206207
```
208+
<!-- /setup-vllm-gpu-telemetry-default-openai-endpoint-server -->
207209

208210
> [!TIP]
209211
> You can customize the `custom_gpu_metrics.csv` file by commenting out metrics you don't need. Lines starting with `#` are ignored.
@@ -246,6 +248,7 @@ uv pip install ./aiperf
246248
247249
## Verify Everything is Running
248250

251+
<!-- health-check-vllm-gpu-telemetry-default-openai-endpoint-server -->
249252
```bash
250253
# Wait for vLLM inference server to be ready (up to 15 minutes)
251254
timeout 900 bash -c 'while [ "$(curl -s -o /dev/null -w "%{http_code}" localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d "{\"model\":\"Qwen/Qwen3-0.6B\",\"messages\":[{\"role\":\"user\",\"content\":\"test\"}],\"max_tokens\":1}")" != "200" ]; do sleep 2; done' || { echo "vLLM not ready after 15min"; exit 1; }
@@ -255,9 +258,11 @@ echo "vLLM ready, waiting for DCGM metrics to be available..."
255258
timeout 120 bash -c 'while true; do OUTPUT=$(curl -s localhost:9401/metrics); if echo "$OUTPUT" | grep -q "DCGM_FI_DEV_GPU_UTIL"; then break; fi; echo "Waiting for DCGM metrics..."; sleep 5; done' || { echo "GPU utilization metrics not found after 2min"; exit 1; }
256259
echo "DCGM GPU metrics are now available"
257260
```
261+
<!-- /health-check-vllm-gpu-telemetry-default-openai-endpoint-server -->
258262

259263
## Run AIPerf Benchmark
260264

265+
<!-- aiperf-run-vllm-gpu-telemetry-default-openai-endpoint-server -->
261266
```bash
262267
aiperf profile \
263268
--model Qwen/Qwen3-0.6B \
@@ -278,6 +283,7 @@ aiperf profile \
278283
--random-seed 100 \
279284
--gpu-telemetry
280285
```
286+
<!-- /aiperf-run-vllm-gpu-telemetry-default-openai-endpoint-server -->
281287

282288
## Multi-Node GPU Telemetry Example
283289

tests/ci/test_docs_end_to_end/test_runner.py

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -325,6 +325,7 @@ def _graceful_server_shutdown(self, server_name: str):
325325
timeout 30 bash -c '
326326
echo "Stopping Docker Compose services..."
327327
docker compose -f docker-compose.yml down 2>/dev/null || true
328+
sleep 3
328329
329330
echo "Stopping Dynamo containers..."
330331
# Stop containers by Dynamo image
@@ -344,12 +345,19 @@ def _graceful_server_shutdown(self, server_name: str):
344345
logger.info("Executing vLLM graceful shutdown...")
345346
shutdown_cmd = """
346347
timeout 30 bash -c '
348+
echo "Stopping DCGM exporter containers..."
349+
# Stop DCGM exporter containers explicitly since they are brought up separately
350+
docker stop dcgm-exporter 2>/dev/null || true
351+
docker rm dcgm-exporter 2>/dev/null || true
352+
docker ps --filter ancestor=*dcgm-exporter* --format "{{.ID}}" | xargs -r docker stop 2>/dev/null || true
353+
docker ps -aq --filter ancestor=*dcgm-exporter* | xargs -r docker rm 2>/dev/null || true
354+
347355
echo "Stopping vLLM containers..."
348-
# Stop containers by vLLM image
349-
docker ps --filter ancestor=*vllm* --format "{{.ID}}" | xargs -r docker stop 2>/dev/null || true
356+
# Stop containers with vllm in image name
357+
docker ps --format "{{.ID}} {{.Image}}" | grep vllm | awk "{print \$1}" | xargs -r docker stop 2>/dev/null || true
350358
351359
# Remove containers
352-
docker ps -aq --filter ancestor=*vllm* | xargs -r docker rm 2>/dev/null || true
360+
docker ps -aq --format "{{.ID}} {{.Image}}" | grep vllm | awk "{print \$1}" | xargs -r docker rm 2>/dev/null || true
353361
354362
echo "vLLM graceful shutdown completed"
355363
'
@@ -363,6 +371,11 @@ def _graceful_server_shutdown(self, server_name: str):
363371
echo "Stopping containers for {server_name}..."
364372
docker ps --filter name={server_name} --format "{{.ID}}" | xargs -r docker stop 2>/dev/null || true
365373
docker ps -aq --filter name={server_name} | xargs -r docker rm 2>/dev/null || true
374+
375+
echo "Stopping DCGM containers..."
376+
docker stop dcgm-exporter 2>/dev/null || true
377+
docker rm dcgm-exporter 2>/dev/null || true
378+
366379
echo "Generic server shutdown completed"
367380
'
368381
"""

0 commit comments

Comments
 (0)