@@ -432,6 +432,57 @@ curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \
432432 -H "Content-Type: application/json"
433433```
434434
435+
436+ ### Profile Microservices
437+
438+ To further analyze MicroService Performance, users could follow the instructions to profile MicroServices.
439+
440+ #### 1. vLLM backend Service
441+ Users could follow previous section to testing vLLM microservice or ChatQnA MegaService.
442+ By default, vLLM profiling is not enabled. Users could start and stop profiling by following commands.
443+
444+ ##### Start vLLM profiling
445+
446+ ```bash
447+ curl http://${host_ip}:9009/start_profile \
448+ -H "Content-Type: application/json" \
449+ -d ' {" model" : " Intel/neural-chat-7b-v3-3" }'
450+ ```
451+ Users would see below docker logs from vllm-service if profiling is started correctly.
452+ ```bash
453+ INFO api_server.py:361] Starting profiler...
454+ INFO api_server.py:363] Profiler started.
455+ INFO: x.x.x.x:35940 - "POST /start_profile HTTP/1.1" 200 OK
456+ ```
457+ After vLLM profiling is started, users could start asking questions and get responses from vLLM MicroService
458+ or ChatQnA MicroService.
459+
460+ ##### Stop vLLM profiling
461+ By following command, users could stop vLLM profliing and generate a *.pt.trace.json.gz file as profiling result
462+ under /mnt folder in vllm-service docker instance.
463+ ```bash
464+ # vLLM Service
465+ curl http://${host_ip}:9009/stop_profile \
466+ -H "Content-Type: application/json" \
467+ -d ' {" model" : " Intel/neural-chat-7b-v3-3" }'
468+ ```
469+ Users would see below docker logs from vllm-service if profiling is stopped correctly.
470+ ```bash
471+ INFO api_server.py:368] Stopping profiler...
472+ INFO api_server.py:370] Profiler stopped.
473+ INFO: x.x.x.x:41614 - "POST /stop_profile HTTP/1.1" 200 OK
474+ ```
475+ After vllm profiling is stopped, users could use below command to get the *.pt.trace.json.gz file under /mnt folder.
476+ ```bash
477+ docker cp vllm-service:/mnt/ .
478+ ```
479+
480+ ##### Check profiling result
481+ Open a web browser and type "chrome://tracing" or "ui.perfetto.dev", and then load the json.gz file, you should be able
482+ to see the vLLM profiling result as below diagram.
483+ 
484+
485+
435486## 🚀 Launch the UI
436487
437488### Launch with origin port
0 commit comments