- 
                Notifications
    You must be signed in to change notification settings 
- Fork 14
generate lws based yaml #219
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Michael Kalantar <[email protected]>
| This PR provides capability to use LeaderWorkerSet as an alternative to a Deployment for the P/D pods. Supports simple expression of tensor and data parallelism. Currently supports on data local parallelism of 1. The base lws configuration comes from https://github.com/tlrmchlsmth/vllm-dp-lws/blob/main/lws.yaml. It was slightly modified to (a) create explicit  | 
| 
 go run main.go \
--epp-cluster-role pod-read generate \
-m samples/deepseek/deepseek-1t1d.yaml \
-b samples/deepseek/lws-base.yaml \
| sed 's/^[a-zA-Z]*:/  ---/' \
| sed 's/^  //' \
> samples/deepseek/deepseek-1t1d-manifest.yaml | 
Signed-off-by: Michael Kalantar <[email protected]>
| Serving inference requests in llm-d where each P/D node is deployed over multiple pods. A project that shows how to host a model with multiple pods per P/D node is --> https://github.com/tlrmchlsmth/vllm-dp-lws/tree/main To do this with llm-d we show the steps to deploy the llm-d inference scheduler. We give instructions using kgateway. 
 
 
 
 | 
Replace generation of P/D with LeaderWorkerSets instead of deployments.
Includes sample msvc and baseconfig files.