Supporting Multi-LoRA inferencing via JetStream server #221

aman2930 · 2025-03-06T22:24:15Z

Supporting Multi-LoRA inferencing via JetStream server following LLM Inference gateway API protocols.

Implemented an adapter_tensorstore to load, store, manage and unload the adapter weights
Added and exposed required metrics at prometheus endpoint
Added multi_lora_decoding service with corresponding APIs as per the requirement.
Implemented single LoRA functionality support.

jetstream/core/lora/adapter_tensorstore.py

jetstream/core/lora/multi_lora_inference_api.py

jetstream/core/orchestrator.py

jetstream/tools/multi_adapter_service_client.py

mailvijayasingh

Looked at it at high level, left some comments. Will take a deeper look again.

vipannalla

Thanks for the PR, its a bit longish and I'd have preferred you to send the adapter_tensorstore.py and related code as a separate PR since its isolated enough along with the unittests before sending the the PR to integrate it into orchestrator.

I've some initial comments.

jetstream/core/proto/jetstream.proto

jetstream/core/proto/multi_lora_decoding.proto

jetstream/core/orchestrator.py

jetstream/tools/multi_lora_decode_requester.py

jetstream/core/lora/adapter_tensorstore.py

jetstream/core/metrics/prometheus.py

vipannalla

Looks good for initial version

jetstream/core/server_lib.py

Change to ubuntu-lastest since ubuntu-20.04 is deprecated Change to ubuntu-lastest since ubuntu-20.04 is deprecated Change to ubuntu-lastest since ubuntu-20.04 is deprecated Move prefix cache from MaxText (#239) Retry grpc async request (#240) The exception raised by asyncio task is not welled catch. If the server is not ready, it cause the benchmark serving blocked forever without noticed. Retry the connection to the server. Adding PyTests in JetStream unit test workflow for code coverage. (#242) Supporting Multi-LoRA inferencing via JetStream server (#221) Supporting Multi-LoRA inferencing via JetStream server following [LLM Inference gateway API protocols](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/docs/proposals/003-model-server-protocol#inference-api-protocol). - Implemented an adapter_tensorstore to load, store, manage and unload the adapter weights - Added and exposed [required metrics](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/docs/proposals/003-model-server-protocol#metrics-reporting) at prometheus endpoint - Added multi_lora_decoding service with corresponding APIs as per the [requirement](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/docs/proposals/003-model-server-protocol#inference-api-protocol). - Implemented single LoRA functionality support.

aman2930 requested a review from vipannalla as a code owner March 6, 2025 22:24

aman2930 requested review from yixinshi, vipannalla and gangji and removed request for vipannalla March 6, 2025 22:28

mailvijayasingh reviewed Mar 7, 2025

View reviewed changes

vipannalla reviewed Mar 10, 2025

View reviewed changes

Bslabe123 reviewed Mar 17, 2025

View reviewed changes

jetstream/core/metrics/prometheus.py Outdated Show resolved Hide resolved

vipannalla approved these changes Mar 18, 2025

View reviewed changes

jetstream/core/server_lib.py Outdated Show resolved Hide resolved

vipannalla approved these changes Apr 13, 2025

View reviewed changes

Squashed commit

4054dc4

aman2930 force-pushed the amangu-lora branch from fae388e to 4054dc4 Compare April 14, 2025 18:20

jyj0w0 approved these changes Apr 14, 2025

View reviewed changes

github-actions bot added the pull ready label Apr 14, 2025

jyj0w0 merged commit 082c0ac into main Apr 14, 2025
4 of 5 checks passed

jyj0w0 deleted the amangu-lora branch April 14, 2025 18:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supporting Multi-LoRA inferencing via JetStream server #221

Supporting Multi-LoRA inferencing via JetStream server #221

aman2930 commented Mar 6, 2025

mailvijayasingh left a comment

vipannalla left a comment

vipannalla left a comment

Supporting Multi-LoRA inferencing via JetStream server #221

Supporting Multi-LoRA inferencing via JetStream server #221

Conversation

aman2930 commented Mar 6, 2025

mailvijayasingh left a comment

Choose a reason for hiding this comment

vipannalla left a comment

Choose a reason for hiding this comment

vipannalla left a comment

Choose a reason for hiding this comment