Skip to content

Conversation

@sanggusti
Copy link

@sanggusti sanggusti commented Dec 23, 2025

What does this PR do?

Added feature requested on enabling ability to control workers_per_device per API instance #572 .

Interface change: workers_per_device now accepts either:

  • int (previous behavior, applied to all APIs), or
  • list[int] (per-API in connector order), or
  • dict[str, int] mapping api_path -> workers_per_device (per-route)

Example:

server = LitServer(
    [sentiment_api, generate_api],
    accelerator="cuda",
    devices=[0, 1],
    workers_per_device={"/sentiment": 2, "/generate": 3},
)

This starts 2 * len(devices)=4 inference workers for /sentiment and 3 * len(devices)=6 for /generate.

This directly addresses #572 by allowing a single endpoint (e.g. /generate) to be backed by multiple worker processes without duplicating API instances into multiple routes.

Before submitting
  • Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

Tests are passing locally. I added coverage in tests/unit/test_lit_server.py and verified

The PR adds unit coverage in tests/unit/test_lit_server.py:

test_workers_per_device_can_be_configured_per_route validates:

  • dict config and list config both result in expected total worker counts per api_path
  • example expectation in the test:
    • {"/sentiment": 2, "/generate": 3} with devices=[0,1]
    • expects totals {"/sentiment": 4, "/generate": 6}

test_workers_per_device_per_route_raises_on_unknown_route validates:

  • mapping keys must correspond to a known api_path, otherwise raises ValueError

On my device (4 x T4) the tests results in:

⚡ feat/workers-per-api-instance ~/LitServe pytest -q tests/unit/test_lit_server.py -k workers_per_device_can_be_configured_per_route
..                                                                                                                    [100%]
2 passed, 49 deselected, 7 warnings in 2.10s

Repro (4× T4):

pytest -q tests/unit/test_lit_server.py -k workers_per_device_can_be_configured_per_route

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.

Did you have fun?

Definitely!

- `_resolve_workers_per_device_config` make `resolve_workers_per_device` into dict[api_path, workers_per_device_int]
- `_inference_workers_config_for_api` to instantiate workers per device
@sanggusti
Copy link
Author

The CI and e2e tests stuck at test_worker_restart_and_server_shutdown and went timeout. Any suggestion?

sanggusti and others added 3 commits December 23, 2025 14:43
@sanggusti
Copy link
Author

Precommit and tests successful. I hope this PR could get reviewed

@codecov
Copy link

codecov bot commented Dec 23, 2025

Codecov Report

❌ Patch coverage is 85.71429% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 85%. Comparing base (d244f2c) to head (fe0b2e6).

Additional details and impacted files
@@         Coverage Diff         @@
##           main   #646   +/-   ##
===================================
- Coverage    85%    85%   -0%     
===================================
  Files        39     39           
  Lines      3212   3261   +49     
===================================
+ Hits       2721   2762   +41     
- Misses      491    499    +8     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sanggusti
Copy link
Author

Hmmm, failed on macos. Any suggestion?

@sanggusti
Copy link
Author

request for review. Need approval for pending checks, cc @bhimrazy @aniketmaurya

@sanggusti
Copy link
Author

I've see that all tests (automatic and dispatch) and codecov is passed. Looking to see review about this PR

@sanggusti
Copy link
Author

Weird, it passed before in 3.12. Can elaborate @bhimrazy?

@bhimrazy
Copy link
Collaborator

bhimrazy commented Jan 6, 2026

Weird, it passed before in 3.12. Can elaborate @bhimrazy?

yeah, it looks like some flaky tests are reappearing after the merge of #644. I’ll investigate this shortly. It’s failing on the main branch as well.

@sanggusti
Copy link
Author

@bhimrazy looks like in main it is passed now, but somehow only fail in ubuntu 3.10

@bhimrazy
Copy link
Collaborator

bhimrazy commented Jan 8, 2026

@bhimrazy looks like in main it is passed now, but somehow only fail in ubuntu 3.10

yeah, it could be another flaky test 😅

@sanggusti
Copy link
Author

I actually have no clue here, should I make some change or else? @bhimrazy

@bhimrazy
Copy link
Collaborator

bhimrazy commented Jan 8, 2026

I actually have no clue here, should I make some change or else? @bhimrazy

Maybe try triggering the CI again with an empty commit.

btw, we’re still waiting on a review/decision from @andyland @aniketmaurya.

@sanggusti
Copy link
Author

Alright, I've added some minimal comment commit to trigger the test. Can you retrigger the pending checks? @bhimrazy

@sanggusti
Copy link
Author

Seems like it passed all the checks required, looking for review of the code owner. cc @bhimrazy @aniketmaurya

Copy link
Collaborator

@andyland andyland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description is inadequate, could you at least detail how the interface changes w/ some example usage of the new functionality

@sanggusti sanggusti requested a review from andyland January 21, 2026 03:54
@sanggusti
Copy link
Author

Hi @andyland I've updated the details in the PR message. Another examples of this PR coverage is as follows:

  1. Backward Compatible
server = ls.LitServer(
    [sentiment_api, generate_api],
    accelerator="cuda",
    devices=[0, 1],
    workers_per_device=2,   # same for all routes
)
  1. Per-route using Dict (matches the How can I initialize multiple instances of different model classes on different GPUs? #572 request most closely)
server = ls.LitServer(
    [sentiment_api, generate_api],
    accelerator="cuda",
    devices=[0, 1],
    workers_per_device={
        "/sentiment": 2,  # 2 workers per GPU for sentiment
        "/generate": 3,   # 3 workers per GPU for generation
    },
)

What this means in worker counts:

  • /sentiment: len(devices)=2 → 2 * 2 = 4 workers total
  • /generate: 2 * 3 = 6 workers total
  1. Per API position (orders matter)
server = ls.LitServer(
    [sentiment_api, generate_api],
    accelerator="cuda",
    devices=[0, 1],
    workers_per_device=[2, 3],  # sentiment then generate (same order as API list)
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants