Feat/workers per api instance #646

sanggusti · 2025-12-23T04:16:20Z

What does this PR do?

Added feature requested on enabling ability to control workers_per_device per API instance #572 .

Interface change: workers_per_device now accepts either:

int (previous behavior, applied to all APIs), or
list[int] (per-API in connector order), or
dict[str, int] mapping api_path -> workers_per_device (per-route)

Example:

server = LitServer(
    [sentiment_api, generate_api],
    accelerator="cuda",
    devices=[0, 1],
    workers_per_device={"/sentiment": 2, "/generate": 3},
)

This starts 2 * len(devices)=4 inference workers for /sentiment and 3 * len(devices)=6 for /generate.

This directly addresses #572 by allowing a single endpoint (e.g. /generate) to be backed by multiple worker processes without duplicating API instances into multiple routes.

Before submitting

Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?

Tests are passing locally. I added coverage in tests/unit/test_lit_server.py and verified

The PR adds unit coverage in tests/unit/test_lit_server.py:

test_workers_per_device_can_be_configured_per_route validates:

dict config and list config both result in expected total worker counts per api_path
example expectation in the test:
- {"/sentiment": 2, "/generate": 3} with devices=[0,1]
- expects totals {"/sentiment": 4, "/generate": 6}

test_workers_per_device_per_route_raises_on_unknown_route validates:

mapping keys must correspond to a known api_path, otherwise raises ValueError

On my device (4 x T4) the tests results in:

⚡ feat/workers-per-api-instance ~/LitServe pytest -q tests/unit/test_lit_server.py -k workers_per_device_can_be_configured_per_route
..                                                                                                                    [100%]
2 passed, 49 deselected, 7 warnings in 2.10s

Repro (4× T4):

pytest -q tests/unit/test_lit_server.py -k workers_per_device_can_be_configured_per_route

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.

Did you have fun?

Definitely!

- `_resolve_workers_per_device_config` make `resolve_workers_per_device` into dict[api_path, workers_per_device_int] - `_inference_workers_config_for_api` to instantiate workers per device

for more information, see https://pre-commit.ci

sanggusti · 2025-12-23T14:19:05Z

The CI and e2e tests stuck at test_worker_restart_and_server_shutdown and went timeout. Any suggestion?

…id` by iterating through APIs since the number of workers can now vary per API

for more information, see https://pre-commit.ci

sanggusti · 2025-12-23T15:58:29Z

Precommit and tests successful. I hope this PR could get reviewed

codecov · 2025-12-23T19:02:31Z

Codecov Report

❌ Patch coverage is 85.71429% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 85%. Comparing base (d244f2c) to head (fe0b2e6).

Additional details and impacted files

@@         Coverage Diff         @@
##           main   #646   +/-   ##
===================================
- Coverage    85%    85%   -0%     
===================================
  Files        39     39           
  Lines      3212   3261   +49     
===================================
+ Hits       2721   2762   +41     
- Misses      491    499    +8

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sanggusti · 2025-12-24T00:53:25Z

Hmmm, failed on macos. Any suggestion?

tests/unit/test_lit_server.py

for more information, see https://pre-commit.ci

sanggusti · 2025-12-29T09:43:06Z

request for review. Need approval for pending checks, cc @bhimrazy @aniketmaurya

sanggusti · 2026-01-06T06:56:27Z

I've see that all tests (automatic and dispatch) and codecov is passed. Looking to see review about this PR

sanggusti · 2026-01-06T15:10:23Z

Weird, it passed before in 3.12. Can elaborate @bhimrazy?

bhimrazy · 2026-01-06T16:06:31Z

Weird, it passed before in 3.12. Can elaborate @bhimrazy?

yeah, it looks like some flaky tests are reappearing after the merge of #644. I’ll investigate this shortly. It’s failing on the main branch as well.

sanggusti · 2026-01-07T11:29:11Z

@bhimrazy looks like in main it is passed now, but somehow only fail in ubuntu 3.10

bhimrazy · 2026-01-08T07:20:51Z

@bhimrazy looks like in main it is passed now, but somehow only fail in ubuntu 3.10

yeah, it could be another flaky test 😅

sanggusti · 2026-01-08T07:23:05Z

I actually have no clue here, should I make some change or else? @bhimrazy

bhimrazy · 2026-01-08T07:27:23Z

I actually have no clue here, should I make some change or else? @bhimrazy

Maybe try triggering the CI again with an empty commit.

btw, we’re still waiting on a review/decision from @andyland @aniketmaurya.

sanggusti · 2026-01-09T04:35:29Z

Alright, I've added some minimal comment commit to trigger the test. Can you retrigger the pending checks? @bhimrazy

sanggusti · 2026-01-13T02:35:00Z

Seems like it passed all the checks required, looking for review of the code owner. cc @bhimrazy @aniketmaurya

andyland

PR description is inadequate, could you at least detail how the interface changes w/ some example usage of the new functionality

sanggusti · 2026-01-21T03:59:17Z

Hi @andyland I've updated the details in the PR message. Another examples of this PR coverage is as follows:

Backward Compatible

server = ls.LitServer(
    [sentiment_api, generate_api],
    accelerator="cuda",
    devices=[0, 1],
    workers_per_device=2,   # same for all routes
)

Per-route using Dict (matches the How can I initialize multiple instances of different model classes on different GPUs? #572 request most closely)

server = ls.LitServer(
    [sentiment_api, generate_api],
    accelerator="cuda",
    devices=[0, 1],
    workers_per_device={
        "/sentiment": 2,  # 2 workers per GPU for sentiment
        "/generate": 3,   # 3 workers per GPU for generation
    },
)

What this means in worker counts:

/sentiment: len(devices)=2 → 2 * 2 = 4 workers total
/generate: 2 * 3 = 6 workers total

Per API position (orders matter)

server = ls.LitServer(
    [sentiment_api, generate_api],
    accelerator="cuda",
    devices=[0, 1],
    workers_per_device=[2, 3],  # sentiment then generate (same order as API list)
)

sanggusti added 3 commits December 9, 2025 02:44

Feat: Add test on MultiRouteAPI with configurations of workers

57af981

Merge branch 'Lightning-AI:main' into feat/workers-per-api-instance

b44984f

Feat: fix tests and add test resolve

fa34ad1

- `_resolve_workers_per_device_config` make `resolve_workers_per_device` into dict[api_path, workers_per_device_int] - `_inference_workers_config_for_api` to instantiate workers per device

sanggusti requested review from Abdul-0x4A, KaelanDt, andyland, aniketmaurya, dmitsf, ethanwharris, justusschock, k223kim, lantiga and tchaton as code owners December 23, 2025 04:16

pre-commit-ci bot and others added 3 commits December 23, 2025 04:16

[pre-commit.ci] auto fixes from pre-commit.com hooks

26976a4

for more information, see https://pre-commit.ci

Fix formatting on FakeCtx

4638826

Fix E501 Line too long check from precommit

4512d3b

sanggusti and others added 3 commits December 23, 2025 14:43

Fix monitor function to correctly calculate lit_api_id and `worker_…

bee1625

…id` by iterating through APIs since the number of workers can now vary per API

[pre-commit.ci] auto fixes from pre-commit.com hooks

5a6e28e

for more information, see https://pre-commit.ci

Precommit issue

a3adc94

aniketmaurya reviewed Dec 25, 2025

View reviewed changes

tests/unit/test_lit_server.py Show resolved Hide resolved

sanggusti and others added 2 commits December 26, 2025 09:56

Fix: add uvicorn mock to resolve systemexit

0222b79

[pre-commit.ci] auto fixes from pre-commit.com hooks

0e00b5d

for more information, see https://pre-commit.ci

sanggusti requested a review from aniketmaurya December 29, 2025 09:43

Merge branch 'main' into feat/workers-per-api-instance

cbc8ddd

Merge branch 'main' into feat/workers-per-api-instance

27dfbf2

doc: comment commit for test

fe0b2e6

andyland requested changes Jan 20, 2026

View reviewed changes

sanggusti requested a review from andyland January 21, 2026 03:54

Feat/workers per api instance #646

Are you sure you want to change the base?

Feat/workers per api instance #646

Uh oh!

Conversation

sanggusti commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

PR review

Did you have fun?

Uh oh!

sanggusti commented Dec 23, 2025

Uh oh!

sanggusti commented Dec 23, 2025

Uh oh!

codecov bot commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sanggusti commented Dec 24, 2025

Uh oh!

Uh oh!

sanggusti commented Dec 29, 2025

Uh oh!

sanggusti commented Jan 6, 2026

Uh oh!

sanggusti commented Jan 6, 2026

Uh oh!

bhimrazy commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sanggusti commented Jan 7, 2026

Uh oh!

bhimrazy commented Jan 8, 2026

Uh oh!

sanggusti commented Jan 8, 2026

Uh oh!

bhimrazy commented Jan 8, 2026

Uh oh!

sanggusti commented Jan 9, 2026

Uh oh!

sanggusti commented Jan 13, 2026

Uh oh!

andyland left a comment

Choose a reason for hiding this comment

Uh oh!

sanggusti commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sanggusti commented Dec 23, 2025 •

edited

Loading

codecov bot commented Dec 23, 2025 •

edited

Loading

bhimrazy commented Jan 6, 2026 •

edited

Loading