Implement runai model streamer for MODEL_IMPL_TYPE=flax_nnx #955

amacaskill · 2025-10-27T23:10:53Z

Description

Start with a short description of what the PR does and how this is a change from
the past.

Recently we made changes to support GCS for the Run AI model streamer in vllm. The last step of that was to install the RunAI model streamer within the vllm image. This was done for GPU in PR 26464, but we forgot to add the installation of the runai-model-streamer module in TPU Dockerfile. By not installing run ai model streamer in TPU image, it requires customers to make this change locally, and build custom TPU vllm image in order to use the RunAI model streamer.

If the change fixes a bug or a Github issue, please include a link, e.g.,:
No bug / Issue has been created for this as RunAI model streamer support in GCS is still pending release for GPU.

Tests

Please describe how you tested this change, and include any instructions and/or
commands to reproduce.

Tested that building the image still succeeds, and that run AI model streamer can be used to load the model for a vllm inference server:

# Build docker file. You might not need DOCKER_BUILDKIT=1 based on your docker version. Tested with and without v7x: 
DOCKER_BUILDKIT=1 docker build -f docker/Dockerfile . -t vllm-tpu-runai --load
DOCKER_BUILDKIT=1 docker build --build-arg IS_FOR_V7X=true -f docker/Dockerfile . -t vllm-tpu-runai-2 --load
#Push image to artifact registry
export REGION_NAME=us-central1
export PROJECT_ID=my-project
gcloud artifacts repositories create vllm-tpu --repository-format=docker --location=$REGION_NAME && \
gcloud auth configure-docker $REGION_NAME-docker.pkg.dev && \
docker image tag vllm-tpu-runai $REGION_NAME-docker.pkg.dev/$PROJECT_ID/vllm-tpu/vllm-tpu-runai:latest && \
docker push $REGION_NAME-docker.pkg.dev/$PROJECT_ID/vllm-tpu/vllm-tpu-runai:latest

After this, I used this image for the vllm image to a qwen3 model from a GCS bucket with runai model streamer, deployed this on a GKE TPU cluster, and I confirmed the inference-server can start up successfully.

Also ensured my correctness test passed. Full test setup here

pytest -s -v tests/e2e/test_runai_model_streamer_loader.py
===================================================== 1 passed, 5 warnings in 125.72s (0:02:05) =====================================================

Checklist

Before submitting this PR, please make sure:
[x] I have performed a self-review of my code.
[x] I have necessary comments in my code, particularly in hard-to-understand areas.
[x] I have made or will make corresponding changes to any relevant documentation.

bvrockwell · 2025-11-03T19:16:12Z

Thanks for wanting to enable this feature!

We need tests (I assume across different TP values for different model sizes) that show this is both 1) correct/accurate 2) performant.

@py4 @vipannalla @manojkris @jcyang43 to comment if there's any guidance we can share.

amacaskill · 2025-11-12T17:51:00Z

We need tests (I assume across different TP values for different model sizes) that show this is both 1) correct/accurate 2) performant.

We don't need anything for different TP values yet. That will come once Distributed streaming is added in follow up PR. I have confirmed it works to load the model (by pushing image, and deploying inference server on GKE), still trying to run the tests I added.

.buildkite/features/runai_model_streamer_loader.yml

tests/e2e/test_runai_model_streamer_loader.py

tpu_inference/models/jax/utils/weight_utils.py

Signed-off-by: Alexis MacAskill <[email protected]> Signed-off-by: <[email protected]>

amacaskill force-pushed the install-runai-streamer branch 4 times, most recently from c04d3e2 to 9f75b5a Compare October 28, 2025 16:29

amacaskill force-pushed the install-runai-streamer branch from 9f75b5a to 68557aa Compare November 7, 2025 23:50

amacaskill changed the title ~~Install runai-model-streamer module in Dockerfile~~ Implement runai model streamer for MODEL_IMPL_TYPE=flax_nnx Nov 7, 2025

amacaskill marked this pull request as draft November 7, 2025 23:52

amacaskill force-pushed the install-runai-streamer branch from 68557aa to 9a32eeb Compare November 8, 2025 00:50

carlesoctav mentioned this pull request Nov 8, 2025

[Feature]: Stream from HF into TPU HBM #1049

Open

1 task

amacaskill force-pushed the install-runai-streamer branch 2 times, most recently from 0a7f8b2 to 2562489 Compare November 10, 2025 23:10

amacaskill marked this pull request as ready for review November 11, 2025 01:29

amacaskill force-pushed the install-runai-streamer branch from 2562489 to d79fd04 Compare November 12, 2025 16:21

amacaskill force-pushed the install-runai-streamer branch 5 times, most recently from 9a76395 to 25c8d92 Compare November 19, 2025 19:13

jrplatin self-requested a review November 19, 2025 19:30

jrplatin reviewed Nov 19, 2025

View reviewed changes

vipannalla requested review from bzgoogle and gpolovets1 November 20, 2025 01:31

amacaskill force-pushed the install-runai-streamer branch from 25c8d92 to 5e78a57 Compare November 20, 2025 23:36

amacaskill requested review from QiliangCui, jcyang43 and vipannalla as code owners November 20, 2025 23:36

amacaskill force-pushed the install-runai-streamer branch 2 times, most recently from c3800f1 to d269fdf Compare November 21, 2025 17:01

amacaskill force-pushed the install-runai-streamer branch 2 times, most recently from 8354386 to 97e28c8 Compare November 21, 2025 22:22

vipannalla assigned jrplatin Nov 23, 2025

amacaskill force-pushed the install-runai-streamer branch from 97e28c8 to 12700e7 Compare November 24, 2025 16:28

jrplatin approved these changes Nov 24, 2025

View reviewed changes

jcyang43 approved these changes Nov 24, 2025

View reviewed changes

implement runai model streamer for MODEL_IMPL_TYPE=flax_nnx

f81adb5

Signed-off-by: Alexis MacAskill <[email protected]> Signed-off-by: <[email protected]>

amacaskill force-pushed the install-runai-streamer branch from 12700e7 to f81adb5 Compare November 24, 2025 20:55

jrplatin merged commit a248918 into vllm-project:main Nov 24, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement runai model streamer for MODEL_IMPL_TYPE=flax_nnx #955

Implement runai model streamer for MODEL_IMPL_TYPE=flax_nnx #955

amacaskill commented Oct 27, 2025 •

edited

Loading

Uh oh!

bvrockwell commented Nov 3, 2025

Uh oh!

amacaskill commented Nov 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Implement runai model streamer for MODEL_IMPL_TYPE=flax_nnx #955

Implement runai model streamer for MODEL_IMPL_TYPE=flax_nnx #955

Conversation

amacaskill commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

bvrockwell commented Nov 3, 2025

Uh oh!

amacaskill commented Nov 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

amacaskill commented Oct 27, 2025 •

edited

Loading