[2/3] queue-based autoscaling - add default queue-based autoscaling policy #59548

harshit-anyscale · 2025-12-18T14:16:37Z

Summary

This PR adds a new combined_workload_autoscaling_policy function that enables Ray Serve deployments to scale based on combined workload from both message queue depth and HTTP requests. This is the second part of the queue-based autoscaling feature for TaskConsumer deployments.

Related PRs:

PR 1 (Prerequisite): #59430 - QueueMonitor Actor
PR 3 (Follow-up): Integration with TaskConsumer

Changes

New Components

Component	Location	Description
`combined_workload_autoscaling_policy()`	`autoscaling_policy.py`	Main policy function that scales based on combined workload (queue + HTTP requests)
`_apply_scaling_decision_smoothing()`	`autoscaling_policy.py`	Reusable helper for delay-based smoothing
`get_queue_monitor_actor_name()`	`queue_monitor.py`	Helper to generate consistent actor names
`DEFAULT_COMBINED_WORKLOAD_AUTOSCALING_POLICY`	`constants.py`	Import path constant for the policy

Files Modified

python/ray/serve/autoscaling_policy.py - Added new policy and helper functions
python/ray/serve/_private/constants.py - Added policy constant
python/ray/serve/_private/queue_monitor.py - Added get_queue_monitor_actor_name() helper

Files Added

python/ray/serve/tests/unit/test_queue_autoscaling_policy.py - 22 unit tests

How It Works

  ┌─────────────────────────────────────────────────────────────────┐
  │              combined_workload_autoscaling_policy               │
  ├─────────────────────────────────────────────────────────────────┤
  │                                                                 │
  │  1. Get QueueMonitor Actor                                      │
  │     ├── ray.get_actor("QUEUE_MONITOR::")                        │
  │     └── If unavailable, maintain current replicas               │
  │                                                                 │
  │  2. Query Queue Length                                          │
  │     └── ray.get(actor.get_queue_length.remote())                │
  │                                                                 │
  │  3. Calculate Total Workload                                    │
  │     └── total_workload = queue_length + total_num_requests      │
  │                                                                 │
  │  4. Calculate Desired Replicas                                  │
  │     ├── Formula: ceil(total_workload / target_ongoing_requests) │
  │     ├── Scale from zero: immediate scale up if workload > 0     │
  │     └── Clamp to [min_replicas, max_replicas]                   │
  │                                                                 │
  │  5. Apply Smoothing                                             │
  │     ├── Scale up: wait upscale_delay_s                          │
  │     ├── Scale down: wait downscale_delay_s                      │
  │     └── Scale to zero: wait downscale_to_zero_delay_s           │
  │                                                                 │
  └─────────────────────────────────────────────────────────────────┘

Scaling Formula

total_workload = queue_length + total_num_requests
desired_replicas = ceil(total_workload / target_ongoing_requests)

Example:

Queue has 100 pending tasks
HTTP requests: 50 ongoing requests
Total workload = 100 + 50 = 150
target_ongoing_requests = 10 (each replica handles 10 concurrent tasks)
Desired replicas = ceil(150 / 10) = 15 replicas

🤖 Generated with https://claude.com/claude-code

gemini-code-assist

Code Review

This pull request introduces a new queue-based autoscaling policy, which is a great addition for TaskConsumer deployments. The implementation is well-structured, with a dedicated QueueMonitor actor and comprehensive unit tests. I've identified a critical bug in the Redis connection handling and a high-severity logic issue in the scaling-to-zero implementation. Addressing these will ensure the new feature is robust and behaves as expected.

python/ray/serve/_private/queue_monitor.py

python/ray/serve/autoscaling_policy.py

github-actions · 2026-01-02T12:26:00Z

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

Signed-off-by: harshit <[email protected]>

abrarsheikh

i would base this PR on top of #58857

abrarsheikh · 2026-01-15T04:15:20Z

python/ray/serve/_private/constants.py

+DEFAULT_COMBINED_WORKLOAD_AUTOSCALING_POLICY = (
+    "ray.serve.autoscaling_policy:default_combined_workload_autoscaling_policy"
+)


can be named better, combined_workload is overloaded term

@abrarsheikh how about DEFAULT_QUEUE_AWARE_AUTOSCALING_POLICY? it seems okaay to me and not that overloaded, lmk your view on it, will then amend the code accordingly.

I would choose default_async_inferance_autoscaling_policy .

default_combined_workload_autoscaling_policy: to the reader its not clear what workloads we are combining
DEFAULT_QUEUE_AWARE_AUTOSCALING_POLICY: which queue, the other autoscaling policy is also queue aware, but the queue there refers to ongoing requests.

abrarsheikh · 2026-01-15T04:20:20Z

python/ray/serve/autoscaling_policy.py

@@ -1,12 +1,15 @@
 import logging
 import math


many of the changes in this file are overlapping with changes in this PR #58857. I suggest reviewing the other PR, makes sure its compatible with what we need to do here.

ack, will review the other PR

abrarsheikh · 2026-01-15T04:22:25Z

python/ray/serve/autoscaling_policy.py

+        queue_length = ray.get(
+            queue_monitor_actor.get_queue_length.remote(), timeout=5.0
+        )


This would make 1 RFC call per every controller iteration. What is the impact of this on the scalability of the controller? Do we need to query queue length in every iteration?

abrarsheikh · 2026-01-15T04:29:05Z

python/ray/serve/tests/unit/test_queue_autoscaling_policy.py

+
+
+@pytest.fixture
+def queue_monitor_mock():


i woudl avoid such heavy mocking. instead figure out a way to write the tests using dependency injection. study how the existing tests are written in test_deployment_state

harshit-anyscale · 2026-01-16T14:39:15Z

i would base this PR on top of #58857

i am not sure of the timeline we are targeting for #58857 PR, but since we want to get queue-based autoscaling feature out asap, hence, thought of merging this PR as it is.

and then once the #58857 PR is merged, i will create a new one, using the changes of #58857 to refactor the queue-aware autoscaling policy.

@abrarsheikh lmk your thoughts on it.

harshit-anyscale self-assigned this Dec 18, 2025

gemini-code-assist bot reviewed Dec 18, 2025

View reviewed changes

python/ray/serve/_private/queue_monitor.py Outdated Show resolved Hide resolved

python/ray/serve/autoscaling_policy.py Show resolved Hide resolved

harshit-anyscale added the go add ONLY when ready to merge, run all tests label Dec 19, 2025

github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jan 2, 2026

add queue based autoscaling

26d4bd7

Signed-off-by: harshit <[email protected]>

harshit-anyscale force-pushed the queue-based-autoscaling-part-2 branch from 86223e0 to 26d4bd7 Compare January 9, 2026 07:05

merge master

b588c1f

Signed-off-by: harshit <[email protected]>

harshit-anyscale removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jan 9, 2026

refactoring

0c7cf30

Signed-off-by: harshit <[email protected]>

harshit-anyscale force-pushed the queue-based-autoscaling-part-2 branch from 29f8b0b to 0c7cf30 Compare January 12, 2026 18:08

harshit-anyscale marked this pull request as ready for review January 12, 2026 19:07

harshit-anyscale requested a review from a team as a code owner January 12, 2026 19:07

harshit-anyscale requested review from abrarsheikh, akyang-anyscale and landscapepainter January 12, 2026 19:09

ray-gardener bot added the serve Ray Serve Related Issue label Jan 13, 2026

abrarsheikh reviewed Jan 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[2/3] queue-based autoscaling - add default queue-based autoscaling policy #59548

[2/3] queue-based autoscaling - add default queue-based autoscaling policy #59548

harshit-anyscale commented Dec 18, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jan 2, 2026

Uh oh!

abrarsheikh left a comment

Uh oh!

abrarsheikh Jan 15, 2026

Uh oh!

harshit-anyscale Jan 15, 2026

Uh oh!

abrarsheikh Jan 18, 2026

Uh oh!

abrarsheikh Jan 15, 2026

Uh oh!

harshit-anyscale Jan 15, 2026

Uh oh!

abrarsheikh Jan 15, 2026

Uh oh!

abrarsheikh Jan 15, 2026

Uh oh!

harshit-anyscale commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		@pytest.fixture
		def queue_monitor_mock():

[2/3] queue-based autoscaling - add default queue-based autoscaling policy #59548

Are you sure you want to change the base?

[2/3] queue-based autoscaling - add default queue-based autoscaling policy #59548

Conversation

harshit-anyscale commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related PRs:

Changes

New Components

Files Modified

Files Added

How It Works

Scaling Formula

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jan 2, 2026

Uh oh!

abrarsheikh left a comment

Choose a reason for hiding this comment

Uh oh!

abrarsheikh Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

harshit-anyscale Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

abrarsheikh Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

abrarsheikh Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

harshit-anyscale Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

abrarsheikh Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

abrarsheikh Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

harshit-anyscale commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

harshit-anyscale commented Dec 18, 2025 •

edited

Loading