add queue length based autoscaling #59351

harshit-anyscale · 2025-12-10T16:51:34Z

This PR adds queue-based autoscaling for Ray Serve TaskConsumer deployments. TaskConsumers are workloads that consume tasks from message queues (Redis, RabbitMQ), and their scaling needs are fundamentally different from HTTP-based deployments.

Instead of scaling based on HTTP request load, TaskConsumers should scale based on the number of pending tasks in the message queue.

Key Features

Automatic queue depth monitoring via QueueMonitor Ray actor
New queue_based_autoscaling_policy that scales replicas based on ceil(queue_length / target_ongoing_requests)
Transparent integration: TaskConsumers with autoscaling enabled automatically use queue-based scaling
Support for Redis and RabbitMQ brokers
Actor recovery mechanism for fault tolerance

Architecture


  ┌─────────────────┐      ┌──────────────────┐      ┌─────────────────┐
  │  Message Queue  │◄─────│  QueueMonitor    │      │ ServeController │
  │  (Redis/RMQ)    │      │  Actor           │◄─────│ Autoscaler      │
  └─────────────────┘      └──────────────────┘      └─────────────────┘
                                   │                         │
                                   │ get_queue_length()      │
                                   └─────────────────────────┘
                                             │
                                             ▼
                                ┌───────────────────────────┐
                                │ queue_based_autoscaling   │
                                │ _policy()                 │
                                │ desired = ceil(len/target)│
                                └───────────────────────────┘

Components

QueueMonitor Actor (queue_monitor.py)
- Lightweight Ray actor that queries queue length from the message broker
- Supports Redis (using LLEN command) and RabbitMQ (using passive queue declaration)
- Named actor format: QUEUE_MONITOR::<deployment_name>
- Detached lifecycle with automatic cleanup on deployment deletion
Queue-Based Autoscaling Policy (autoscaling_policy.py)
- New policy function: queue_based_autoscaling_policy
- Formula: desired_replicas = ceil(queue_length / target_ongoing_requests)
- Reuses existing smoothing/delay logic to prevent oscillation
- Stores QueueMonitor config in policy_state for actor recovery
Automatic Policy Switching (application_state.py)
- Detects TaskConsumer deployments via _is_task_consumer marker
- Automatically creates QueueMonitor actor and switches to queue-based policy
- Only applies when user hasn't specified a custom autoscaling policy
Lifecycle Management (autoscaling_state.py)
- QueueMonitor actors are cleaned up when deployments are deleted
- Prevents actor leaks and ensures test isolation

Files Changed

File	Changes
python/ray/serve/_private/queue_monitor.py	New file - QueueMonitor actor for broker queries
python/ray/serve/autoscaling_policy.py	Add queue_based_autoscaling_policy, refactor smoothing logic
python/ray/serve/_private/application_state.py	Auto-configure queue-based autoscaling for TaskConsumers
python/ray/serve/_private/autoscaling_state.py	Clean up QueueMonitor actors on deployment deletion
python/ray/serve/_private/constants.py	Add DEFAULT_QUEUE_BASED_AUTOSCALING_POLICY constant
python/ray/serve/task_consumer.py	Add _is_task_consumer marker and get_queue_monitor_config()
python/ray/serve/tests/test_task_processor.py	Integration tests for queue-based autoscaling
python/ray/serve/tests/unit/test_queue_autoscaling_policy.py	New file - Unit tests for policy
python/ray/serve/tests/unit/test_queue_monitor.py	New file - Unit tests for QueueMonitor

Test Plan

Unit tests for queue_based_autoscaling_policy (19 tests)
- Basic scaling formula verification
- Scale from zero handling
- Min/max replicas enforcement
- Upscale/downscale delays
- QueueMonitor unavailability handling
- Actor recovery from policy_state
Unit tests for QueueMonitor (8 tests)
- Redis and RabbitMQ initialization
- Queue length queries
- Error handling and fallback to cached values
Integration tests (3 tests)
- test_task_consumer_scales_up_based_on_queue_depth - Verifies scaling from 1→4 replicas with 20 queued tasks
- test_task_consumer_autoscaling_respects_max_replicas - Verifies max_replicas cap
- test_task_consumer_autoscaling_respects_min_replicas - Verifies min_replicas floor

Follow-up PRs

PR to add the number of prefetch tasks + number of tasks on which celery is currently executing in the queue_monitor scaling logic.
Additional test cases (e.g., edge cases, failure scenarios, multi-deployment tests) will be added in a follow-up PR.

Signed-off-by: harshit <[email protected]>

gemini-code-assist

Code Review

This pull request introduces queue-based autoscaling for TaskConsumer deployments in Ray Serve. A new QueueMonitor component, implemented as a Ray actor, is added to directly query message broker queue lengths (supporting Redis and RabbitMQ). The task_consumer decorator is updated to mark deployments as TaskConsumer and provide a QueueMonitorConfig for their associated queue. During application deployment, ApplicationState now identifies TaskConsumer deployments with default autoscaling policies, creates a QueueMonitor actor for them, and switches their policy to the new queue_based_autoscaling_policy. This new policy calculates desired replicas based on queue depth (ceil(queue_length / target_ongoing_requests)), respects min_replicas and max_replicas, handles scaling from zero, and incorporates existing smoothing logic for scaling decisions. The AutoscalingState is updated to clean up QueueMonitor actors when deployments are deregistered. Unit and integration tests were added to validate the QueueMonitor functionality, the new autoscaling policy, and its interaction with TaskConsumer deployments, including scenarios for scaling up, respecting replica bounds, and actor recovery. Review comments suggest improving exception handling by catching more specific exceptions or logging full tracebacks, and updating the docstring for _apply_scaling_decision_smoothing to accurately reflect its parameters. Additionally, it was noted that relying on __del__ for resource cleanup in QueueMonitor is discouraged, recommending explicit shutdown() methods or context managers instead.

python/ray/serve/_private/application_state.py

python/ray/serve/autoscaling_policy.py

python/ray/serve/_private/queue_monitor.py

Signed-off-by: harshit <[email protected]>

…ray into add-queue-based-autoscaling

abrarsheikh

i suggest breaking down this PR into smaller chunks, ok to keep it like this if you want to iterate on the implementation

abrarsheikh · 2025-12-11T16:07:03Z

python/ray/serve/_private/application_state.py

+def _is_task_consumer_deployment(deployment_info: DeploymentInfo) -> bool:
+    """Check if a deployment is a TaskConsumer."""
+    try:
+        deployment_def = deployment_info.replica_config.deployment_def


this access is unsafe because it deserializes user code in controller, controller does not have the user's runtime env to execute this code.

For example if user code depends on pytorch and torch is mentioned as a pip dependency in deployment's runtime_env, then this call here will fail in cluster mode.

abrarsheikh · 2025-12-11T16:12:07Z

python/ray/serve/_private/application_state.py

+            # Configure queue-based autoscaling for TaskConsumers
+            _configure_queue_based_autoscaling_for_task_consumers(deployment_infos)


this only takes effect in imperative mode. If user try to deploy through config then this code path is not executed

abrarsheikh · 2025-12-11T16:12:49Z

python/ray/serve/_private/application_state.py

+                    except Exception as e:
+                        logger.error(
+                            f"Failed to create QueueMonitor actor for '{deployment_name}': {e}"
+                        )
+                        continue


why fail silently, should we fail deployment here

abrarsheikh · 2025-12-11T16:14:22Z

python/ray/serve/_private/queue_monitor.py

+    actor = QueueMonitorActor.options(
+        name=full_actor_name,
+        namespace=namespace,
+        lifetime="detached",


Needs a comment here on why to choose a detached actor?

as we are creating the actor again in case the controller goes down, lifetime as detached won't be required anymore, removed it in #59430

abrarsheikh · 2025-12-11T16:16:46Z

python/ray/serve/_private/queue_monitor.py

+            return "unknown"
+
+
+class QueueMonitor:


why did we not go with the flower here? Doesn't that abstract away all this logic here?

yes it does abstract all this logic but in the case of flower, we will have to make a remote HTTP call, then the flower will intercept it, read it, and then make the call to broker, all this combined will be more time-consuming than directly calling the broker. Apart from that, we will have to port-management for running N instances of flower inside serve controller. It is doable, but given these details, writing plain vanilla code to get queue length seems more efficient and simple.

harshit-anyscale · 2025-12-15T07:35:06Z

creating new small PRs for this bigger change.
will address these comments in the new ones

lint changes

7279bbe

Signed-off-by: harshit <[email protected]>

gemini-code-assist bot reviewed Dec 10, 2025

View reviewed changes

python/ray/serve/_private/application_state.py Show resolved Hide resolved

python/ray/serve/_private/application_state.py Show resolved Hide resolved

python/ray/serve/autoscaling_policy.py Outdated Show resolved Hide resolved

python/ray/serve/_private/queue_monitor.py Show resolved Hide resolved

harshit-anyscale changed the title ~~lint changes~~ add queue length based autoscaling Dec 10, 2025

harshit-anyscale self-assigned this Dec 10, 2025

harshit-anyscale added 2 commits December 11, 2025 05:50

merge master

440fae7

Signed-off-by: harshit <[email protected]>

Merge branch 'master' into add-queue-based-autoscaling

8ae50fd

harshit-anyscale added the go add ONLY when ready to merge, run all tests label Dec 11, 2025

harshit-anyscale added 2 commits December 11, 2025 08:17

add pika as dependency in ray-serve-async-inf

9a9d637

Signed-off-by: harshit <[email protected]>

Merge branch 'add-queue-based-autoscaling' of github.com:ray-project/…

d6ee531

…ray into add-queue-based-autoscaling

abrarsheikh reviewed Dec 11, 2025

View reviewed changes

harshit-anyscale closed this Dec 15, 2025

		# Configure queue-based autoscaling for TaskConsumers
		_configure_queue_based_autoscaling_for_task_consumers(deployment_infos)

add queue length based autoscaling #59351

add queue length based autoscaling #59351

Uh oh!

Conversation

harshit-anyscale commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Features

Architecture

Components

Files Changed

Follow-up PRs

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

abrarsheikh left a comment

Choose a reason for hiding this comment

Uh oh!

abrarsheikh Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

abrarsheikh Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

abrarsheikh Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

abrarsheikh Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

harshit-anyscale Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

abrarsheikh Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

harshit-anyscale Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

harshit-anyscale commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

harshit-anyscale commented Dec 10, 2025 •

edited

Loading

harshit-anyscale commented Dec 15, 2025 •

edited

Loading