Skip to content

Conversation

@nadongjun
Copy link
Contributor

Description

Add application-level autoscaling snapshot support for observability.

This PR extends the existing deployment-level autoscaling snapshot feature (PR #56225) to support application-level autoscaling. When an app-level autoscaling policy is configured, the controller now emits ApplicationSnapshot logs containing aggregated metrics across all deployments in the application.

Related issues

Related to #55833

Additional information

bash % cat /tmp/ray/session_latest/logs/serve/autoscaling_snapshot_6668.log 
{"asctime": "2026-01-09 13:56:19,481", "levelname": "INFO", "message": "{'snapshots': [{'snapshot_type': 'application', 'timestamp_str': '2026-01-09T04:56:19Z', 'app': 'app_snap_1767934578', 'num_deployments': 2, 'total_current_replicas': 0, 'total_target_replicas': 2, 'scaling_status': 'scaling up', 'policy_name': 'ray.serve.tests.test_controller.simple_app_policy_for_test', 'errors': []}]}", "filename": "controller.py", "lineno": 511, "process": 6668, "timestamp_ns": 1767934579481838000}
{"asctime": "2026-01-09 13:56:19,999", "levelname": "INFO", "message": "{'snapshots': [{'snapshot_type': 'application', 'timestamp_str': '2026-01-09T04:56:19Z', 'app': 'app_snap_1767934578', 'num_deployments': 2, 'total_current_replicas': 2, 'total_target_replicas': 2, 'scaling_status': 'stable', 'policy_name': 'ray.serve.tests.test_controller.simple_app_policy_for_test', 'errors': []}]}", "filename": "controller.py", "lineno": 511, "process": 6668, "timestamp_ns": 1767934579999085000}

@nadongjun nadongjun requested a review from a team as a code owner January 9, 2026 05:39
@nadongjun nadongjun changed the title Serve obsv application [Serve][3/N] Add application-level autoscaling snapshot Jan 9, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request extends autoscaling observability to the application level by introducing ApplicationSnapshot logs. The changes are well-structured, reusing existing patterns from deployment-level snapshots, and include comprehensive tests. I've identified a couple of areas for improvement to enhance code clarity and correctness. Overall, this is a solid addition to Ray Serve's observability features.

Signed-off-by: Dongjun Na <[email protected]>
Signed-off-by: Dongjun Na <[email protected]>
@ray-gardener ray-gardener bot added serve Ray Serve Related Issue observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling community-contribution Contributed by the community labels Jan 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant