Skip to content
Open
Show file tree
Hide file tree
Changes from 54 commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
ab6cd99
Add label_selector and bundle_label_selector to Serve API
ryanaoleary Oct 14, 2025
0a449e1
Fix argument order
ryanaoleary Oct 16, 2025
a900e41
Merge branch 'master' into add-label-selector-serve-option
ryanaoleary Nov 3, 2025
6862eee
Merge branch 'master' into add-label-selector-serve-option
zcin Nov 5, 2025
a0c2246
Add fallback strategy to serve options and consider labels during rep…
ryanaoleary Dec 18, 2025
c74a0e8
Merge branch 'master' into add-label-selector-serve-option
ryanaoleary Dec 18, 2025
6a4d241
Add new fields to DeploymentVersion proto
ryanaoleary Dec 19, 2025
a1e5fd1
Add validation check that bundles are provided with bundle_label_sele…
ryanaoleary Dec 19, 2025
0ba9ed7
Correctly handle label constraint for replica actor compaction
ryanaoleary Dec 19, 2025
181156b
Fix fallback strategy type
ryanaoleary Dec 19, 2025
be6fdf1
mock correct flag in test
ryanaoleary Dec 19, 2025
4558c47
Fix test, move some to unit, and remove NodeLabelSchedulingStrategy c…
ryanaoleary Dec 19, 2025
b0eae3a
Update python/ray/serve/_private/config.py
ryanaoleary Dec 22, 2025
8b72f0c
Fix argument names, refactor test fixture, and improve readability
ryanaoleary Dec 30, 2025
248be33
Remove fallback_strategy from Deployment API
ryanaoleary Dec 30, 2025
3ddb0a1
Add new fields to to_dict
ryanaoleary Dec 30, 2025
a9dbec0
Fully remove fallback strategy from deployment API
ryanaoleary Dec 30, 2025
d96f858
Fix var names and tests
ryanaoleary Dec 30, 2025
db162af
Move appropriate tests to deployment_scheduler and remove duplicate test
ryanaoleary Dec 30, 2025
935e084
Handle Label selector set to None
ryanaoleary Dec 30, 2025
d080190
Merge branch 'master' into add-label-selector-serve-option
ryanaoleary Dec 30, 2025
d87f838
Move back actor_options copy to where it was
ryanaoleary Dec 31, 2025
31bd7d8
Merge branch 'master' into add-label-selector-serve-option
ryanaoleary Dec 31, 2025
fe6a1a7
Fix invalid parameter being passed as placement_group_version
ryanaoleary Jan 6, 2026
40c02e0
Add string type check to _filter_nodes_by_labels
ryanaoleary Jan 6, 2026
4305e20
Merge branch 'master' into add-label-selector-serve-option
ryanaoleary Jan 6, 2026
7b13cc5
Fix None check for fallback strategy
ryanaoleary Jan 7, 2026
fe1eb88
pass bundle_label_selector through config override path
ryanaoleary Jan 7, 2026
f4e90a4
Update python/ray/serve/schema.py
ryanaoleary Jan 7, 2026
fc52d16
Add TODO comments
ryanaoleary Jan 7, 2026
b844884
Fix field name that gets popped
ryanaoleary Jan 7, 2026
47ed24d
Refactor schedule function to be more clear
ryanaoleary Jan 7, 2026
4f438f4
Add unit test coverage for bundle_label_selector and fallback_strateg…
ryanaoleary Jan 8, 2026
485dde1
Refactor is_scaled_copy_of for readability and add attribute checks
ryanaoleary Jan 8, 2026
38d4d30
make variable assignments more readable
ryanaoleary Jan 8, 2026
06c23f5
Prevent fallback_strategy from getting overwritten
ryanaoleary Jan 8, 2026
4a771c1
Fix explicit null bundles in fallback causes crash
ryanaoleary Jan 8, 2026
5fc10e7
Add validation test for validate_bundle_label_selector
ryanaoleary Jan 8, 2026
0b294cc
Merge branch 'master' into add-label-selector-serve-option
ryanaoleary Jan 8, 2026
cb78cc2
Update python/ray/serve/_private/deployment_scheduler.py
ryanaoleary Jan 8, 2026
a55af8e
Update python/ray/serve/_private/deployment_scheduler.py
ryanaoleary Jan 8, 2026
5e8f85e
Update python/ray/serve/_private/deployment_scheduler.py
ryanaoleary Jan 8, 2026
3128161
Update python/ray/serve/_private/deployment_scheduler.py
ryanaoleary Jan 8, 2026
51dd705
Update python/ray/serve/_private/deployment_scheduler.py
ryanaoleary Jan 8, 2026
45dd5aa
Update python/ray/serve/_private/deployment_scheduler.py
ryanaoleary Jan 8, 2026
c213408
Update python/ray/serve/_private/deployment_scheduler.py
ryanaoleary Jan 8, 2026
dc59988
Update python/ray/serve/_private/deployment_scheduler.py
ryanaoleary Jan 8, 2026
fc79d5a
Update python/ray/serve/_private/deployment_scheduler.py
ryanaoleary Jan 8, 2026
802bd5e
Fix naming from suggested comments
ryanaoleary Jan 9, 2026
cde9d08
Fix version truthiness check
ryanaoleary Jan 9, 2026
656986d
Merge branch 'master' into add-label-selector-serve-option
ryanaoleary Jan 9, 2026
010827f
Move filter by label selector logic to label_utils
ryanaoleary Jan 9, 2026
1e3a074
Merge branch 'master' into add-label-selector-serve-option
ryanaoleary Jan 9, 2026
89d4390
Fix key names and add NotImplementedError
ryanaoleary Jan 9, 2026
bc76515
Fix test key
ryanaoleary Jan 9, 2026
3c7c07f
Update python/ray/tests/test_label_utils.py
ryanaoleary Jan 13, 2026
206a89d
Trim whitespace around selector and fix invalid test case
ryanaoleary Jan 13, 2026
8cd0b0b
Add more test cases and re-structure under relevant classes
ryanaoleary Jan 14, 2026
21bd155
Enable single bundle_label_selector to apply to all bundles and add t…
ryanaoleary Jan 14, 2026
9f8d570
Merge branch 'master' into add-label-selector-serve-option
ryanaoleary Jan 14, 2026
56f8bdf
Fix validate_bundle_label_selector for single selector case
ryanaoleary Jan 15, 2026
a9f1954
Fix undescriptive normalize function name
ryanaoleary Jan 15, 2026
0a150b8
Update tests and add unhappy path coverage
ryanaoleary Jan 15, 2026
331e8cd
Merge branch 'master' into add-label-selector-serve-option
ryanaoleary Jan 15, 2026
d21d001
Fix test_schema.py for validate bundle_label_selector
ryanaoleary Jan 15, 2026
bc26060
Import match label selector logic from C++ instead
ryanaoleary Jan 16, 2026
523de76
Merge branch 'master' into add-label-selector-serve-option
ryanaoleary Jan 16, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions python/ray/_private/label_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -228,3 +228,45 @@ def validate_fallback_strategy(
return error_message

return None


def match_label_selector_value(node_value: Optional[str], selector_value: str) -> bool:
"""Evaluates if a node's label value matches a selector expression.

Supports:
- Equality: "value" matches if node_value == "value"
- Not Equal: "!value" matches if node_value != "value"
- In: "in(v1, v2)" matches if node_value in ["v1", "v2"]
- Not In: "!in(v1, v2)" matches if node_value not in ["v1", "v2"]
"""
if not isinstance(selector_value, str):
return False

# !in operator
if selector_value.startswith("!in(") and selector_value.endswith(")"):
content = selector_value[4:-1]
values = [v.strip() for v in content.split(",")]
return node_value is None or node_value not in values

# in operator
if selector_value.startswith("in(") and selector_value.endswith(")"):
content = selector_value[3:-1]
values = [v.strip() for v in content.split(",")]
return node_value in values

# not equal operator
if selector_value.startswith("!"):
target_val = selector_value[1:]
return node_value != target_val

# equals operator
return node_value == selector_value


def match_label_selector(node_labels: Dict[str, str], selector: Dict[str, str]) -> bool:
"""Returns True if node_labels satisfy all selector constraints."""
for key, selector_value in selector.items():
node_value = node_labels.get(key)
if not match_label_selector_value(node_value, selector_value):
return False
return True
10 changes: 10 additions & 0 deletions python/ray/serve/_private/application_state.py
Original file line number Diff line number Diff line change
Expand Up @@ -1707,6 +1707,14 @@ def override_deployment_info(
override_max_replicas_per_node = options.pop(
"max_replicas_per_node", replica_config.max_replicas_per_node
)
override_bundle_label_selector = options.pop(
"placement_group_bundle_label_selector",
replica_config.placement_group_bundle_label_selector,
)
override_fallback_strategy = options.pop(
"placement_group_fallback_strategy",
replica_config.placement_group_fallback_strategy,
)

# Record telemetry for container runtime env feature at deployment level
if override_actor_options.get("runtime_env") and (
Expand All @@ -1725,6 +1733,8 @@ def override_deployment_info(
placement_group_bundles=override_placement_group_bundles,
placement_group_strategy=override_placement_group_strategy,
max_replicas_per_node=override_max_replicas_per_node,
placement_group_bundle_label_selector=override_bundle_label_selector,
placement_group_fallback_strategy=override_fallback_strategy,
)
override_options["replica_config"] = replica_config

Expand Down
4 changes: 4 additions & 0 deletions python/ray/serve/_private/cluster_node_info_cache.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,10 @@ def get_available_resources_per_node(self) -> Dict[str, Union[float, Dict]]:

return self._cached_available_resources_per_node

def get_node_labels(self, node_id: str) -> Dict[str, str]:
"""Get the labels for a specific node from the cache."""
return self._cached_node_labels.get(node_id, {})


class DefaultClusterNodeInfoCache(ClusterNodeInfoCache):
def __init__(self, gcs_client: GcsClient):
Expand Down
2 changes: 2 additions & 0 deletions python/ray/serve/_private/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -805,6 +805,8 @@ class CreatePlacementGroupRequest:
target_node_id: str
name: str
runtime_env: Optional[str] = None
bundle_label_selector: Optional[List[Dict[str, str]]] = None
fallback_strategy: Optional[List[Dict[str, Any]]] = None


# This error is used to raise when a by-value DeploymentResponse is converted to an
Expand Down
142 changes: 115 additions & 27 deletions python/ray/serve/_private/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -480,6 +480,8 @@ def __init__(
ray_actor_options: Dict,
placement_group_bundles: Optional[List[Dict[str, float]]] = None,
placement_group_strategy: Optional[str] = None,
placement_group_bundle_label_selector: Optional[List[Dict[str, str]]] = None,
placement_group_fallback_strategy: Optional[List[Dict[str, Any]]] = None,
max_replicas_per_node: Optional[int] = None,
needs_pickle: bool = True,
):
Expand All @@ -505,6 +507,10 @@ def __init__(

self.placement_group_bundles = placement_group_bundles
self.placement_group_strategy = placement_group_strategy
self.placement_group_bundle_label_selector = (
placement_group_bundle_label_selector
)
self.placement_group_fallback_strategy = placement_group_fallback_strategy

self.max_replicas_per_node = max_replicas_per_node

Expand Down Expand Up @@ -535,12 +541,18 @@ def update(
ray_actor_options: dict,
placement_group_bundles: Optional[List[Dict[str, float]]] = None,
placement_group_strategy: Optional[str] = None,
placement_group_bundle_label_selector: Optional[List[Dict[str, str]]] = None,
placement_group_fallback_strategy: Optional[List[Dict[str, Any]]] = None,
max_replicas_per_node: Optional[int] = None,
):
self.ray_actor_options = ray_actor_options

self.placement_group_bundles = placement_group_bundles
self.placement_group_strategy = placement_group_strategy
self.placement_group_bundle_label_selector = (
placement_group_bundle_label_selector
)
self.placement_group_fallback_strategy = placement_group_fallback_strategy

self.max_replicas_per_node = max_replicas_per_node

Expand All @@ -557,6 +569,8 @@ def create(
ray_actor_options: Optional[Dict] = None,
placement_group_bundles: Optional[List[Dict[str, float]]] = None,
placement_group_strategy: Optional[str] = None,
placement_group_bundle_label_selector: Optional[List[Dict[str, str]]] = None,
placement_group_fallback_strategy: Optional[List[Dict[str, Any]]] = None,
max_replicas_per_node: Optional[int] = None,
deployment_def_name: Optional[str] = None,
):
Expand Down Expand Up @@ -597,17 +611,23 @@ def create(
deployment_def_name = deployment_def.__name__

config = cls(
deployment_def_name,
pickle_dumps(
deployment_def_name=deployment_def_name,
serialized_deployment_def=pickle_dumps(
deployment_def,
f"Could not serialize the deployment {repr(deployment_def)}",
),
pickle_dumps(init_args, "Could not serialize the deployment init args"),
pickle_dumps(init_kwargs, "Could not serialize the deployment init kwargs"),
ray_actor_options,
placement_group_bundles,
placement_group_strategy,
max_replicas_per_node,
serialized_init_args=pickle_dumps(
init_args, "Could not serialize the deployment init args"
),
serialized_init_kwargs=pickle_dumps(
init_kwargs, "Could not serialize the deployment init kwargs"
),
ray_actor_options=ray_actor_options,
placement_group_bundles=placement_group_bundles,
placement_group_strategy=placement_group_strategy,
placement_group_bundle_label_selector=placement_group_bundle_label_selector,
placement_group_fallback_strategy=placement_group_fallback_strategy,
max_replicas_per_node=max_replicas_per_node,
)

config._deployment_def = deployment_def
Expand All @@ -633,6 +653,8 @@ def _validate_ray_actor_options(self):
"resources",
# Other options
"runtime_env",
"label_selector",
"fallback_strategy",
}

for option in self.ray_actor_options:
Expand Down Expand Up @@ -674,11 +696,37 @@ def _validate_placement_group_options(self) -> None:
"`placement_group_bundles` must also be provided."
)

if self.placement_group_fallback_strategy is not None:
if self.placement_group_bundles is None:
raise ValueError(
"If `placement_group_fallback_strategy` is provided, "
"`placement_group_bundles` must also be provided."
)
if not isinstance(self.placement_group_fallback_strategy, list):
raise TypeError(
"placement_group_fallback_strategy must be a list of dictionaries. "
f"Got: {type(self.placement_group_fallback_strategy)}."
)
for i, strategy in enumerate(self.placement_group_fallback_strategy):
if not isinstance(strategy, dict):
raise TypeError(
f"placement_group_fallback_strategy entry at index {i} must be a dictionary. "
f"Got: {type(strategy)}."
)

if self.placement_group_bundle_label_selector is not None:
if self.placement_group_bundles is None:
raise ValueError(
"If `placement_group_bundle_label_selector` is provided, "
"`placement_group_bundles` must also be provided."
)

if self.placement_group_bundles is not None:
validate_placement_group(
bundles=self.placement_group_bundles,
strategy=self.placement_group_strategy or "PACK",
lifetime="detached",
bundle_label_selector=self.placement_group_bundle_label_selector,
)

resource_error_prefix = (
Expand Down Expand Up @@ -772,19 +820,37 @@ def init_kwargs(self) -> Optional[Tuple[Any]]:
@classmethod
def from_proto(cls, proto: ReplicaConfigProto, needs_pickle: bool = True):
return ReplicaConfig(
proto.deployment_def_name,
proto.deployment_def,
proto.init_args if proto.init_args != b"" else None,
proto.init_kwargs if proto.init_kwargs != b"" else None,
json.loads(proto.ray_actor_options),
json.loads(proto.placement_group_bundles)
if proto.placement_group_bundles
else None,
proto.placement_group_strategy
if proto.placement_group_strategy != ""
else None,
proto.max_replicas_per_node if proto.max_replicas_per_node else None,
needs_pickle,
deployment_def_name=proto.deployment_def_name,
serialized_deployment_def=proto.deployment_def,
serialized_init_args=(proto.init_args if proto.init_args != b"" else None),
serialized_init_kwargs=(
proto.init_kwargs if proto.init_kwargs != b"" else None
),
ray_actor_options=json.loads(proto.ray_actor_options),
placement_group_bundles=(
json.loads(proto.placement_group_bundles)
if proto.placement_group_bundles
else None
),
placement_group_strategy=(
proto.placement_group_strategy
if proto.placement_group_strategy != ""
else None
),
placement_group_bundle_label_selector=(
json.loads(proto.placement_group_bundle_label_selector)
if proto.placement_group_bundle_label_selector
else None
),
placement_group_fallback_strategy=(
json.loads(proto.placement_group_fallback_strategy)
if proto.placement_group_fallback_strategy
else None
),
max_replicas_per_node=(
proto.max_replicas_per_node if proto.max_replicas_per_node else None
),
needs_pickle=needs_pickle,
)

@classmethod
Expand All @@ -793,19 +859,39 @@ def from_proto_bytes(cls, proto_bytes: bytes, needs_pickle: bool = True):
return cls.from_proto(proto, needs_pickle)

def to_proto(self):
placement_group_bundles = (
json.dumps(self.placement_group_bundles)
if self.placement_group_bundles is not None
else ""
)

bundle_label_selector = (
json.dumps(self.placement_group_bundle_label_selector)
if self.placement_group_bundle_label_selector is not None
else ""
)

fallback_strategy = (
json.dumps(self.placement_group_fallback_strategy)
if self.placement_group_fallback_strategy is not None
else ""
)

max_replicas_per_node = (
self.max_replicas_per_node if self.max_replicas_per_node is not None else 0
)

return ReplicaConfigProto(
deployment_def_name=self.deployment_def_name,
deployment_def=self.serialized_deployment_def,
init_args=self.serialized_init_args,
init_kwargs=self.serialized_init_kwargs,
ray_actor_options=json.dumps(self.ray_actor_options),
placement_group_bundles=json.dumps(self.placement_group_bundles)
if self.placement_group_bundles is not None
else "",
placement_group_bundles=placement_group_bundles,
placement_group_strategy=self.placement_group_strategy,
max_replicas_per_node=self.max_replicas_per_node
if self.max_replicas_per_node is not None
else 0,
placement_group_bundle_label_selector=bundle_label_selector,
placement_group_fallback_strategy=fallback_strategy,
max_replicas_per_node=max_replicas_per_node,
)

def to_proto_bytes(self):
Expand All @@ -818,6 +904,8 @@ def to_dict(self):
"ray_actor_options": self.ray_actor_options,
"placement_group_bundles": self.placement_group_bundles,
"placement_group_strategy": self.placement_group_strategy,
"placement_group_bundle_label_selector": self.placement_group_bundle_label_selector,
"placement_group_fallback_strategy": self.placement_group_fallback_strategy,
"max_replicas_per_node": self.max_replicas_per_node,
}

Expand Down
1 change: 1 addition & 0 deletions python/ray/serve/_private/default_impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ def _default_create_placement_group(
_soft_target_node_id=request.target_node_id,
name=request.name,
lifetime="detached",
bundle_label_selector=request.bundle_label_selector,
)


Expand Down
Loading