-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[DOC-127] MVP for OSS Ray labels #54254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
d584094
91f1db0
2a6bd96
f742228
0cdb49b
5d1c58a
2b9f5b6
92de85f
98295aa
567beab
ca1f37f
c598fa5
d3f98e1
9d35425
b0010db
72dc0d2
7566fc3
479e4cf
a0568eb
4254dad
d3674d4
c3658d9
14ceb8a
f405dd6
ed27235
2831e18
7ed4358
9c1efe8
e833c7f
493dfd7
9759f52
9bf134b
7ef8f1c
946710b
9f185b4
95f86ab
55c806b
2dfef26
324d823
c855106
b455703
de64cca
eaff8f2
65129cf
e61a044
38e7465
894c6a2
dac5d51
3c2ceb4
3d4d54a
78ce38e
fcef957
4e86ecd
968643f
c18ad8b
a5dc30a
a4bdb92
3d16c65
81504cc
8f73a91
2b4d6d9
5f87384
795d1c6
6b8b3c1
46ee375
cd7d75b
d7a2372
2bade54
a328773
6497a48
1dcefd2
97d1941
9ba96b9
8a29f31
ed21018
d9db9c0
945fb1d
08b83cc
74fbb50
09918da
642a1bd
cb2a489
c479f9a
4d14af3
e899722
3918fd8
6421d4b
64321d8
6d3290b
70d0abb
bcef9b6
76a7831
90d60d2
ff32fa0
8d9b79b
86844c4
be4fac8
12688a9
6677e49
a33dcbc
db38137
ab65315
aead31b
6b03cda
109bed2
4265754
bb7255d
c7c053b
1820d06
f1197ac
2197f99
58bf339
94d32bb
6f89324
97b9dfb
e0b920f
d27c6b8
96353b3
d078d13
7fc30fd
e789778
b9eeb38
9834962
371bfc1
5d9e1e7
ca0db46
ffb86e7
69f9b6f
a55e882
7927a48
afd45c9
d499847
60cb3b6
0c3e8c2
f59ee5e
7c1dcd8
5467011
6d99759
1bc0114
d1e1477
8f2b639
63bbad3
a6db1f1
0632dd1
a2610d4
f2591da
f977b62
785eefb
2d384d4
0ed226a
9d689ca
78aa5ea
b00f094
21d4c99
1b1074c
b30074c
2956f83
bd1fb2c
39df8a0
030818d
e1c1039
1ec8b59
7ede107
bf5cfc6
e5e30ad
f84f239
6236996
14582ab
03b5f48
a2a3913
3a30409
5a485a1
79d8c4e
15d2bba
039f514
639bff7
a210519
c3063a9
051e699
d63d56e
def1603
9fe5ae8
c1d68dc
420cc84
ee73f40
1bf846e
39178de
7cf325d
c5718a3
194f66c
fd387b4
368714f
fdd97ac
b1fbac4
8ed72d8
904d6ca
56789aa
b0bc362
dfb6771
2497993
c46d2d3
08573a2
0f1688d
92aa23d
3cd708f
a45d267
c1fd11d
95ca224
33797ec
b52df85
24ebbc3
8434fa0
3f5f227
213cd4b
036afbf
fc5643c
2a0747d
a049d25
f21012c
055a3da
623a09c
153a84c
ed52bc8
529e040
139a74e
81c62c1
56568c9
f0ded73
562e3d3
cf737c2
35a4f23
cfa6627
4fce24a
f82d7bd
4f85194
8595fc9
6e370ca
122c021
8cdadba
2674d91
8f6e660
1cdf05d
241e619
82f88fa
64c1379
aa2d496
29535ea
a1cef9c
a5a4e46
dfb1fa9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,175 @@ | ||
--- | ||
description: "Learn about using labels to control how Ray schedules tasks, actors, and placement groups to nodes in your Kubernetes cluster." | ||
--- | ||
|
||
(labels)= | ||
# Use labels to control scheduling | ||
|
||
In Ray version 2.49.0 and above, you can use labels to control scheduling for KubeRay. Labels are a beta feature. | ||
|
||
This page provides a conceptual overview and usage instructions for labels. Labels are key-value pairs that provide a human-readable configuration for users to control how Ray schedules tasks, actors, and placement group bundles to specific nodes. | ||
|
||
|
||
```{note} | ||
Ray labels share the same syntax and formatting restrictions as Kubernetes labels, but are conceptually distinct. See the [Kubernetes docs on labels and selectors](https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#syntax-and-character-set). | ||
``` | ||
|
||
|
||
## How do labels work? | ||
|
||
The following is a high-level overview of how you use labels to control scheduling: | ||
|
||
- Ray sets default labels that describe the underlying compute. See [](defaults). | ||
- You define custom labels as key-value pairs. See [](custom). | ||
- You specify *label selectors* in your Ray code to define label requirements. You can specify these requirements at the task, actor, or placement group bundle level. See [](label-selectors). | ||
- Ray schedules tasks, actors, or placement group bundles based on the specified label selectors. | ||
- In Ray 2.50.0 and above, if you're using a dynamic cluster with autoscaler V2 enabled, the cluster scales up to add new nodes from a designated worker group to fulfill label requirements. | ||
|
||
(defaults)= | ||
## Default node labels | ||
```{note} | ||
Ray reserves all labels under ray.io namespace. | ||
``` | ||
During cluster initialization or as autoscaling events add nodes to your cluster, Ray assigns the following default labels to each node: | ||
|
||
| Label | Description | | ||
| --- | --- | | ||
| `ray.io/node-id` | A unique ID generated for the node. | | ||
| `ray.io/accelerator-type` | The accelerator type of the node, for example `L4`. CPU-only machines have an empty string. See {ref}`accelerator types <accelerator-types>` for a mapping of values. | | ||
|
||
```{note} | ||
You can override default values using `ray start` parameters. | ||
``` | ||
|
||
The following are examples of default labels: | ||
|
||
```python | ||
"ray.io/accelerator-type": "" # Default label indicating the machine is CPU-only. | ||
``` | ||
|
||
(custom)= | ||
## Define custom labels | ||
|
||
You can add custom labels to your nodes using the `--labels` or `--labels-file` parameter when running `ray start`. | ||
|
||
dstrodtman marked this conversation as resolved.
Show resolved
Hide resolved
|
||
```bash | ||
# Examples 1: Start a head node with cpu-family and test-label labels | ||
ray start --head --labels="cpu-family=amd,test-label=test-value" | ||
|
||
# Example 2: Start a head node with labels from a label file | ||
ray start --head --labels-files='./test-labels-file' | ||
|
||
# The file content can be the following (should be a valid YAML file): | ||
# "test-label": "test-value" | ||
# "test-label-2": "test-value-2" | ||
``` | ||
|
||
```{note} | ||
You can't set labels using `ray.init()`. Local Ray clusters don't support labels. | ||
``` | ||
|
||
(label-selectors)= | ||
## Specify label selectors | ||
|
||
You add label selector logic to your Ray code when defining Ray tasks, actors, or placement group bundles. Label selectors define the label requirements for matching your Ray code to a node in your Ray cluster. | ||
|
||
Label selectors specify the following: | ||
|
||
- The key of the label. | ||
- Operator logic for matching. | ||
- The value or values to match on. | ||
|
||
The following table shows the basic syntax for label selector operator logic: | ||
|
||
| Operator | Description | Example syntax | | ||
| --- | --- | --- | | ||
| Equals | Label matches exactly one value. | `{“key”: “value”}` | ||
| Not equal | Label matches anything by one value. | `{“key”: “!value”}` | ||
| In | Label matches one of the provided values. | `{“key”: “in(val1,val2)”}` | ||
| Not in | Label matches none of the provided values. | `{“key”: “!in(val1,val2)”}` | ||
|
||
You can specify one or more label selectors as a dict. When specifying multiple label selectors, the candidate node must meet all requirements. The following example configuration uses a custom label to require an `m5.16xlarge` EC2 instance and a default label to require node ID to be 123: | ||
|
||
```python | ||
label_selector={"instance_type": "m5.16xlarge", "ray.io/node-id": "123"} | ||
``` | ||
|
||
## Specify label requirements for tasks and actors | ||
|
||
Use the following syntax to add label selectors to tasks and actors: | ||
|
||
```python | ||
# An example for specifing label_selector in task's @ray.remote annotation | ||
@ray.remote(label_selector={"label_name":"label_value"}) | ||
def f(): | ||
pass | ||
|
||
# An example of specifying label_selector in actor's @ray.remote annotation | ||
@ray.remote(label_selector={"ray.io/accelerator-type": "nvidia-h100"}) | ||
class Actor: | ||
pass | ||
|
||
# An example of specifying label_selector in task's options | ||
@ray.remote | ||
def test_task_label_in_options(): | ||
pass | ||
|
||
test_task_label_in_options.options(label_selector={"test-lable-key": "test-label-value"}).remote() | ||
|
||
# An example of specifying label_selector in actor's options | ||
@ray.remote | ||
class Actor: | ||
pass | ||
|
||
actor_1 = Actor.options( | ||
label_selector={"ray.io/accelerator-type": "nvidia-h100"}, | ||
).remote() | ||
``` | ||
|
||
## Specify label requirements for placement group bundles | ||
|
||
Use the `bundle_label_selector` option to add label selector to placement group bundles. See the following examples: | ||
|
||
```python | ||
# All bundles require the same labels: | ||
ray.util.placement_group( | ||
bundles=[{"GPU": 1}, {"GPU": 1}], | ||
bundle_label_selector=[{"ray.io/accelerator-type": "H100"} * 2], | ||
) | ||
|
||
# Bundles require different labels: | ||
ray.util.placement_group( | ||
bundles=[{"CPU": 1}] + [{"GPU": 1} * 2], | ||
bundle_label_selector=[{"ray.io/market-type": "spot"}] + [{"ray.io/accelerator-type": "H100"} * 2] | ||
) | ||
cursor[bot] marked this conversation as resolved.
Show resolved
Hide resolved
|
||
``` | ||
## Using labels with autoscaler | ||
|
||
Autoscaler V2 supports label-based scheduling. To enable autoscaler to scale up nodes to fulfill label requirements, you need to create multiple worker groups for different label requirement combinations and specify all the corresponding labels in the `rayStartParams` field in the Ray cluster configuration. For example: | ||
|
||
```python | ||
rayStartParams: { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When ray-project/kuberay#4106 is merged we can direct users to specify the top-level There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @MengjinYan LMK if this needs to happen for this release or you want to add later. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @dstrodtman I think we can add it later. |
||
labels: "region=me-central1,ray.io/accelerator-type=nvidia-h100" | ||
} | ||
``` | ||
|
||
## Monitor nodes using labels | ||
|
||
The Ray dashboard automatically shows the following information: | ||
- Labels for each node. See {py:attr}`ray.util.state.common.NodeState.labels`. | ||
- Label selectors set for each task, actor, or placement group bundle. See {py:attr}`ray.util.state.common.TaskState.label_selector` and {py:attr}`ray.util.state.common.ActorState.label_selector`. | ||
|
||
Within a task, you can programmatically obtain the node label from the RuntimeContextAPI using `ray.get_runtime_context().get_node_labels()`. This returns a Python dict. See the following example: | ||
|
||
```python | ||
@ray.remote | ||
def test_task_label(): | ||
node_labels = ray.get_runtime_context().get_node_labels() | ||
print(f"[test_task_label] node labels: {node_labels}") | ||
|
||
""" | ||
Example output: | ||
(test_task_label pid=68487) [test_task_label] node labels: {'test-label-1': 'test-value-1', 'test-label-key': 'test-label-value', 'test-label-2': 'test-value-2'} | ||
""" | ||
``` | ||
You can also access information about node label and label selector information using the state API and state CLI. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we link to the state reference? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure what the references here are TBH. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is Ray docs, so the recommender is "we" IIUC.