Skip to content

Conversation

ryanaoleary
Copy link
Collaborator

Add default Ray node label info to Ray Pod environment

Why are these changes needed?

This PR adds market-type information for different cloud providers to the Ray pod environment based on the provided nodeSelector value. This PR also adds environment variables to pass region and zone information using downward API (kubernetes/kubernetes#127092). These environment variables will be used in Ray core to set default Ray node labels.

I'll add a comment below with my manual test results with propagating topology.k8s.io/region and topology.k8s.io/zone on a GKE v1.33 alpha cluster.

Related issue number

ray-project/ray#51564

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@ryanaoleary ryanaoleary self-assigned this May 27, 2025
@ryanaoleary
Copy link
Collaborator Author

cc: @MengjinYan

}

// add downward API environment variables for Ray default node labels
addDefaultRayNodeLabels(&pod)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we guard this logic with a Ray version check?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code doesn't rely on any API change from Ray, it just sets some env vars and the actual node labels get set in Ray core using those vars, but I can add a version guard here (I guess for whatever version ray-project/ray#53360 is included in) if we don't want it setting any unused vars for users on older versions of Ray.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From offline discussion with @MengjinYan we were leaning towards not including a version guard, since users are not required to specify the Ray version they're using in the CR spec

Copy link

@MengjinYan MengjinYan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LTGM @kevin85421 Can take a look from Kuberay's perspective?

pod.Spec.Containers[utils.RayContainerIndex].Env,
// used to set the ray.io/market-type node label
corev1.EnvVar{
Name: "RAY_NODE_MARKET_TYPE",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan for Ray Core to check these env vars? Can you link the PR that includes this change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's correct - this is the related PR: ray-project/ray#53360

@ryanaoleary ryanaoleary requested a review from andrewsykim June 3, 2025 22:03
@ryanaoleary
Copy link
Collaborator Author

LTGM @kevin85421 Can take a look from Kuberay's perspective?

@kevin85421 Bumping this to see if we can include it in 1.4

@kevin85421
Copy link
Member

@ryanaoleary I chatted with @MengjinYan, and my understanding is that this doesn’t need to be included in v1.4.0. Could you sync with @MengjinYan and let me know if I’m mistaken? Thanks!

@ryanaoleary
Copy link
Collaborator Author

ding is that this doesn’t need to be included in v1.4.0. Could you sync with @MengjinYan and let me know if I’m mistaken? Thanks!

Synced offline with @MengjinYan and yeah there's no urgency to include this in v1.4.0, we can wait to include it in the next release. My thought was just that it'd be useful to have this functionality in the soonest stable release for testing, but I can just use the nightly image.

@kevin85421
Copy link
Member

My thought was just that it'd be useful to have this functionality in the soonest stable release for testing, but I can just use the nightly image.

This makes sense. I’ll review it for now. We’ll make a best-effort attempt to include this PR, but there’s no guarantee.


// getRayMarketTypeFromNodeSelector is a helper function to determine the ray.io/market-type label
// based on user-provided Kubernetes nodeSelector values.
func getRayMarketTypeFromNodeSelector(pod *corev1.Pod) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess some schedulers or webhooks may update the node selector after the Pod is created. We should take it into consideration.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that the value of the default labels might change after the ray node started?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might change after the ray node started?

No, I mean it may be changed after KubeRay constructs the Pod spec but before the Pod is created or scheduled.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any webhooks that modify the value of cloud.google.com/gke-spot on a Pod? I'm having difficulty finding a reference. I think we should default to adding Ray node labels based on what the user specifies in their Pod spec.

ryanaoleary and others added 7 commits June 6, 2025 05:55
Signed-off-by: Ryan O'Leary <[email protected]>

Add default Ray node label info to Ray Pod environment

Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Co-authored-by: Kai-Hsun Chen <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
@ryanaoleary ryanaoleary force-pushed the pass-worker-group-ray-label branch from daebac4 to 604c162 Compare June 6, 2025 05:56
@ryanaoleary ryanaoleary requested a review from andrewsykim June 6, 2025 06:46
@kevin85421
Copy link
Member

cc @andrewsykim, do you have the bandwidth to review this PR? I will give it a final pass after you approve the PR.

@MengjinYan
Copy link

@andrewsykim @kevin85421 Follow up on this PR.

cc: @ryanaoleary

@kevin85421
Copy link
Member

If we want to automate features in the future, I suggest avoiding mechanisms that automatically pass multiple labels into Ray, as this could cause issues. For example:

  • In KubeRay v1.5, we pass labels A, B, and C to Ray.
  • In KubeRay v1.6, we also want to pass label D to Ray.
  • This could lead to unexpected issues for Ray/KubeRay users, making debugging difficult. Additionally, it would require the Ray Autoscaler to implement different logic for handling autoscaling with labels across various KubeRay versions. This will introduce the complexity for compatibility between Ray / KubeRay.

The issue still exists?

@andrewsykim
Copy link
Member

Hmmm.. I don't see why that issue would exist.

Let's wait for @ryanaoleary to update the PR based on our latest discussion and hopefully reviewing the code will make it more obviosu if future compatibility issues will be a problem.

@kevin85421
Copy link
Member

#3699 (comment)

Let me add more details for this one.

The Ray Autoscaler must identify the Ray labels associated with a worker group based on the RayCluster CR to scale pods correctly with the associated labels. I will use an example to explain the compatibility issue I am referring to:

  • In KubeRay v1.5, we pass labels A, B, and C to Ray.
    • To support autoscaling, Ray v2.N.0 checks whether the RayCluster CR specifies labels A, B, and C.
  • In KubeRay v1.6, we also pass label D to Ray.
    • To support autoscaling, Ray v2.M.0 checks whether the RayCluster CR specifies labels A, B, C, and D.

If KubeRay introduces changes to how Ray labels are populated, the Ray Autoscaler will require corresponding updates.

Would you mind helping me understand why you don't like ask users to specify rayStartParams directly? It is much safer and better UX.

@andrewsykim
Copy link
Member

Would you mind helping me understand why you don't like ask users to specify rayStartParams directly? It is much safer and better UX.

We are still documenting how to use rayStartParams directly to update labels. This change is only simplifying behavior for well-known default labels in Ray. It's not mutually exclusive.

We discussed the autoscaling behavior as well. We don't intend to change autoscalier behavior for labels just yet and this PR will not change anything for autoscaling. But Ryan has a doc for future changes. @ryanaoleary can you share the doc with Kai-Hsun as well please?

@Future-Outlier
Copy link
Member

Would you mind helping me understand why you don't like ask users to specify rayStartParams directly? It is much safer and better UX.

We are still documenting how to use rayStartParams directly to update labels. This change is only simplifying behavior for well-known default labels in Ray. It's not mutually exclusive.

We discussed the autoscaling behavior as well. We don't intend to change autoscalier behavior for labels just yet and this PR will not change anything for autoscaling. But Ryan has a doc for future changes. @ryanaoleary can you share the doc with Kai-Hsun as well please?

Hi, @ryanaoleary can you also share the doc to @rueian and me, too?

@Future-Outlier
Copy link
Member

Future-Outlier commented Sep 30, 2025

Would you mind helping me understand why you don't like ask users to specify rayStartParams directly? It is much safer and better UX.

We are still documenting how to use rayStartParams directly to update labels. This change is only simplifying behavior for well-known default labels in Ray. It's not mutually exclusive.

We discussed the autoscaling behavior as well. We don't intend to change autoscalier behavior for labels just yet and this PR will not change anything for autoscaling. But Ryan has a doc for future changes. @ryanaoleary can you share the doc with Kai-Hsun as well please?

I agree with what @kevin85421 said.
In the future, if we add a new label in KubeRay 1.6.0, we’ll also have to update Ray’s code to read that environment variable, right?
That will greatly increase complexity.

@ryanaoleary
Copy link
Collaborator Author

ryanaoleary commented Sep 30, 2025

also share the doc to @Ruei

@Future-Outlier Sorry for the delay, just shared.

@andrewsykim
Copy link
Member

andrewsykim commented Sep 30, 2025

From discussion with @Future-Outlier and @rueian:

Using rayStartParams is not a full-proof solution for autoscaling because labels from rayStartParams can be read from file --labels-file or values of the labels can be rendered from env vars --labels=var1=$var1_value. In addition, Ray already supports env vars like RAY_NODE_ZONE and RAY_NODE_REGION that users can set, which autoscaler will have to check.

I don't object to deferring this PR, but to be clear, asking users to set labels in rayStartParams will not solve the autoscaling gaps. We should view label-based autoscaling and default labelling as two separate problems to solve.

@kevin85421
Copy link
Member

kevin85421 commented Sep 30, 2025

because labels from rayStartParams can be read from file --labels-file or values of the labels can be rendered from env vars --labels=var1=$var1_value. In addition, Ray already supports env vars like RAY_NODE_ZONE and RAY_NODE_REGION that users can set

This means that labels-file and envs are incorrect APIs (especially label-file). We should make sure the Ray Autoscaler can easily determine which Pod will have which Ray labels from RayCluster CR.

cc @edoakes @jjyao please revisit the APIs so that we can avoid adding too much complexity in Ray Autoscaler.

@edoakes
Copy link
Contributor

edoakes commented Sep 30, 2025

cc @edoakes @jjyao please revisit the APIs so that we can avoid adding too much complexity in Ray Autoscaler.

This is the core reason why we need to make some form of change in the KubeRay CR and can't do this entirely transparently based on rayStartParams. The proposal above to auto-populate the builtin set of ray.io/ labels from the pod spec labels is the most basic form of it that doesn't require adding a new API/field to the CR. The autoscaler will just scan the pod spec for those whitelisted labels.

This won't solve the problem for custom/user-specified labels though. For that, we'd need to introduce a more explicit API such as a labels field in the head/worker group spec itself. The idea is to start with the minimal thing listed above without requiring a CR change, and taking it from there.

The amount of complexity being added for any of the options discussed in this thread is very minor, and the feature will be alpha in both Ray and KubeRay before stabilizing, so there is no "one way door" and I see no harm in starting with the proposed option. If we want to jump directly to a top-level labels field (or comparable alternative), that's fine with me too. It has the benefit of being very explicit.

@andrewsykim
Copy link
Member

andrewsykim commented Sep 30, 2025

Agreed with everything @edoakes said, the level of complexity is pretty low and it's not a one way door.

If it helps address some concerns from others, we can also guard the default labeling with a feature gate so we can remove it / break it in the future, but I don't see it being likely / necessary since the feature it already opt-in using pod labels.

@kevin85421
Copy link
Member

kevin85421 commented Sep 30, 2025

@edoakes thank you for the reply!

This won't solve the problem for custom/user-specified labels though.

What's "this" referring to?

For that, we'd need to introduce a more explicit API such as a labels field in the head/worker group spec itself.

Why we can't reuse rayStartParams? This doesn't require CRD change.

The idea is to start with the minimal thing listed above without requiring a CR change, and taking it from there.

Why it requires a CRD change? Can we reuse rayStartParams?

The amount of complexity being added for any of the options discussed in this thread is very minor

Have you read #3699 (comment)? The contract inconsistency between KubeRay / Ray Autoscaler was a real pain before. That's why I am very uncomfortable with the potential frequent contract changes with this PR if we incrementally auto-populate more and more labels.

Some solutions which provide stable contract between KubeRay and Ray Autoscaler I can accept:

@edoakes
Copy link
Contributor

edoakes commented Sep 30, 2025

  • rayStartParams isn't structured, so the autoscaler doesn't know what labels are present on the nodes in the node group (without some very hacky parsing logic to reverse engineer them). So that is not a viable solution on its own.

I'm now remembering after reading the KubeRay CR spec that the parsing logic is actually what we do for resources... my suggestion would be to:

  1. use this opportunity to lift both resources and labels into explicit structured fields
  2. revisit auto-population later

@kevin85421
Copy link
Member

I'm now remembering after reading the KubeRay CR spec that the parsing logic is actually what we do for resources... my suggestion would be to:

  • use this opportunity to lift both resources and labels into explicit structured fields
  • revisit auto-population later

This sounds good to me.

For resources, it will be great if we can lift it into an explicit structured field because it is a JSON with some reasons I didn't know. It causes a lot of UX friction.

For labels, I guess the format should be labels=$KEY1:$VAL1,$KEY2:$VAL2,.... It is not hard to handle it, but I don't object to add a structured field for it.

@edoakes
Copy link
Contributor

edoakes commented Sep 30, 2025

For labels, I guess the format should be labels=$KEY1:$VAL1,$KEY2:$VAL2,.... It is not hard to handle it, but I don't object to add a structured field for it.

The parsing might get a little messy because labels can have non-alphanumeric characters -, /, etc. Structured field is also probably just a little cleaner in the YAML.

@ryanaoleary
Copy link
Collaborator Author

ryanaoleary commented Sep 30, 2025

I'm now remembering after reading the KubeRay CR spec that the parsing logic is actually what we do for resources... my suggestion would be to:

Yeah we already have some logic to set the labels in the Ray autoscaling config based on the --labels argument in the rayStartParams here, but yeah like you said it's not very explicit to users or structured.

For the two separate issues:

  1. It should be explicit to users what node types will be scaled by the autoscaler given a resource request, one way to ensure this is to require users to explicitly set labels for each worker group . I think lifting both resources and labels into explicit structured fields sounds good as well for that purpose.

  2. However, for the set of default ray.io/ labels we've discussed (region, zone, market-type, node group), it's possible to automatically infer these labels deterministically from the KubeRay CR, since users are required to configure fields like nodeSelectors or podAffinity (or workerGroup.Name for the node-group) if they want to guarantee that their Pod schedules on a certain type of node. Even though they haven't explicity set a labels field, it's very clear from the Pod spec what the expected Ray node labels would be. Now that we're introducing label based scheduling, users writing application logic will have to go back and edit all of their CR specs to explicitly set labels based on the values they've previously configured for fields like nodeSelectors. A fairly simple ease-of-use change we could make here is the logic to set the environment variables that are already checked by Ray core here, which will populate the default labels without additional manual user intervention. We can put this logic behind a feature gate so that users not using label_selectors will not unexpectedly populate env vars and Ray node labels. I'm fine deferring this change though if we want to see user feedback first.

Neither of the above fully address label-based autoscaling though, since we're still requiring users to configure their KubeRay CR specs in order to determine what label_selectors are possible to schedule (i.e. for a given application they will need to edit the worker groups with labels and/or nodeSelectors). This doc goes over potential changes that could fully de-couple users writing application code and the infrastructure/KubeRay CR code but that proposal would require a much larger refactor.

@ryanaoleary
Copy link
Collaborator Author

ryanaoleary commented Sep 30, 2025

cc @edoakes @jjyao please revisit the APIs so that we can avoid adding too much complexity in Ray Autoscaler.

This is the core reason why we need to make some form of change in the KubeRay CR and can't do this entirely transparently based on rayStartParams. The proposal above to auto-populate the builtin set of ray.io/ labels from the pod spec labels is the most basic form of it that doesn't require adding a new API/field to the CR. The autoscaler will just scan the pod spec for those whitelisted labels.

This won't solve the problem for custom/user-specified labels though. For that, we'd need to introduce a more explicit API such as a labels field in the head/worker group spec itself. The idea is to start with the minimal thing listed above without requiring a CR change, and taking it from there.

The amount of complexity being added for any of the options discussed in this thread is very minor, and the feature will be alpha in both Ray and KubeRay before stabilizing, so there is no "one way door" and I see no harm in starting with the proposed option. If we want to jump directly to a top-level labels field (or comparable alternative), that's fine with me too. It has the benefit of being very explicit.

It seems like the support we should implement is as follows:

  1. Add some logic to scan Ray CR for ray.io/ Pod labels and set env vars like Andrew suggested in this comment. These vars are already checked for in Ray core here and this change wouldn't require a CR change. We could promise not to increase this set of default labels, since we'd support them instead through step 3.
  2. Add top level, explicit resources and labels args to workerGroupSpecs to avoid parsing rayStartParams in Ray Core.
  3. Re-visit adding logic to automatically set whitelisted labels in the new labels field of a worker group based what we can infer from the Pod Spec.

@kevin85421
Copy link
Member

The parsing might get a little messy because labels can have non-alphanumeric characters -, /, etc. Structured field is also probably just a little cleaner in the YAML.

We can consistently use a comma , as the separator for labels. I guess kubectl uses a similar parsing logic (#3699 (comment)) in kubectl label selectors. The syntax and char set of K8s labels are here https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#syntax-and-character-set.

It's not necessarily to use CRD to verify in OpenAPI level. KubeRay also has its own validation logic and nothing will be created if the validation fails. The validation mechanism works quite well.

Anyway, I don't object to add a structured field for labels.

@kevin85421
Copy link
Member

kevin85421 commented Sep 30, 2025

it's possible to automatically infer these labels deterministically from the KubeRay CR, since users are required to configure fields like nodeSelectors or podAffinity (or workerGroup.Name for the node-group) if they want to guarantee that their Pod schedules on a certain type of node.

I believe we are no longer considering automatic inference of labels. It is the user's responsibility to set the labels ray.io/zone, ray.io/region, and ray.io/market-type (#3699 (comment)). Am I misunderstanding something? Additionally, I don't believe we can infer these labels accurately.

Add some logic to scan Ray CR for ray.io/ Pod labels and set env vars like Andrew suggested in #3699 (comment). These vars are already checked for in Ray core here and this change wouldn't require a CR change. We could promise not to increase this set of default labels, since we'd support them instead through step 3.

I proposed the environment variables approach to @MengjinYan several months ago. At that time, we overlooked the Ray Autoscaler and decided to use the Kubernetes Downward API to set environment variables with label values injected by cloud providers at runtime.

However, we no longer rely on dynamic label values. We should avoid reading env vars and make ray start --labels to be the source of truth. Although I believe using env vars is not ideal in the current context, I do not have a strong opinion, as I have not been responsible for it. cc @edoakes

We could promise not to increase this set of default labels, since we'd support them instead through step 3.

If we can ensure this, the contract between KubeRay and the Ray Autoscaler will remain stable. I am OK with #3699 (comment), but it is still better to avoid reading env vars because it is tech debt that we can still avoid at this moment.

@andrewsykim
Copy link
Member

andrewsykim commented Sep 30, 2025

I'm good with lifting resources and labels as a field, I initially assumed we wanted to avoid API changes until user feedback :) In this case, I suggest doing the opposite for pod labels where we set pod labels based on labels field. This keeps labels as source of truth for autoscaler but still provides better visibility of labels at the Pod level. Doesn't have to be in this PR.

@ryanaoleary we're cutting the release branch for v1.5 in a week, do we want to land this for v1.5 given the new scope?

@kevin85421
Copy link
Member

This keeps labels as source of truth for autoscaler but still provides better visibility of labels at the Pod level.

SG

we're cutting the release branch for v1.5 in a week, do we want to land this for v1.5 given the new scope?

It will be helpful if this can be landed in v1.5 so that we can mention it in Ray Summit.

@ryanaoleary
Copy link
Collaborator Author

labels

Yeah I think I can still try to land this for v1.5, I'll try to put out the change quickly and then maybe we can prioritize reviews before the branch cut. If it's not able to be merged in time, we can still use rayStartParams in the meantime even though it's not ideal.

Just so I'm clear on the required change, we'd be adding structured fields to workerGroupSpecs for resources and labels at the top level. I.e. like this:

workerGroupSpecs:
- groupName: worker-group-1
  replicas: 1
  resources:
    GPU: 8
    CPU: 16
  labels:
    ray.io/zone: us-west-2a
    ray.io/region: us-west-2

And then in either the same PR or a follow-up, Labels specified under labels will be added as k8s labels to the Pod Spec for observability.

@andrewsykim
Copy link
Member

That sounds right. To preserve compatibility, you may need to check if either resources or labels are already set in rayStartParams.

@Future-Outlier
Copy link
Member

Future-Outlier commented Oct 1, 2025

labels

Yeah I think I can still try to land this for v1.5, I'll try to put out the change quickly and then maybe we can prioritize reviews before the branch cut. If it's not able to be merged in time, we can still use rayStartParams in the meantime even though it's not ideal.

Just so I'm clear on the required change, we'd be adding structured fields to workerGroupSpecs for resources and labels at the top level. I.e. like this:

workerGroupSpecs:
- groupName: worker-group-1
  replicas: 1
  resources:
    GPU: 8
    CPU: 16
  labels:
    ray.io/zone: us-west-2a
    ray.io/region: us-west-2

And then in either the same PR or a follow-up, Labels specified under labels will be added as k8s labels to the Pod Spec for observability.

should we also add labels and resources to HeadGroupSpec? or just WorkerGroupSpec?

cc @ryanaoleary @andrewsykim @kevin85421

@ryanaoleary
Copy link
Collaborator Author

labels

Yeah I think I can still try to land this for v1.5, I'll try to put out the change quickly and then maybe we can prioritize reviews before the branch cut. If it's not able to be merged in time, we can still use rayStartParams in the meantime even though it's not ideal.
Just so I'm clear on the required change, we'd be adding structured fields to workerGroupSpecs for resources and labels at the top level. I.e. like this:

workerGroupSpecs:
- groupName: worker-group-1
  replicas: 1
  resources:
    GPU: 8
    CPU: 16
  labels:
    ray.io/zone: us-west-2a
    ray.io/region: us-west-2

And then in either the same PR or a follow-up, Labels specified under labels will be added as k8s labels to the Pod Spec for observability.

should we also add labels and resources to HeadGroupSpec? or just WorkerGroupSpec?

cc @ryanaoleary @andrewsykim @kevin85421

I think to both HeadGroupSpec and WorkerGroupSpec since both can specify resources / labels.

@Future-Outlier
Copy link
Member

labels

Yeah I think I can still try to land this for v1.5, I'll try to put out the change quickly and then maybe we can prioritize reviews before the branch cut. If it's not able to be merged in time, we can still use rayStartParams in the meantime even though it's not ideal.
Just so I'm clear on the required change, we'd be adding structured fields to workerGroupSpecs for resources and labels at the top level. I.e. like this:

workerGroupSpecs:
- groupName: worker-group-1
  replicas: 1
  resources:
    GPU: 8
    CPU: 16
  labels:
    ray.io/zone: us-west-2a
    ray.io/region: us-west-2

And then in either the same PR or a follow-up, Labels specified under labels will be added as k8s labels to the Pod Spec for observability.

should we also add labels and resources to HeadGroupSpec? or just WorkerGroupSpec?
cc @ryanaoleary @andrewsykim @kevin85421

I think to both HeadGroupSpec and WorkerGroupSpec since both can specify resources / labels.

I think we can start the implementation

@ryanaoleary
Copy link
Collaborator Author

ray-project/ray#51564

Implemented in #4106, I'll close this PR now since we're no longer inferring --labels based on the K8s Pod spec.

@ryanaoleary ryanaoleary closed this Oct 2, 2025
@github-project-automation github-project-automation bot moved this from In Progress to Done in K8s and Ray (go/k8s-ray-oss) Oct 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

7 participants