Skip to content

Conversation

ryanaoleary
Copy link
Collaborator

@ryanaoleary ryanaoleary commented Oct 2, 2025

Why are these changes needed?

This PR lifts both resources and labels into explicit structured fields for the HeadGroupSpec and WorkerGroupSpec. When these optional fields are specified, they override their respective values in the rayStartCommand for Pods created by KubeRay for that group. Additionally, labels specified at the top-level Labels field are merged with the K8s labels on the Pod for observability.

The discussion and rationale for this change is discussed more in #3699. The labels part of this change will help enable the autoscaling use case with label_selectors in Ray core.

Related issue number

Contributes to ray-project/ray#51564

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@ryanaoleary
Copy link
Collaborator Author

@ryanaoleary
Copy link
Collaborator Author

Running make api-docs fails to add the Resources field to the api.md file. I think this is because I use:

Resources corev1.ResourceList

which isn't a top-level API object. I'm wondering if there's a preference for changing it to a map[string]string and then converting it to a ResourceList internally.

// +optional
Resources corev1.ResourceList `json:"resources,omitempty"`
// Labels specifies the Ray node labels for the head group.
// These labels will also be added to the Pods of this head group and override the `--labels`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should mention that labels are ignored if already specifeid in rayStartParams?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I implemented it, we ignore --labels in rayStartParams if they exist and instead override it with the values set in the group Labels field. Should it actually be the opposite?

My thinking was that since the top-level Labels and Resources fields are the most explicit, they should take precedence.

Signed-off-by: Ryan O'Leary <[email protected]>
Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try to avoid handling the logic of overriding or merging user configurations. It’s hard to ensure correct behavior and makes Ray Autoscaler more complex. My suggestions:

Add validations in ValidateRayClusterSpec:

  • Resources
    • If users specify both (1) num-cpus / num-gpus / memory / resources in rayStartParams and (2) {Head|Worker}GroupSpec.Resources, we should fail validation and avoid reconciling anything. Users should only use (1) or (2).
  • Labels
    • If users specify labels in rayStartParams, we should fail the validation because we plan not to handle the string parsing in Ray Autoscaler as @edoakes said. Only {Head|Worker}GroupSpec.Labels is allowed.

cc @Future-Outlier @rueian Could one of you open an issue to track updating the compatible Ray versions (because of Ray Autoscaler)? And @rueian, could you work on adding support in Ray Autoscaler for Resources / Labels?

sort.Strings(keys)

for _, k := range keys {
labels = append(labels, fmt.Sprintf("%s=%s", k, groupLabels[k]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to validate that there is no , in the k and groupLabels[k]?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opened here, thank you!
#4113

Signed-off-by: Ryan O'Leary <[email protected]>
@ryanaoleary
Copy link
Collaborator Author

rsions (because of Ray Autoscaler)? And @rueian, could you work on adding support in Ray Autoscaler for Resources / Labels?

Added the validation logic in f7f85dd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants