-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Add wanda ray image builds for Docker Hub #59936
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: andrew/revup/master/ray-cpp-wheel
Are you sure you want to change the base?
Add wanda ray image builds for Docker Hub #59936
Conversation
|
02a8921 to
fe68786
Compare
f104bf5 to
288cadb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request successfully migrates the Ray Docker image builds to use wanda, which simplifies the CI configuration. The new wanda.yaml files and the shared Dockerfile are well-structured. However, I've identified a few areas for improvement, primarily in the new push_ray_image.py script. There's a critical issue where the script only supports pushing the main ray images, which is a regression from the previous system that also handled ray-extra and ray-llm variants. Additionally, the script duplicates significant portions of the image tagging logic from another file, which will create maintenance challenges. I've also included a suggestion to reduce duplication in the .buildkite/build.rayci.yml file using YAML anchors.
| def _generate_image_tags( | ||
| commit: str, | ||
| python_version: str, | ||
| platform: str, | ||
| architecture: str = "x86_64", | ||
| ) -> List[str]: | ||
| """ | ||
| Generate destination tags matching the original ray docker image format. | ||
| For nightly builds (master + nightly schedule): | ||
| nightly.YYMMDD.{sha[:6]}-py310-cpu | ||
| For release branches: | ||
| {release_name}.{sha[:6]}-py310-cpu | ||
| For other branches (PRs, non-nightly master): | ||
| {sha[:6]}-py310-cpu | ||
| For GPU_PLATFORM, also generates -gpu alias tags to match the original behavior. | ||
| """ | ||
| branch = os.environ.get("BUILDKITE_BRANCH", "") | ||
| sha_tag = commit[:6] | ||
| formatted_date = datetime.now(tz.utc).strftime("%y%m%d") | ||
| schedule = os.environ.get("RAYCI_SCHEDULE", "") | ||
|
|
||
| # Determine version prefix (matches docker_container.py _get_image_version_tags) | ||
| if branch == "master" and schedule == "nightly": | ||
| version_prefix = f"nightly.{formatted_date}.{sha_tag}" | ||
| elif branch.startswith("releases/"): | ||
| release_name = branch[len("releases/") :] | ||
| version_prefix = f"{release_name}.{sha_tag}" | ||
| else: | ||
| version_prefix = sha_tag | ||
|
|
||
| py_tag = _format_python_version_tag(python_version) | ||
| platform_tag = _format_platform_tag(platform) | ||
| arch_tag = _format_architecture_tag(architecture) | ||
|
|
||
| # For GPU_PLATFORM, also create -gpu alias (matches docker_container.py) | ||
| platform_tags = [platform_tag] | ||
| if platform == GPU_PLATFORM: | ||
| platform_tags.append("-gpu") | ||
|
|
||
| tags = [] | ||
| for ptag in platform_tags: | ||
| tags.append(f"{version_prefix}{py_tag}{ptag}{arch_tag}") | ||
|
|
||
| return tags |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for generating image tags in _generate_image_tags and its helper functions (_format_python_version_tag, _format_platform_tag) duplicates logic that already exists in ci/ray_ci/docker_container.py.
This duplication makes the codebase harder to maintain. If the tagging scheme ever changes, it will need to be updated in two separate places, which is error-prone.
I recommend refactoring the tag generation logic from ci/ray_ci/docker_container.py into a shared utility module that both this script and docker_container.py can import. This will create a single source of truth for image tagging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since ci/ray_ci/docker_container.py is being deprecated by this flow, I just ported the logic over here instead of refactoring to a single shared library. Attempting to untangle them was a bit messy, but I can do so if it would make sense for consistency
| matrix: | ||
| setup: | ||
| python: | ||
| - "3.10" | ||
| - "3.11" | ||
| - "3.12" | ||
| cuda: | ||
| - "11.7.1-cudnn8" | ||
| - "11.8.0-cudnn8" | ||
| - "12.1.1-cudnn8" | ||
| - "12.3.2-cudnn9" | ||
| - "12.4.1-cudnn" | ||
| - "12.5.1-cudnn" | ||
| - "12.6.3-cudnn" | ||
| - "12.8.1-cudnn" | ||
| - "12.9.1-cudnn" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This matrix definition is repeated for ray_images_cuda_push (lines 246-261) and ray-extra-image-cuda-build (lines 290-305). To improve maintainability and reduce duplication, you can use YAML anchors to define these lists once and reuse them.
You can define the anchors here, and then use *python_versions and *cuda_versions in the other matrix definitions.
For example, in the ray_images_cuda_push step, you would change the matrix to:
matrix:
setup:
python: *python_versions
cuda: *cuda_versions matrix:
setup:
python: &python_versions
- "3.10"
- "3.11"
- "3.12"
cuda: &cuda_versions
- "11.7.1-cudnn8"
- "11.8.0-cudnn8"
- "12.1.1-cudnn8"
- "12.3.2-cudnn9"
- "12.4.1-cudnn"
- "12.5.1-cudnn"
- "12.6.3-cudnn"
- "12.8.1-cudnn"
- "12.9.1-cudnn"|
|
||
| # GPU_PLATFORM is the default GPU platform that gets aliased as "gpu" | ||
| # This must match the definition in ci/ray_ci/docker_container.py | ||
| GPU_PLATFORM = "cu12.1.1-cudnn8" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GPU_PLATFORM constant is hardcoded here and is also defined in ci/ray_ci/docker_container.py. Duplicating constants can lead to inconsistencies if one is updated and the other is not.
This constant should be defined in a single, shared location and imported where needed. This could be done as part of the larger refactoring of the tag generation logic I mentioned in another comment.
fe68786 to
46cc862
Compare
288cadb to
164e2e4
Compare
46cc862 to
70df756
Compare
164e2e4 to
b4fb3e0
Compare
70df756 to
ae32d45
Compare
b4fb3e0 to
cd79253
Compare
ae32d45 to
04cfd26
Compare
cd79253 to
7d257e4
Compare
04cfd26 to
55d1473
Compare
7d257e4 to
e93a821
Compare
55d1473 to
84d7479
Compare
84d7479 to
c275fe2
Compare
8a4bdf5 to
a0b03e9
Compare
c275fe2 to
7c78fa2
Compare
a0b03e9 to
32347fa
Compare
7c78fa2 to
29cf5ab
Compare
29cf5ab to
d566c8a
Compare
8d1057a to
d8901f7
Compare
d566c8a to
9190029
Compare
9190029 to
73822f5
Compare
9e43825 to
6801f72
Compare
73822f5 to
94805e9
Compare
This adds wanda-based builds for ray Docker Hub images. Changes: - Add ray-image-cpu/cuda.wanda.yaml for ray images - Add ray-extra-image-cpu/cuda.wanda.yaml for extra dependencies - Add ray-llm-image-cuda.wanda.yaml and ray-llm-extra variant - Add push_ray_image.py for pushing to Docker Hub via crane - Update build.rayci.yml with wanda image build and push steps Topic: ray-image Relative: ray-cpp-wheel Labels: draft Signed-off-by: andrew <[email protected]>
6801f72 to
9834ea3
Compare
94805e9 to
fc79d67
Compare
9834ea3 to
25e0d03
Compare
fc79d67 to
db94f93
Compare
25e0d03 to
2a36172
Compare
db94f93 to
05c0670
Compare
2a36172 to
b81d4cb
Compare
05c0670 to
a671850
Compare
This adds wanda-based builds for ray Docker Hub images.
Changes:
Topic: ray-image
Relative: ray-wheel
Labels: draft