Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(deps): add kubeflow-training to workbench images #826

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

andyatmiami
Copy link
Contributor

Description

This commit adds kubeflow-training[huggingface] to the following workbench images:

  • ./jupyter/datascience/ubi9-python-3.11
  • ./jupyter/pytorch/ubi9-python-3.11
  • ./jupyter/rocm/pytorch/ubi9-python-3.11
  • ./jupyter/trustyai/ubi9-python-3.11
  • ./codeserver/ubi9-python-3.11

This outcome comes with a slew of caveats and disclaimers:

  • Due to a dependency conflict, codeflare-sdk==0.24.3 was also added to the following workbench images.
    • ./jupyter/datascience/ubi9-python-3.11
    • ./jupyter/pytorch/ubi9-python-3.11
    • ./jupyter/rocm/pytorch/ubi9-python-3.11
    • ./jupyter/trustyai/ubi9-python-3.11
  • ⚠️ In what may be a "controversial" decision, codeflare-sdk was NOT updated on other workbench images. Since 0.24.3 was a "one-off" release to unblock the kubeflow-training inclusion - the thought process here is that normal "sync" procedures on the next official release will standardize the codeflare-sdk dependency across all workbench images. This allows us to restrict the testing effort of this commit.
  • jupyter/minmal/ubi9-python-3.11 was deliberately excluded from receiving kubeflow-training per discussions with team.
  • Due to dependency conflicts discovered tensorflow-based workbench images,kubeflow-training has not been added to those workbench images at this time. This decision was agreed to by affect stakeholders. Core blocking issue can be seen here:
  • Due to a dependency conflict, transformers = "==4.38.0" was also added to the./jupyter/trustyai/ubi9-python-3.11 workbench image after discussion with the developer that last worked on the trustyai image. While it certainly must be tested, there was no strict requirement that necessitated pinning the transformers dependency to 4.36.2 - and the huggingface extras now introduces a 4.38.0 constraint for transformers.

Related-to: https://issues.redhat.com/browse/RHOAIENG-12822

How Has This Been Tested?

⚠️ TODO

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Copy link
Contributor

openshift-ci bot commented Dec 20, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

openshift-ci bot commented Dec 20, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from andyatmiami. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@andyatmiami andyatmiami force-pushed the chore/add-kfto-python-sdk branch 3 times, most recently from e09fa47 to 429e683 Compare December 20, 2024 16:35
This commit adds `kubeflow-training[huggingface]` to the following workbench images:
- `./jupyter/datascience/ubi9-python-3.11`
- `./jupyter/pytorch/ubi9-python-3.11`
- `./jupyter/rocm/pytorch/ubi9-python-3.11`
- `./jupyter/trustyai/ubi9-python-3.11`
- `./codeserver/ubi9-python-3.11`

This outcome comes with a slew of caveats and disclaimers:
- Due to a dependency conflict, `codeflare-sdk==0.24.3` was **also** added to the following workbench images.
    - `./jupyter/datascience/ubi9-python-3.11`
    - `./jupyter/pytorch/ubi9-python-3.11`
    - `./jupyter/rocm/pytorch/ubi9-python-3.11`
    - `./jupyter/trustyai/ubi9-python-3.11`
- ⚠️ In what may be a "controversial" decision, `codeflare-sdk` was **NOT** updated on other workbench images.  Since `0.24.3` was a "one-off" release to unblock the `kubeflow-training` inclusion - the thought process here is that normal "sync" procedures on the next official release will standardize the `codeflare-sdk` dependency across all workbench images.  This allows us to restrict the testing effort of this commit.
- `jupyter/minmal/ubi9-python-3.11` was deliberately excluded from receiving `kubeflow-training` per discussions with team.
- Due to dependency conflicts discovered `tensorflow`-based workbench images,`kubeflow-training` has not been added to those workbench images at this time.  This decision was agreed to by affect stakeholders.  Core blocking issue can be seen here:
    - onnx/tensorflow-onnx#2328
- Due to a dependency conflict, `transformers = "==4.38.0"` was **also** added to the`./jupyter/trustyai/ubi9-python-3.11` workbench image after discussion with the developer that last worked on the `trustyai` image.  While it certainly must be tested, there was no strict requirement that necessitated pinning the `transformers` dependency to `4.36.2` - and the `huggingface` `extras` now introduces a `4.38.0` constraint for `transformers`.

Related-to: https://issues.redhat.com/browse/RHOAIENG-12822
@andyatmiami andyatmiami force-pushed the chore/add-kfto-python-sdk branch from 429e683 to b2d3631 Compare December 20, 2024 17:00
],
"markers": "python_version >= '3.8'",
"version": "==2.1.0"
"version": "==1.2.0"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not immediately clear to me why this dependency got downgraded 🤔 I'm doing the pipenv install in a "controlled environment" (read: podman container running base/ubi9-python-3.11 )

Was initially updated here:
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andyatmiami
Copy link
Contributor Author

/test all

@andyatmiami
Copy link
Contributor Author

/test notebook-jupyter-pytorch-ubi9-python-3-11-pr-image-mirror

@andyatmiami
Copy link
Contributor Author

/test notebook-rocm-jupyter-pyt-ubi9-python-3-11-pr-image-mirror

@andyatmiami
Copy link
Contributor Author

/test notebook-cuda-jupyter-ds-ubi9-python-3-11-pr-image-mirror

@andyatmiami
Copy link
Contributor Author

/test notebooks-ubi9-e2e-tests

Copy link
Contributor

openshift-ci bot commented Dec 23, 2024

@andyatmiami: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/notebook-rocm-jupyter-tf-ubi9-python-3-11-pr-image-mirror b2d3631 link true /test notebook-rocm-jupyter-tf-ubi9-python-3-11-pr-image-mirror
ci/prow/rocm-notebooks-e2e-tests b2d3631 link true /test rocm-notebooks-e2e-tests
ci/prow/codeserver-notebook-e2e-tests b2d3631 link true /test codeserver-notebook-e2e-tests
ci/prow/images b2d3631 link true /test images
ci/prow/notebook-cuda-jupyter-tf-ubi9-python-3-11-pr-image-mirror b2d3631 link true /test notebook-cuda-jupyter-tf-ubi9-python-3-11-pr-image-mirror
ci/prow/notebooks-ubi9-e2e-tests b2d3631 link true /test notebooks-ubi9-e2e-tests

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@andyatmiami
Copy link
Contributor Author

/test codeserver-notebook-e2e-tests

@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants