Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] [fine_tuning]: Reorder Ray tasks #659

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

albertoperdomo2
Copy link
Collaborator

As per an internal discussion, we want to: run -> capture -> cleanup rather than cleanup -> run -> capture. Also, we want to capture KubeRay logs.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 28, 2025
Copy link

openshift-ci bot commented Jan 28, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from albertoperdomo2. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@albertoperdomo2 albertoperdomo2 force-pushed the feature/reorder-ray-tasks branch from 67e6487 to 3f86491 Compare January 29, 2025 07:59
@albertoperdomo2 albertoperdomo2 force-pushed the feature/reorder-ray-tasks branch from 3f86491 to 64df5bf Compare February 3, 2025 11:04
@albertoperdomo2
Copy link
Collaborator Author

/test jump-ci rhoai-4xh100 fine_tuning ray_bench__iperf

Copy link

topsail-bot bot commented Feb 3, 2025

🔴 Test of 'fine_tuning test test_ci' failed after 00 hours 04 minutes 12 seconds. 🔴

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: jump-ci
PR_POSITIONAL_ARG_1: ray_bench__iperf

Failure indicator:

/logs/artifacts/003__ray__ray-benchmark/000__fine_tuning__ray_fine_tuning_job/FAILURE | [000__fine_tuning__ray_fine_tuning_job] ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra={'name': 'ray', 'pod_count': 2, 'gpu': 0, 'node_selector_key': 'nvidia.com/gpu.present', 'node_selector_value': 'true', 'hyper_parameters': {'flavor': 'iperf'}} --> 2
/logs/artifacts/003__ray__ray-benchmark/FAILURE | CalledProcessError: Command 'set -o errexit;set -o pipefail;set -o nounset;set -o errtrace;ARTIFACT_DIR="/logs/artifacts/003__ray__ray-benchmark" ./run_toolbox.py from_config fine_tuning ray_fine_tuning_job --extra="{'name': 'ray', 'pod_count': 2, 'gpu': 0, 'node_selector_key': 'nvidia.com/gpu.present', 'node_selector_value': 'true', 'hyper_parameters': {'flavor': 'iperf'}}"' returned non-zero exit status 2.
Traceback (most recent call last):
  File "/opt/topsail/src/projects/fine_tuning/testing/test_finetuning.py", line 154, in _run_test
    run.run_toolbox_from_config(
  File "/opt/topsail/src/projects/core/library/run.py", line 65, in run_toolbox_from_config
    return run(f'{cmd_env} ./run_toolbox.py from_config {group} {command} {_dict_to_run_toolbox_args(kwargs)}', **run_kwargs)
  File "/opt/topsail/src/projects/core/library/run.py", line 121, in run
    proc = subprocess.run(command, **args)
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run

[...]

Copy link

openshift-ci bot commented Feb 3, 2025

@albertoperdomo2: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/jump-ci 64df5bf link true /test jump-ci

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants