Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce CI runs #184

Merged
merged 8 commits into from
Mar 4, 2024
Merged

Reduce CI runs #184

merged 8 commits into from
Mar 4, 2024

Conversation

vyasr
Copy link
Contributor

@vyasr vyasr commented Mar 1, 2024

This PR cuts down on the the number testing jobs in PRs while attempting to retain similar coverage. It also enables fail-fast so that matrix jobs will not proceed to completion if any one of the tests fail.

Contributes to rapidsai/build-planning#5

@vyasr vyasr added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Mar 1, 2024
@vyasr vyasr self-assigned this Mar 1, 2024
@vyasr vyasr mentioned this pull request Mar 1, 2024
3 tasks
Copy link
Contributor Author

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will also apply fail-fast: true to the test jobs once we've finished discussions on the matrix.

.github/workflows/conda-python-tests.yaml Outdated Show resolved Hide resolved
.github/workflows/wheels-test.yaml Outdated Show resolved Hide resolved
@vyasr
Copy link
Contributor Author

vyasr commented Mar 1, 2024

We discussed offline and the new matrix reflects the consensus @ajschmidt8 @bdice and I came to.

@vyasr
Copy link
Contributor Author

vyasr commented Mar 1, 2024

The latest run on the cudf test PR verifies that this workflow is valid: https://github.com/rapidsai/cudf/actions/runs/8116036240?pr=15201

@vyasr vyasr marked this pull request as ready for review March 1, 2024 19:09
@vyasr vyasr requested review from bdice and ajschmidt8 March 1, 2024 19:09
@vyasr
Copy link
Contributor Author

vyasr commented Mar 1, 2024

New run based on setting v100/a100 appropriately: https://github.com/rapidsai/cudf/actions/runs/8116093123?pr=15201

@bdice
Copy link
Contributor

bdice commented Mar 1, 2024

Net changes to PRs:
conda C++ tests go from 4 jobs to 3.
conda Python tests go from 4 jobs to 3.
wheel tests go from 3 jobs to 2.

amd64 jobs go from 8 (3 conda C++, 3 conda Python, 2 wheel) to 5 (2 conda C++, 2 conda Python, 1 wheel)
arm64 jobs go from 3 to ... 3. 😬

Of course, we also have to account for custom jobs like docs and notebooks that also consume amd64 runners. That usually adds about 2 jobs to the amd64 runners, depending on the repository.

For this to improve CI times, we have to be waiting on amd64 (not arm64). That's currently true, from my observations, and I expect that reducing amd64 usage will be sufficient to improve CI queue times.

If you also consider that one of those amd64 jobs uses the "earliest" driver and thus comes from a separate pool of runners, this is an even more significant improvement. 👍

Copy link
Member

@ajschmidt8 ajschmidt8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

found two invalid combinations

.github/workflows/wheels-test.yaml Outdated Show resolved Hide resolved
.github/workflows/wheels-test.yaml Outdated Show resolved Hide resolved
@bdice
Copy link
Contributor

bdice commented Mar 1, 2024

#184 (comment)
https://github.com/rapidsai/cudf/actions/runs/8117888463/job/22191610519?pr=15201

Confirming the earlier hypothesis about imbalanced queues for latest/earliest drivers -- for this job's C++ tests, the amd64 "latest" job is still waiting on a runner after 45 minutes, but the amd64 "earliest" job started immediately.

@vyasr
Copy link
Contributor Author

vyasr commented Mar 2, 2024

Aside from some unrelated failures, the test cudf job looks good now. @ajschmidt8 I think this is good to go when you're happy with it.

@vyasr vyasr requested a review from ajschmidt8 March 2, 2024 01:48
@bdice bdice merged commit 1fb4b53 into branch-24.04 Mar 4, 2024
@ajschmidt8 ajschmidt8 deleted the feat/reduce_runs branch March 4, 2024 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improves an existing functionality non-breaking Introduces a non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants