Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce CI usage #83

Merged
merged 3 commits into from
Jul 25, 2023
Merged

Reduce CI usage #83

merged 3 commits into from
Jul 25, 2023

Conversation

kurtmckee
Copy link
Contributor

This PR reduces the total usage in CI (that is, the sum of time across all jobs that run).

Currently, CI usage is ~30 minutes.
Without a cache hit, that number is dropped to ~15 minutes.
With a cache hit, that number drops to ~10 minutes.

The CI duration (the amount of time for the entire workflow to run while you're staring at a wall clock) stays roughly the same (~5 minutes).

This is achieved by reducing the number of test runners to just one of each OS testing all Python versions simultaneously. (The setup-python action now supports setting up multiple Python versions!) This reduces the duplication of activities that are unrelated to the actual testing. For example, checking out the repo and installing pandoc.

In addition, time is saved by building a wheel once, rather than building a .tar.gz that each Python version-specific pip invocation quietly converts to a wheel in the background.

Finally, a virtual environment is created and cached, and the tox environments are cached as well. This reduces the setup time required to actually launch the tests.

Note: The cache of the tox environment may get rebuilt by tox when a Python version has a hotfix bump, like 3.10.x to 3.10.y. A possible improvement to this would be to write the Python versions to a file and include that in the call to hashFiles().

@nedbat
Copy link
Owner

nedbat commented Feb 6, 2023

Thanks, I wish I understood the tradeoffs here better. For scriv it won't matter much, because there aren't version-specific concerns, but it's easier for me to use similar patterns across projects. I definitely would like to know more about how to speed up CI runs in general. I'll have to study what you did here to understand.

@kurtmckee
Copy link
Contributor Author

Sure thing! My hope is that the individual commits help isolate the changes so they're easier to interpret and reproduce. My goal was simply "reduce total CI usage, and try to keep the workflow duration roughly equivalent if possible".

If you'd like to hop on a conference call to discuss the approach and techniques here, just shoot me an email and we can coordinate a good time. 👍

Copy link
Contributor Author

@kurtmckee kurtmckee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ned, I added commentary to many of the changes here. I hope this is helpful. Let me know if you have questions or suggestions!

Comment on lines 34 to 40
python-version:
- "3.7"
- "3.8"
- "3.9"
- "3.10"
- "3.11"
- "pypy-3.7"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section of the matrix is removed because the setup-python action can now install all of these versions simultaneously, and Tox can take detect and run all of these in one go.

This reduces the matrix down from a cross-product of operating systems and Python versions (18x runs) to merely a list of three operating systems (3x runs).

The advantage here is that slow, duplicate operations that must happen just to get the operating system environment to a functional state now happen once per operating system (3x) instead of once per OS per Python version (18x).

For example, looking at a recent run of the tests against the main branch, these are steps that had to happen multiple times just to get to a usable state where Tox could run:

  • Checkout the repo (13s)
  • Set up Python (15s)
  • Install dependencies (18s)
  • Install pandoc (31s)

Moving these six Python versions from the matrix to the setup-python action eliminates this time multiplier.

Comment on lines +52 to +59
- name: "Restore cache"
id: "restore-cache"
uses: "actions/cache@v3"
with:
path: |
.tox/
.venv/
key: "cache-python-${{ steps.setup-python.outputs.python-version }}-os-${{ runner.os }}-hash-${{ hashFiles('tox.ini', 'requirements/*.txt') }}"
Copy link
Contributor Author

@kurtmckee kurtmckee Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By caching the .tox/ and .venv/ directories, the virtual environment and the Tox environments can be restored as filesystem operations, rather than as an installation of Python packages.

The cache key includes a hash of tox.ini and all of the requirements/*.txt files changes. This ensures that a cache miss occurs if any of those files changes.

Note that it may be helpful to modify the cache key in the future. In particular, if the last-listed Python version above ("3.11" currently) doesn't change versions but any of the other versions updates (say, from "3.10.x" to "3.10.y"), then:

  1. This would result in a cache hit.
  2. However, Tox would detect that the 3.10 version had changed and would rebuild the py310 test environment.
  3. The cache would not be updated with this new environment, resulting in an increase in test times.

This situation can be improved by adding a step that runs each Python version and writes its exact version to a file. Then, this file could be included in the list of files passed to hashFiles().

I haven't learned how to do this yet, so this improvement isn't included. The only thing that will happen in this situation is that Tox has to discard the Python 3.10.x test environment and rebuild it with 3.10.y, so test times will increase until the cache is invalidated by some other means, like when the requirements/*.txt files are updated.

Comment on lines +61 to +63
- name: "Identify venv path"
shell: "bash"
run: "echo 'venv-path=${{ runner.os == 'Windows' && '.venv/Scripts' || '.venv/bin' }}' >> $GITHUB_ENV"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step establishes the relative path to the Python executable and writes it to $GITHUB_ENV, which expects INI-like lines that establish an environment variable name and its value. Subsequent steps will have access to a venv-path environment variable.

Although the default shell is specified above to be bash, note that the forward slashes would also work in Windows' default shell, Powershell, so no changes are needed here.

However, I explicitly included the shell: "bash" configuration because the test suite failed for me locally with "path not found" errors while running in Powershell, so I'm anticipating the possibility that using bash as the default shell even when running on Windows is masking a failure. Investigating that was not in-scope for this work, but I tried to anticipate a future change to running Powershell on Windows in CI.

Comment on lines 65 to +70
- name: "Install dependencies"
if: "steps.restore-cache.outputs.cache-hit == false"
run: |
python -m pip install -U setuptools
python -m pip install -r requirements/tox.txt
python -m venv .venv
${{ env.venv-path }}/python -m pip install -U setuptools
${{ env.venv-path }}/python -m pip install -r requirements/tox.txt
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step is expensive, so it only happens when there's a cache miss. A virtual environment is used because it's far easier to cache a virtual environment at a path of our choosing than to try caching items in an OS-specific location.

run: |
python -m tox
${{ env.venv-path }}/python -m tox -m ci-tests
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Tox invocation changes from a bare run of Tox (which was intercepted by tox-gh and transformed to a single run of the default test environment) to use Tox 4's new labels. This allows the tox.ini file to control which environments the ci-tests label is associated with.

tox.ini Outdated
Comment on lines 4 to 5
labels =
ci-tests = py{37,38,39,310,311,py3}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the list of supported Python versions changes in envlist, above, the ci-tests label should also be updated.

This uses Tox's generative environment syntax.

tox.ini Outdated
Comment on lines 8 to 9
package = wheel
wheel_build_env = build_wheel
Copy link
Contributor Author

@kurtmckee kurtmckee Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, Tox builds a .tar.gz file once...but when pip installs that file into the virtual environment Tox creates for testing, pip will first convert the file to a wheel. (This can be reviewed by running Tox with -vvv.)

The package directive tells Tox to create a wheel instead of a .tar.gz file, and the wheel_build_env directive tells Tox that each environment should use a hard-coded build environment name (its default is to use a name that depends on the Python version, iirc).

("build_wheel" could be any static name, it just happens that I chose "build_wheel".)

tox.ini Outdated
Comment on lines 55 to 73

[gh]
# https://pypi.org/project/tox-gh/
python =
3.7 = py37
3.8 = py38
3.9 = py39
3.10 = py310
3.11 = py311
pypy3 = pypy3
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As tox-gh is no longer needed for associating Python versions with Tox environment names, this bit of config is ripped out.

@kurtmckee
Copy link
Contributor Author

Hi Ned! It looks like this PR has gotten into a stuck state. If you're leaning towards rejecting it, it's fine to close.

If you're still interested in learning about what the PR does, let me know; I can be available for a conference call to discuss these changes, and the techniques it's using.

And finally, if you'd like me to break this up into some smaller chunks that are easier to digest, let me know. For example, caching virtual environments will be an easy win.

@nedbat
Copy link
Owner

nedbat commented Jul 21, 2023

@kurtmckee I'm coming back to this. I think I will mostly take what you have here, as a learning experience and to use a different pattern than my other projects. You've taught me about tox labels if nothing else! :)

@nedbat
Copy link
Owner

nedbat commented Jul 21, 2023

I rebased onto main in https://github.com/nedbat/scriv/tree/nedbat/rebase-83. One small change: the tox docs suggested using .pkg for wheel_build_env, so I switched to that. Happy to hear a reason not to though!

@kurtmckee
Copy link
Contributor Author

Nice! I didn't realize there was a recommendation for the wheel_build_env, and have no reason to choose one name over the other. .pkg sounds good to me.

Do you want me to rebase this work on main, or are you at a point where your branch is ready to merge?

@nedbat
Copy link
Owner

nedbat commented Jul 23, 2023

I can merge my branch if you like instead of asking you to redo the rebasing. BTW: thanks for the changelog entries, but in my opinion, these sorts of internal developer-oriented details don't need a mention.

@kurtmckee
Copy link
Contributor Author

kurtmckee commented Jul 23, 2023

I can rebase, and simultaneously excise the changelog fragments and migrate to .pkg. I think I can tackle that in the next day or two. 👍

@kurtmckee
Copy link
Contributor Author

@nedbat I think this addresses the issues you identified. I'll double-check the results of the CI run in the morning.

@nedbat
Copy link
Owner

nedbat commented Jul 25, 2023

Looks good, thanks!

@nedbat nedbat merged commit 6a3a064 into nedbat:main Jul 25, 2023
7 checks passed
@kurtmckee kurtmckee deleted the reduce-ci-usage branch July 25, 2023 10:50
@kurtmckee
Copy link
Contributor Author

I was just sitting down to review, but you beat me to it! 😀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants