Make clear every test's status in every CI run #1679

EliahKagan · 2023-09-26T09:11:14Z

This makes changes so tests give more informative output, both locally and on CI but especially on CI. It comprises several interrelated changes that should help lead up to, but do not include, adding CI jobs for Windows. My motivation for doing this was to make it easier to add those jobs, and to get the most out of those jobs once they are added, but it seems to me that the benefit of these changes is actually largely independent of those goals, and can be reviewed independently of them (and thus before they are done). This description is divided into descriptions of the problems I believe the changes solve, and how they solve them.

`pytest-sugar` output didn't display correctly on CI

The pytest-sugar plugin is working very well for local testing, but on CI it has been producing output that is voluminous, yet fairly low in information. This happens because instead of outputting a row of check marks as it does locally (at least when run in most terminals), updating the line creates a new one:

 test/test_actor.py ✓                                              0%
 test/test_actor.py ✓✓                                             0% ▏
 test/test_actor.py ✓✓✓                                            1% ▏
 test/test_actor.py ✓✓✓✓                                           1% ▏

Those are four passing tests in test_actor.py, reported with one line per test, yet those lines give no information about the individual tests they represent, not even those tests' names. This happens because, from pytest-sugar's perspective, it is updating a single line in a terminal.

The cause of the problem is the progress bar animated on the right side. Passing -v or -vv improves things somewhat, in that it shows specific information about the tests (their names and, where applicable, which in this project is almost everywhere, the test class they appear in). However, the progress bar of that style is still drawn, and extra newlines are still often shown, with the resulting excess blank lines making the output hard to read. I was not able to find a way to get pytest-sugar not to show this progress bar, or to show it in a different style. Fortunately, that turned out not to be necessary.

pytest-sugar provides two useful features: its pretty output, and showing each failure immediately. The former is not achieved on CI, for this project, currently. However, the latter is, and may be considered important. But there is another pytest plugin that separately provides just that feature: pytest-instafail. When enabled, pytest-instafail reports each failure with full details immediately. (It does not stop running the tests early or cancel any tests.)

So I added pytest-instafail as a development dependency (in the test extra) but did not set things up for it to be used automatically. And I configured pytest to use pytest-sugar by default, but invoked pytest on CI so that it runs with pytest-sugar turned off and pytest-instafail turned on. This plays well with how GitHub Actions handles output, at any level of verbosity. But I have also passed -vv on CI, to solve...

CI did not show which tests had which statuses

CI is the easiest and also most reliable way to run tests in this project, and probably the most common. Often I found myself wanting to know what tests were actually running, versus being skipped. I often made assumptions about this that were mistaken, and others that were correct but hard to be confident about; see #1657 (comment).

Furthermore, it is useful to be able to notice--even when one is not looking for it, so long as it's not too distracting--what tests are passing, skipped, etc. The failing tests are reported in detail at each failure and listed at the end, but tests with other statuses are not listed with those statuses--or, really, at all--without -v or -vv.

So I have passed -vv so that, for non-failing tests, each test shows a line indicating the file, the class where applicable, the name of the test case (with @ddt-parameterized tests appearing as separate test cases but listed together, and named in terms of their arguments), and the test's status. This distinguishes tests with pass, skip, xfail, and xpass statuses. It also provides reassurance that particular tests really are running, and allows one to search for tests by name while viewing CI output.

The custom `pytest` configuration was rigid and hid failures

Implicit options for pytest are configured in pyproject.toml. It is not immediately obvious, to someone running pytest on the project or inspecting CI output, what these are set to.

One was --force-sugar, which appears to have been necessary to get pytest-sugar to run on CI even in spite of its (in this case correct) guess that it does not have a suitable output device when doing so. This made it very difficult to run pytest without pytest-sugar, as is needed on CI (detailed above) and as may also be useful to do locally in some cases.

So I removed that. The plugin is still loaded automatically, but it can now be turned off with -p no:sugar (as well as automatically if it detects an unsuitable output device, but I am not relying on that in the CI workflows).

Another was --maxfail=10. I have removed this as well. It made pytest stop running tests once the tenth failure occurred, so no more than ten failures would ever be reported. It was possible to notice this by looking carefully at test output, either by noticing the line that warns about it (amongst lots of other output) or by seeing that the total number of tests that are run was lower than the number the runner found, by more than the number of skipped tests.

On CI, it is useful to see the output of all tests. Sometimes it makes sense to weaken that, but stopping in the middle of a test run because of a failure in that run (where one may want to know what else does, and what doesn't, also fail), is not, in my view, the best way to do it. Instead, fail-fast could be enabled on the CI test matrix, or some available combinations of platforms and Python versions could even be omitted (for example, when Windows CI jobs are added, fewer than six versions could be tested, depending on how fast they run).

The bigger issue, though, is that I thought I was getting only ten test failures in a number of circumstances! Fortunately, I never relied on that, and even on my native Windows system where I was unable to get all tests passing, I ran the tests that seemed most germane to the changes I was making. Since it is more involved--on any operating system--to get all tests passing locally with GitPython than with most codebases, showing only ten failures seems like a stumbling block that is best removed.

Finally, removing --maxfail=10 makes it much easier to investigate what fails due to removing @skipIf and related annotations, turning off HIDE_WINDOWS_KNOWN_ERRORS, and the like, because one can see everything that fails.

I have retained --disable-warnings, though that should be revisited in the future, since it could be hiding something of value. Delving into that would have been in keeping with the theme of this PR, but I have not done so, in part to limit the scope of the changes (as turning that off would likely lead to the addition of a number of more fine-grained suppressions).

The other arguments pertain to code coverage and are fine; code coverage reports are often wanted, especially for full test runs on CI, and they can be turned off easily by passing --no-cov to pytest.

It wasn't clear when `is_win` was true

is_win is true only on native Windows systems, but this is not obvious, nor is it in all cases obvious what constitutes a native Windows system. In a couple cases, a test was skipped when is_win and a Cygwin check both hold. Those never happen together--those tests were running on all systems and doing fine (and I removed those @skipIf annotations).

is_win, which is provided by GitPython, is true when os.name == "nt". On Cygwin, os.name == "posix". Cygwin can be distinguished from other Unix-like platforms by checking that sys.platform == "cygwin", which also holds on systems like MSYS2 that are derived from Cygwin. However, it does not hold on MinGW builds of Python, which are a native build that is provided with MSYS 2, and which is correctly detected as native Windows with os.name and thus GitPython's is_win. (The "Git Bash" environment also provides MinGW builds, though it does not include Python.)

Given this situation, I am unsure if it's really a good idea to have is_win in the project, rather than testing the above conditions directly. However, we do have it, and I believe users are supposed to be able to use it directly (so removing it would be a breaking change). I have certainly not endeavored to remove it here, nor to deprecate it. However, I have extended the part of both of the CI test workflows that output version information such as git and python versions to also output the values of all of the above, so they can be readily checked. Together with being able to see the status of each test by name (by either perusing or searching), I think this should avoid the kind of situation I faced in #1636 and #1650 where I didn't know what platforms my tests were running (or supposed to run) on and couldn't tell by examining CI output.

Expected failures were handled by skipping

I have not completely addressed this, but I have made significant headway, and the areas where it remains so are either marked or are already known to be especially tricky cases.

When the actions a test performs should not be attempted at all, the test should be skipped. However, in testing frameworks with limited support for marking tests as expected to fail, skipping is often used to express this as well. This is best avoided when possible, because it requires that both initiative and manual action be taken to check whether tests that were not working before are still not working, as well as to check that how they are failing remains the same, at least in the general sense of what exception is raised.

I replaced most, but not all, @skipIf and similar annotations, in places where they represent expected failure of a test that ought to pass, with @pytest.mark.xfail. Both unittest and pytest provide facilities for expressing expected failure, but unittest.expectedFailure is very limited, not allowing a condition, reason, or expected exception to be passed. In contrast, pytest.mark.xfail supports such arguments (among others), and I have always passed them. The raises argument takes an exception type (or tuple of them) that is expected to be raised, and test failures due to other exceptions will be reported as regular failures.

Because tests marked xfail still run (except in limited cases where one prevents this, which I have not done), examining pytest output (when -v or -vv have been passed) reveals, for each test, whether it really did fail as expected (xfail status), or if it unexpectedly passed (xpass status). It is possible to have the xpass status treated as a failure, but I have not done that.

An xfailing or xpassing test does not produce full test output with a traceback, so we are still only getting that highly verbose inline failure information (due to pytest-sugar locally and pytest-instafail on CI) if we get an unexpected failure, i.e., one that fails the run.

My goal with xfail is that, taken together with other changes in this PR, it should make it unnecessary to take any special action to check whether a test is still unable to pass--or to check whether the system or other condition for its failure, or the way it fails, has remained the same. Instead, one simply takes a look at the xfail output, a key part of the approach articulated there thus being continuously automated. Going along with that, it should be little more obtrusive than the effect of a passing test, so one is not distracted by it. I believe this PR largely achieves both goals, in a way that is worthwhile even though imperfect. But the imperfections are worth noting:

I don't believe ddt has its own xfail feature. @pytest.mark.parametrize is made to work together with @pytest.mark.xfail, so generated tests with some arguments can be marked xfail while others generated from the same function/method are not. But @pytest.mark.parametrize cannot be used effectively on a method in a class that inherits directly or indirectly from unittest.TestCase, even if (as in this project) it is only necessary to support the pytest test runner. Most GitPython test classes derive indirectly from TestCase through a custom TestBase class that provides the import rorepo fixture, and I did not want migrate that as part of the work on this PR. But the effect is that there are some xpassing functions already, because I could only mark the whole group (i.e,, defined function) as xfail. The situation is clearly documented with comments and the reason keyword argument. It affects two groups of cygpath tests.
test_includes_order contains a few assertions, the last of which is expected to fail, which is handled by placing it in a try block, catching the AssertionError, and reraising it as a SkipTest exception. It is possible to do something analogous to signal that an expected failure has occurred. But without splitting up or otherwise reorganizing the test case, this would not carry the benefit of an xpass status if the assertion unexpectedly starts working. It might still be reasonable to do that, but because I don't want to create the appearance of observability where we don't have it, I have not done so at this point. My hope is that either the test can be reorganized or the underlying problem can be fixed. I have clearly commented the situation.
The git module (rather than its unit tests) has three places where unittest.SkipTest is raised from inside it to skip a test (#790). As in test_includes_order, these are situations where, if nothing goes wrong, no indication is given. So for the same reason as there--but also that it would be much harder because I am using xfail from pytest but the git module shouldn't have testing framework dependencies, per #526 and #527--I have not changed that here. Note that, in spite of the connection to the related #525, most occurrences of HIDE_WINDOWS_KNOWN_ERRORS do not straddle git and test in this way, and I have changed then from @skipIf and similar to @pytest.mark.xfail.

I have not applied xfail to HIDE_WINDOWS_FREEZE_ERRORS, but that is because I don't think they are expected failures, so I do not class this as an imperfection. (We don't want those tests to run automatically on native Windows systems, but @pytest.mark.xfail supports run=False for such cases, though that forgoes being able to observe an xpass. But I believe these tests usually work on all systems, but freeze occasionally on native Windows. I also think this should be addressed as its own change rather than here; I suspect there is actually a connection to #1676, as I've noted there, and maybe they can be fixed together.)

Some expected failures were out of date (or originally erroneous)

This is to be expected, and a benefit of the above changes (of xfail and the other changes above it that support seeing and using that output) was to be able to identify these cases and fix the skip and xfail specifications, and most of all to verify that they are fixed.

To be clear, I didn't actually fix any bugs in the code under test. The changes to the test code should be evident in the diff (as, if not, then the test code is not as clear as I intend as a result of the changes), but a summary may help in distinguishing these from nearby changes, so in case it helps, here are the main (kinds of) changes of this kind:

Removing skips/xfails for tests that had been broken but are now fully working.
Fixing a wrong platform (some Cygwin tests were run only on native Windows).
Removing skips for an impossible platform (the is_win and Cygwin case mentioned above).
Identifying more specific failure conditions (in particular, test_commit_msg_hook_success on native Windows works with some bash interpreters and not others, possibly arising later than #1399).
Removing the Cygwin xfail on test_blocking_lock_file, which turned out just to need its time window extended on Cygwin the same as for native Windows.

Very old `git` versions would skip tests needing newer features

I think that was the right choice at the time, but now those versions are even older. I've changed the logic to raise RuntimeError to error out those tests as being unable to run if one attempts to run them with extremely old versions of git that don't support them, rather than skipping.

Intuitively, people don't expect a bug to be caught if it only affects one platform and they test on another. In contrast, skipping these tests is likely to go unnoticed: just as one does not get a passing test run if one doesn't have any git installed, I think it makes sense for the few specific tests that require a non-ancient git to error out without it, now that new enough versions are so widely available.

Some operating systems, such as CentOS 7, that maintain downstream versions of extremely old software for an extremely long time, might have such an old git version that nonetheless has downstream security fixes that make it acceptably safe to use. But I suspect that anyone developing improvements to GitPython on such a system would also have installed a newer git to test with, even if not aware of any specific incompatibilities.

The CI test workflows still appeared more different than they are

For debugging CI (and also for future efforts in extending it, including in adding native Windows jobs), it is useful to be able to readily identify the fundamental similarities and differences between the two CI test workflows. I had made some progress on this before, but I've taken care of some stuff I missed. Most notably, commands provided by Cygwin are now run with relative paths in the Cygwin workflow (and found), and set -x in the pytest step no longer breaks the tests. There is probably more that could reasonably be done in this area.

This makes jobs from both test workflows give more information relevant to examining which tests are skipped (and if any tests xfail, those too) in what environments: - Values of os.name and git.util.is_win. - The name of each test that runs, with its status. The latter doesn't increase the output length as much as might be expected, because due to the way the output is handled, the pytest-sugar pretty output format without -v looked like: test/test_actor.py ✓ 0% test/test_actor.py ✓✓ 0% ▏ test/test_actor.py ✓✓✓ 1% ▏ test/test_actor.py ✓✓✓✓ 1% ▏ When instead it was intended to fit on a single line. Still, the current output with -v has extra newlines, increasing length and worsening readability, so it should be improved on if possible.

Instead of splitting it in into two places where at least one of the places is highly likely to be missed, this puts it together just before the first steps that makes nontrivial use of the installed packages. Grouping it together, it can't really be shown earlier, because one of the pieces of information is obtained using the git module (to examine that behavior of the code). This also presents the information more clearly. "set -x" makes this easy, so the commands are rewritten to take advantage of it.

Don't stop after the first 10.

There are two benefits of the pytest-sugar plugin: 1. Pretty output. 2. Show details on each failure immediately instead of at the end. The first benefit is effectively local-only, because extra newlines are appearing when it runs on CI, both with and without -v. The second benefit applies both locally and on CI. So this adds the pytest-instafail plugin and uses it on CI to get the second benefit. It is not set up to run automatically, and pytest-sugar still is (though no longer forced), so local testing retains no benefit and we don't have a clash. The name "instafail" refers only to instantly *seeing* failures: it does not cause the pytest runner to stop earlier than otherwise.

+ Reorder pytest arguments so both workflows are consistent.

While pytest-sugar output gets mangled with extra newlines on CI, colorized output seems to work fine and improves readability.

This permits the longer delay in test_blocking_lock_file--which was already allowed for native Windows--on Cygwin, where it is also needed. That lets the xfail mark for Cygwin be removed. This also updates the comments to avoid implying that the need for the delay is AppVeyor-specific (it seems needed on CI and locally).

They were not running on Cygwin, because git.util.is_win is False on Cygwin. They were running on native Windows, with a number of them always failing; these failures had sometimes been obscured by the --maxfail=10 that had formerly been used (from pyproject.toml). Many of them (not all the same ones) fail on Cygwin, and it might be valuable for cygpath to work on other platforms, especially native Windows. But I think it still makes sense to limit the tests to Cygwin at this time, because all the uses of cygpath in the project are in code that only runs after a check that the platform is Cygwin. Part of that check, as it is implemented, explicitly excludes native Windows (is_win must be false).

Two of the groups of cygpath tests in test_util.py generate tests that fail on Cygwin. There is no easy way to still run, but xfail, just the specific tests that fail, because the groups of tests are generated with `@ddt` parameterization, but neither the unittest nor pytest xfail mechanisms interact with that. If `@pytest.mark.parametrized` were used, this could be done. But that does not work on methods of test classes that derive from unittest.TestCase, including those in this project that indirectly derive from it by deriving from TestBase. The TestBase base class cannot be removed without overhauling many tests, due to fixtures it provides such as rorepo. So this marks too many tests as xfail, but in doing so allows test runs to pass while still exercising and showing status on all the tests, allowing result changes to be observed easily.

This changes a default Windows skip of test_commit_msg_hook_success to an xfail, and makes it more specific, expecting failure only when either bash.exe is unavailable (definitely expected) or when bash.exe is the WSL bash wrapper in System32, which fails for some reason even though it's not at all clear it ought to. This showcases the failures rather than skipping, and also lets the test pass on Windows systems where bash.exe is something else, including the Git Bash bash.exe that native Windows CI would use.

As it seems to be working now on Cygwin (maybe not native Windows).

This makes the test explicitly error out, rather than skipping, if it appears the environment doesn't support encoding Unicode filenames. Platforms these days should be capable of that, and reporting it as an error lessens the risk of missing a bug in the test code (that method or a fixture) if one is ever introduced. Erroring out will also make it easier to see the details in the chained UnicodeDecodeError exception. This does not affect the behavior of GitPython itself. It only changes how a test reports an unusual condition that keeps the test\ from being usefully run.

Although GitPython does not require git >=2.5.1 in general, and this does *not* change that, this makes the unavailability of git 2.5.1 or later an error in test_linked_worktree_traversal, where it is needed to exercise that test, rather than skipping the test. A few systems, such as CentOS 7, may have downstream patched versions of git that remain safe to use yet are numbered <2.5.1 and do not have the necesary feature to run this test. But by now, users of those systems likely anticipate that other software would rely on the presence of features added in git 2.5.1, which was released over 7 years ago. As such, I think it is more useful to give an error for that test, so the test's inability to be run on the system is clear, than to automatically skip the test, which is likely to go unnoticed.

Rather than skipping, so it becomes known if the situation changes.

It looked like test_untracked_files was sometimes skipped, and specifically that it would be skipped on Cygwin. But the `@skipIf` on it had the condition: HIDE_WINDOWS_KNOWN_ERRORS and Git.is_cygwin() HIDE_WINDOWS_KNOWN_ERRORS can only ever be true if it is set to a truthy value directly (not an intended use as it's a "constant"), or on native Windows systems: no matter how the environment variable related to it is set, it's only checked if is_win, which is set by checking os.name, which is only "nt" on native Windows systems, not Cygwin. So whenever HIDE_WINDOWS_KNOWN_ERRORS is true Git.is_cygwin() will be false. Thus this condition is never true and the test was never being skipped anyway: it was running and passing on Cygwin.

In the tests only, and not in any way affecting the feature set or requirements of GitPython itself. This is similar to, and with the same reasoning as, cf5f1dc.

The current cause of failure is different from what is documented in the skip reason.

And rewrite the reason to give more useful information. (The new reason also doesn't state the exception type, because that is now specified, and checked by pytest, by being passed as "raises".)

This is working on Cygwin, so that old reason no longer applies. (The test was not being skipped on Cygwin, and was passing.) It is not working on native Windows, due to a PermissionError from attempting to move a file that is still open (which Windows doesn't allow). That may have been the original native Windows skip reason, but the old AppVeyor CI link for it is broken or not public. This makes the reason clear, though maybe I should add more details.

… xfail And improve details. The xfail is only for native Windows, not Cygwin (same as the old skip was, and still via checking HIDE_WINDOWS_KNOWN_ERRORS). This change is analogous to the change in c1798f5, but for test_git_submodules_and_add_sm_with_new_commit rather than test_root_module.

This stops skipping them, as they are now working.

I had forgotten to do this earlier when converting from skip to xfail. Besides consistency with the other uses of xfail in the test suite, the benefit of passing "raises" is that pytest checks that the failure gave the expected exception and makes it a non-expected failure if it didn't.

+ Style tweak and comment to clarify the "Limit $PATH" step.

I had put that step in the Cygwin workflow for purposes of experimentation, and it seemed to make clearer what is going on, but really it does the opposite: it's deceptive because Cygwin uses other logic to set its PATH. So this step is unnecessary and ineffective at doing what it appears to do.

This makes the two CI test workflows more similar in a couple of the remaining ways they differ unnecessarily. This could be extended, and otherwise improved upon, in the future.

Byron

That's wonderful work, thanks so much!

I admire how you work your way through the mess of a decade of maintenance without actually knowing what exactly is the best way to do things, and bring GitPython into the present one step at a time.

Thank you!

Given this situation, I am unsure if it's really a good idea to have is_win in the project, rather than testing the above conditions directly.

I second this approach - leave it alone as it's public and might be relied upon, but avoid using it internally in favor of more specific platform checks.

The git module (rather than its unit tests) has three places where unittest.SkipTest is raised from inside it to skip a test (#790).

I am probably the author of this incredible hack and hope that thanks to your fantastic work, git can soon avoid to know about test at all.

This makes three of the four hook-related tests pass instead of failing, and changes the way the fourth fails. GitPython uses bash.exe to run hooks that are (or appear to be) shell scripts. On many Windows systems, this is the bash.exe in the system32 directory, which delegates to bash in a WSL system if at least one such system is installed (for the current user), and gives an error otherwise. It may be a bug that GitPython ends up using this bash.exe when WSL is installed but no WSL systems exist, since that is actually a fairly common situation. One place that happened was on the GitHub Actions runners used for Windows jobs. Those runners have WSL available, and are capable of running WSL 1 systems (not currently WSL 2 systems), but no WSL systems were actually installed. This commit fixes that cause of failure, for all four tests it happened in, by setting up a Debian WSL system on the test runner. (This increases the amount of time it takes Windows jobs to run, but that might be possible to improve on.) Three of those four tests now pass, while the other fails for another reason. The newly passing tests are: - test/test_index.py::TestIndex::test_commit_msg_hook_fail - test/test_index.py::TestIndex::test_pre_commit_hook_fail - test/test_index.py::TestIndex::test_pre_commit_hook_success The test that still fails, but differently, is: - test/test_index.py::TestIndex::test_commit_msg_hook_success I had previously found that test to fail on a local WSL 2 system, and attempted to add a suitable xfail marking in 881456b (gitpython-developers#1679). But the condition I wrote there *appears* to have a bug related to the different orders in which subproces.Popen and shutil.which find executables, causing it not always to detect when the WSL-related bash.exe is the one the Popen call in git.index.fun.run_commit_hook will use.

881456b (gitpython-developers#1679) expanded the use of shutil.which in test_index.py to attempt to mark test_commit_msg_hook_success xfail when bash.exe is a WSL bash wrapper in System32 (because that test currently is not passing when the hook is run via bash in a WSL system, which the WSL bash.exe wraps). But this was not reliable, due to significant differences between shell and non-shell search behavior for executable commands on Windows. As the new docstring notes, lookup due to Popen generally checks System32 before going through directories in PATH, as this is how CreateProcessW behaves. - https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-createprocessw - python/cpython#91558 (comment) The technique I had used in 881456b also had the shortcoming of assuming that a bash.exe in System32 was the WSL wrapper. But such a file may exist on some systems without WSL, and be a bash interpreter unrelated to WSL that may be able to run hooks. In addition, one common situation, which was the case on CI before a step to install a WSL distribution was added, is that WSL is present but no WSL distributions are installed. In that situation bash.exe is found in System32, but it can't be used to run any hooks, because there's no actual system with a bash in it to wrap. This was not covered before. Unlike some conditions that prevent a WSL system from being used, such as resource exhaustion blocking it from being started, the absence of a WSL system should probably not fail the pytest run, for the same reason as we are trying not to have the complete *absence* of bash.exe fail the pytest run. Both conditions should be xfail. Fortunately, the error message when no distribution exists is distinctive and can be checked for. There is probably no correct and reasonable way to check LBYL-style which executable file bash.exe resolves to by using shutil.which, due to shutil.which and subprocess.Popen's differing seach orders and other subtleties. So this adds code to do it EAFP-style using subprocess.run (which itself uses Popen, so giving the same CreateProcessW behavior). It tries to run a command with bash.exe whose output pretty reliably shows if the system is WSL or not. We may fail to get to the point of running that command at all, if bash.exe is not usable, in which case the failure's details tell us if bash.exe is absent (xfail), present as the WSL wrapper with no WSL systems (xfail), or has some other error (not xfail, so the user can become aware of and proably fix the problem). If we do get to that point, then a file that is almost always present on both WSL 1 and WSL 2 systems and almost always absent on any other system is checked for, to distinguish whether the working bash shell operates in a WSL system, or outside any such system as e.g. Git Bash does. See https://superuser.com/a/1749811 on various techniques for checking for WSL, including the /proc/sys/fs/binfmt_misc/WSLInterop technique used here (which seems overall may be the most reliable). Although the Windows CI runners have Git Bash, and this is even the bash.exe that appears first in PATH (giving rise to the problem with shutil.which detailed above), it would be a bit awkward to test the behavior when Git Bash is actually used to run hooks on CI, because of how Popen selects the System32 bash.exe first, whether or not any WSL distribution is present. However, local testing shows that when Git Bash's bash.exe is selected, all four hook tests in the module pass, both before and after this change, and furthermore that bash.exe is correctly detected as "native", continuing to avoid an erronous xfail mark in that case.

881456b (gitpython-developers#1679) expanded the use of shutil.which in test_index.py to attempt to mark test_commit_msg_hook_success xfail when bash.exe is a WSL bash wrapper in System32 (because that test currently is not passing when the hook is run via bash in a WSL system, which the WSL bash.exe wraps). But this was not reliable, due to significant differences between shell and non-shell search behavior for executable commands on Windows. As the new docstring notes, lookup due to Popen generally checks System32 before going through directories in PATH, as this is how CreateProcessW behaves. - https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-createprocessw - python/cpython#91558 (comment) The technique I had used in 881456b also had the shortcoming of assuming that a bash.exe in System32 was the WSL wrapper. But such a file may exist on some systems without WSL, and be a bash interpreter unrelated to WSL that may be able to run hooks. In addition, one common situation, which was the case on CI before a step to install a WSL distribution was added, is that WSL is present but no WSL distributions are installed. In that situation bash.exe is found in System32, but it can't be used to run any hooks, because there's no actual system with a bash in it to wrap. This was not covered before. Unlike some conditions that prevent a WSL system from being used, such as resource exhaustion blocking it from being started, the absence of a WSL system should probably not fail the pytest run, for the same reason as we are trying not to have the complete *absence* of bash.exe fail the pytest run. Both conditions should be xfail. Fortunately, the error message when no distribution exists is distinctive and can be checked for. There is probably no correct and reasonable way to check LBYL-style which executable file bash.exe resolves to by using shutil.which, due to shutil.which and subprocess.Popen's differing seach orders and other subtleties. So this adds code to do it EAFP-style using subprocess.run (which itself uses Popen, so giving the same CreateProcessW behavior). It tries to run a command with bash.exe whose output pretty reliably shows if the system is WSL or not. We may fail to get to the point of running that command at all, if bash.exe is not usable, in which case the failure's details tell us if bash.exe is absent (xfail), present as the WSL wrapper with no WSL systems (xfail), or has some other error (not xfail, so the user can become aware of and proably fix the problem). If we do get to that point, then a file that is almost always present on both WSL 1 and WSL 2 systems and almost always absent on any other system is checked for, to distinguish whether the working bash shell operates in a WSL system, or outside any such system as e.g. Git Bash does. See https://superuser.com/a/1749811 on various techniques for checking for WSL, including the /proc/sys/fs/binfmt_misc/WSLInterop technique used here (which seems overall may be the most reliable). Although the Windows CI runners have Git Bash, and this is even the bash.exe that appears first in PATH (giving rise to the problem with shutil.which detailed above), it would be a bit awkward to test the behavior when Git Bash is actually used to run hooks on CI, because of how Popen selects the System32 bash.exe first, whether or not any WSL distribution is present. However, local testing shows that when Git Bash's bash.exe is selected, all four hook tests in the module pass, both before and after this change, and furthermore that bash.exe is correctly detected as "native", continuing to avoid an erroneous xfail mark in that case.

EliahKagan added 30 commits September 25, 2023 10:55

Make "Update PyPA packages" step clearer

bd3307a

Show all the failures

680d795

Don't stop after the first 10.

Pass -v twice to see full skip reasons

eb56e7b

+ Reorder pytest arguments so both workflows are consistent.

Force pytest color output on CI

9c7ff1e

While pytest-sugar output gets mangled with extra newlines on CI, colorized output seems to work fine and improves readability.

No longer skip test_index_mutation on Cygwin

c6a586a

As it seems to be working now on Cygwin (maybe not native Windows).

Add a few FIXMEs re: better use of xfail

203da23

Change skipIf(not ...) to skipUnless(...)

8923236

Express known test_depth failure with xfail

b198bf1

Rather than skipping, so it becomes known if the situation changes.

Make 2 more too-low git version skips into errors

f38cc00

In the tests only, and not in any way affecting the feature set or requirements of GitPython itself. This is similar to, and with the same reasoning as, cf5f1dc.

Update test_root_module Windows skip reason

8fd56e7

The current cause of failure is different from what is documented in the skip reason.

Change test_root_module Windows skip to xfail

c1798f5

And rewrite the reason to give more useful information. (The new reason also doesn't state the exception type, because that is now specified, and checked by pytest, by being passed as "raises".)

Run the tests in test_tree on Windows

1d6abdc

This stops skipping them, as they are now working.

Consolidate test_repo module import statements

ed95e8e

Show more CI system information

ceb4dd3

Use Cygwin's bash and git for more CI steps

3276aac

Try to work in all LF on Cygwin CI

5d40976

+ Style tweak and comment to clarify the "Limit $PATH" step.

Consistent formatting style across all workflows

dda4286

EliahKagan added 2 commits September 26, 2023 00:01

Further reduce differences between test workflows

4860f70

This makes the two CI test workflows more similar in a couple of the remaining ways they differ unnecessarily. This could be extended, and otherwise improved upon, in the future.

EliahKagan marked this pull request as ready for review September 26, 2023 09:16

Byron approved these changes Sep 26, 2023

View reviewed changes

Byron merged commit 58076c2 into gitpython-developers:main Sep 26, 2023
8 checks passed

EliahKagan deleted the verbose-ci branch September 26, 2023 18:24

This was referenced Nov 3, 2023

Overbroad xfail marks will eventually make CI fail #1728

Closed

is_darwin is always False (os.name is never "darwin") #1731

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make clear every test's status in every CI run #1679

Make clear every test's status in every CI run #1679

EliahKagan commented Sep 26, 2023 •

edited

Loading

Byron left a comment

Make clear every test's status in every CI run #1679

Make clear every test's status in every CI run #1679

Conversation

EliahKagan commented Sep 26, 2023 • edited Loading

pytest-sugar output didn't display correctly on CI

CI did not show which tests had which statuses

The custom pytest configuration was rigid and hid failures

It wasn't clear when is_win was true

Expected failures were handled by skipping

Some expected failures were out of date (or originally erroneous)

Very old git versions would skip tests needing newer features

The CI test workflows still appeared more different than they are

Byron left a comment

Choose a reason for hiding this comment

EliahKagan commented Sep 26, 2023 •

edited

Loading

`pytest-sugar` output didn't display correctly on CI

The custom `pytest` configuration was rigid and hid failures

It wasn't clear when `is_win` was true

Very old `git` versions would skip tests needing newer features