-
-
Notifications
You must be signed in to change notification settings - Fork 906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overbroad xfail marks will eventually make CI fail #1728
Comments
I am sorry not to have more feedback, but this part of the tests is way over my head 😅. |
Did it become so as a result of this change? If so, then probably a further change should be made to add comments, docstrings, something in |
I think it's that many of these tests generally are containing multiple steps and thus take some state to keep track of, which makes them harder to follow. But in this case I think the issue was that the change was so broad that the diff didn't actually help this time around. It would have been better for me to look at it in an editor, I then again, I do trust that you leave the place better than you found it and overall, the complexity of the code is lower now. So, all good, it's me, I am having one of these days as well I guess, in a good way, yet, I am more fuzzy than usual. |
Yes, many tests are long and test many things, and one of my worries about the situation with native Windows tests is that, in some cases, when a test fails, it never tests many of the numerous other things it was written to test. Changing
When I looked over #1729 in the GitHub interface (this was just a brief look before I marked the PR ready for review), I found the default "Unified" view, which is often useful, to be incomprehensible, but the "Split" view usable. However, even for the "Split" view, it may be that my knowledge of the changes--having myself made them, and recently--was a big factor.
The web editor, which can be opened for a PR by pressing . from the PR page, can sometimes be a convenient way to get a richer review experience without leaving the browser and without needing to be on a machine with an editor or other development tools. I'm not sure if that's one of the tools you sometimes use, but if so, then in this specific case, it might've contributed to making your experience worse, because GitHub suffered an outage around the time #1729 was approved (showing 404 pages for repositories, and with some interruption to accessing GitHub-hosted remotes and various other services). Usually, though, I think it works fairly well... though whether or not one likes it is a separate question.
Thanks! Of course, it is quite possible for me to make mistakes. It is also possible for me to introduce intermediate states of improvement without full insight into whether they are worth merging at that time. For example, #1661 was greatly improved due to #1661 (review). The area where I think my pull requests might carry the greatest risk of making GitPython worse is when they embody consequential design decisions that might be wrong (not applicable in #1729, which is fairly narrow in its effects and affects only the test suite).
No problem! |
🙀 That's amazing! How could I not know that? Thanks so much, I will definitely use that more. I actually use VSCode for GitPython, mainly because getting some level of code-intelligence is easy and it has a usable vim mode as well.
It's true, the test-suite is the only place where changes can even be made, and there I think it's just about hitting a spot where it's idiomatic and modern, maybe along with change to the away the tests actually work, i.e. make them more focussed where possible and properly isolated. An no matter what, I don't think it can ever be worse :D. |
This causes full "failure" output to be printed when a test marked xfail unexpectedly passes, and for the test run to be considered failing as a result. The immediate purpose of this change is to facilitate efficient identification of recently introduced wrong or overbroad xfail markings. This behavior may eventually become the pytest default (see gitpython-developers#1728 and references therein), and this could be retained even after the current xpassing tests are investigated, to facilitate timely detection of tests marked xfail of code that is newly working. (Individual tests decorated `@pytest.mark.xfail` can still be allowed to unexpectedly pass without it being treated like a test failure, by passing strict=False explicitly.)
This causes full "failure" output to be printed when a test marked xfail unexpectedly passes, and for the test run to be considered failing as a result. The immediate purpose of this change is to facilitate efficient identification of recently introduced wrong or overbroad xfail markings. This behavior may eventually become the pytest default (see gitpython-developers#1728 and references therein), and this could be retained even after the current xpassing tests are investigated, to facilitate timely detection of tests marked xfail of code that is newly working. (Individual tests decorated `@pytest.mark.xfail` can still be allowed to unexpectedly pass without it being treated like a test failure, by passing strict=False explicitly.)
This causes full "failure" output to be printed when a test marked xfail unexpectedly passes, and for the test run to be considered failing as a result. The immediate purpose of this change is to facilitate efficient identification of recently introduced wrong or overbroad xfail markings. This behavior may eventually become the pytest default (see gitpython-developers#1728 and references therein), and this could be retained even after the current xpassing tests are investigated, to facilitate timely detection of tests marked xfail of code that is newly working. (Individual tests decorated `@pytest.mark.xfail` can still be allowed to unexpectedly pass without it being treated like a test failure, by passing strict=False explicitly.)
This causes full "failure" output to be printed when a test marked xfail unexpectedly passes, and for the test run to be considered failing as a result. The immediate purpose of this change is to facilitate efficient identification of recently introduced wrong or overbroad xfail markings. This behavior may eventually become the pytest default (see gitpython-developers#1728 and references therein), and this could be retained even after the current xpassing tests are investigated, to facilitate timely detection of tests marked xfail of code that is newly working. (Individual tests decorated `@pytest.mark.xfail` can still be allowed to unexpectedly pass without it being treated like a test failure, by passing strict=False explicitly.)
This causes full "failure" output to be printed when a test marked xfail unexpectedly passes, and for the test run to be considered failing as a result. The immediate purpose of this change is to facilitate efficient identification of recently introduced wrong or overbroad xfail markings. This behavior may eventually become the pytest default (see gitpython-developers#1728 and references therein), and this could be retained even after the current xpassing tests are investigated, to facilitate timely detection of tests marked xfail of code that is newly working. (Individual tests decorated `@pytest.mark.xfail` can still be allowed to unexpectedly pass without it being treated like a test failure, by passing strict=False explicitly.)
This causes full "failure" output to be printed when a test marked xfail unexpectedly passes, and for the test run to be considered failing as a result. The immediate purpose of this change is to facilitate efficient identification of recently introduced wrong or overbroad xfail markings. This behavior may eventually become the pytest default (see gitpython-developers#1728 and references therein), and this could be retained even after the current xpassing tests are investigated, to facilitate timely detection of tests marked xfail of code that is newly working. (Individual tests decorated `@pytest.mark.xfail` can still be allowed to unexpectedly pass without it being treated like a test failure, by passing strict=False explicitly.)
Some updated information: pytest 8 was released on 27 January 2024 and does not contain the change to the default xfail behavior, but this change is still planned. It has been moved from the 8.0 to 9.0 milestone. Major version 8 of pytest is able to run on all of GitPython's CI test jobs except the 3.7 jobs. A compatible version is automatically selected and the release has not caused any tests to fail or otherwise malfunction. Pytest 8 improves reporting, including of XFAIL. However, with
In addition to the web editor, there is also Codespaces, where the editor user interface frontend runs either in a browser or in the VS Code desktop application, but the editor backend and the rest of the development environment--which is a fully functional environment--run in a container on a cloud-hosted virtual machine. (There are other things like this, of which the most prominent seems to be Gitpod.) Codespaces likewise integrate with GitHub, including with pull requests. The web editor supports some extensions, including the Python extension and GitLens. Only some features work in the web editor, since there is no full backend, but this includes the GitLens commit graph feature. However, the best visual commit graph tool I've ever used--even though it is no longer maintained--is the Git Graph extension. A while ago you had mentioned that you don't have a tool to visualize complex git graphs. If you're using VS Code and haven't tried the Git Graph extension, I recommend it; it produces by far the clearest visualizations for this that I have seen. The Git Graph extension does not support the web editor, and in practice I sometimes use a codespace rather than the web editor just to use Git Graph. Codespaces can be customized with a different Docker image, VS Code extensions, VS Code configuration (e.g., to automatically find unit tests and allow them to be run in the editor), development tools, startup scripts, and so forth. Really it is the dev container that runs in the codespace that is customized. Dev containers can also be run locally with VS Code and Docker (which should not be confused with connecting to a codespace from the VS Code desktop application, which can also be done). I have thought about proposing a dev container configuration for GitPython to allow users to try it out, and developers to work on changes, with effectively zero extra steps. To be clear, codespaces and local dev containers can already be used to try out, and to develop changes to, GitPython; I am talking about customizing the environment to make it a bit nicer and also make it so one need not do any setup steps. The reason I have opened such a PR, though, is that to be useful enough to justify itself, I think it would have to get things to the point where unit tests can be run...which currently entails running In practice, dev containers are mostly used in GitHub Codespaces. Local dev containers are also useful. They are most useful when one clones (or re-clones) the repository into the dev container, for maximal isolation from the host machine, which the VS Code extension for dev containers can do (and will even offer to do if one opens a local clone of the repository). However, it is also possible to map the local repository directory on the host as storage for the dev container. This is not recommended on Windows and macOS, because it is much slower than cloning into container storage. Even on GNU/Linux, one forgoes some of the benefits of a dev container by sacrificing isolation. However, it can be done, and people sometimes do it. In this situation, running This can be solved, but it may not be worthwhile to do so, because there would still be the need to switch to a |
I agree, it seems easier to 'maintenance mode' to have less CI failures, even if they are for the right reasons. With that said, it seems to be a communication problem more than anything else. If pytest would make clear that xfail can be removed for a certain condition, then 'strict' mode should be more beneficial.
Thanks for the hint! Maybe what I meant is that none of these tools simplify the graph to what truly matters, where 'truly matters' is certainly the hard part to get right. Visualizing is one part, but doing so in a manner that is possible to follow is another. Maybe such tool just doesn't exist unless one day I write it, just to test the hypothesis that this visualization could be better.
I see. This container configuration could only be safe in the (maybe not so unusual) local setup if unit tests would indeed be isolated. Achieving this shouldn't even be this hard. I think I mentioned how |
Background
#1679 included improvements to a number of tests that are known to fail on some platforms, by marking them
xfail
instead ofskip
so they are still run and their status is reported, but without a failing status causing the whole test run to fail. However, it appliedxfail
to too many tests, due to limitations on granularity when applyingpytest
marks tounittest
test cases generated by@ddt
parameterization.GitPython/test/test_util.py
Lines 221 to 228 in 340da6d
GitPython/test/test_util.py
Lines 233 to 245 in 340da6d
Upcoming impact
Although this was known and discussed in #1679, and FIXME comments about it were included in the code, the problem turns out to be somewhat more serious than I had anticipated: if not addressed, it will eventually lead to test failures in a future version of
pytest
. This is because the default behavior of an unexpectedly passing test--one that is markedxfail
but passes--will most likely change in pytest 8. Because GitPython does not specify upper bounds on most of its development dependencies, and pytest is one of the development dependencies for which no upper bound is specified, pytest 8 will be automatically installed once it is (stably) released.Specifically, and in the absence of configuration or command-line options to
pytest
that override the behavior:xfail
that fails, and fails in the expected way, produces an XFAIL status, which is treated similarly to PASS. We always want this.xfail
that fails in a detectably unexpected way--where a different exception results than the one that was expected--produces a FAIL status. We always want this.xfail
that passes produces an XPASS status. How this status is treated is more complicated. Thexfail
mark supports an optionalstrict
parameter. Where present, it determines whether the XPASS fails the test run like a FAIL status would, or does not fail the test run (thus behaving like PASS or XFAIL). If absent, thexfail_strict
configuration option provides the default. Currently, as of pytest 7,xfail_strict
defaults toFalse
when not specified.As noted in pytest-dev/pytest#11467, which was opened by a pytest maintainer and is listed for pytest's 8.0 milestone, the default is planned to be changed from
False
toTrue
starting in pytest 8.0. (See also pytest-dev/pytest#11499.)Possible fixes
Breakage could be avoided (at least for a while, since
strict=False
may eventually be removed as a feature) by passingstrict=False
or settingxfail_strict=false
forpytest
inpyproject.toml
. It is also possible to set an upper bound like<8
forpytest
intest-requirements.txt
.However, I recommend this instead be fixed by reorganizing the tests in
test_util.py
so that the tests ofcygpath
anddecygpath
--which are the ones that have the insufficiently precisexfail
markings that mark some generated test casesxfail
even though they are known to pass--can be purepytest
tests. Because they are currentlyunittest
tests, they cannot be generated by@pytest.mark.parametrize
(hence@ddt
is used). But if they could be generated with theparametrize
mark then they could have per-case markings, becauseparametrize
supports an optionalmarks
argument. They could then have thexfail
mark applied to exactly the cases where failure is really expected.That approach – which I mentioned in #1679 itself and in #1700 (comment), and more recently alluded to in #1725 and #1726 (comment) – has the following advantages over other approaches that effectively just suppress the problem:
pytest
changes--but also even before that, once it is documented to change--the presence of expected XPASSes will be more misleading than it is already, even if GitPython is not using a version ofpytest
affected by the change. This is because that change will further solidify people's expectations about what XPASS indicates, including for people who are trying to become familiar with GitPython.test_util.py
can also help clarify the tests ofrmtree
behavior, and help make them easier to modify. This is useful because it will allow building on #1700 toward an eventual complete fix for #790. (In addition, I want to make sure the planned native Windows CI jobs don't have the effect of calcifying cleanup logic inrmtree
that otherwise could or should change, or at least that this does not happen in ways that impinge on non-Windows platforms. I think such a reorganization will help with that, too.)I have opened #1729, which fixes this issue by reorganizing tests in
test_util.py
in this way.The text was updated successfully, but these errors were encountered: