-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make harness errors block PRs (with override) #10877
Comments
I think we should try to make all harness errors fail the Travis jobs, unless while trying to do that we discover some reason why harness errors really should be allowed. I imagine that @jgraham has thoughts on this. |
What do you mean "harness errors"? Do you mean errors in testharness.js or in wptrunner? Assuming you mean the former, there is an already-discussed ambiguity between single page tests which are likely to produce |
Also |
Even for |
@zcorpan @jugglinmike, in your triage of tests that produce harness errors, have you come across any that you don't believe are test bugs? Failing to find such hard cases is IMHO a reason to prevent them from being introduced as suggested in this issue, but I think we should still have an escape hatch in lint.whitelist for cases when browser bugs are the cause. |
I found #11269 (comment) today. Maybe we shouldn't block landing of TIMEOUT or even ERROR, but making it visible when it happens somehow could be useful; in most cases it's bug in the test. |
That was one of the things that I thought was planned for the PR dashboard before it got abandoned. I certainly think that requiring people to think before landing a PR that introduces |
Indeed, there are patterns of changes that we want to make sure do not happen accidentally, but which are still sometimes correct. For things like that, I think the options are to either post a comment about it, which is possible to overlook, or to have a failing status check that leads to a page describing a problem, along with a "make the status green" checkbox for when it's actually appropriate. A bit complex, I know. cc @lukebjerring |
As a first step towards this, when implementing #7475, we can treat new harness errors as more serious and a different category of problem than new failing tests. @jugglinmike, do you have a somewhat up-to-date link to the report of harness errors, so we can see just how long the path to eliminating harness errors might be? |
Uhg, by "failing" I really mean "producing errors" |
Alright, that isn't a huge number, 591 in total. I know that @zcorpan has already fixed a bunch. @jgraham, if we're able to fix >80% of these as test bugs, do you think that'd be a strong enough case and not many remain because of implementation bugs, would that be a strong enough case to turn harness errors into something that by default fails PRs unless the test is whitelisted? |
@jugglinmike, the list in #10877 (comment) is great! I sent PRs to fix the "producing errors in 5 browser(s)" tests, and I imagine many of the rest are similar oversights. Preventing this with tooling would be fabufantastic. |
Previously, this test caused a harness error in Firefox and Safari, now the test itself fails correctly instead. Found via #10877 (comment).
I certainly agree that some mechanism to require manual approval if there are errors in multiple browsers seems reasonable, independent of how hard it is to fix. For tests that only error in a single browser, I guess it depends what the fix looks like; in general I guess these addiitional checks aren't going to get run on the Chrome/Gecko/other upstream side, so it seems like triggering this failure too often might be problematic for developers primarilly contributing upstream. Therefore if the ERROR case mostly describes valid tests that could simply have been written more carefully to not error when the features are missing, glagging them seems like a poor tradeoff (so far I haven't seen complaints about tests we have synced that error, but I did just get a complaint about some tests we synced that were buggy and failed, so we should be realistic about the value of treating ERROR as uniquely bad). All that said, I don't know what the mechanism for allowing tests that produce multiple errors through should be; putting it in the source seems wrong because it's not a property of a specific git revision, but a property of running that revision in specific revisions of a set of browsers. Ideally I would like some kind of checkbox in the PR like [x] Test that ERRORs is not buggy. But enforcing that could be tricky without building a landing bot. |
I'm unable to guess what this is a type of...
Sure, harness errors aren't the worst kind of problem a test can have, but it is one that's pretty easy to create tooling around, which would help catch some smallish mistakes. (See my 3 PRs above.)
With https://developer.github.com/v3/checks/ we'd be able to have a button that says "this harness error isn't a bug", see the "Ignore this" in https://github.com/foolip/web-platform-tests/runs/9087881 for an example. It would require us to create/host some infrastructure for such status checks, though, or perhaps wait for Taskcluster to get it. |
flagging.
I don't see such a button, are you sure you don't have to be an admin of the repo to use it? That would be equivalent to the situation today, but we certainly want the PR author or reviewers to be able to override the checks. |
…est helper, a=testonly Automatic update from web-platform-testsFix a t.assert_unreached typo in a CSP test helper (#12303) Found via web-platform-tests/wpt#10877 (comment). -- wpt-commits: 64a9cbb9aec8db0f84c076ae77045297fe8ea320 wpt-pr: 12303
…est helper, a=testonly Automatic update from web-platform-testsFix a t.assert_unreached typo in a CSP test helper (#12303) Found via web-platform-tests/wpt#10877 (comment). -- wpt-commits: 64a9cbb9aec8db0f84c076ae77045297fe8ea320 wpt-pr: 12303
Previously, this test caused a harness error in Firefox and Safari, now the test itself fails correctly instead. Found via #10877 (comment).
…t, a=testonly Automatic update from web-platform-testsWrap an assert in step_func in a CSP test (#12305) Previously, this test caused a harness error in Firefox and Safari, now the test itself fails correctly instead. Found via web-platform-tests/wpt#10877 (comment). -- wpt-commits: 590992ecaa850723c3300417912a874d94295444 wpt-pr: 12305
…t, a=testonly Automatic update from web-platform-testsWrap an assert in step_func in a CSP test (#12305) Previously, this test caused a harness error in Firefox and Safari, now the test itself fails correctly instead. Found via web-platform-tests/wpt#10877 (comment). -- wpt-commits: 590992ecaa850723c3300417912a874d94295444 wpt-pr: 12305
Given that we'll stop running stability checks in Travis in favor results+stability in Taskcluster, this issue isn't quite accurate. I'll change the title to reflect that it might not involve Travis. |
Ping from your friendly neighbourhood ecosystem infra rotation If this is |
So, in theory a harness error will be picked up as a regression by the wpt.fyi status check.
So, currently blocked on launch of https://github.com/web-platform-tests/wpt.fyi/projects/6 ? |
@lukebjerring can this be assigned to you, or is harness errors further down the line than regressions? I guess harness error regressions are already prevented? I actually don't know that we need to do more than that... |
This is something I think we might still want to do, leaving as roadmap but probably won't get done in Q2 because @lukebjerring is OOO. |
@lukebjerring is this something we should look at again in Q2? I don't know how big of a problem it is that tests with new harness errors land vs. the effort required to prevent it. I guess it'd just be a failing wpt.fyi check, and a new action for "I really mean it"? |
…est helper, a=testonly Automatic update from web-platform-testsFix a t.assert_unreached typo in a CSP test helper (#12303) Found via web-platform-tests/wpt#10877 (comment). -- wpt-commits: 64a9cbb9aec8db0f84c076ae77045297fe8ea320 wpt-pr: 12303 UltraBlame original commit: 46cd6f41717a3d07bed82676d6a6c1fb17995e74
…est helper, a=testonly Automatic update from web-platform-testsFix a t.assert_unreached typo in a CSP test helper (#12303) Found via web-platform-tests/wpt#10877 (comment). -- wpt-commits: 64a9cbb9aec8db0f84c076ae77045297fe8ea320 wpt-pr: 12303 UltraBlame original commit: 46cd6f41717a3d07bed82676d6a6c1fb17995e74
…t, a=testonly Automatic update from web-platform-testsWrap an assert in step_func in a CSP test (#12305) Previously, this test caused a harness error in Firefox and Safari, now the test itself fails correctly instead. Found via web-platform-tests/wpt#10877 (comment). -- wpt-commits: 590992ecaa850723c3300417912a874d94295444 wpt-pr: 12305 UltraBlame original commit: b6e9010c6387a844841891a8d7e6d7cff7c953d7
…t, a=testonly Automatic update from web-platform-testsWrap an assert in step_func in a CSP test (#12305) Previously, this test caused a harness error in Firefox and Safari, now the test itself fails correctly instead. Found via web-platform-tests/wpt#10877 (comment). -- wpt-commits: 590992ecaa850723c3300417912a874d94295444 wpt-pr: 12305 UltraBlame original commit: b6e9010c6387a844841891a8d7e6d7cff7c953d7
…est helper, a=testonly Automatic update from web-platform-testsFix a t.assert_unreached typo in a CSP test helper (#12303) Found via web-platform-tests/wpt#10877 (comment). -- wpt-commits: 64a9cbb9aec8db0f84c076ae77045297fe8ea320 wpt-pr: 12303 UltraBlame original commit: 46cd6f41717a3d07bed82676d6a6c1fb17995e74
…t, a=testonly Automatic update from web-platform-testsWrap an assert in step_func in a CSP test (#12305) Previously, this test caused a harness error in Firefox and Safari, now the test itself fails correctly instead. Found via web-platform-tests/wpt#10877 (comment). -- wpt-commits: 590992ecaa850723c3300417912a874d94295444 wpt-pr: 12305 UltraBlame original commit: b6e9010c6387a844841891a8d7e6d7cff7c953d7
@jugglinmike made a list of tests producing harness errors in web-platform-tests/results-collection#478 (comment)
@foolip do we want to make harness errors fail in Travis?
The text was updated successfully, but these errors were encountered: