Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make harness errors block PRs (with override) #10877

Open
zcorpan opened this issue May 7, 2018 · 24 comments
Open

Make harness errors block PRs (with override) #10877

zcorpan opened this issue May 7, 2018 · 24 comments

Comments

@zcorpan
Copy link
Member

zcorpan commented May 7, 2018

@jugglinmike made a list of tests producing harness errors in web-platform-tests/results-collection#478 (comment)

@foolip do we want to make harness errors fail in Travis?

@foolip
Copy link
Member

foolip commented May 7, 2018

I think we should try to make all harness errors fail the Travis jobs, unless while trying to do that we discover some reason why harness errors really should be allowed. I imagine that @jgraham has thoughts on this.

@jgraham
Copy link
Contributor

jgraham commented May 9, 2018

What do you mean "harness errors"? Do you mean errors in testharness.js or in wptrunner?

Assuming you mean the former, there is an already-discussed ambiguity between single page tests which are likely to produce ERROR if the feature in question isn't implemented, and tests that unintentionally error due to a bug. One could certainly look for cases where multiple browsers produce ERROR. This was the problem that the pulls dashboard was supposed to solve, but that seems to have stopped working.

@foolip
Copy link
Member

foolip commented May 9, 2018

We're talking about testharness.js harness errors. We can make progress on this even if #10853 remains unresolved, it just means that if #10853 is fixed, we might need to fix a bunch of new harness errors at the same time. I would slightly prefer the other order of fixing though.

@zcorpan
Copy link
Member Author

zcorpan commented May 31, 2018

Also TIMEOUT may be good to prevent.

@jgraham
Copy link
Contributor

jgraham commented May 31, 2018

TIMEOUT is fine in the case where you're waiting for an event that never happens. The alternative is that authors add some shorter timeout, making the test more prone to intermittent failures.

Even forERROR, where a top-level error status might well indicate a bug, I'd be interested to see evidence for or against the theory that an ERROR in a subset of implemenatations is usually indcative of a test bug and so should block landing. For cases where no browser has a non-ERROR, non-TIMEOUT status, the case for blocking landing is more obviously strong, but it does disadvantage tests written against implemenations we can't run in CI (notably Safari/WebKit and Edge)

@foolip
Copy link
Member

foolip commented May 31, 2018

@zcorpan @jugglinmike, in your triage of tests that produce harness errors, have you come across any that you don't believe are test bugs? Failing to find such hard cases is IMHO a reason to prevent them from being introduced as suggested in this issue, but I think we should still have an escape hatch in lint.whitelist for cases when browser bugs are the cause.

@zcorpan
Copy link
Member Author

zcorpan commented May 31, 2018

I found #11269 (comment) today.

Maybe we shouldn't block landing of TIMEOUT or even ERROR, but making it visible when it happens somehow could be useful; in most cases it's bug in the test.

@jgraham
Copy link
Contributor

jgraham commented May 31, 2018

That was one of the things that I thought was planned for the PR dashboard before it got abandoned. I certainly think that requiring people to think before landing a PR that introduces ERROR everywhere is a good plan.

@foolip
Copy link
Member

foolip commented May 31, 2018

Indeed, there are patterns of changes that we want to make sure do not happen accidentally, but which are still sometimes correct. For things like that, I think the options are to either post a comment about it, which is possible to overlook, or to have a failing status check that leads to a page describing a problem, along with a "make the status green" checkbox for when it's actually appropriate. A bit complex, I know. cc @lukebjerring

@foolip
Copy link
Member

foolip commented Aug 3, 2018

As a first step towards this, when implementing #7475, we can treat new harness errors as more serious and a different category of problem than new failing tests.

@jugglinmike, do you have a somewhat up-to-date link to the report of harness errors, so we can see just how long the path to eliminating harness errors might be?

@jugglinmike
Copy link
Contributor

jugglinmike commented Aug 3, 2018

Not off hand, but it's quick to generate one. Here are the failures for WPT revision 4a15e98, dated Thu Aug 2 11:45:23 2018 +0200:

357 tests failing producing errors in 1 browser(s)
137 tests failing producing errors in 2 browser(s)
65 tests failing producing errors in 3 browser(s)
30 tests failing producing errors in 4 browser(s)
2 tests failing producing errors in 5 browser(s)
0 tests failing producing errors in 6 browser(s)
Script used to collect these results
#!/usr/bin/env python

import gzip
import json
import sys

def count_errors(filenames):
    by_title = {}
    revision = None

    for filename in filenames:
        with gzip.open(filename) as handle:
            results = json.load(handle)

        if revision is None:
            revision = results['run_info']['revision']

        if revision != results['run_info']['revision']:
            raise Exception('revision mismatch: %s and %s' % (revision, results['run_info']['revision']))

    for filename in filenames:
        with gzip.open(filename) as handle:
            results = json.load(handle)

        for result in results['results']:
            if result['status'] != 'ERROR':
                continue

            title = result['test']

            if title not in by_title:
                by_title[title] = []

            by_title[title].append(filename)

    by_count = [[] for _ in xrange(len(filenames))]

    for title, filenames in by_title.iteritems():
        by_count[len(filenames) - 1].append(title)

    for titles in by_count:
        titles.sort()

    return revision, by_count

if __name__ == '__main__':
    revision, by_count = count_errors(sys.argv[1:])

    for index, titles in enumerate(by_count):
        browser_count = index + 1
        failure_count = len(titles)
        title_listing = '\n'.join(
            ['- [{path}](https://wpt.fyi/results{path}?sha={revision})'.format(revision=revision, path=title) for title in titles]
        )
        print '''<details>
<summary>
  {failure_count} tests producing errors in {browser_count} browser(s)
</summary>

{title_listing}
</details>'''.format(
            failure_count=failure_count, browser_count=browser_count,
            title_listing=title_listing
        )

@jugglinmike
Copy link
Contributor

Uhg, by "failing" I really mean "producing errors"

@foolip
Copy link
Member

foolip commented Aug 3, 2018

Alright, that isn't a huge number, 591 in total. I know that @zcorpan has already fixed a bunch. @jgraham, if we're able to fix >80% of these as test bugs, do you think that'd be a strong enough case and not many remain because of implementation bugs, would that be a strong enough case to turn harness errors into something that by default fails PRs unless the test is whitelisted?

@foolip
Copy link
Member

foolip commented Aug 3, 2018

@jugglinmike, the list in #10877 (comment) is great! I sent PRs to fix the "producing errors in 5 browser(s)" tests, and I imagine many of the rest are similar oversights. Preventing this with tooling would be fabufantastic.

foolip added a commit that referenced this issue Aug 3, 2018
Previously, this test caused a harness error in Firefox and Safari, now
the test itself fails correctly instead.

Found via #10877 (comment).
@jgraham
Copy link
Contributor

jgraham commented Aug 4, 2018

I certainly agree that some mechanism to require manual approval if there are errors in multiple browsers seems reasonable, independent of how hard it is to fix. For tests that only error in a single browser, I guess it depends what the fix looks like; in general I guess these addiitional checks aren't going to get run on the Chrome/Gecko/other upstream side, so it seems like triggering this failure too often might be problematic for developers primarilly contributing upstream. Therefore if the ERROR case mostly describes valid tests that could simply have been written more carefully to not error when the features are missing, glagging them seems like a poor tradeoff (so far I haven't seen complaints about tests we have synced that error, but I did just get a complaint about some tests we synced that were buggy and failed, so we should be realistic about the value of treating ERROR as uniquely bad).

All that said, I don't know what the mechanism for allowing tests that produce multiple errors through should be; putting it in the source seems wrong because it's not a property of a specific git revision, but a property of running that revision in specific revisions of a set of browsers. Ideally I would like some kind of checkbox in the PR like

[x] Test that ERRORs is not buggy.

But enforcing that could be tricky without building a landing bot.

@foolip
Copy link
Member

foolip commented Aug 4, 2018

Therefore if the ERROR case mostly describes valid tests that could simply have been written more carefully to not error when the features are missing, glagging

I'm unable to guess what this is a type of...

them seems like a poor tradeoff (so far I haven't seen complaints about tests we have synced that error, but I did just get a complaint about some tests we synced that were buggy and failed, so we should be realistic about the value of treating ERROR as uniquely bad).

Sure, harness errors aren't the worst kind of problem a test can have, but it is one that's pretty easy to create tooling around, which would help catch some smallish mistakes. (See my 3 PRs above.)

Ideally I would like some kind of checkbox in the PR

With https://developer.github.com/v3/checks/ we'd be able to have a button that says "this harness error isn't a bug", see the "Ignore this" in https://github.com/foolip/web-platform-tests/runs/9087881 for an example.

It would require us to create/host some infrastructure for such status checks, though, or perhaps wait for Taskcluster to get it.

@jgraham
Copy link
Contributor

jgraham commented Aug 6, 2018

I'm unable to guess what this is a type of...

flagging.

With https://developer.github.com/v3/checks/ we'd be able to have a button that says "this harness error isn't a bug", see the "Ignore this" in https://github.com/foolip/web-platform-tests/runs/9087881 for an example.

I don't see such a button, are you sure you don't have to be an admin of the repo to use it? That would be equivalent to the situation today, but we certainly want the PR author or reviewers to be able to override the checks.

@foolip
Copy link
Member

foolip commented Aug 7, 2018

Yeah, I guess you need write access to the repo. Here's a screenshot:
image

moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Aug 8, 2018
…est helper, a=testonly

Automatic update from web-platform-testsFix a t.assert_unreached typo in a CSP test helper (#12303)

Found via web-platform-tests/wpt#10877 (comment).
--

wpt-commits: 64a9cbb9aec8db0f84c076ae77045297fe8ea320
wpt-pr: 12303
xeonchen pushed a commit to xeonchen/gecko-cinnabar that referenced this issue Aug 8, 2018
…est helper, a=testonly

Automatic update from web-platform-testsFix a t.assert_unreached typo in a CSP test helper (#12303)

Found via web-platform-tests/wpt#10877 (comment).
--

wpt-commits: 64a9cbb9aec8db0f84c076ae77045297fe8ea320
wpt-pr: 12303
foolip added a commit that referenced this issue Aug 9, 2018
Previously, this test caused a harness error in Firefox and Safari, now
the test itself fails correctly instead.

Found via #10877 (comment).
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Aug 14, 2018
…t, a=testonly

Automatic update from web-platform-testsWrap an assert in step_func in a CSP test (#12305)

Previously, this test caused a harness error in Firefox and Safari, now
the test itself fails correctly instead.

Found via web-platform-tests/wpt#10877 (comment).
--

wpt-commits: 590992ecaa850723c3300417912a874d94295444
wpt-pr: 12305
jankeromnes pushed a commit to jankeromnes/gecko that referenced this issue Aug 14, 2018
…t, a=testonly

Automatic update from web-platform-testsWrap an assert in step_func in a CSP test (#12305)

Previously, this test caused a harness error in Firefox and Safari, now
the test itself fails correctly instead.

Found via web-platform-tests/wpt#10877 (comment).
--

wpt-commits: 590992ecaa850723c3300417912a874d94295444
wpt-pr: 12305
@foolip
Copy link
Member

foolip commented Oct 9, 2018

Given that we'll stop running stability checks in Travis in favor results+stability in Taskcluster, this issue isn't quite accurate. I'll change the title to reflect that it might not involve Travis.

@foolip foolip changed the title Make harness errors fail in Travis Make new harness errors block PRs (with bypass mechanism) Oct 9, 2018
@foolip foolip changed the title Make new harness errors block PRs (with bypass mechanism) Make harness errors block PRs (with bypass mechanism) Oct 9, 2018
@foolip foolip changed the title Make harness errors block PRs (with bypass mechanism) Make harness errors block PRs (with override) Oct 9, 2018
@mdittmer
Copy link
Contributor

Ping from your friendly neighbourhood ecosystem infra rotation

If this is priority:roadmap should it have an assignee?

@lukebjerring
Copy link
Contributor

So, in theory a harness error will be picked up as a regression by the wpt.fyi status check.
That check is currently neutral when it detects a regression, but we do have a feature for making it fail (and an Ignore action built into the check), which I think satisfies this issues' current title:

Make harness errors block PRs (with override)

So, currently blocked on launch of https://github.com/web-platform-tests/wpt.fyi/projects/6 ?

@foolip
Copy link
Member

foolip commented Dec 12, 2018

@lukebjerring can this be assigned to you, or is harness errors further down the line than regressions? I guess harness error regressions are already prevented? I actually don't know that we need to do more than that...

@lukebjerring lukebjerring self-assigned this Dec 13, 2018
@foolip
Copy link
Member

foolip commented Mar 22, 2019

This is something I think we might still want to do, leaving as roadmap but probably won't get done in Q2 because @lukebjerring is OOO.

@foolip
Copy link
Member

foolip commented Jun 10, 2019

@lukebjerring is this something we should look at again in Q2? I don't know how big of a problem it is that tests with new harness errors land vs. the effort required to prevent it. I guess it'd just be a failing wpt.fyi check, and a new action for "I really mean it"?

gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified-and-comments-removed that referenced this issue Oct 3, 2019
…est helper, a=testonly

Automatic update from web-platform-testsFix a t.assert_unreached typo in a CSP test helper (#12303)

Found via web-platform-tests/wpt#10877 (comment).
--

wpt-commits: 64a9cbb9aec8db0f84c076ae77045297fe8ea320
wpt-pr: 12303

UltraBlame original commit: 46cd6f41717a3d07bed82676d6a6c1fb17995e74
gecko-dev-updater pushed a commit to marco-c/gecko-dev-comments-removed that referenced this issue Oct 3, 2019
…est helper, a=testonly

Automatic update from web-platform-testsFix a t.assert_unreached typo in a CSP test helper (#12303)

Found via web-platform-tests/wpt#10877 (comment).
--

wpt-commits: 64a9cbb9aec8db0f84c076ae77045297fe8ea320
wpt-pr: 12303

UltraBlame original commit: 46cd6f41717a3d07bed82676d6a6c1fb17995e74
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified-and-comments-removed that referenced this issue Oct 3, 2019
…t, a=testonly

Automatic update from web-platform-testsWrap an assert in step_func in a CSP test (#12305)

Previously, this test caused a harness error in Firefox and Safari, now
the test itself fails correctly instead.

Found via web-platform-tests/wpt#10877 (comment).
--

wpt-commits: 590992ecaa850723c3300417912a874d94295444
wpt-pr: 12305

UltraBlame original commit: b6e9010c6387a844841891a8d7e6d7cff7c953d7
gecko-dev-updater pushed a commit to marco-c/gecko-dev-comments-removed that referenced this issue Oct 3, 2019
…t, a=testonly

Automatic update from web-platform-testsWrap an assert in step_func in a CSP test (#12305)

Previously, this test caused a harness error in Firefox and Safari, now
the test itself fails correctly instead.

Found via web-platform-tests/wpt#10877 (comment).
--

wpt-commits: 590992ecaa850723c3300417912a874d94295444
wpt-pr: 12305

UltraBlame original commit: b6e9010c6387a844841891a8d7e6d7cff7c953d7
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this issue Oct 3, 2019
…est helper, a=testonly

Automatic update from web-platform-testsFix a t.assert_unreached typo in a CSP test helper (#12303)

Found via web-platform-tests/wpt#10877 (comment).
--

wpt-commits: 64a9cbb9aec8db0f84c076ae77045297fe8ea320
wpt-pr: 12303

UltraBlame original commit: 46cd6f41717a3d07bed82676d6a6c1fb17995e74
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this issue Oct 3, 2019
…t, a=testonly

Automatic update from web-platform-testsWrap an assert in step_func in a CSP test (#12305)

Previously, this test caused a harness error in Firefox and Safari, now
the test itself fails correctly instead.

Found via web-platform-tests/wpt#10877 (comment).
--

wpt-commits: 590992ecaa850723c3300417912a874d94295444
wpt-pr: 12305

UltraBlame original commit: b6e9010c6387a844841891a8d7e6d7cff7c953d7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants