Flakiness detection #998

tvolkert · 2019-03-14T15:28:00Z

It'd be really helpful if the test runner could be configured to auto-rerun individual failed tests upon failure, and then if a test succeeded in a rerun invocation, flag it as flaky. Then the runner would report how many tests passed, how many failed, how many were skipped, and how many flaked. Along with this, we'd be able to configure whether flakes would trigger an overall passing result or failing result (via the exit code).

Upon failure or flakiness, stdout would report the failure of the first invocation, and if a subsequent invocation passed, it'd get reported as [F] for flaky.

For example, configured to report success if flakes were detected:

$ pub run test --rerun-on-failure=5 --success_on_flake
00:04 +20 ~1 !1: test/foo_test.dart: Example test [F]                                                        
  Expected: true
    Actual: <false>
  
  package:test_api           expect
  test/foo_test.dart 10:20  main.<fn>.<fn>.<fn>
  
00:04 +40 ~1 !1: Some tests flaked.                                                                                                                           

$ echo $?
0 <-- If the only failures were identified to be flakes, then report tests passing

... and configured to report non-success if flakes were detected:

$ pub run test --rerun-on-failure=5 --no-success_on_flake
00:04 +20 ~1 !1: test/foo_test.dart: Example test [F]                                                        
  Expected: true
    Actual: <false>
  
  package:test_api           expect
  test/foo_test.dart 10:20  main.<fn>.<fn>.<fn>
  
00:04 +40 ~1 !1: Some tests flaked.                                                                                                                           

$ echo $?
[X] <-- some well-defined exit code that signals failure due only to flaky tests

Related (not not the same): #441

The text was updated successfully, but these errors were encountered:

jakemac53 · 2019-03-14T15:45:40Z

Given that we already do have the retry logic, would this request be satisfied by:

Reporting any test that had to be retried as a flake (in the summary at the end)
Adding a flag which causes any flakes to result in a non-zero exit code

The primary difference I see here compared to your proposal is you wouldn't have a command line option to control the number of retries (although you can disable retries with --no-retry). Unfortunately that also means we can't add a --retry=<num-retries> option since that is already a flag. However it is supported through dart_test.yaml, annotations, and named args on tests/groups today.

Note that you can also create different retry configurations using presets, so you can have different config on travis etc.

natebosch · 2019-03-14T17:09:41Z

What does our current reporting look like when using existing retry behavior?

jakemac53 · 2019-03-14T17:30:02Z

Currently it looks something like this:

0:01 +1 ~1 -1: flaky test [E]                                                                           
  Expected: <5>
    Actual: <0>
  
  package:test_api            expect
  hello_world_test.dart 34:5  <fn>
  
Retry: flaky test
  Expected: <5>
    Actual: <1>
  
  package:test_api            expect
  hello_world_test.dart 34:5  <fn>
  
Retry: flaky test
  Expected: <5>
    Actual: <2>
  
  package:test_api            expect
  hello_world_test.dart 34:5  <fn>
  
Retry: flaky test
  Expected: <5>
    Actual: <3>
  
  package:test_api            expect
  hello_world_test.dart 34:5  <fn>
  
Retry: flaky test
  Expected: <5>
    Actual: <4>
  
  package:test_api            expect
  hello_world_test.dart 34:5  <fn>
  
Retry: flaky test
00:02 +2 ~1: All tests passed!

fzyzcjy · 2023-05-01T13:36:52Z

+1 after 4 years. For example, Flutter itself has a dedicated article about detecting and reducing flakiness - https://github.com/flutter/flutter/wiki/Reducing-Test-Flakiness, thus flaky really should be handled separately instead of marking it as passed (or failed).

My https://github.com/fzyzcjy/flutter_convenient_test does work well for integration tests, but when running widget tests / unit tests using IDE (e.g. intellij), I cannot rely on convenient_test and thus hope that package:test can have flaky support. Thanks!

jakemac53 · 2023-05-01T15:45:14Z

It doesn't look like we ever got any clarity here on whether the proposed UX would be sufficient -- cc @tvolkert .

Presumably we would also want to have a special flag for this in the JSON protocol which could be breaking, cc @DanTup . We could make it non-breaking by having it just be an additional flag on successful tests, but it might be cleaner if we just had 3 result states (success, failure, flaky).

fzyzcjy · 2023-05-01T23:13:46Z

Looks quite reasonable, I am willing to PR if it is clear

tvolkert · 2023-05-01T23:29:03Z

@pdblasi-google is the right person on the Flutter side now 🙂

pdblasi-google · 2023-05-02T17:07:33Z

Taking inspiration from NUnit, would it make sense to have four possible states? NUnit has Passed, Failed, Inconclusive and Skipped.

Inconclusive could be what we use for tests that are marked flaky, and the additional Skipped state would be helpful for reporting.

pdblasi-google · 2023-05-02T17:09:56Z

Inconclusive could also be a useful state for situations such as, "We were unable to connect to a device, the test itself didn't explicitly fail but we are unable to report on it meaningfully".

DanTup · 2023-05-02T17:29:52Z

Presumably we would also want to have a special flag for this in the JSON protocol which could be breaking, cc @DanTup . We could make it non-breaking by having it just be an additional flag on successful tests, but it might be cleaner if we just had 3 result states (success, failure, flaky).

If this behaviour is opt-in by the IDE, then presumably that wouldn't be breaking. If it could be enabled outside of the IDE, then having a third result state could be breaking and we might need to ship updates to handle this first.

There was mention of existing retry functionality above which I'm not sure I've seen before and I have no idea what Dart-Code will do for it. I'm happy to update the Dart extension/DAP to handle whatever you think makes most sense for package:test though.

FWIW, VS Code's API only allows us to report tests as passed, failed, errored, skipped (and queued). So we'd need to map the status to one of those at the end for its test runner anyway (I presume that Passed is the most reasonable, even though it doesn't make the flakes very visible).

jakemac53 · 2023-05-03T14:36:32Z

Ok it sounds like it would make sense to have this be a separate field and keep the current success/failure status then

jakemac53 · 2023-05-03T20:16:48Z

FWIW, VS Code's API only allows us to report tests as passed, failed, errored, skipped (and queued). So we'd need to map the status to one of those at the end for its test runner anyway (I presume that Passed is the most reasonable, even though it doesn't make the flakes very visible).

Would you be able to pop up a notification that some of the tests had been flaky?

DanTup · 2023-05-03T20:21:22Z

We can show toast notifications, but they could seem spammy if we show them too much. There are some cases where we have to spawn multiple debug sessions at once (if you select a bunch of tests across multiple files in the test explorer) so doing it at the end of each session could still trigger multiple.

We can probably find some way to do it (and have a Don't Show Again option), it'll just need a little thought.

jakemac53 added the needs-info Additional information needed from the issue author label May 1, 2023

github-actions bot removed the needs-info Additional information needed from the issue author label May 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flakiness detection #998

Flakiness detection #998

tvolkert commented Mar 14, 2019

jakemac53 commented Mar 14, 2019 •

edited

Loading

natebosch commented Mar 14, 2019

jakemac53 commented Mar 14, 2019

fzyzcjy commented May 1, 2023

jakemac53 commented May 1, 2023

fzyzcjy commented May 1, 2023

tvolkert commented May 1, 2023

pdblasi-google commented May 2, 2023

pdblasi-google commented May 2, 2023

DanTup commented May 2, 2023

jakemac53 commented May 3, 2023

jakemac53 commented May 3, 2023

DanTup commented May 3, 2023

Flakiness detection #998

Flakiness detection #998

Comments

tvolkert commented Mar 14, 2019

jakemac53 commented Mar 14, 2019 • edited Loading

natebosch commented Mar 14, 2019

jakemac53 commented Mar 14, 2019

fzyzcjy commented May 1, 2023

jakemac53 commented May 1, 2023

fzyzcjy commented May 1, 2023

tvolkert commented May 1, 2023

pdblasi-google commented May 2, 2023

pdblasi-google commented May 2, 2023

DanTup commented May 2, 2023

jakemac53 commented May 3, 2023

jakemac53 commented May 3, 2023

DanTup commented May 3, 2023

jakemac53 commented Mar 14, 2019 •

edited

Loading