-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flakiness detection #998
Comments
Given that we already do have the retry logic, would this request be satisfied by:
The primary difference I see here compared to your proposal is you wouldn't have a command line option to control the number of retries (although you can disable retries with Note that you can also create different retry configurations using presets, so you can have different config on travis etc. |
What does our current reporting look like when using existing retry behavior? |
Currently it looks something like this:
|
+1 after 4 years. For example, Flutter itself has a dedicated article about detecting and reducing flakiness - https://github.com/flutter/flutter/wiki/Reducing-Test-Flakiness, thus flaky really should be handled separately instead of marking it as passed (or failed). My https://github.com/fzyzcjy/flutter_convenient_test does work well for integration tests, but when running widget tests / unit tests using IDE (e.g. intellij), I cannot rely on convenient_test and thus hope that package:test can have flaky support. Thanks! |
It doesn't look like we ever got any clarity here on whether the proposed UX would be sufficient -- cc @tvolkert . Presumably we would also want to have a special flag for this in the JSON protocol which could be breaking, cc @DanTup . We could make it non-breaking by having it just be an additional flag on successful tests, but it might be cleaner if we just had 3 result states (success, failure, flaky). |
Looks quite reasonable, I am willing to PR if it is clear |
@pdblasi-google is the right person on the Flutter side now 🙂 |
Taking inspiration from NUnit, would it make sense to have four possible states? NUnit has
|
|
If this behaviour is opt-in by the IDE, then presumably that wouldn't be breaking. If it could be enabled outside of the IDE, then having a third result state could be breaking and we might need to ship updates to handle this first. There was mention of existing retry functionality above which I'm not sure I've seen before and I have no idea what Dart-Code will do for it. I'm happy to update the Dart extension/DAP to handle whatever you think makes most sense for FWIW, VS Code's API only allows us to report tests as passed, failed, errored, skipped (and queued). So we'd need to map the status to one of those at the end for its test runner anyway (I presume that Passed is the most reasonable, even though it doesn't make the flakes very visible). |
Ok it sounds like it would make sense to have this be a separate field and keep the current success/failure status then |
Would you be able to pop up a notification that some of the tests had been flaky? |
We can show toast notifications, but they could seem spammy if we show them too much. There are some cases where we have to spawn multiple debug sessions at once (if you select a bunch of tests across multiple files in the test explorer) so doing it at the end of each session could still trigger multiple. We can probably find some way to do it (and have a Don't Show Again option), it'll just need a little thought. |
It'd be really helpful if the test runner could be configured to auto-rerun individual failed tests upon failure, and then if a test succeeded in a rerun invocation, flag it as flaky. Then the runner would report how many tests passed, how many failed, how many were skipped, and how many flaked. Along with this, we'd be able to configure whether flakes would trigger an overall passing result or failing result (via the exit code).
Upon failure or flakiness, stdout would report the failure of the first invocation, and if a subsequent invocation passed, it'd get reported as
[F]
for flaky.For example, configured to report success if flakes were detected:
... and configured to report non-success if flakes were detected:
Related (not not the same): #441
The text was updated successfully, but these errors were encountered: