Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new test resource to mark flaky tests #130474

Open
tomasr8 opened this issue Feb 22, 2025 · 8 comments
Open

Add a new test resource to mark flaky tests #130474

tomasr8 opened this issue Feb 22, 2025 · 8 comments
Labels
pending The issue will be closed if no feedback is provided tests Tests in the Lib/test dir type-feature A feature request or enhancement

Comments

@tomasr8
Copy link
Member

tomasr8 commented Feb 22, 2025

Feature or enhancement

Proposal:

Came up in this issue #130363 and this comment: #130363 (comment)

How about a flaky resources. At least, we know which tests are meant to be flaky and which are not. And for the CI, we could ask the CI to rerun flaky tests on failures so that contributors don't need to wait for a triager/core dev to rerun a failed workflow when needed.

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

No response

Linked PRs

@encukou
Copy link
Member

encukou commented Feb 24, 2025

I don't think we should have any tests that are meant to be flaky.
If it's OK to ignore a test, why not just delete it?

@picnixz
Copy link
Member

picnixz commented Feb 24, 2025

If it's OK to ignore a test, why not just delete it?

In general, the flaky tests are those that are tests with threading, but now I wonder if it's actually better not to have a flaky resources but rather a marker that indicates that a test can be executed multiple times until it's considered to succeed on the CI.

We found some "flaky" references but in the end, they are rather scarce. So the argument "if it's flaky and we usually skip it", maybe it's better to remove them if we can't fix them (as @vstinner suggested for test_selflink)

@picnixz picnixz added the pending The issue will be closed if no feedback is provided label Feb 24, 2025
@encukou
Copy link
Member

encukou commented Feb 24, 2025

now I wonder if it's actually better not to have a flaky resources but rather a marker that indicates that a test can be executed multiple times until it's considered to succeed on the CI.

Ah, that does sound useful! Basically, make --fail-rerun the default except for the tests marked "flaky"?

@picnixz
Copy link
Member

picnixz commented Feb 24, 2025

Basically, make --fail-rerun the default except for the tests marked "flaky

Actually, not entirely. Make the tests run, and if some of them fail but are marked as flaky in the skip list (to borrow Victor's terminology) rerun them more than just once. Sometimes, even rerunning them once is not sufficient. In general, the CI fails on tests because of the system being busy (perhaps because of other tests).

The workflow I had in my head is:

  • Run all tests normally once.
  • Run all failed tests normally (that's what we do now I think?)
  • If some of the failed tests are known to be flaky, rerun them again (say 3-5 times).
  • Report the tests that really failed (namely non-flaky tests that failed in step 2 and flaky tests that failed after lots of repetitions).

Context: I'm a bit tired of hitting the "rerun" button everytime and/or having a notification telling me that an unrelated test has failed!

@vstinner
Copy link
Member

Context: I'm a bit tired of hitting the "rerun" button everytime and/or having a notification telling me that an unrelated test has failed!

Which tests are failing randomly? If nobody fix them, the situation will not become better.

@picnixz
Copy link
Member

picnixz commented Feb 24, 2025

In general, the tests that I need to rerun are the test.test_concurrent_futures* or test.test_multiprocessing* ones. Usually on the free-threaded build and sometimes on TSAN/ASAN.

For instance: https://github.com/python/cpython/actions/runs/13242348106/job/37023611044?pr=129175, or https://github.com/python/cpython/actions/runs/13225352402/job/36915529351?pr=129175 or https://github.com/python/cpython/actions/runs/13205748354/job/36868408015?pr=129785. Agreed that those are all free-threading related and/or threading related but I'm not sure we can actually fix them at once.

For instance, https://github.com/python/cpython/actions/runs/13205748354/job/36868408015?pr=129785#step:6:516 is a failure I've usually seen but maybe it can be fixed as the second issue is that the environment seems to have changed (which then results in an overall failure I think?)

Another example is https://github.com/python/cpython/actions/runs/13205748354/job/36868408015?pr=129785#step:6:516 where test.test_concurrent_futures.test_interpreter_pool fails. However, it could be because there are other tests that are running simultaneously that it's failing, or because we didn't wait enough time between two runs.

And another example is https://github.com/python/cpython/actions/runs/12927858394/job/36053953144?pr=127676#step:12:58 which is a socket test so could be flaky for whatever reason.


If you don't really like the idea of a skip list, I'll note which tests I'm rerunning when I do so that we can fix them. I should have done that sooner but it didn't occur to me that the tests should be really fixed (since they pass most of the time, like 95% or more, I was biased and assumed that they were just flaky and that's it).

@vstinner
Copy link
Member

For instance: https://github.com/python/cpython/actions/runs/13242348106/job/37023611044?pr=129175

FAIL: test_4_daemon_threads (test.test_threading.ThreadJoinOnShutdown.test_4_daemon_threads)

This test is known to be flaky: issue gh-110052. It should be skipped.

or https://github.com/python/cpython/actions/runs/13225352402/job/36915529351?pr=129175

It looks like a real bug, you should open an issue:

0:04:15 load avg: 7.63 [20/23/2] test_free_threading worker non-zero exit code (Exit code -6 (SIGABRT)) -- running (2): test_socket (1 min 38 sec), test_threading (38.8 sec)
python: Objects/obmalloc.c:1219: void process_queue(struct llist_node *, struct _qsbr_thread_state *, _Bool, delayed_dealloc_cb, void *): Assertion `buf->rd_idx == buf->wr_idx' failed.
Fatal Python error: Aborted

<Cannot show all threads while the GIL is disabled>
Stack (most recent call first):
  File "/home/runner/work/cpython/cpython/Lib/test/test_free_threading/test_dict.py", line 184 in writer_func
  File "/home/runner/work/cpython/cpython/Lib/threading.py", line 996 in run
  File "/home/runner/work/cpython/cpython/Lib/threading.py", line 1054 in _bootstrap_inner
  File "/home/runner/work/cpython/cpython/Lib/threading.py", line 1016 in _bootstrap

Extension modules: _testinternalcapi, _testcapi (total: 2)

or https://github.com/python/cpython/actions/runs/13205748354/job/36868408015?pr=129785. Agreed that those are all free-threading related and/or threading related but I'm not sure we can actually fix them at once.

test_wrapped_exception (test.test_multiprocessing_spawn.test_threads.WithThreadsTestPool.test_wrapped_exception) ... Warning -- Uncaught thread exception: ValueError
(...)
ValueError: concurrent send_bytes() calls are not supported

It looks like a bug in the test.

@picnixz
Copy link
Member

picnixz commented Feb 24, 2025

For

test_wrapped_exception (test.test_multiprocessing_spawn.test_threads.WithThreadsTestPool.test_wrapped_exception) ... Warning -- Uncaught thread exception: ValueError

I'm actually unsure that it's the actual error. The two tests that failed were rerun in https://github.com/python/cpython/actions/runs/13205748354/job/36868408015?pr=129785#step:6:1165 and it appears that they are related to conditions:

0:27:43 load avg: 0.00 Re-running 2 failed tests in verbose mode in subprocesses
0:27:43 load avg: 0.00 Run 2 tests in parallel using 2 worker processes (timeout: 10 min, worker timeout: 15 min)
0:27:49 load avg: 0.02 [1/2] test_threading passed
Re-running test_threading in verbose mode (matching: test_timeout)
test_timeout (test.test_threading.BarrierTests.test_timeout)
Test wait(timeout) ... ok
test_timeout (test.test_threading.CRLockTests.test_timeout) ... ok
test_timeout (test.test_threading.ConditionAsRLockTests.test_timeout) ... ok
test_timeout (test.test_threading.ConditionTests.test_timeout) ... ok
test_timeout (test.test_threading.EventTests.test_timeout) ... ok
test_timeout (test.test_threading.LockTests.test_timeout) ... ok
test_timeout (test.test_threading.PyRLockTests.test_timeout) ... ok

For the error you linked, I think it's a warning omission (https://github.com/python/cpython/actions/runs/13205748354/job/36868408015?pr=129785#step:6:1258):

test_wrapped_exception (test.test_multiprocessing_spawn.test_threads.WithThreadsTestPool.test_wrapped_exception) ... Warning -- Uncaught thread exception: ValueError

I'll try to remember opening an issue for python/cpython/actions/runs/13225352402/job/36915529351?pr=129175

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending The issue will be closed if no feedback is provided tests Tests in the Lib/test dir type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

4 participants