-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test race conditions (including cases only seen under heavy cpu load) #181
Comments
Possibly related to #188 (comment) |
Not directly related, but I've seen "reaper not started messages" before, and this may be a source. This is part of my journey to make tests more reliable. |
@flotter A noble journey, fellow Pebbler. I hope you have a good steed! ;-) |
checkstate should now be bullet proof. |
|
Next one to tackle. |
PASS: manager_test.go:496: S.TestStartBadCommand 0.006s START: manager_test.go:630: S.TestStartFastExitCommand 2023-07-03T07:18:35.824Z [test] Service "test4" starting: echo -e 'too-fast\nsecond line' PASS: manager_test.go:630: S.TestStartFastExitCommand 0.006s START: manager_test.go:268: S.TestStartStopServices 2023-07-03T07:18:35.832Z [test] Service "test1" starting: /bin/sh -c "echo test1 | tee -a /tmp/check-327806957/37/log.txt; sleep 10" goroutine 300 [running]: |
START: request_test.go:44: S.TestStop START: manager_test.go:183: S.TearDownTest PASS: request_test.go:44: S.TestStop 0.000s START: manager_test.go:1101: S.TestStopDuringBackoff 2023-07-03T07:23:48.582Z [test] Service "test2" starting: sleep 0.1 PASS: manager_test.go:1101: S.TestStopDuringBackoff 0.104s START: manager_test.go:1718: S.TestStopRunning goroutine 357 [running]: |
#253 should now fix this. |
Servstate is next. |
#266 will address the common "reaper not started" type panics often seen during In addition to the reaper issues, I have started remove some of the timing related races, in this case only one specific to code reading file output from started services. However, tons of timing super sensitive code remains, so this is not all |
Further improvement for servstate: #283 |
@flotter Do you think this is good enough that we should just close this now? I haven't done stress tests recently, but I haven't seen these failures for a long time. |
We haven't seen these flakiness issues for ages (thanks in large part to @flotter's earlier work). Going to close this issue. We can open specific issues if we see more flakiness. |
I was getting what appears to be intermittent failures on CI test runners at times when github feels slow and under heavy load.
I spent some hours investigating intermittent issues locally with Pebble test, and I would like to use this issue to track my progress, and get some feedback.
My environment details are:
It is possible this is related to, but I have not directly encountered those:
In general, some of the service tests relies on output written to disk or the logger to verify weather a service performed the intended work. The strategy some of the tests follow is to 'ensure' the service is started, and then proceed to verifying the service output. However, this does not guarantee any work the service process itself is doing is complete.
internal/overlord/servstate/manager_test.go:
In some places there is a loop with arbitrary number of iterations to wait on the service status:
Reproducing
I can easily reproduce various failures in
internal/overlord/servstate/manager_test.go
Terminal 1:
Terminal 2:
The text was updated successfully, but these errors were encountered: