Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
checkstate: make check runners exit cleanly
The checks manager uses contexts to cancel check runners. This happens during checkstate.PlanChanged() when the previous checkers are all terminated, and the new checker definition is applied. However, context cancellation is asynchronous (as highlighted in the context documentation), so the check runners will continue to complete even after the context cancellations returned. Due to this race condition, the runner can still launch a check even after the context cancellation is complete. This problem is reproduced here: https://gist.github.com/flotter/f6bf32d7cc33cb1d53a3ddc39fc3b64c The result of this is easily seen the following unit test failure: panic: internal error: reaper must be started goroutine 75 [running]: github.com/canonical/pebble/internals/reaper.StartCommand /home/flotter/pebble/internals/reaper/reaper.go:164 github.com/canonical/pebble/internals/overlord/checkstate.(*execChecker).check() /home/flotter/pebble/internals/overlord/checkstate/checkers.go:170 github.com/canonical/pebble/internals/overlord/checkstate.(*checkData).runCheck() /home/flotter/pebble/internals/overlord/checkstate/manager.go:199 github.com/canonical/pebble/internals/overlord/checkstate.(*checkData).loop() /home/flotter/pebble/internals/overlord/checkstate/manager.go:182 created by github.com/canonical/pebble/internals/overlord/checkstate.(*CheckManager).PlanChanged /home/flotter/pebble/internals/overlord/checkstate/manager.go:74 In this particular case, the problem is that test teardown stops the reaper process which is responsible for reaping check commands, while the final (edge case) run of the check is still completing. Following the same logic, this could also cause check overlap between terminating previous checks and starting new checks in checkstate.PlanChanged(). The following changes fix the problem: - Use a WaitGroup to represent the group of active checks in the check manager - Wait() on the group to complete before creating the next group of checks when PlanChanged is called. - Each check runner will use a deferred exit to decrement the group counter.
- Loading branch information