Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

checkstate: make tests more robust #249

Merged
merged 3 commits into from
Jun 28, 2023
Merged

Commits on Jun 27, 2023

  1. checkstate: make check runners exit cleanly

    The checks manager uses contexts to cancel check runners. This happens
    during checkstate.PlanChanged() when the previous checkers are all
    terminated, and the new checker definition is applied.
    
    However, context cancellation is asynchronous (as highlighted in the
    context documentation), so the check runners will continue to complete
    even after the context cancellations returned.
    
    Due to this race condition, the runner can still launch a check even after
    the context cancellation is complete.
    
    This problem is reproduced here:
    https://gist.github.com/flotter/f6bf32d7cc33cb1d53a3ddc39fc3b64c
    
    The result of this is easily seen the following unit test failure:
    
    panic: internal error: reaper must be started
    
    goroutine 75 [running]:
    github.com/canonical/pebble/internals/reaper.StartCommand
    	/home/flotter/pebble/internals/reaper/reaper.go:164
    github.com/canonical/pebble/internals/overlord/checkstate.(*execChecker).check()
    	/home/flotter/pebble/internals/overlord/checkstate/checkers.go:170
    github.com/canonical/pebble/internals/overlord/checkstate.(*checkData).runCheck()
    	/home/flotter/pebble/internals/overlord/checkstate/manager.go:199
    github.com/canonical/pebble/internals/overlord/checkstate.(*checkData).loop()
    	/home/flotter/pebble/internals/overlord/checkstate/manager.go:182
    created by github.com/canonical/pebble/internals/overlord/checkstate.(*CheckManager).PlanChanged
    	/home/flotter/pebble/internals/overlord/checkstate/manager.go:74
    
    In this particular case, the problem is that test teardown stops the reaper
    process which is responsible for reaping check commands, while the final
    (edge case) run of the check is still completing.
    
    Following the same logic, this could also cause check overlap between terminating
    previous checks and starting new checks in checkstate.PlanChanged().
    
    The following changes fix the problem:
    
    - Use a WaitGroup to represent the group of active checks in the check manager
    
    - Wait() on the group to complete before creating the next group of checks
      when PlanChanged is called.
    
    - Each check runner will use a deferred exit to decrement the group counter.
    flotter committed Jun 27, 2023
    Configuration menu
    Copy the full SHA
    9100887 View commit details
    Browse the repository at this point in the history
  2. checkstate: make tests more robust

    The following changes are made:
    
    - Remove a check which depends on check output in TestTimeout as
      there is no way to remove the race between the 'echo FOO'
      completing before the timeout is triggered.
    
      This failure can easily be reproduced by running the test under
      heavy cpu load (i.e. stress --cpu 10 --io 4)
    
      manager_test.go:140:
          c.Assert(check.ErrorDetails, Equals, "FOO")
      ... obtained string = ""
      ... expected string = "FOO"
    
    - Make waitCheck() more conservative when waiting for a check
      runner iteration to complete. The worse case timeout can
      safely be much longer as this does not have a general impact
      on test duration.
    
      This failure can easily be reproduced by running the test under
      heavy cpu load (i.e. stress --cpu 10 --io 4)
    
      manager_test.go:134:
          check := waitCheck(c, mgr, "chk1", func(check *CheckInfo) bool {
              return check.Status != CheckStatusUp
          })
      manager_test.go:283:
          c.Fatalf("timed out waiting for check %q", name)
      ... Error: timed out waiting for check "chk1"
    flotter committed Jun 27, 2023
    Configuration menu
    Copy the full SHA
    d36ce7f View commit details
    Browse the repository at this point in the history

Commits on Jun 28, 2023

  1. Code review changes canonical#1

    flotter committed Jun 28, 2023
    Configuration menu
    Copy the full SHA
    c853715 View commit details
    Browse the repository at this point in the history