Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

checkstate: make tests more robust #249

Merged
merged 3 commits into from
Jun 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion internals/overlord/checkstate/manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ import (
// CheckManager starts and manages the health checks.
type CheckManager struct {
mutex sync.Mutex
wg sync.WaitGroup
checks map[string]*checkData
failureHandlers []FailureFunc
}
Expand Down Expand Up @@ -58,6 +59,12 @@ func (m *CheckManager) PlanChanged(p *plan.Plan) {
for _, check := range m.checks {
check.cancel()
}
// Wait for all context cancellations to propagate and allow
// each goroutine to cleanly exit.
m.wg.Wait()

// Set the size of the next wait group
m.wg.Add(len(p.Checks))

// Then configure and start new checks.
checks := make(map[string]*checkData, len(p.Checks))
Expand All @@ -71,7 +78,10 @@ func (m *CheckManager) PlanChanged(p *plan.Plan) {
action: m.callFailureHandlers,
}
checks[name] = check
go check.loop()
go func() {
defer m.wg.Done()
check.loop()
}()
}
m.checks = checks
}
Expand Down
27 changes: 19 additions & 8 deletions internals/overlord/checkstate/manager_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -45,12 +45,14 @@ func (s *ManagerSuite) SetUpSuite(c *C) {
setLoggerOnce.Do(func() {
logger.SetLogger(logger.New(os.Stderr, "[test] "))
})
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved reaper.Start() and reaper.Stop() to test level. The command cleanup should happen before the next test starts, and it also provides a more robust test environment for race conditions, as now each test must cleanly exit before the reaper is stopped.

Note that the reaper is bound to the package level test binary process ID, and therefore moving it to the test setup and teardown make it incompatible for parallel testing. However, parallel testing is strictly opt-in, so this is still a valid requirement for this particular package tests.


func (s *ManagerSuite) SetUpTest(c *C) {
err := reaper.Start()
c.Assert(err, IsNil)
}

func (s *ManagerSuite) TearDownSuite(c *C) {
func (s *ManagerSuite) TearDownTest(c *C) {
err := reaper.Stop()
c.Assert(err, IsNil)
}
Expand Down Expand Up @@ -137,7 +139,6 @@ func (s *ManagerSuite) TestTimeout(c *C) {
c.Assert(check.Failures, Equals, 1)
c.Assert(check.Threshold, Equals, 1)
c.Assert(check.LastError, Equals, "exec check timed out")
c.Assert(check.ErrorDetails, Equals, "FOO")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As explained in the commit message, this check is too racey. There is no way there is any guarantee this would be done before the timeout cuts off the command execution. Given that this test focuses on the timeout mechanism, and not command output logging, I feel it is justified to remove this to improve test robustness.

}

func (s *ManagerSuite) TestCheckCanceled(c *C) {
Expand All @@ -161,17 +162,15 @@ func (s *ManagerSuite) TestCheckCanceled(c *C) {
},
})

// Wait for command to start (output file grows in size)
prevSize := 0
// Wait for command to start (output file is not zero in size)
Copy link
Contributor Author

@flotter flotter Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous logic did not make 100% sense, so I simplified it a bit.

c.MkDir() is test managed and removed after the test. The tempfile is therefore always zero to start with, and so we can simply check for a non-zero value. In the original code, the prevSize = len(b) appeared redundant as the previous check would break out before it could happen for any non-zero value.

for i := 0; ; i++ {
if i >= 100 {
c.Fatalf("failed waiting for command to start")
}
b, _ := ioutil.ReadFile(tempFile)
if len(b) != prevSize {
if len(b) > 0 {
break
}
prevSize = len(b)
time.Sleep(time.Millisecond)
}

Expand All @@ -185,7 +184,6 @@ func (s *ManagerSuite) TestCheckCanceled(c *C) {
stopChecks(c, mgr)

// Ensure command was terminated (output file didn't grow in size)
time.Sleep(50 * time.Millisecond)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stopChecks() calls PlanChanged(), which terminates the previous checks synchronously now, so this is not needed.

b1, err := ioutil.ReadFile(tempFile)
c.Assert(err, IsNil)
time.Sleep(20 * time.Millisecond)
Expand Down Expand Up @@ -269,8 +267,20 @@ func (s *ManagerSuite) TestFailures(c *C) {
c.Assert(failureName, Equals, "")
}

// waitCheck is a time based approach to wait for a checker run to complete.
// The timeout value does not impact the general time it takes for tests to
// complete, but determines a worst case waiting period before giving up.
// The timeout value must take into account single core or very busy machines
// so it makes sense to pick a conservative number here as failing a test
// due to a busy test resource is more extensive than waiting a few more
// seconds.
func waitCheck(c *C, mgr *CheckManager, name string, f func(check *CheckInfo) bool) *CheckInfo {
for i := 0; i < 100; i++ {
// Worst case waiting time for checker run(s) to complete. This
// period should be much longer (10x is good) than the longest
// check timeout value.
timeout := time.Second * 10

for start := time.Now(); time.Since(start) < timeout; {
checks, err := mgr.Checks()
c.Assert(err, IsNil)
for _, check := range checks {
Expand All @@ -280,6 +290,7 @@ func waitCheck(c *C, mgr *CheckManager, name string, f func(check *CheckInfo) bo
}
time.Sleep(time.Millisecond)
}

c.Fatalf("timed out waiting for check %q", name)
return nil
}
Expand Down