Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: restart services failed within okay delay #520

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

IronCore864
Copy link
Contributor

@IronCore864 IronCore864 commented Nov 15, 2024

Restart services failed within the okay delay period.

Fixes #240.

See previous discussion here, approved spec here, previous PoC here, here, and here.

@IronCore864
Copy link
Contributor Author

It seems the test case TestStartFastExitCommand in manager_test.go has some race conditions.

After some testing, it might be the case that the setup function writes killDelayDefault and startInternal reads it. Need some help here, is it the root cause, and if yes, how to fix?

$ go test -race -c ./internals/overlord/servstate/
$ ./servstate.test -check.v -check.f ^S\.TestStartFastExitCommand$
2024-11-15T05:47:00.717Z [test] Service "test4" starting: echo -e 'too-fast\nsecond line'
2024-11-15T05:47:00.718Z [test] Service "test4" on-success action is "restart", waiting ~500ms before restart (backoff 1)
2024-11-15T05:47:00.718Z [test] Change 1 task (Start service "test4") failed: service start attempt: exited quickly with code 0, will retry
PASS: internals/overlord/servstate/manager_test.go:634: S.TestStartFastExitCommand	0.006s
OK: 1 passed
PASS
==================
WARNING: DATA RACE
Read at 0x000000a405f8 by goroutine 22:
  github.com/canonical/pebble/internals/overlord/servstate.(*serviceData).startInternal()
      /home/ubuntu/work/pebble3/internals/overlord/servstate/handlers.go:433 +0xe90
  github.com/canonical/pebble/internals/overlord/servstate.(*serviceData).backoffTimeElapsed()
      /home/ubuntu/work/pebble3/internals/overlord/servstate/handlers.go:725 +0xf8
  github.com/canonical/pebble/internals/overlord/servstate.(*serviceData).doBackoff.func1()
      /home/ubuntu/work/pebble3/internals/overlord/servstate/handlers.go:604 +0x2c

Previous write at 0x000000a405f8 by goroutine 21:
  github.com/canonical/pebble/internals/overlord/servstate_test.(*S).SetUpTest.FakeKillFailDelay.func3()
      /home/ubuntu/work/pebble3/internals/overlord/servstate/export_test.go:81 +0x38
  github.com/canonical/pebble/internals/testutil.(*BaseTest).TearDownTest()
      /home/ubuntu/work/pebble3/internals/testutil/base.go:38 +0xc4
  github.com/canonical/pebble/internals/overlord/servstate_test.(*S).TearDownTest()
      /home/ubuntu/work/pebble3/internals/overlord/servstate/manager_test.go:180 +0x84
  runtime.call16()
      /usr/local/go/src/runtime/asm_arm64.s:503 +0x74
  reflect.Value.Call()
      /usr/local/go/src/reflect/value.go:380 +0x90
  gopkg.in/check%2ev1.(*suiteRunner).runFixture.func1()
      /home/ubuntu/go/pkg/mod/gopkg.in/[email protected]/check.go:724 +0x100
  gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1()
      /home/ubuntu/go/pkg/mod/gopkg.in/[email protected]/check.go:669 +0xcc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pebble does not try to restart service if exited too quickly for the first time
1 participant