-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: start process subreaper at top level to avoid shutdown hangs #380
Conversation
For example, if an exec command is running when you Ctrl-C the Pebble daemon, it hangs. It's hanging because the ServiceManager is stopping the reaper early in the process, before TaskRunner has had a chance to abort the exec tasks (aborting an exec task sends SIGKILL to its pid via the tomb and command context). Then when TaskRunner.Stop is called, it calls tomb.Kill and then tomb.Wait on each task (exec) tomb, and because the reaper's not running, the tomb.Wait hangs. So the fix is to move the reaper.Start and reaper.Stop to the top level (inside the "run" command, which is also used for "enter"), instead of putting them in servstate.NewManager and ServiceManager.Stop. After doing this, some of the tests also had to be modified to start and stop the reaper. Fixes canonical#163
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benhoyt I've created a issue for us to also start the reaper top level, as we have a different entrypoint, and with reaper init removed at the servstate level, we will have to do this, before we move our Pebble reference forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
PR #380 recently changed the way the reaper is stopped. If `runDaemon()` returns an error the `defer` method overwrites the `err` return value in `err = reaper.Stop()`. If `err` was already set to something else, this will in effect overwrite the non-`nil` `err` with either `nil` (if `reaper.Stop()` returns `nil`) or a different error (if `reaper.Stop()` returns a different error). Because this defer method is the first one in the function, it will get called last, so it's the one that will have `err` set on entry if the function returns a non-`nil` error. --------- Co-authored-by: Ben Hoyt <[email protected]>
For example, if an exec command is running when you Ctrl-C the Pebble daemon, it hangs.
It's hanging because the ServiceManager is stopping the reaper early in the process, before TaskRunner has had a chance to abort the exec tasks (aborting an exec task sends SIGKILL to its pid via the tomb and command context).
Then when TaskRunner.Stop is called, it calls tomb.Kill and then tomb.Wait on each task (exec) tomb, and because the reaper's not running, the tomb.Wait hangs.
So the fix is to move the reaper.Start and reaper.Stop to the top level (inside the "run" command, which is also used for "enter"), instead of putting them in servstate.NewManager and ServiceManager.Stop.
After doing this, some of the tests also had to be modified to start and stop the reaper.
Fixes #163 and fixes #284