Skip to content

Commit

Permalink
daemon: remove listener closing race
Browse files Browse the repository at this point in the history
We use http.Server with a number of custom net.Listener instances.

This patch makes some changes to the daemon.Stop(...) function.

In order for the server instance to Shutdown() gracefully, the listener
termination must be performed by the shutdown process itself. The
server shutdown process is thread-safe, and tracks the listeners in
use. Thread safety is important because the http.Server has two places
where listeners can be terminated:

1. The thread hosting the Serve() method, using a deferred mutex
   protected closing function for all registered listeners.

2. The Shutdown() method (with mutex protection), which will close
   listeners async resuling in an error from the listener Accept(),
   but this special case is handled in Serve() and returns the
   ErrServerClosed error which means shutdown was done cleanly.

However, in daemon.go we also pre-close all the listeners before the
call to Shutdown(), which causes two kinds of errors:

1. accept tcp [::]:<port>: use of closed network connection

This error is actually masked because of tomb.Kill() (when state is dying)
for the termination of the daemon. See the daemon.Start() exit code.

2. close tcp [::]:<port>: use of closed network connection

This is the actual race condition symptom that can sometimes be seen in
CI logs. Since the listeners are pre-closed, this error is expected. However,
errors discovered during deferred closure in the Serve() thread are discarded
while errors in Shutdown() as reported.

The race is between the two closing paths. If the Shutdown path wins the
sync.Once race, we see the error. If the deferred Serve() path wins, we see
nothing.

This race is only triggered because we manually close the listener first.

A standalone simulation of the problem can be seen here:

https://gist.github.com/flotter/483d233d38f0961a52f7ef7405887012

The following changes are introduced:

- Remove the early manual listener closing code so its properly done by
  the Shutdown() call.
  • Loading branch information
flotter committed Jul 7, 2023
1 parent 8e96c74 commit 60be0c1
Showing 1 changed file with 0 additions and 9 deletions.
9 changes: 0 additions & 9 deletions internals/daemon/daemon.go
Original file line number Diff line number Diff line change
Expand Up @@ -568,17 +568,8 @@ func (d *Daemon) Stop(sigCh chan<- os.Signal) error {
restartSocket := d.restartSocket
d.mu.Unlock()

d.generalListener.Close()
d.standbyOpinions.Stop()

if d.untrustedListener != nil {
d.untrustedListener.Close()
}

if d.httpListener != nil {
d.httpListener.Close()
}

if restartSystem {
// give time to polling clients to notice restart
time.Sleep(rebootNoticeWait)
Expand Down

0 comments on commit 60be0c1

Please sign in to comment.