Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
daemon: remove listener closing race
We use http.Server with a number of custom net.Listener instances. This patch makes some changes to the daemon.Stop(...) function. In order for the server instance to Shutdown() gracefully, the listener termination must be performed by the shutdown process itself. The server shutdown process is thread-safe, and tracks the listeners in use. Thread safety is important because the http.Server has two places where listeners can be terminated: 1. The thread hosting the Serve() method, using a deferred mutex protected closing function for all registered listeners. 2. The Shutdown() method (with mutex protection), which will close listeners async resuling in an error from the listener Accept(), but this special case is handled in Serve() and returns the ErrServerClosed error which means shutdown was done cleanly. However, in daemon.go we also pre-close all the listeners before the call to Shutdown(), which causes two kinds of errors: 1. accept tcp [::]:<port>: use of closed network connection This error is actually masked because of tomb.Kill() (when state is dying) for the termination of the daemon. See the daemon.Start() exit code. 2. close tcp [::]:<port>: use of closed network connection This is the actual race condition symptom that can sometimes be seen in CI logs. Since the listeners are pre-closed, this error is expected. However, errors discovered during deferred closure in the Serve() thread are discarded while errors in Shutdown() as reported. The race is between the two closing paths. If the Shutdown path wins the sync.Once race, we see the error. If the deferred Serve() path wins, we see nothing. This race is only triggered because we manually close the listener first. A standalone simulation of the problem can be seen here: https://gist.github.com/flotter/483d233d38f0961a52f7ef7405887012 The following changes are introduced: - Remove the early manual listener closing code so its properly done by the Shutdown() call.
- Loading branch information