Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pub/Sub connections can stop functioning without error #295

Closed
stevenh opened this issue Nov 21, 2017 · 8 comments
Closed

Pub/Sub connections can stop functioning without error #295

stevenh opened this issue Nov 21, 2017 · 8 comments

Comments

@stevenh
Copy link
Collaborator

stevenh commented Nov 21, 2017

If a network event occurs which partitions the server from the client then the client can end up with a connection which is half-closed, in this state the connection will remain broken unless a write is performed.

This isn't an issue for normal connections which issue commands as this involves a write, however for Pub/Sub connections this is a serious problem as they will currently never recover as there is no way for such connections to detect the fact that the TCP session is in a broken state.

We had this happen when we experienced VMWare switch migration, all Pub/Sub connections where shutdown by the server but our golang daemons never noticed, they just remained stuck in Receive().

@garyburd
Copy link
Member

Use the PING command to detect broken connections.

@stevenh
Copy link
Collaborator Author

stevenh commented Nov 21, 2017

While this is possible it would add quite a bit of complexity to any use Pub/Sub, as it would require an new go routine and all the synchronisation which goes along with that.

If Pub/Sub Recieve() was channel based then it would be easier as we could use a select to multiplex the recieve with a ticker that could trigger a Ping() something like.

t := time.NewTicker(time.Minute)
defer t.Stop()
for {
     select {
     case v := <-psc.Receive():
            // Handle message, pong, error etc
     case <- t.C:
           if err := psc.Ping(“”); err != nil {
                 // Handle recovery
           }
     }
}

Either way having to explicitly add Ping() support adds extra complexity to every use of Pub/Sub. With this in mind I think the cleanest solution is to to enable keep-alive on the connections.

This approach also means all users of redigo will benefit as soon as they update, with no need to add any additional code. This is also the method used the redis-server as of v3.2, so I would like to here your thoughts on our PR which adds connection keep-alive

@garyburd
Copy link
Member

I will accept the PR, but PING is the more robust solution to the problem. PING woks through proxies and detects stuck servers.

Regarding channel based API: It's a few lines of application code to start a goroutine that reads from a connection and sends to a channel.

@stevenh
Copy link
Collaborator Author

stevenh commented Nov 21, 2017

Thanks @garyburd

I know that keep-alive can have issues with proxies, although not sure how many users that would actually impact, so do you think it would be worth building it in to Recieve or providing a simple way to facilitate managing a “pinger”?

Either way I think it would be good to improve the docs around Pub/Sub to mention this, as its far from obvious so I wouldn’t be surprised if a lot of users are unaware of this potential problem.

@garyburd
Copy link
Member

The pinger is a four or five extra lines of code for the application. Building the pinger into Receive adds an unnecessary goroutine.

I'll add a pinger to the example.

@garyburd
Copy link
Member

I agree that many people do not know how to check for liveness of the connection and server. Another problem is that it's not well understood how to cancel receive on a connection in pubsub mode. I added #296 to address these issues.

@stevenh
Copy link
Collaborator Author

stevenh commented Nov 22, 2017

Sounds good @garyburd 👍

@garyburd
Copy link
Member

I updated the pubsub example to include health checks with PING. https://godoc.org/github.com/garyburd/redigo/redis#example-PubSubConn

@gomodule gomodule locked and limited conversation to collaborators Aug 29, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants