Skip to content
This repository has been archived by the owner on Jan 9, 2024. It is now read-only.

fix: possible connection leak in pipeline when max_connections is reached #420

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Neon4o4
Copy link

@Neon4o4 Neon4o4 commented Dec 17, 2020

When max_connections is reached connection_pool.get_connection_by_node will throw an error (Too many connections). However other connections allocated before the error are clean and should be returned to connection pool.

@Grokzen
Copy link
Owner

Grokzen commented Dec 18, 2020

@Neon4o4 Is this the same behavior that redis-py is using? Also are you considering that it might not be that good to release connections to other server/nodes in your cluster as it seems like you are doing that as well and not only connections to a specific node?

@Neon4o4
Copy link
Author

Neon4o4 commented Dec 18, 2020

@Grokzen Thanks for the comment

Is this the same behavior that redis-py is using?

I think they are not the same. redis-py's pipeline keeps used connection with self.connection and releases it in reset() (called in finally block)
https://github.com/andymccurdy/redis-py/blob/master/redis/client.py#L4150

For redis-py-cluster, connections are released after read/writes. If error occurred when making new connections (e.g. Too many connections) these connections would be considered as used and will never be released.
https://github.com/Grokzen/redis-py-cluster/blob/master/rediscluster/pipeline.py#L233
I noticed the comments after read/writes about why not releasing connections in finally block and I do agree with it. However if error occurred even before any operation is performed on these connections, it should be safe to release them.

it might not be that good to release connections to other server/nodes in your cluster

Actually I'm trying to release the connections. I think they should be released because we run into an error when making new connections and these connections are not used and will not be used in this pipeline execution. Connections to other nodes will be considered as "used" if they are not released. If that happened we would never be able to use these connections in the future. And if we have max_connections set and got enough un-released connections, we would not be able to make any new connections with this cluster object. For some micro-service setups, there is only one cluster object per process.

Copy link

@alivx alivx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since that connection is a pool, I guess it's I guess it's better to use timeout option and retry in X time for any available connection.
What do you think?

@Neon4o4
Copy link
Author

Neon4o4 commented Dec 30, 2020

@alivx Hi, thanks for the comment.

use timeout option and retry in X time for any available connection

Sure. I will update the code.

Copy link
Owner

@Grokzen Grokzen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Neon4o4 I have looked through this and thought about it a little bit and i do not know if i am super happy with the case that all connections would be dropped upon this case. I would rather fix the possible leak at the source ofc with proper connection recycle when used but that is a bigger change not really in the scope of this PR. I think dropping all connections can cause a sudden rush/storm of connections required to be made all at once if you drop this in a multi threaded environment where multiple clients might run several pipelines or commands at once.

One random thought is that if you instead of recycling all connections just try to recycle one random connection and retry to acquire a new connection again until you get one. Yes it might mean that you will have to reach this case a lot and you might never really get out of being at around the max number of connections but it would solve the problem more gracefully at least. You could look into adding in a counted loop and attempt like 3 times before raising the exception again because we do not want to get deadlocks or infinite loops where you can get stuck in.

@Grokzen Grokzen added Fixes needed Update or fixes is needed from the author for the MR to be mergable labels Mar 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Fixes needed Update or fixes is needed from the author for the MR to be mergable
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants