AWS ElastiCache Redis maintenance triggered disconnect from cluster #479
-
We are using AWS ElastiCache Redis configured in cluster mode with master and slave node pairs. Using a RedisCluster instance configured with read_from_replicas=True. During recent maintenance performed by AWS on the cache one of our applications was effectively disconnected from the cache and left in a state where it had no mapping of keys to nodes in the cache, resulting in KeyErrors. Mixed in with the KeyErrors were the following AttributeErrors that seem to indicate that the
The odd thing is we have multiple services using the same libraries and similar setup for the connection to ElastiCache Redis and similarly provisioned ElastiCache Redis clusters in AWS for each of these services. The same maintenance was performed on the other ElastiCache Redis clusters but only one of our services ended up in this odd state. Sequence of errors during AWS maintenance for a service that ended up in a disconnected state:
Sequence of errors during AWS maintenance for a service that recovered by itself:
Has anyone else seen similar behaviour? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
So in some older versions of this lib there has been issues where different threads have manipulated the |
Beta Was this translation helpful? Give feedback.
So in some older versions of this lib there has been issues where different threads have manipulated the
connection
object in different threads and destroyed them and in other threads not looking if that object was removed or invalid properly. There is several places where this is checked before running the functions to avoid this exception. If you are not running the last version you should try it and if you find new places where this None check is not added in then we should add it in.