-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PHP - Retry through proxy is not successful #427
Comments
default value for server_retry_timeout is 30 seconds. So, if you are retrying your request after 30 seconds, it might end up going to the same server. Try setting it to a bigger value. Also use a sane value for the timeout key. |
I am storing css/js output in cache for fast retrieval so I cannot wait 30 seconds to retry, the retry is almost instantaneous for obvious reasons. But it still tries to connect to the know bad server. "timeout key" = the key expiry? We are also storing sessions in memcached, so the expiry usually runs 15 minutes. |
have you read this -- https://github.com/twitter/twemproxy/blob/master/notes/recommendation.md; most of your answers can be found here |
Of course, specifically the "liveness" section, did you see my question on SO? |
yes; use a server_retry_timeout and set it to 300000. server_retry_timeout option controls how long an auto_ejected server is kept ejected |
I understand that. What I am saying is that at 300001, the bad cache node is re-considered for re-entry back into the pool. However the request at 300001 will "break", because the server is still not online. To recover from this breakage, the retry mechanism at the app layer tries to re-execute the same command, as the expectation is the bad cache node will now have been re-ejected from the pool, and the retry will go to a known good cache node. |
yup, that is correct - if you set it up properly as mentioned in recommendation.md, you won't encounter this issue. alternatively this patch: #29 will also solve your issue |
" if you set it up properly as mentioned in recommendation.md", could you be more specific? The nutcracker.yaml file is posted in the SO link in the OP. What is wrong with that config that would cause the aforementioned behavior? |
basically you want to do is tradeoff "application level retries" for "server_retry_timeout". This section talks about it: https://github.com/twitter/twemproxy/blob/master/notes/recommendation.md#liveness |
That's what I am trying to tell you. The application is trying 3 times to set an item in the cache store via twemproxy. First try it fails due to failed node, 2nd and 3rd time it tries, it fails again b/c twemproxy is sending the request to the same failed server, when the failed server should have been ejected. |
The issue is that by the time second retry arrives at twemproxy, the ejected server has already been added to the server pool. So you need to set server_retry_timeout to a value greater than 30 seconds (or some value that lets your retries to succeed). Let say you set the server_retry_timeout to 30 seconds and server_failure_limit to 2,
hth |
I appreciate the help. I guess that's where my confusion lies. Because we are dealing with a caching layer, which is expected to be extremely fast, I cannot wait 10 seconds to re-issue the 2nd retry. The 2nd retry has to be in milliseconds. Based on docs, and what you said, if I set server_failure_limit to 1, and keep server_retry_timeout at 30 this is what is supposed to happen:
|
@digitalprecision this patch try restore failed node after checking. |
Sweet, I'll give her a go and let you know. Thanks. Just curious, when do you think the heartbeat patch will make it into master? |
@digitalprecision sorry, Actually, I don't know. maybe It depends on other persons' review and usage :) |
Hmmmm, with that being said, would you say the heartbeat branch is stable? If this does work, it wouldn't be ideal to run a fork off master in production environment for too long. |
@digitalprecision not experimental :) but we always need test :) |
Actually I am going to pass on the heartbeat branch. System stability is of utmost importance and manually compiling the heartbeat fork is forcing me to update a slew of packages which aren't available in upstream repos (CentOS 6.7, rpmforge, epel). At this point I am going to go with the following config settings:
I'd rather have a server be out of pool for 10 minute then dealing with breakage, and considering that it's low % chance of a cache node actually being down, shouldn't be too painful. But I would recommend merging the heartbeat patch into master ASAP. Otherwise, other organizations are under the false pretense that an app level retry can recover from a non-responsive cache node. |
Curious on thoughts @manjuraj |
The heartbeat patch hasn't been merged into twitter/twemproxy yet, but is planned for 0.6.0 - #608 |
http://stackoverflow.com/questions/33487641/twitter-twemproxy-retry-not-working-as-expected
Wondering if anyone had insight into this? I even tried to sleep and create a new instance and I can't get it to hit a known good cache node. I set server retry to 1, and in code have retries set to 2 per the docs ( retries have to be > server retry).
The text was updated successfully, but these errors were encountered: