-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nsqd: REQ without altering attempts #380
Comments
@visionmedia are you proposing that the A few thoughts... One problem is that (according to And implementation wise they're disconnected - What do you think @jehiah |
yup, and I agree, it's weird but the ability to give the message back to NSQ side-effect free is definitely something we'll use a lot. Since the client handles discards anyway we could have yet-another client-side layer in redis that helps keep track of what is/isn't a real attempt, but all these layers are getting a little crazy haha. The other problem is that this is at a large scale, 5+ million messages in-flight at any given time so it eventually gets non-trivial to introduce tooling for the weird little edge-cases I definitely feel like a lot of these are pretty specific to us, and might warrant a fork but I just like bringing them up in case someone else has had similar issues. |
Another valid use-case: When we put redshift in maintenance mode or resize a cluster we need to requeue those messages with a delay, but this also shouldn't count towards it's number of attempts, otherwise we'll lose very large copies containing potentially millions of messages. Under normal circumstances one or two attempts is just fine so they're definitely separate cases IMO |
pause the channel while it's in maintenance mode 😄 - don't have the consumers pound it into the ground while performing an operational procedure on the cluster, right? |
it's a shared topic/channel ATM :( |
@jehiah care to weigh in on your thoughts here |
FWIW I'm rewriting the entire thing in Go over the weekend haha, changing how we're handling things now that I understand the edge-cases better. My first case isn't relevant anymore but the second use-case of clusters being under maintenance etc is still relevant |
The case you are talking about is where you consume a channel and messages in it fan out to N independent clusters and you are putting one of those clusters in maintenance and want a way to avoid burning your possible attempts against the cluster in maintenance while handling attempts normally for the other clusters. correct? The combination of consumer backoff, and per-message retry/backoff are entirely meant to deal w/ this state. (individual messages get retried at increasing delays to last beyond your maintenance window, and you process slower burning fewer retries even if messages are ready to be retried) If this is a special maintenance state, it sounds like you might be able to 'finish' these messages when they hit a cluster in maintenance and push them to a second topic/channel where you apply different (higher ) max retry attempts and probably a different strategy to requeueing and backoff. I think it's hard for nsq to give good primitives for more fine granular controls in this situation without an ability to tag messages with additional metadata that gets passed through. We've actively avoided that metadata because it's often more properly associated w/ the consumer (ie: which cluster a message maps to) rather than the producer. |
cluster == redshift cluster in this case, they have mandatory weekly scheduled downtime. If the backoff logic was tailored to user logic that might work ok, if cluster A is under maintenance it backs off and B trickles through fine. The second queue thing could work, more stuff to manage but it would work I guess |
I realize this point might be moot since you're moving things to |
Another nice thing you could use this for is to analyze what's in the queue without having any real effect on it. I guess you could FIN and PUB but that seems a little weird |
hmm I keep coming across more and more use-cases for this. Even if I pushed them to another nsqd or topic I need to process those per-client as well, and we have too many clients to have separate topics, so I have pretty much no choice but to REQ with a reasonable delay. It can take anywhere from 3 hours to 3 days to ETL this data though so I can't rely on a large REQ being good enough. Since lots of nsq relies on pushing logic to the client, I think it's reasonable to have this behaviour. Whether or not the client makes an actual attempt to process the data is up to the client I'd think. |
I need to think about this more. I still have implementation concerns and "does this belong in the core" concerns so I need to sift through those feelings and come up with a reasonable rebuttal or blessing. Anyone lurking who watches the repo and has any feelings on this now would be the time to weigh in 👍 or 👎 |
Possible hack: Another solution: Making some very broad assumptions about the problem you're trying to solve regarding acquiring locks, one possible solution would be to have a first topic/channel pair where the retry attempts is fairly high. The channel would on a per message basis attempt to acquire the lock and REQ if the lock is unavailable. Once the lock is available, publish the message to another topic/channel with said lock and have it process but this time with the small number of REQ attempts. |
yea I was thinking about FIN/PUB, I guess the downsides I can think of would be:
Reducing the REQ attempts would definitely help, but I guess for me there's a conflict with the idea of what makes an attempt. Does receiving it count as an attempt? Or does actually processing it make it one. Then it also makes you alter max_attempts to allow for these cases vs the "real" max_attempts that you'd want Might be able to rework things with a non-nsq solution but I thinkkkk this is still a legit thing, core or not is tricky though |
@visionmedia I haven't forgotten about this, I just wanted to get the stable release out the door to pave the way for focussing on new things... |
no worries! It's nothing too urgent on our end |
How about a NoAttempt method on messages and extending the protocol internally to include On consumer shutdown, all messages in flight could be remarked as not attempted and nsqd wouldn't penalize them with an attempt. This would also be nice for nsqio/go-nsq#96. |
@twmb I don't think the specific implementation was ever a question (and your suggestion makes a lot of sense). I think the question has always been does it fundamentally make sense to allow this? |
Lurking and chiming in here. At least as stated, 👎. @tj I realize this issue is > 1 year old and things may have changed. I'm not sure what having a shared topic/channel means in your context - do you have different message types which come through the same channel, and do some internal routing to a handler inside your code? I personally would change to a single purpose topic/channel, it's extremely useful to have operational control over a well defined type of messages. If you're already using some custom routing, why not have custom Regarding A general solution may be if you don't want the OOB way of counting |
we have some cases where we have to wait around for some distributed locks so I just keep requeueing the messages to allow messages over other types (that won't collide with those locks) to flow through. Problem is we also need a pretty low maxAttempts
The text was updated successfully, but these errors were encountered: