Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DLQ Service (GSI-1073) #128

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

Conversation

TheByronHimes
Copy link
Member

Contains the plan for building a DLQ service. This iteration has no web UI, only a REST API that returns JSON.
We will be able to see what the next events are in the DLQ topic for a given service, then resolve them one at a time.

@lkuchenb
Copy link
Member

lkuchenb commented Nov 5, 2024

Thanks a lot for the comprehensive spec, good read!

I have questions, maybe we can sit together with @Cito once he's seen the doc as well.

  • Is using one unified DLQ topic per service really a good idea? Say a service subscribes to two topics, all failures would go into one DLQ. If we wanted to resolve issues on one topic only, we'd still have to process messages in the DLQ in order if I understand your proposal correctly.
  • The spec is rather vague around the "processing" part. Wouldn't the most straight forward way of processing be that the POST endpoint accepts a modified payload for the message? That would immediately enable manual intervention but also automated intervention in form of scripts talking to the DLQS API.
  • Have you considered compaction in the context of DLQ? With the proposed approach the DLQ topic cannot be compacted because it holds messages from different original topics, potentially using identical keys. Do we want to keep all events that failed while reading from a compacted (original) topic? I guess there might be arguments for that and the current proposal allows that.
  • We might want to mention that this service cannot be scaled and that only one partition should be configured for a DLQ topic.
  • We should have a look at the API vocabulary and semantics

@TheByronHimes
Copy link
Member Author

TheByronHimes commented Nov 6, 2024

@lkuchenb thanks for the insight.
The concern with one DLQ topic per service is valid, and your understanding is correct. We can rework hexkit to instead assume a DLQ topic for each normal topic and automate it to use some suffix. The DLQS API could still provide whole-service preview or per-topic preview then, and it would alleviate the buried-high-priority event problem. Presumably we would want a separate DLQ topic per normal topic per service? We'd also need to think about how to manage config for the different topics in the DLQS.

In the proposed solution, I had assumed we would not use compacted DLQ topics. However, if we used a 1:1 DLQ/normal topic arrangement, then we could use compacted DLQ topics. The problem of identical keys for different event types in the one topic had not occurred to me though -- that would be a problem.

The endpoint for directly posting an updated event is something I originally included in this spec, actually. However, I removed it because I thought it would be too prone to error. But you're right that it is the fastest way to resolve DLQ events. It's trivial to implement if that's something we want.

And no, it can't be scaled. We're the bottleneck.

@Cito
Copy link
Member

Cito commented Nov 6, 2024

Would it make sense to add the original topic name as a kind of prefix to the key in the DLQ and later remove it again?

@TheByronHimes
Copy link
Member Author

Would it make sense to add the original topic name as a kind of prefix to the key in the DLQ and later remove it again?

That would probably work fine if you're referring to the compacted topic key clashing.

Cito
Cito previously approved these changes Nov 22, 2024
Copy link
Member

@Cito Cito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, see two small comments.

Maybe you should also clarify what you mean with "internal token" for auth.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants