-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Persistent storage for activity queue #31
Comments
Hello there @Nutomic. I would like to work on this issue. Should we go with using sled or look for alternative solutions? Some questions first:
Also, I have another comment which we could track on another issue: We have two servers: Disclaimer: I don't know if this implemented right now in Lemmy. From my understanding, it is not. Also, I don't know if this kind of behaviour is compatible with the ActivityPub protocol. |
My 2 cents regarding the downsides of Sled:
As lemmy_server already has a dependency on Postgres anyway, why not just leverage that? We are already writing all activites into the database anyway, so we already know that the database can handle it. If more speed is required and/or if reliability of data is not a huge concern, then I would still rather use Redis than Sled, just because Redis is easy to host on its own server, and has great tooling. |
We were discussing this on matrix, and @cetra3 , @sunaurus and me were all of the opinion a good first attempt at this could use PostgreSQL. It could look something like the following (pseudocode): --- uses the existing activity table for most of the data
CREATE TYPE federation_type as enum (comment, post, comment_vote, post_vote);
CREATE TYPE federation_state as enum (todo, in_progress, gave_up);
CREATE TABLE federation_queue (
id bigserial primary key,
activity_id not null references activity.id,
created timestamptz not null,
inbox_url text not null,
federation_type federation_type not null,
state federation_state not null,
retries bigint not null default 0,
); the main process would insert stuff into federation_queue. A (potentially separate) process would take out values using the following: WITH (SELECT id from federation_queue ORDER BY XXXX FOR UPDATE SKIP LOCKED LIMIT 1000) as batch
DO UPDATE federation_queue set state = 'in_progress' from batch where batch.id = federation_queue.id
RETURNING activity_id, id That way, we get the following:
Implementation: The postgresql part would probably be in lemmy. in activitypub-queue would be a trait something like: pub trait PersistentFederationQueue {
/// add an activity to the queue. todo: figure out interaction with the activity table since it's already stored in there on the lemmy side
pub fn queue(activity: ActivityJson, inboxes: Vec<Url>);
/// retrieves a batch from the database and sets its state to "in_progress". returns (id, activity as json, inbox_urls)
pub fn get_batch(limit: i64) -> Vec<(i64, ActivityJson, Vec<Url>);
/// deletes / marks as finished
pub fn mark_finished(ids: Vec<i64>);
/// requeues some requests that have failed (state in_progress back to todo). the db should track the the retry count and figure out the delays i guess
pub fn requeue(ids: Vec<i64>);
} |
Agree with phiresky, although the queue function needs look more like this: pub fn queue(task: SendActivityTask, num_retries: u8); The num_retries param can be used so that Lemmy can avoid storing the first and second retry (after 60s, 60m), because these would cause excessive db writes. In any case we currently need to make some major changes to the activity queue code in order to fix performance problems. This persistent storage would cause git conflicts and could worsen performance even more, so now is not a good time to implement it. |
I thought so as well, but one issue is that that prevents the whole dequeuing from running on separate processes/servers..
I guess, but also this would play a huge part in fixing those problems by allowing implementation of proper prioritization based on hosts and on activity type, as well as allowing the queue to be out of lemmy process memory and main lemmy cpu time. It's not just about the robustness. But yeah it would be much larger changes than simpler "fixes". |
I just realized something: There doesn't actually need to be a table that stores the send state for every inbox-activity combination like the It's enough if there's a very tiny table like this:
Then, for every receiver inbox, the dequeuer can just take the a batch of the next 1000 activities from the activities table, filter them by those that need to be received by this inbox, send them out, and update the last_successful_activity_id. If one of those activities fails to send then the others won't work either (just need to make sure it doesn't get stuck because of bugs). That should reduces the storage and perf needed out of the queueing system (e.g. postgresql) by instance_count so by like 300x. @cetra3 It does increase the processing load if you have many instances that only subscribe to a tiny subset of content, but probably worth it. |
Hey just wanted to check whether the implementation described above (this crate provides a For my personal use-case, BYO queue would be very helpful -- though if that ends up not being the actual implementation for whatever reason, I'll try and find a workaround of some description. |
@colatkinson Can you explain why you want to bring your own queue? For me the use case isnt clear. |
So in my case I'm already pulling in a different persistent distributed queuing system. While I haven't yet written a prototype of this part (so everything below should be considered speculative at best), my general inclination was to use it for sending activities as well. In addition, I currently support multiple RDBMSes -- so a dependency on any particular DB (be it Postgres or another) would be undesirable. Beyond that, I'd expect users of the crate to have fairly diverse needs -- a single-node bot may just need some embedded DB, whereas an application concerned with high scalability may want to use e.g. Kafka. Similarly, I can imagine some applications requiring custom behaviors around prioritization or retry intervals. That said, effectively my current plan is to apply a (vaguely hacky) patch downstream to expose a retry-less Hopefully this explains it somewhat -- let me know if there's anything I can clarify further. Also happy to help with implementation efforts, though I obviously don't want to interfere with any performance work happening in this area. |
Would #64 be in the right direction for what you're after? |
@colatkinson This crate is not going to have any direct database dependency, it will only expose a trait like For me its still not clear what the advantages of bring-your-own-queue are. If you want to get rid of retries, it would be much easier to add a setting for that. |
Ah there may have been a misunderstanding on my part -- I was assuming that bring-your-own-queue was referring to being able to implement a My interest is primarily in having application-defined persistent queuing. If the plan is to keep |
The queue for sending outgoing activities has an in-memory storage for activities that failed to be delivered and need to be retried later, when the target server is hopefully reachable again. As its only in memory, this storage is gone after a restart or crash. It would be good to provide a config option for storing it on disk, eg in a sled database.
The text was updated successfully, but these errors were encountered: