Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor thread fetching into an iterative operation #538

Open
aumetra opened this issue May 20, 2024 · 5 comments
Open

Refactor thread fetching into an iterative operation #538

aumetra opened this issue May 20, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@aumetra
Copy link
Member

aumetra commented May 20, 2024

Is your feature request related to a problem? Please describe.

Right now thread fetching is a recursive operation and overflows the stack quite quickly. We already introduced the recursion limit but it already overflows when only fetching 10 replies.

This isn't ideal.

Describe the solution you'd like

Technically every recursive algorithm can be rewritten as an iterative one and vice versa. This is what we need to do here. Make the thread fetching iterative instead.

@aumetra aumetra added the enhancement New feature or request label May 20, 2024
@tesaguri
Copy link
Contributor

I think the ideal handling of threads is to use an asynchronous job worker like Mastodon's ThreadResolveWorker instead of occupying a task until the thread is fully resolved, iterative or not.

Maybe this would require the database schema to be aware of replies whose in_reply_to_id is yet to be resolved.

@aumetra
Copy link
Member Author

aumetra commented Jul 22, 2024

Yeah, good point. But even if it's run in a background task, I'd personally prefer an iterative operation since it won't exhaust the stack space because of function calls

@aumetra
Copy link
Member Author

aumetra commented Jul 22, 2024

Maybe this would require the database schema to be aware of replies whose in_reply_to_id is yet to be resolved

Not sure if we need that. We could just set that to NULL if we haven't resolved that and then just enqueue a new job to resolve the thread on initial fetch

@tesaguri
Copy link
Contributor

tesaguri commented Jul 22, 2024

Yeah, good point. But even if it's run in a background task, I'd personally prefer an iterative operation since it won't exhaust the stack space because of function calls

I was assuming that the worker would only fetch a single post in a job and, well, recursively enqueue another job to the scheduler to fetch the reply until the thread is fully resolved, so that it won't exhaust the stack (and each job is reasonably short-lived). That's how Mastodon's worker is implemented IIUC.

We could just set that to NULL if we haven't resolved that and then just enqueue a new job to resolve the thread on initial fetch

Yes, I think we don't need this if we can hold a transaction over the whole thread resolution process, but otherwise, clients may observe a reply whose in_reply_to post haven't been resolved. Just making the in_reply_to_id to NULL in this case would make the client to recognise the post as a non-reply, which is not true.

Well, come to think of it, we don't keep a transaction over the whole process_new_object function today either, so this problem might not be inherent to the said worker approach anyway.

@aumetra
Copy link
Member Author

aumetra commented Jul 22, 2024

To be fair, the whole ActivityPub fetching logic with the database mapping needs a major cleanup. It's in an abysmal state at the moment.

Maybe that cleanup can also add better testing to it finally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants