Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable buffered reading #1

Open
aartaka opened this issue Feb 26, 2022 · 5 comments
Open

Enable buffered reading #1

aartaka opened this issue Feb 26, 2022 · 5 comments

Comments

@aartaka
Copy link
Contributor

aartaka commented Feb 26, 2022

Async reading is saving us in many places, but it does not save us from reading too much. For example, once my Nyxt history file exceeds 150MB (which it does every one-two months), it makes my Lisp image segfault, for there simply is not enough memory to process/allocate such an amount of text.

This may better be a feature request in cl-prevalence or SBCL, but this also concerns how nfiles work: should we somehow enable batch/buffered file reading, so that we don't have to allocate too much when reading from files?

@Ambrevar
Copy link
Member

Ambrevar commented Feb 26, 2022 via email

@aartaka
Copy link
Contributor Author

aartaka commented Feb 26, 2022

In the case of the Nyxt history, the entire file must be processed at once because it's a single s-exp, right?

Even a single s-exp can be broken into smaller pieces with a smart enough parser, I believe.

Can you think of another way around it?

Allocating more memory :D

How does crash Nyxt / SBCL? Is it a limitation of SBCL that it cannot read too-big s-expressions? Is there a way to recover from it?

It does not print any error messages, it simply segfaults, and it is not recoverable. The frequency/speed of segfaults correlates with the size of the history file. The problem most probably comes from allocating too much memory when parsing the object representation in cl-prevalence. s-serialization::deserialize-sexp is a recursive function with no tail call optimization, after all. Parsing a huge object is likely to clog the memory there :(

What do you mean with "batch reading"?

Oh wait, I misused the word it seems. What I meant is reading a big piece of text in small digestible pieces to not clog the memory too much.

Also you mentioned "async reading", is this related to asynchronicity?

Yes, it is. I meant the asynchronous reading/writing to files that nfiles enables.

@Ambrevar
Copy link
Member

Ambrevar commented Feb 27, 2022 via email

@aartaka
Copy link
Contributor Author

aartaka commented Feb 27, 2022

Allocating more memory :D
How do we do this?

It was just a joke about adding more --dynamic-space-size :)

s-serialization::deserialize-sexp is a recursive function with no tail call optimization, after all. Parsing a huge object is likely to clog the memory there :(
If the call stack is the problem, then I'm not to sure how to fix it since tail-call-optimization cannot be done here. Maybe use static-dispatch and inline the methods if static-dispatch allows it?

The best way would be to optimize deserialize-sexp into iterative or proper tail-call-optimized version. What's static-dispatch?

First thing we'd need is data to reproduce. Do you think you can generate a history with random entries that reproduces it? If you already have a history file that generates the issue, you could try replacing all the titles / URLs with dummy content to anonymize it. By the way, this is the kind of operations we are likely to ask our users. Maybe write a Lisp/Nyxt script to automate this?

I already deleted it :( But yes, a script can be useful.

Yes, it is. I meant the asynchronous reading/writing to files that nfiles enables.
I understand the problem with memory, but I don't understand how it's related to asynchronicity. A few commits ago on master reading was synchronous and I believe the same memory issue exists there. Can you clarify?

It's unrelated, sorry.

EDIT: complete answer.

@Ambrevar
Copy link
Member

I'm not sure how to write a tail-call-optimized version of deserialize-sexp. Could you provide a draft?

Otherwise an interactive version would work.

Static dispatch is this: https://github.com/alex-gutev/static-dispatch

@aartaka aartaka changed the title Enable batch reading Enable buffered reading Feb 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants