Enable buffered reading #1

aartaka · 2022-02-26T17:53:32Z

Async reading is saving us in many places, but it does not save us from reading too much. For example, once my Nyxt history file exceeds 150MB (which it does every one-two months), it makes my Lisp image segfault, for there simply is not enough memory to process/allocate such an amount of text.

This may better be a feature request in cl-prevalence or SBCL, but this also concerns how nfiles work: should we somehow enable batch/buffered file reading, so that we don't have to allocate too much when reading from files?

The text was updated successfully, but these errors were encountered:

Ambrevar · 2022-02-26T19:03:56Z

In the case of the Nyxt history, the entire file must be processed at once because it's a single s-exp, right? Can you think of another way around it? How does crash Nyxt / SBCL? Is it a limitation of SBCL that it cannot read too-big s-expressions? Is there a way to recover from it? What do you mean with "batch reading"? Also you mentioned "async reading", is this related to asynchronicity?

aartaka · 2022-02-26T20:41:39Z

In the case of the Nyxt history, the entire file must be processed at once because it's a single s-exp, right?

Even a single s-exp can be broken into smaller pieces with a smart enough parser, I believe.

Can you think of another way around it?

Allocating more memory :D

How does crash Nyxt / SBCL? Is it a limitation of SBCL that it cannot read too-big s-expressions? Is there a way to recover from it?

It does not print any error messages, it simply segfaults, and it is not recoverable. The frequency/speed of segfaults correlates with the size of the history file. The problem most probably comes from allocating too much memory when parsing the object representation in cl-prevalence. s-serialization::deserialize-sexp is a recursive function with no tail call optimization, after all. Parsing a huge object is likely to clog the memory there :(

What do you mean with "batch reading"?

Oh wait, I misused the word it seems. What I meant is reading a big piece of text in small digestible pieces to not clog the memory too much.

Also you mentioned "async reading", is this related to asynchronicity?

Yes, it is. I meant the asynchronous reading/writing to files that nfiles enables.

Ambrevar · 2022-02-27T08:04:56Z

Allocating more memory :D

How do we do this?

`s-serialization::deserialize-sexp` is a recursive function with no tail call optimization, after all. Parsing a huge object is likely to clog the memory there :(

If the call stack is the problem, then I'm not to sure how to fix it since tail-call-optimization cannot be done here. Maybe use static-dispatch and inline the methods if static-dispatch allows it? First thing we'd need is data to reproduce. Do you think you can generate a history with random entries that reproduces it? If you already have a history file that generates the issue, you could try replacing all the titles / URLs with dummy content to anonymize it. By the way, this is the kind of operations we are likely to ask our users. Maybe write a Lisp/Nyxt script to automate this?

Yes, it is. I meant the asynchronous reading/writing to files that `nfiles` enables.

I understand the problem with memory, but I don't understand how it's related to asynchronicity. A few commits ago on `master` reading was synchronous and I believe the same memory issue exists there. Can you clarify?

aartaka · 2022-02-27T10:31:57Z

Allocating more memory :D
How do we do this?

It was just a joke about adding more --dynamic-space-size :)

s-serialization::deserialize-sexp is a recursive function with no tail call optimization, after all. Parsing a huge object is likely to clog the memory there :(
If the call stack is the problem, then I'm not to sure how to fix it since tail-call-optimization cannot be done here. Maybe use static-dispatch and inline the methods if static-dispatch allows it?

The best way would be to optimize deserialize-sexp into iterative or proper tail-call-optimized version. What's static-dispatch?

First thing we'd need is data to reproduce. Do you think you can generate a history with random entries that reproduces it? If you already have a history file that generates the issue, you could try replacing all the titles / URLs with dummy content to anonymize it. By the way, this is the kind of operations we are likely to ask our users. Maybe write a Lisp/Nyxt script to automate this?

I already deleted it :( But yes, a script can be useful.

Yes, it is. I meant the asynchronous reading/writing to files that nfiles enables.
I understand the problem with memory, but I don't understand how it's related to asynchronicity. A few commits ago on master reading was synchronous and I believe the same memory issue exists there. Can you clarify?

It's unrelated, sorry.

EDIT: complete answer.

Ambrevar · 2022-02-27T17:00:32Z

I'm not sure how to write a tail-call-optimized version of deserialize-sexp. Could you provide a draft?

Otherwise an interactive version would work.

Static dispatch is this: https://github.com/alex-gutev/static-dispatch

aartaka changed the title ~~Enable batch reading~~ Enable buffered reading Feb 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable buffered reading #1

Enable buffered reading #1

aartaka commented Feb 26, 2022

Ambrevar commented Feb 26, 2022 via email

aartaka commented Feb 26, 2022

Ambrevar commented Feb 27, 2022 via email

aartaka commented Feb 27, 2022 •

edited

Loading

Ambrevar commented Feb 27, 2022

Enable buffered reading #1

Enable buffered reading #1

Comments

aartaka commented Feb 26, 2022

Ambrevar commented Feb 26, 2022 via email

aartaka commented Feb 26, 2022

Ambrevar commented Feb 27, 2022 via email

aartaka commented Feb 27, 2022 • edited Loading

Ambrevar commented Feb 27, 2022

aartaka commented Feb 27, 2022 •

edited

Loading