Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate methods of reducing the on-disk size of Mentat databases #290

Open
gburd opened this issue Aug 6, 2020 · 0 comments
Open

Investigate methods of reducing the on-disk size of Mentat databases #290

gburd opened this issue Aug 6, 2020 · 0 comments
Labels

Comments

@gburd
Copy link

gburd commented Aug 6, 2020

This document highlights many of the issues and concerns https://docs.google.com/document/d/14ywV4PlBAdOsJrxd7QhMcxbo7pw8cdc1kLAb7I4QhFY/edit?ts=5b7b7f87.

Some work to measure the size of the database when storing history is in mozilla/application-services#191. For my places.sqlite (100k places, 150k visits), it gets around 200MB larger.

It's worth noting that 'disk usage' was one of the primary concerns reported by user research for Fenix (although it's not clear if this size increase (relative to places) is the kind of thing that would make a dent relative to stuff like caches and the like -- A very informal poll of some friends of mine found that Fennec typically uses around 500MB of space (app + data), another 200MB isn't a trivial increase, but doesn't substantially change where we are in terms of app size).

Some bugs which may help (suggested by @rnewman):

I think something like sqlite's zipvfs extension would likely help (as the databases compress well), but have not tried it. Implementing it ourselves is likely beyond the scope of this effort (I took a look at the effort required and it wasn't exactly trivial). Additionally, whatever we do would need to somehow integrate with sqlcipher (I also took a look at bolting compression into sqlcipher before the encryption, but the fact that this makes the block output a variable size seemed to make this problematic).

Other notes:

  1. Storing strings as fulltext and using the compress/uncompress options of FTS4 did not help, since the strings in each column are relatively small. Additionally, the performance overhead here was substantial even for a very fast compressor (LZ4).
  2. Most string data seems to be duplicated ~4 times, in datoms, timelined_transactions, and in the indices idx_datoms_eavt, idx_datoms_aevt.
  3. During RustConf, @rnewman suggested that ultimately mentat will likely not want to use sqlite, and instead want to read datoms chunks directly out of something like RKV. These chunks could be compressed more easily. This seems out of scope, as it would be a massive change to mentat, but is worth writing down.

Additional concerns exist around the fact that this problem may be exacerbated by materialized views (perhaps #33 will help or prevent this?)

@gburd gburd added the size label Aug 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant