Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Twom database format and first implementation #5157

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

brong
Copy link
Member

@brong brong commented Dec 5, 2024

Throwing this open for wide reviews now! It's been rewritten a BUNCH of times, and optimised a fair bit - it's got a simpler internal structure and basically everything aligned with the right datastructures I think. It does MVCC reads nicely, can keep its transaction in foreach unless the callback writes, in which case it STILL does the right thing and re-locks and re-finds itself.

On disk format of records is:
image

lib/cyrusdb_twom.c Outdated Show resolved Hide resolved
@rjbs
Copy link
Collaborator

rjbs commented Dec 16, 2024

I've removed include-in-fm while tests are failing.

@brong brong force-pushed the twom branch 4 times, most recently from 7d3b303 to 5ffbf60 Compare January 5, 2025 11:37
@brong brong requested review from dilyanpalauzov, robmueller, rjbs, elliefm, wolfsage and rsto and removed request for dilyanpalauzov January 5, 2025 11:41
@brong brong force-pushed the twom branch 2 times, most recently from 981f5c7 to a0615b2 Compare January 5, 2025 12:14
@brong
Copy link
Member Author

brong commented Jan 5, 2025

Bah, CI failed because errno.h is required. I took it out because it's not needed in my machine and I was trying to trim back all the required headers

@brong brong requested a review from dilyanpalauzov January 5, 2025 12:33
Copy link
Contributor

@elliefm elliefm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't finished looking at this yet, but need to stop for lunch, so here's what I have so far.

imap/hammer_cyrusdb.c Outdated Show resolved Hide resolved
imap/hammer_cyrusdb.c Outdated Show resolved Hide resolved
lib/imapoptions Outdated Show resolved Hide resolved
lib/imapoptions Outdated Show resolved Hide resolved
lib/cyrusdb_twom.c Outdated Show resolved Hide resolved
lib/cyrusdb_twom.c Outdated Show resolved Hide resolved
@brong
Copy link
Member Author

brong commented Jan 16, 2025

Ok I've rewritten to have DELETE just be an ancestor pointer in the level0 chain (no key or value) and the most recent ADD or REPLACE always be in the full multi-level links except for the DELETE being immediately before it if present.

This added a 'deleted_offset' to struct tm_loc, and a bunch of extra logic (slightly more code overall) - but partly because I just open-coded it everywhere, it could do with some code refactoring but it's working and the structure is what I want, so I figured it's time to ask for reviews again!

I also wrote up a bunch of extra doc about it which I'll pop in the repo.

docsrc/imap/concepts/deployment/databases.rst Outdated Show resolved Hide resolved
docsrc/imap/concepts/deployment/databases.rst Outdated Show resolved Hide resolved
return r;
}

int twom_db_fetch(struct twom_db *db,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the public functions in this file be marked as HIDDEN__attribute__((__visibility__("hidden"))), as they are not called from outside libcyrus.so?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention is to have this as a completely separate library eventually, though the code is in Cyrus, so I don't want to do this.

@brong
Copy link
Member Author

brong commented Jan 17, 2025

Based on robmueller comments elsewhere, I have implemented two-stage locking and separate repack locking (also based on wolfsage review up above about the repack)

@brong brong force-pushed the twom branch 5 times, most recently from 471dfc1 to a02584a Compare January 17, 2025 13:51
docsrc/imap/concepts/deployment/databases.rst Outdated Show resolved Hide resolved
lib/cyrusdb_twom.c Outdated Show resolved Hide resolved
lib/twom.c Show resolved Hide resolved
brong added 2 commits January 18, 2025 15:16
This contains the full implementation of the xxhash algorithm.  It's
only being used by twom database format, so it's being compiled straight
into that file.

This is an unmodified copy of xxhash.h from:

commit dd11140c2dc5d53a3c0a949d67af7f40f546878e

in the repository at [email protected]:Cyan4973/xxHash.git
brong added 7 commits January 18, 2025 15:24
A cyrusdb wrapper around twom library
see: https://stackoverflow.com/questions/27625597/how-to-implement-a-writer-preferring-read-write-lock-for-nix-processes

Two stage locking;
1) "headlock" - locks the first 16 bytes of the message (the twom magic)
2) "datalock" - locks the DUMMY record

And there's also a third lock:

3) "repacklock" - locks the GENERATION field (8 bytes at offset 40)

I'll document this in a separate commit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants