Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement concurrent compaction #16

Open
rmind opened this issue Oct 8, 2022 · 0 comments
Open

Implement concurrent compaction #16

rmind opened this issue Oct 8, 2022 · 0 comments
Assignees
Labels
enhancement New feature or request optimization Better performance or resource utilization

Comments

@rmind
Copy link
Owner

rmind commented Oct 8, 2022

Currently, the index structures are generally append-only as the document removal uses tombstones (special markings) to indicate deletions. Therefore, many deletions would produce gaps in the index files which would waste space. We want to address this problem by implementing compaction.

The following is a proposal for a concurrent compaction algorithm:

new-dtmap = exclusive-open-create
lock new-dtmap

// Initial concurrent sync (captures most of the data)
dtmap-sync from current-dtmap

lock current-dtmap
  // Sync any remaining data with the lock held if there was a race
  dtmap-sync from current-dtmap

  // Make the new index globally visible
  atomic-posix-rename new-dtmap.filename to current-dtmap.filename

  // Publish the compaction offset for the active index references
  atomic-store-release current-dtmap.compaction-offset <= last offset in new-dtmap
unlock current-dtmap
unlock new-dtmap
  • The active index references check for a compaction-offset change, re-open the index (picking up the new file) and sync from this offset. The existing references to the memory-mapped file itself, primarily idxdoc_t::offset, would require a sequential scan to be adjusted.
  • The next compaction may be triggered only when all active index references have synced. The index deletion and re-creation should be separately handled with the timestamp/generation number to avoid the ABA problem.
@rmind rmind added the enhancement New feature or request label Oct 18, 2022
@rmind rmind self-assigned this Oct 18, 2022
@rmind rmind added the optimization Better performance or resource utilization label Mar 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request optimization Better performance or resource utilization
Projects
None yet
Development

No branches or pull requests

1 participant