-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathgarbage_collection.txt
28 lines (15 loc) · 1.04 KB
/
garbage_collection.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
1. We need to keep two pointers, head and tail
2. Only the part of the vLog between the head and the tail contains valid values
and will be searched during lookups.
3. Read some chunks of key-value pairs(Several Megabytes) from the tail to head of the vLog
4. Find which of those values are valid (not yet overwritten or deleted) by querying the LSM-tree.
5. Append valid values back to the head of the vLog.
6. Free the space occupied previously by the chunk (using the punch hole feature in rust)
7. Update tail accordingly
NOTE: To avoid losing any data if a crash happens during garbage collection
- The newly appended valid values and the new tail are persisted on the device before freeing space
- This is achieved by calling fsync()/sync_all() in Rust
- It adds these new addresses and current tail to the LSM Tree in a synchronous manner
- The tail is stored in LSM Tree as <"tail", tail-vLog-offset">
- Finally, the free space in the vLog is claimed
- Schedular is configured to run garbage collection periodically