|
| 1 | +# Ra log compaction |
| 2 | + |
| 3 | +This is a living document capturing current work on log compaction. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | + |
| 8 | +Compaction in Ra is intrinsically linked to the snapshotting |
| 9 | +feature. Standard Raft snapshotting removes all entries in the Ra log |
| 10 | +that precedes the snapshot where the snapshot is a full representation of |
| 11 | +the state machine state. |
| 12 | + |
| 13 | + |
| 14 | +### Ra Server log worker responsibilities |
| 15 | + |
| 16 | +* Write checkpoints and snapshots |
| 17 | +* Perform compaction runs |
| 18 | +* report segments to be deleted back to the ra server (NB: the worker does |
| 19 | +not perform the segment deletion itself, it needs to report changes back to the |
| 20 | +ra server first). The ra server log worker maintains its own list of segments |
| 21 | +to avoid double processing |
| 22 | + |
| 23 | + |
| 24 | +```mermaid |
| 25 | +sequenceDiagram |
| 26 | + participant segment-writer |
| 27 | + participant ra-server |
| 28 | + participant ra-server-log |
| 29 | +
|
| 30 | + segment-writer--)ra-server: new segments |
| 31 | + ra-server-)+ra-server-log: new segments |
| 32 | + ra-server-log->>ra-server-log: phase 1 compaction |
| 33 | + ra-server-log-)-ra-server: segment changes (new, to be deleted) |
| 34 | + ra-server-)+ra-server-log: new snapshot |
| 35 | + ra-server-log->>ra-server-log: write snapshot |
| 36 | + ra-server-log->>ra-server-log: phase 1 compaction |
| 37 | + ra-server-log-)-ra-server: snapshot written, segment changes |
| 38 | +``` |
| 39 | + |
| 40 | +### Log sections |
| 41 | + |
| 42 | +#### Normal log section |
| 43 | + |
| 44 | +The normal log section is the contiguous log that follows the last snapshot. |
| 45 | + |
| 46 | +#### Compacting log section |
| 47 | + |
| 48 | +The compacting log section consists of all live raft indexes that are lower |
| 49 | +than or equal to the last snapshot taken. |
| 50 | + |
| 51 | + |
| 52 | + |
| 53 | +### Compacted segments: naming (phase 3 compaction) |
| 54 | + |
| 55 | +Segment files in a Ra log have numeric names incremented as they are written. |
| 56 | +This is essential as the order is required to ensure log integrity. |
| 57 | + |
| 58 | +Desired Properties of phase 3 compaction: |
| 59 | + |
| 60 | +* Retain immutability, entries will never be deleted from a segment. Instead they |
| 61 | +will be written to a new segment. |
| 62 | +* lexicographic sorting of file names needs to be consistent with order of writes |
| 63 | +* Compaction walks from the old segment to new |
| 64 | +* Easy to recover after unclean shutdown |
| 65 | + |
| 66 | +Segments will be compacted when 2 or more adjacent segments fit into a single |
| 67 | +segment. |
| 68 | + |
| 69 | +The new segment will have the naming format `OLD-NEW.segment` |
| 70 | + |
| 71 | +This means that a single segment can only be compacted once e.g |
| 72 | +`001.segment -> 001-001.segment` as after this there is no new name available |
| 73 | +and it has to wait until it can be compacted with the adjacent segment. Single |
| 74 | +segment compaction could be optional and only triggered when a substantial, |
| 75 | +say 75% or more entries / data can be deleted. |
| 76 | + |
| 77 | +This naming format means it is easy to identify dead segments after an unclean |
| 78 | +exit. |
| 79 | + |
| 80 | +During compaction a different extension will be used: `002-004.compacting` and |
| 81 | +after an unclean shutdown any such files will be removed. Once synced it will be |
| 82 | +renamed to `.segment` and some time after the source files will be deleted (Once |
| 83 | +the Ra server has updated its list of segments). |
| 84 | + |
| 85 | + |
| 86 | + |
| 87 | + |
| 88 | + |
| 89 | + |
0 commit comments