Skip to content

Commit 70b7e6e

Browse files
committed
docs wip
1 parent 9a7a199 commit 70b7e6e

File tree

4 files changed

+90
-0
lines changed

4 files changed

+90
-0
lines changed

Diff for: .gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -33,3 +33,4 @@ doc/*
3333
/bazel-*
3434

3535
/.vscode/
36+
.DS_store

Diff for: docs/internals/COMPACTION.md

+89
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# Ra log compaction
2+
3+
This is a living document capturing current work on log compaction.
4+
5+
## Overview
6+
7+
8+
Compaction in Ra is intrinsically linked to the snapshotting
9+
feature. Standard Raft snapshotting removes all entries in the Ra log
10+
that precedes the snapshot where the snapshot is a full representation of
11+
the state machine state.
12+
13+
14+
### Ra Server log worker responsibilities
15+
16+
* Write checkpoints and snapshots
17+
* Perform compaction runs
18+
* report segments to be deleted back to the ra server (NB: the worker does
19+
not perform the segment deletion itself, it needs to report changes back to the
20+
ra server first). The ra server log worker maintains its own list of segments
21+
to avoid double processing
22+
23+
24+
```mermaid
25+
sequenceDiagram
26+
participant segment-writer
27+
participant ra-server
28+
participant ra-server-log
29+
30+
segment-writer--)ra-server: new segments
31+
ra-server-)+ra-server-log: new segments
32+
ra-server-log->>ra-server-log: phase 1 compaction
33+
ra-server-log-)-ra-server: segment changes (new, to be deleted)
34+
ra-server-)+ra-server-log: new snapshot
35+
ra-server-log->>ra-server-log: write snapshot
36+
ra-server-log->>ra-server-log: phase 1 compaction
37+
ra-server-log-)-ra-server: snapshot written, segment changes
38+
```
39+
40+
### Log sections
41+
42+
#### Normal log section
43+
44+
The normal log section is the contiguous log that follows the last snapshot.
45+
46+
#### Compacting log section
47+
48+
The compacting log section consists of all live raft indexes that are lower
49+
than or equal to the last snapshot taken.
50+
51+
![compaction](compaction1.jpg)
52+
53+
### Compacted segments: naming (phase 3 compaction)
54+
55+
Segment files in a Ra log have numeric names incremented as they are written.
56+
This is essential as the order is required to ensure log integrity.
57+
58+
Desired Properties of phase 3 compaction:
59+
60+
* Retain immutability, entries will never be deleted from a segment. Instead they
61+
will be written to a new segment.
62+
* lexicographic sorting of file names needs to be consistent with order of writes
63+
* Compaction walks from the old segment to new
64+
* Easy to recover after unclean shutdown
65+
66+
Segments will be compacted when 2 or more adjacent segments fit into a single
67+
segment.
68+
69+
The new segment will have the naming format `OLD-NEW.segment`
70+
71+
This means that a single segment can only be compacted once e.g
72+
`001.segment -> 001-001.segment` as after this there is no new name available
73+
and it has to wait until it can be compacted with the adjacent segment. Single
74+
segment compaction could be optional and only triggered when a substantial,
75+
say 75% or more entries / data can be deleted.
76+
77+
This naming format means it is easy to identify dead segments after an unclean
78+
exit.
79+
80+
During compaction a different extension will be used: `002-004.compacting` and
81+
after an unclean shutdown any such files will be removed. Once synced it will be
82+
renamed to `.segment` and some time after the source files will be deleted (Once
83+
the Ra server has updated its list of segments).
84+
85+
86+
![segments](compaction2.jpg)
87+
88+
89+

Diff for: docs/internals/compaction1.jpg

35.8 KB
Loading

Diff for: docs/internals/compaction2.jpg

52 KB
Loading

0 commit comments

Comments
 (0)