-
Notifications
You must be signed in to change notification settings - Fork 71
Revision Trees
This is an internal design doc. You don't need to know any of this unless you work on LiteCore, and even then only if you need to know about the details of document storage. If you just want to inspect a database, use the 'cblite' command-line tool.
LiteCore stores each document as a revision tree, much like a version control system (although LiteCore is not a version control system.) Normally this tree is linear, with the current revision its leaf and a history of ancestor revisions below it. If there are conflicts, the tree will have branches, with each leaf being a conflicting revision.
Each document’s revision tree is self-contained within the body
column of a single SQLite row in the kv_default
table. Couchbase Lite 1.x’s practice of storing revisions as separate rows turned out to be too expensive.
This document describes LiteCore’s representation of revisions and trees as C++ objects, beneath the C API layer.
Unlike a version control system like git or SVN, LiteCore
- does not promise to save the contents of non-leaf revisions
- does not promise to save the entire tree back to the original creation of the document.
It only saves this data for as long as it's deemed to be needed for conflict resolution during replication.
A revision is represented in memory by a Rev
object. It consists of:
- revision ID: Identifies the revision uniquely in the document. It consists of a generation number (height in the tree, with the root at 1) and a digest (opaque unique data), and is expressed as the generation number in decimal, a hyphen, and the digest in lowercase hexadecimal. Revision IDs are generated by LiteCore as revisions are created.
- sequence: The "time" (according to the database's sequence counter) at which this revision was added to the database.
- flags: A set of boolean attributes (see below.)
- parent: The parent revision, or null if this is a root. (There may be multiple roots.)
- body: The actual contents of the document, or null if no longer available. The rev tree code doesn't care what's in this; higher layers interpret it as Fleece data.
A note on sequences: there’s a chicken-and-egg problem with storing the
sequence
property of a newly-added revision, since the sequence isn’t known until the document is saved. As a workaround, new revisions are saved with a zero sequence. When the revisions are next read from storage, the zero is replaced with the document’s current sequence. (This is done by theVersionedDocument
class; see below.)
- deleted: This is a deletion, also called a tombstone. Deleting a document actually just adds a tombstone to it, so that the tombstone can be replicated.
- leaf: This is a leaf revision (no children.)
- hasAttachments: The body of the revision contains references to one or more blobs. (This flag is used in blob garbage collection.)
- keepBody: The body should be preserved. Otherwise it will be removed when this revision stops being a leaf.
- isConflict: This revision is part of an unresolved conflict pulled in by the replicator.
- closed: Attached to a tombstone; indicates that it represents a branch that has been closed off, not a regular deletion.
Two other flags are used internally by the RevTree
class and are not persisted:
- new: Newly-created, not saved yet.
- purge: Marked for removal before saving.
Once created, a revision’s revID and sequence are immutable, as are the deleted
and closed
flags. The body is immutable but may be removed.
The parent
property and leaf
flag may be cleared as a side effect of other revisions being added or deleted. The hasAttachments
flag is cleared when the body is removed. The replicator will clear the keepBody
flag, and resolving a conflict will alter isConflict
.
Finally, the revision itself can be removed if it gets too old (pruning) or on demand (purging).
Class RevTree
manages a tree of revisions. (LiteCore doesn't instantiate it directly, only its subclass VersionedDocument
.)
For efficiency, RevTree maintains the Rev objects in a vector, which is the same order in which they're stored in serialized form in the database. They’re sorted by descending priority, where the current revision has the highest priority and always appears at index 0. The ordering is defined by the function compareRevs
; the rules are:
- Leaf revisions go before non-leaves
- Conflicting revisions go after non-conflicting ones
- Deletions go after non-deletions, and those that are closed go after normal deletions
- Otherwise, a revision with a higher generation goes first, and within a generation, a revision with a lexicographically-higher digest goes first.
(In practice, a total ordering isn't necessary; it's only important that the leaves come first, and that the first of them is the current revision.)
The RevTree also keeps track of which revision is the last-known current revision on each remote database that the local database has replicated with. This is needed for the “no-conflicts” mode of replication, where a document being pushed must identify its remote ancestor.
The remote-revision property is logically a flag on a revision, but there can be an arbitrary number of these flags — one per remote database — so instead it’s represented as a mapping from remote IDs (small integers) to Revs. The Database class in turn manages a mapping from remote database URLs to these remote IDs.
A Rev that is a remote revision will never be pruned, since the replicator needs to know its ID. Instead, revisions above it will be pruned. This leads to gaps in generations: the parent of a generation-n Rev may have a generation less than n-1. This shouldn’t be a problem.
A RevTree
has two layers of revision storage:
`std::vector\<Rev*\> _revs; // Revs in sorted order
std::deque\<Rev\> _revsStorage; // Actual storage of the Rev objects
The Rev objects actually live in
_revsStorage, which is unsorted. The current sorted order is represented by
_revs`, which stores only pointers. This makes sorting faster because only pointers need to be swapped.
(Why is _revsStorage
a deque
, not the more common vector
? Because growing a vector
invalidates pointers to values in it, so if it were used, adding a new revision to _revsStorage
might cause all the pointers in _revs
to become garbage. deque
does not have this problem.)
A RevTree is persisted in a custom binary format. The class RawRevision
manages the encoding and decoding of Rev
s.
In the encoded form, the revisions are stored consecutively in in the same sorted order as above. Each begins with a two-byte length field; the end of the revisions is signaled by a zero length.
RevTree
itself has no knowledge of databases or even documents. Its subclass VersionedDocument
adds that integration. A VersionedDocument
belongs to a KeyStore
, knows its document-ID, and manages the corresponding Record
, including reading and writing it from the database as well as updating its metadata.
The revision tree the oldest and most technically-indebted design in LiteCore, dating back to 2011. The code has been reworked several times but is still based on that old design.
One of its problems is that it’s still somewhat expensive (at the micro level) and complex to load a document. The data has to be parsed and reallocated as Rev objects, then rewritten on save. This is a noticeable hot-spot in profiling queries, and it’s been the source of many bugs over the years.
A better design would be to store the entire rev tree as one Fleece array, with each revision an object in the array. ThenRev
would be a lightweight wrapper around a revision pointer.