Skip to content

Commit

Permalink
Add Internals documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
BenedekThaler authored and erenon committed Feb 28, 2020
1 parent 06f5573 commit a900387
Show file tree
Hide file tree
Showing 3 changed files with 173 additions and 1 deletion.
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -184,7 +184,7 @@ add_executable(brecovery
# Documentation
#---------------------------

markdown_to_html_group(Documentation UserGuide Mserialize)
markdown_to_html_group(Documentation UserGuide Internals Mserialize)

#---------------------------
# Examples
Expand Down
171 changes: 171 additions & 0 deletions doc/Internals.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
Title: Internals

Explaining how Binlog works, and why it is so fast.

[TOC]

# Introduction

The design goal of Binlog is correctness, ease of use and high performance.
Flexibility and ease of change is preferred over a large number of features.

Ease of use is achieved by supporting the logging of a [wide variety of types][logging] out of the box,
while also allowing simple adaption of custom [enums][adapt-enum] and [structures][adapt-struct].

Regular, text based logfiles are redundant, expensive to produce, not trivial to parse - but easy to use.
Binlog solves these issues by using binary logs, at the cost of adding an extra binary-to-text step
to the reading of the logs.

The binary log reduces redundancy by separating metadata (severity, category, format string, function, file, line)
from data (timestamp and arguments). The performance is improved by not converting binary values to text
in the application, only later, while reading the logs. Given the well defined format of the binary logfile,
it is more efficient and less error prone to parse programmatically.
The serialized format of the [adapted structures][adapt-struct] integrates into this format, further strengthening
the typed contract between the binary log writer and reader.

[logging]: UserGuide.html#logging
[adapt-enum]: UserGuide.html#logging-enums
[adapt-struct]: UserGuide.html#logging-user-defined-structures

# Glossary

* **Entry**: A size prefixed binary blob. See Entries.hpp
* **Event**: A kind of **Entry**, that corresponds to one log line in a text log
* **EventSource**: A piece of code that produces **Event**s, can created by macros, e.g: `BINLOG_INFO`
* **Session**: A concurrently writable and readable log stream
* **SessionWriter**: A (non-concurrently) writable handle of a **Session** for adding **Event**s
* **Producer**: An actor (thread, coroutine, fiber, etc.) that uses a **SessionWriter** to produce **Event**s
* **OutputStream**: The destination of metadata and data consumed from a **Session**
* **Consumer**: An actor that moves **Entries** from a **Session** to an **OutputStream**

# Architecture

+---------------+ addEvent
| SessionWriter |----------->( Channel )----+
+---------------+ |
|
+---------------+ ( Channel ) | +---------+ consume +--------------+
| SessionWriter |----------->( Channel )----+-->| Session |---------->| OutputStream |
+---------------+ | +---------+ +--------------+
| | ^
| ( Channel )----+ |
| |
+---------------------------------------------+
addEventSource

The data flow of Binlog entries in a simple application is shown above.
Each Producer owns a SessionWriter. Through the SessionWriter, Producers
add EventSources (metadata) directly to the Session, and write Events (data) to a single-producer,
single-consumer queue, wrapped by a Channel. Every Channel is owned by a Session.
The Consumer reads the metadata from the Session, and the data from the Channels of the Session,
and writes them to the OutputStream. The Producers ands the Consumer can be any user defined actor
(thread, coroutine, fiber, etc.) - there's no hidden actor run by the Binlog library.

If a SessionWriter fills up the queue of its Channel, by default, it allocates a new Channel,
and closes the old one (see the second SessionWriter above).
The Consumer, after fully consuming the closed Channel, will deallocate it.

When a SessionWriter is destroyed, the owned Channel becomes closed.
Like before, the Consumer, after fully consuming the closed Channel, will deallocate it.
On the diagram above, such an orphan Channel can be seen, waiting to be fully consumed
and deallocated.

The asynchronous logging (i.e: SessionWriter does write OutputStream directly, but through the Channel)
saves cycles for the Producer (the hot path) to reduce latency. The cost of asynchronous logging is manifold:
First, Consumer task must be running periodically. Second, to restore the order of concurrently added
Events, [bread][] must do extra work. Third, if the application crashes, the remaining Events in the Channels
must be [recovered from the core dump][brecovery].

[bread]: UserGuide.html#bread
[brecovery]: UserGuide.html#brecovery

# Life of an Event

The life of an Event begins with the creation of an EventSource.
The EventSource describes the static properties of an Event,
e.g: severity, category, function, file, line, format string and [argument tags][mserialize-tag].
The newly created EventSource is added to the Session associated with the given SessionWriter.
The Session assigns a unique identifier to the source, that will be used by the events produced
by the source, and saves a copy of the EventSource. This happens only once for each EventSource.

After the source is created, space for an Event is allocated in the queue of the Channel
associated with the given SessionWriter. If the allocation fails, a new Channel is created,
and the old one becomes closed.
Now there is enough space, the Event fields are serialized into the queue:
size, EventSource identifier, clock value (the timestamp of the Event), and the log arguments
(the values the will be substitute the `{}` placeholders in the format string later).
The arguments are serialized by [Mserialize][], that means they are mostly copied directly
(no binary to text conversion), with very little decoration.
The number of arguments must match the number of placeholders in the format string,
and this is enforced compile time.

Following the serialization, the written data is committed to the queue.
The commit is atomic: the event will be either completely visible to the Consumer of the Session,
or it is not visible. Partial entries cannot be observed.
After the commit, the hot path of the Producer ends.

The periodically running Consumer first reads the metadata from the Session
(EventSources and ClockSync, if needed) and writes it to the OutputStream.
After the available metadata is fully consumed, it proceeds to poll each Channel.
Data (batch of Events) read from a Channel, preceded by a WriterProp kind of metadata, that
describes the SessionWriter is written to the OutputStream.

By design, the OutputStream is not owned by the Session, and only constrained by
a [simple concept][OutputStream], to make extensions simple: log rotation,
a syslog or custom monitoring backend, a network target are easy to add.

[Mserialize]: Mserialize.html
[mserialize-tag]: Mserialize.html#visiting-serialized-values
[OutputStream]: Mserialize.html#outputstream

# Binary log schema

The Binlog logfile is an append only stream of Binlog entries.
Entries can represent metadata (e.g: EventSource, WriterProp, ClockSync),
or data (i.e: log events). Metadata entries always precede data entries
that reference them.

<BinlogStream> ::= <Entry>*
<Entry> ::= <EntrySize> <EntryPayload>
<EntrySize> ::= uint32
<EntryPayload> ::= <EventSource> | <WriterProp> | <ClockSync> | <Event>

<EventSource> ::= <EventSourceTag> <EventSourceId> <Severity> <Category> <Function> <File> <Line> <FormatString> <ArgumentTags>
<EventSourceTag> ::= uint64(-1)
<EventSourceId> ::= uint64
<Severity> ::= uint16
<Category> ::= <String>
<Function> ::= <String>
<File> ::= <String>
<Line> ::= uint64
<FormatString> ::= <String>
<ArgumentTags> ::= <String>

<WriterProp> ::= <WriterPropTag> <WriterPropId> <WriterPropName> <BatchSize>
<WriterPropTag> ::= uint64(-2)
<WriterPropId> ::= uint64
<WriterPropName> ::= <String>
<BatchSize> ::= uint64

<ClockSync> ::= <ClockSyncTag> <ClockValue> <ClockFrequency> <NsSinceEpoch> <TzOffset> <TzName>
<ClockSyncTag> ::= uint64(-3)
<ClockValue> ::= uint64
<ClockFrequency> ::= uint64
<NsSinceEpoch> ::= uint64
<TzOffset> ::= int32
<TzName> ::= <String>

<Event> ::= <EventSourceId> <ClockValue> <Arguments>
<Arguments> ::= byte* # serialized values according to the mserialize format

<String> ::= <SequenceSize> char*

The semantic meaning of the fields is described in Entries.hpp.
Numbers are serialized according to the byte order of the producing host.
Tags with their most significant bit set indicate a metadata entry.
To ensure backward compatibility, a new field can be added only to the end
of any specific entry type. Adding a new metadata entry type is allowed,
as long as it is not required to parse the data.
To ensure forward compatibility, it is not an error if an entry has
extra payload after the last field (as long as the size field indicates that),
and unknown metadata entries are ignored.
1 change: 1 addition & 0 deletions doc/body.html
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
</div>
<nav class="nav">
<a class="nav-link" href="UserGuide.html">UserGuide</a>
<a class="nav-link" href="Internals.html">Internals</a>
<a class="nav-link" href="Mserialize.html">Mserialize</a>
<a class="nav-link" href="https://github.com/Morgan-Stanley/binlog">GitHub</a>
</nav>
Expand Down

0 comments on commit a900387

Please sign in to comment.