Redo Logs

Overview

Redo logs are used by AresDB to recover after a server shutdown. They are implemented at table level instead of at database level after the following considerations.

Transactional support is no higher than row level, requiring no cross-table mutation atomicity.
Each table (fact or dimension) has its own archiving/snapshot schedule, requiring redo log purging at different delay.
Using separate redo logs makes table deletion easier.

Redo logs are appended with upsert batches. A new log file is created periodically (every 2 hour for instance, according to the archiving interval) for each table. The file is named by arrival time of first upsert (which needs to be synchronized in a distributed setup). Purging is achieved by simply deleting redo log files that have been completely archived/snapshotted.ted.

Upsert Batch

An upsert batch contains multiple upsert mutations to a table. The upserts are applied to a subset of the columns. Primary key columns and time columns for fact tables must be specified with non-NULL values. Unrecognized columns (deleted for instance) are ignored. The data (values and nulls) are represented as uncompressed columnar vectors.

An upsert batch is serialized into the following format:

[uint32] magic_number
[uint32] buffer_size

<begin of buffer>
[uint32] version_number
[int32] num_of_rows
[uint16] num_of_columns
<reserved 14 bytes>
[uint32] arrival_time

[uint32] column_offset_0 ... [uint32] column_offset_x
[uint32] column_reserved_field1_0 ... [uint32] column_reserved_field1_x
[uint32] column_reserved_field2_0 ... [uint32] column_reserved_field2_x
[uint32] column_data_type_0 ... [uint32] column_data_type_x
[uint16] column_id_0 ... [uint16] column_id_x
[uint8] column_mode_0 ... [uint8] column_mode_x

(optional) null_vector_0
(optional) [padding to 4 byte alignment] offset_vector_0
[padding for 8 byte alignment] value_vector_0
...

[padding for 8 byte alignment]
<end of buffer>

This format is used for both client-server communication as well as redo logging. All serialized numbers are written in little-endian. NumRows (batch size) should be reasonably large (>= 256 for instance) and preferably a multiple of 64 for this format to be efficient.

Field	Description
magic_number	Verification header of value 0xADDAFEED.
buffer_size	The size of the buffer which starts from the num_of_rows field till the end of buffer including any trailing paddings.
version_number	upsert batch version number 0xFEED0001.
num_of_rows	The number of rows in the redo log.
num_of_columns	The total number of columns in the redo log.
arrival_time	The arrival time of upsert batch.
column_offset	The offsets (from the beginning of buffer) to the beginning of the data section of each column. The total size of the offset vector is num_of_columns + 1 where the last element points to the end of the last column data section.
column_data_type	The data type for each column. See details below.
column_id	The logical id of each column.
column_mode	The encoding mode of each column. See details below.
null vector	If present, the validity vector of each value in a column.
offset vector	If present, the offset (from 0) to each value in the value vector. The total size of offset vector is num_of_rows + 1 where the last element points to the end of the last value. This is needed for variable length values (arrays).
value vector	The value buffer for a column.

column_data_type

column_data_type is a 4-byte integer that stores the type info of a column. It consists of 3 parts: column_data_type & 0x0000FFFF: The width of the data type in bits. column_data_type & 0x00FF0000 >> 16: The base type of the enum. column_data_type & 0xFF000000 >> 24: Reserved for supporting variable length values (array).

The type enum values and their widths:

Enum Value	Name	Width in bits
0	bool	1
1	int8	8
2	uint8	8
3	int16	16
4	uint16	16
5	int32	32
6	uint32	32
7	float32	32
8	small_enum	8
9	big_enum	16
10	uuid	128

column_mode consists of three parts:

The lowest 3 bit is used for data encoding, it can be one of the following values; (0x0007)
- 0 means all values are null and the null vector for the column is omitted.
- 1 means all values are valid and the null vector for the column is omitted.
- 2 means the null vector is present and there may be values of null in the column.
The middle 3 bit will be used for update operation, now it support following operators (>>3 & 0x0007)
- 0 (default) will overwrite existing value if new value is NOT null, otherwise just skip
- 1 will simply overwrite existing value even when new data is null
- 2 addition, add existing value with incoming value
- 3 min, take the minimum of existing and incoming vaule
- 4 max, take the maximum of existing and incoming value
the high 2 bit is reserved

Recovery

During recovery for fact tables, upsert batches are replayed to populate primary key hash, as well as live batches of the in memory vector store, just like when upserts arrive from clients. (This may be out of date: after the replays, archiving needs to be triggered immediately to handle late arrivals in redo log, in order to avoid over counting. )

For dimension tables, the last snapshot is replayed first to populate primary key and the vector store; then the redo logs are replayed to apply patches. This assumes upserts are idempotent.

Home

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redo Logs

Overview

Upsert Batch

column_data_type

Recovery

AresDB Documentation

Administration

Design Docs

Clone this wiki locally