Skip to content

Architecture

Hongzheng Shi edited this page Dec 3, 2018 · 4 revisions

Overview

At a high level, AresDB stores most of its data in host memory (RAM that is connected to CPUs). The solution handles data ingestion using CPUs and recovery via disks. At query time, it transfers data from host memory to GPU memory for parallel processing on GPUs. As shown below, AresDB consists of a memory store, a meta data store, and a disk store:

Tables

Unlike most relational database management systems (RDBMSs), there is no database or schema scope in AresDB. All tables belong to the same scope in the same AresDB cluster/instance, enabling users to refer to them directly. Users store their data as fact tables and dimension tables, outlined below:

Fact table

A fact table stores an infinite stream of time series events. Users use fact table to store events/facts that are happening in time. Each event is associated with an event time. The table is often queried by this event time. An example of the type of information stored by fact tables are trips, where each trip is an event and the trip request time is often designated as the event time. In case an event has multiple timestamps associated with it, only one timestamp is designated as the event time of the information displayed in the fact table.

Dimension table

A dimension table stores current properties for entities (cities, clients, drivers). For example, users can use dimension table to store all the city info (city name, timezone, country, etc.) Compared to fact tables, which grows infinitely over time, dimension tables are always bounded by size (e.g., cities table is bounded by actual number of cities in the world). Dimension tables do not need the special time column.

Data types

Table below details the current data types supported in AresDB:

Data Types Storage(in bytes) Details
Bool 1/8 Stored as a single bit.
Int8, Uint8, SmallEnum 1 Strings are ingested as enums for equality check only. No substring or regex support for now.
Int16, Uint16, BigEnum 2 Strings are ingested as enums for equality check only. No substring or regex support for now.
Int32, Uint32, Float32 4 Float64 can be added later when needed.
UUID 16 For equality check only.
GeoPoint 4 Geographic points
GeoShape Variable Length Pologon or multi-polygons

With AresDB, strings are converted to enums automatically before they enter the database for better storage and query efficiency. This allows case-sensitive equality checking, but does not support advanced operations such as concatenation, substrings, globs, and regex matching. We may add full string support in future.

Key features

AresDB’s architecture supports the following features:

  • Columnar based storage with compression: storage efficiency (less memory usage in terms of bytes to store data) and query efficiency (less data transfer from CPU memory to GPU memory during query)
  • Real-time upsert with primary key deduplication: high data accuracy and near real-time data freshness within seconds.
  • GPU powered query processing: highly parallelized data processing powered by GPU renders low query latency (sub-seconds to seconds)