Skip to content

Latest commit

 

History

History
147 lines (101 loc) · 6.87 KB

ARCHITECTURE.md

File metadata and controls

147 lines (101 loc) · 6.87 KB

SensApp Architecture

SensApp should be an accommodating platform to support sensor-based applications.

It should be relatively small and simple to transform incoming sensor data, persist it, and make it available to other applications.

Simple SensApp Deployment Example

graph LR
    Sensor1[Sensor 1] --> SensApp
    Sensor2[Sensor 2] --> SensApp
    Sensor3[Sensor 3] --> SensApp
    Sensor4[Sensor 4] --> SensApp
    SensApp <--> DB[Database]

    SensApp[SensApp]
Loading

Advanced SensApp Deployment Example

graph LR

    subgraph Sensors
    Sensor1(Sensor 1)
    Sensor2(Sensor 2)
    Sensor3(Sensor 3)
    Sensor4(Sensor 4)
    MoreSensors((...))
    end

    subgraph MQs["Distributed Message Queue"]
    MQ1[MQ 1]
    MQ2[MQ 2]
    MQ3[MQ 3]
    MoreMQ((...))
    end

    subgraph SensApp["SensApp Cluster"]
    SensApp1[SensApp 1]
    SensApp2[SensApp 2]
    SensApp3[SensApp 3]
    MoreSensApp((...))
    end

    subgraph DBs["Fancy Database Cluster"]
    DB1[DB 1]
    DB2[DB 2]
    DB3[DB 3]
    MoreDB((...))
    end

    Sensor1 --> MQs
    Sensor2 --> MQs
    Sensor3 --> MQs
    Sensor4 --> MQs
    MQs --> SensApp
    MQs --> SensApp
    MQs --> SensApp
    SensApp <--> DBs
    SensApp <--> DBs
    SensApp <--> DBs
Loading

Technology and programming language

Previously developed in Scala, we rewrote SensApp in Rust. The main reason is that the new author prefers Rust over Scala or Golang. The second main reason is from the results from the paper Energy efficiency across programming languages: how do energy, time, and memory relate?, which shows Rust as one of the most energy-efficient programming languages while having memory safety.

SensApp uses the Rust Tokio runtime for asynchronous programming.

Internal Architecture

SensApp uses an event-based architecture internally, with a message bus to communicate between components. The messages are lightweight, only internal, and do not rely on a network.

Incoming data streams

SensApp should eventually support the following incoming data streams:

  • Stream-based protocols:
    • MQTT
    • AMQP
    • Kafka
    • Nats
  • HTTP REST Push
  • HTTP Rest Pull
  • InfluxDB Write API
  • Prometheus Remote Write API

Supported incoming data formats

Sensor data can come in many formats, and as an accommodating platform, SensApp should support the most common ones.

CSV, JSON, SenML, and Parquet are supported. But more formats can be added whenever needed.

We also support InfluxDB line protocol, and the Prometheus remote stores protocol to allow an easy transition from these platforms to SensApp.

Storage

SensApp should support various storage backends. The best storage backend for time series has yet to exist.

  • For small deployments, SQLite is used.

  • It is possible to use Litestream to replicate and backup the SQLite database.

  • For medium deployments, PostgreSQL is used.

    • It is optional to use the TimescaleDB plugin or Citus Columnar.
  • For larger deployments, ClickHouse is used.

  • SensApp can also produce Parquet files stored in S3-compatible object stores.

  • A experimental support for DuckDB is also available. The DuckDB database format isn't stable yet, and it may be wise to wait for the DuckDB 1.0 release before using it in production.

SensApp can use other storage backends in the future. Could it be Cassandra, Apache IoTDB, OpenTSDB, QuestDB, HoraeDB, or something new?

We base our storage on the findings of the paper TSM-Bench: Benchmarking Time Series Database Systems for Monitoring Applications that shows that ClickHouse is a better choice than most databases for time series at scale, at the moment. Unfortunately, The paper didn't include IoTDB, and the new author doesn't like the JVM much, so IoTDB support is not a priority. Other databases are relatively new, and we favour the most popular ones for now.

SensApp also supports SQLite for small deployments and local persistence. The SQLite storage feature cannot scale to large deployments, but many deployments are small, and SQLite is a good choice for these. When combined with Litestream, SQLite can have some replication and backup capabilities.

PostgreSQL is also supported as it is the most popular database according to the StackOverflow developer Survey 2023 and should provide a good compromise between performance and convenience. The choice between Vanilla PostgreSQL tables, TimeScaleDB bucketstyle (hyper) tables, or Citus columnar tables is left to the user.

Columnar storage with compression fits well with time series data, and a distributed Clickhouse cluster is the favoured choice for large deployments.

DuckDB is a new database that is promising. It can be seen as some kind of columnar SQLite, for analytics. Once the DuckDB 1.0 release is out, it may be a good choice for small to medium deployments.

SensApp used to rely on MongoDB, as it was created during the NoSQL hype, but the performances were very poor for this use case. It was supposed to web scale, but it didn't scale well enough for time series data. Having a binary JSON (BSON) document per datapoint had a too significant overhead.

Scalability

SensApp should be able to scale vertically and horizontally. However, the burden of horizontal scaling is left to other components outside SensApp.

At scale, it is strongly advised to rely on the message queue ingestion pipelines when possible.

The publisher should have a mechanism to automatically retry when SensAPP returns a 503 HTTP error because of a high load.

SensApp should scale horizontally and not persist state on its own. It keeps relatively small buffers in memory to improve performances and relies on the storage backend to persist data. Publishers should consider the data as persisted once SensApp acknowledges it.

The storage layer should scale as well. SQLite on a network filesystem or using Litestream may, but using another distributed storage backend is advised when one single database instance isn't enough.

It is essential to mention that horizontal scalability comes with a higher complexity and energy cost. Prefer vertical scalability when possible. In 2024, single database servers can handle high loads, with hundreds of cores, petabytes of storage, and terabytes of RAM.

Resilience

SensApp should acknowledge the persistence of the incoming data once the storage backend has persisted it. If SensApp crashes or is shut down, the publisher should keep the data and wait for SensApp to return online.

The publisher should favour the message queue ingestion pipeline if resilience is a concern.

The storage backend and the message queue should be resilient.

Internal Software Architecture

Internally, SensApp uses a message bus and queues to communicate between components. This is inspired by the NodeJS, Apache NiFi, and the reactive manifesto.