You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're need to know what we're aiming at, and users need to know what they can expect from using RIG in their infrastructure. Given new requirements or change requests, it should be easier to decide whether or not they match the goals of the project.
The subject of this issue is to formulate this vision, but also to track the progress of related tasks (spoiler: we're not quite there yet).
The tasks are defined inline in the following vision statement, which will become a document on the website eventually. Feedback welcome.
Update README with the vision statement
Update repository tagline with the vision statement
Update the website with the text below
Reactive Interaction Gateway a.k.a. RIG
RIG enables event-driven frontends in a secure, reliable and scaleable way.
What does that mean, exactly? Well, let's break it down.
Event-driven frontends
Frontends react to what happens on the backend. Upon receiving business events, they can change their state, adapt their UI or use HTTP requests to fetch new data.
This is not just Server-Sent Events versus polling for data; it also decouples the frontends from the backends, as they describe what kind of events they're interested in, but they don't need to know where and when to fetch them.
Secure
RIG employs process supervisors, which makes it very unlikely that a node fails as a whole. For example, a fault in the Kafka connection might cause the Kafka subsystem to fail, but the rest of the system would be unaffected. The respective supervisor would restart the Kafka process automatically; as soon as the connection would be re-established successfully, the node would be fully operational again.
Running on the BEAM (i.e., the Erlang runtime machine), there is tooling available for introspection, code updates and instrumentation (e.g., tracing) at runtime, in production.
Describe, and link to, the tooling mentioned here.
Verbose logging and a Prometheus metrics endpoint also help with monitoring a node in production.
Describe how to set change log levels (using env vars and at runtime).
To limit a cluster's resource consumption, an administrator can configure a number of soft limits:
Multiple RIG nodes form a self-healing cluster, as described in the following.
Scenario 1: A RIG node goes offline and leaves the cluster
After a node has failed, the frontends that were connected to that node reconnect to another node. They will be notified upon reconnect whether their subscriptions are still available and whether events have been lost (that depends on whether a frontend's session was hosted on the failed node). If subscriptions need to be set up again or missed data has to be requested, the frontend can act accordingly.
Any Kafka or Kinesis partitions the node had handled before its failure are reassigned to other nodes in the cluster automatically. As a consequence of how Kafka/Kinesis consumer groups work, messages may be delivered more than once.
Scenario 2: A RIG node comes back online and (re-)joins the cluster
When a node (re-)joins the cluster, it replicates the subscriptions table as well as the JWT blacklist from other nodes. It might get assigned Kafka or Kinesis partitions and starts to accept frontend connections.
Scaleable
RIG is designed to be scaleable with respect to two key metrics: the number of connected frontends and the number of backend events it can process without generating a noticeable* delay in transmission.
*For our tests we assume 1 second as the highest tolerable delay between RIG's consumption of a message from Kafka and the reception of that message on the frontend via an established SSE connection.
Scalability with respect to the number of connected frontends
For handling large amounts of frontend connections per node we rely on the light-weight processes ("actors") that BEAM, the Erlang runtime, provides. Those processes each have their own lifecycle (and even their own garbage collector) and are scheduled by BEAM on all available CPU cores. They are also memory-efficient and designed to be created and destroyed very quickly. Those features make them ideal for running TCP connections, where they provide the isolation and runtime efficiency needed for handling the long-lived connections in RIG.
Benchmarks
Scaleability with respect to the number of processed backend events
Designed as the event-handling backbone for "everything UI" in a microservice architecture, RIG must be able to process large amounts of data. The trick: RIG immediately discards events that no users are subscribed to.
For example, in a banking system, the backend might generate a million events per second that need to be consumed by RIG. At the same time, only a few thousand users are online at any point in time, which allows RIG to drop most of the messages right at the consuming node.
Benchmarks
The text was updated successfully, but these errors were encountered:
A Refined Vision
We're need to know what we're aiming at, and users need to know what they can expect from using RIG in their infrastructure. Given new requirements or change requests, it should be easier to decide whether or not they match the goals of the project.
The subject of this issue is to formulate this vision, but also to track the progress of related tasks (spoiler: we're not quite there yet).
The tasks are defined inline in the following vision statement, which will become a document on the website eventually. Feedback welcome.
Reactive Interaction Gateway a.k.a. RIG
What does that mean, exactly? Well, let's break it down.
Event-driven frontends
Frontends react to what happens on the backend. Upon receiving business events, they can change their state, adapt their UI or use HTTP requests to fetch new data.
This is not just Server-Sent Events versus polling for data; it also decouples the frontends from the backends, as they describe what kind of events they're interested in, but they don't need to know where and when to fetch them.
Secure
RIG employs process supervisors, which makes it very unlikely that a node fails as a whole. For example, a fault in the Kafka connection might cause the Kafka subsystem to fail, but the rest of the system would be unaffected. The respective supervisor would restart the Kafka process automatically; as soon as the connection would be re-established successfully, the node would be fully operational again.
Running on the BEAM (i.e., the Erlang runtime machine), there is tooling available for introspection, code updates and instrumentation (e.g., tracing) at runtime, in production.
Verbose logging and a Prometheus metrics endpoint also help with monitoring a node in production.
To limit a cluster's resource consumption, an administrator can configure a number of soft limits:
Plugins are sandboxed, as they are either
Reliable
Multiple RIG nodes form a self-healing cluster, as described in the following.
Scenario 1: A RIG node goes offline and leaves the cluster
After a node has failed, the frontends that were connected to that node reconnect to another node. They will be notified upon reconnect whether their subscriptions are still available and whether events have been lost (that depends on whether a frontend's session was hosted on the failed node). If subscriptions need to be set up again or missed data has to be requested, the frontend can act accordingly.
Any Kafka or Kinesis partitions the node had handled before its failure are reassigned to other nodes in the cluster automatically. As a consequence of how Kafka/Kinesis consumer groups work, messages may be delivered more than once.
Scenario 2: A RIG node comes back online and (re-)joins the cluster
When a node (re-)joins the cluster, it replicates the subscriptions table as well as the JWT blacklist from other nodes. It might get assigned Kafka or Kinesis partitions and starts to accept frontend connections.
Scaleable
RIG is designed to be scaleable with respect to two key metrics: the number of connected frontends and the number of backend events it can process without generating a noticeable* delay in transmission.
*For our tests we assume 1 second as the highest tolerable delay between RIG's consumption of a message from Kafka and the reception of that message on the frontend via an established SSE connection.
Scalability with respect to the number of connected frontends
For handling large amounts of frontend connections per node we rely on the light-weight processes ("actors") that BEAM, the Erlang runtime, provides. Those processes each have their own lifecycle (and even their own garbage collector) and are scheduled by BEAM on all available CPU cores. They are also memory-efficient and designed to be created and destroyed very quickly. Those features make them ideal for running TCP connections, where they provide the isolation and runtime efficiency needed for handling the long-lived connections in RIG.
Scaleability with respect to the number of processed backend events
Designed as the event-handling backbone for "everything UI" in a microservice architecture, RIG must be able to process large amounts of data. The trick: RIG immediately discards events that no users are subscribed to.
For example, in a banking system, the backend might generate a million events per second that need to be consumed by RIG. At the same time, only a few thousand users are online at any point in time, which allows RIG to drop most of the messages right at the consuming node.
The text was updated successfully, but these errors were encountered: