Agents should keeping trying to send events to the principal, until the principal acknowledges that the event has been processed and persisted to the backend. #117
Labels
enhancement
New feature or request
Milestone
At present, there are a couple issues with event communication reliability between principal and agent:
In GitOps Service, we solved this using a queue stored in RDBMS (the 'Operation's table). The general algorithm is the same, here, although the specifics are slightly different (because principal/agent does not persist queue entries to disk, unlike RDBMS).
To solve both these problems, agent <-> principal communication can work as follows:
A) When an agent event occurs, queue it to be sent to principal, replacing any previously waiting events for that resource:
B) Do not remove an event from the agent queue until the principal indicates that it has been processed AND stored in the principal backend:
C) On startup of agent, the agent must send the current state of all resources (presumably as update events)
These 3 behaviours work together to ensure that the eventual consistency gap between agent/principal is as small as possible, and no state changes are missed, even in the case of network instability or agent/principal container restarts.
The text was updated successfully, but these errors were encountered: