From 3176e49ef580c0c37b6153420174ab4e7f499a0b Mon Sep 17 00:00:00 2001 From: Amber Brown Date: Fri, 11 Oct 2024 15:51:52 +1100 Subject: [PATCH] add some documentation --- docs/mimo.md | 1 - docs/mimo/README.md | 22 ++++++++++++++++++++++ docs/mimo/actuator.md | 30 ++++++++++++++++++++++++++++++ docs/mimo/admin-api.md | 0 docs/mimo/scheduler.md | 3 +++ docs/mimo/writing-tasks.md | 1 + 6 files changed, 56 insertions(+), 1 deletion(-) delete mode 100644 docs/mimo.md create mode 100644 docs/mimo/README.md create mode 100644 docs/mimo/actuator.md create mode 100644 docs/mimo/admin-api.md create mode 100644 docs/mimo/scheduler.md create mode 100644 docs/mimo/writing-tasks.md diff --git a/docs/mimo.md b/docs/mimo.md deleted file mode 100644 index bc4f21edd3e..00000000000 --- a/docs/mimo.md +++ /dev/null @@ -1 +0,0 @@ -# Managed Infrastructure Maintenance Operator diff --git a/docs/mimo/README.md b/docs/mimo/README.md new file mode 100644 index 00000000000..8d03ab961c4 --- /dev/null +++ b/docs/mimo/README.md @@ -0,0 +1,22 @@ +# MIMO Documentation + +The Managed Infrastructure Maintenance Operator, or MIMO, is a component of the Azure Red Hat OpenShift Resource Provider (ARO-RP) which is responsible for automated maintenance of clusters provisioned by the platform. +MIMO specifically focuses on "managed infrastructure", the parts of ARO that are deployed and maintained by the RP and ARO Operator instead of by OCP (in-cluster) or Hive (out-of-cluster). + +MIMO consists of two main components, the [Actuator](./actuator.md) and the [Scheduler](./scheduler.md). It is primarily interfaced with via the [Admin API](./admin-api.md). + +## A Primer On MIMO + +The smallest thing that you can tell MIMO to run is a **Task** (see [`pkg/mimo/tasks/`](../../pkg/mimo/tasks/)). +A Task is composed of reusable **Steps** (see [`pkg/mimo/steps/`](../../pkg/mimo/steps/)), reusing the framework utilised by AdminUpdate/Update/Install methods in `pkg/cluster/`. +A Task only runs in the scope of a singular cluster. +These steps are run in sequence and can return either **Terminal** errors (causing the ran Task to fail and not be retried) or **Transient** errors (which indicates that the Task can be retried later). + +Tasks are executed by the **Actuator** by way of creation of a **Maintenance Manifest**. +This Manifest is created with the cluster ID (which is elided from the cluster-scoped Admin APIs), the Task ID (which is currently a UUID), and optional priority, "start after", and "start before" times which are filled in with defaults if not provided. +The Actuator will treat these Maintenance Manifests as a work queue, taking ones which are past their "start after" time and executing them in order of earliest start-after and priority. +After running each, a state will be written into the Manifest (with optional free-form status text) with the result of the ran Task. +Manifests past their start-before times are marked as having a "timed out" state and not ran. + +Currently, Manifests are created by the Admin API. +In the future, the Scheduler will create some these Manifests depending on cluster state/version and wall-clock time, providing the ability to perform tasks like rotations of secrets autonomously. diff --git a/docs/mimo/actuator.md b/docs/mimo/actuator.md new file mode 100644 index 00000000000..5950662890e --- /dev/null +++ b/docs/mimo/actuator.md @@ -0,0 +1,30 @@ +# Managed Infrastructure Maintenance Operator: Actuator + +The Actuator is the MIMO component that performs execution of tasks. +The process of running tasks looks like this: + +```mermaid +graph TD; + START((Start))-->QUERY; + QUERY[Fetch all State = Pending] -->SORT; + SORT[Sort tasks by RUNAFTER and PRIORITY]-->ITERATE[Iterate over tasks]; + ITERATE-- Per Task -->ISEXPIRED; + subgraph PerTask[ ] + ISEXPIRED{{Is RUNBEFORE > now?}}-- Yes --> STATETIMEDOUT([State = TimedOut]) --> CONTINUE[Continue]; + ISEXPIRED-- No --> DEQUEUECLUSTER; + DEQUEUECLUSTER[Claim lease on OpenShiftClusterDocument] --> DEQUEUE; + DEQUEUE[Actuator dequeues task]--> ISRETRYLIMIT; + ISRETRYLIMIT{{Have we retried the task too many times?}} -- Yes --> STATETIMEDOUT; + ISRETRYLIMIT -- No -->STATEINPROGRESS; + STATEINPROGRESS([State = InProgress]) -->RUN[[Task is run]]; + RUN -- Success --> SUCCESS + RUN-- Terminal Error-->TERMINALERROR; + RUN-- Transient Error-->TRANSIENTERROR; + SUCCESS([State = Completed])-->DELEASECLUSTER + TERMINALERROR([State = Failed])-->DELEASECLUSTER; + TRANSIENTERROR([State = Pending])-->DELEASECLUSTER; + DELEASECLUSTER[Release Lease on OpenShiftClusterDocument] -->CONTINUE; + end + CONTINUE-->ITERATE; + ITERATE-- Finished -->END; +``` diff --git a/docs/mimo/admin-api.md b/docs/mimo/admin-api.md new file mode 100644 index 00000000000..e69de29bb2d diff --git a/docs/mimo/scheduler.md b/docs/mimo/scheduler.md new file mode 100644 index 00000000000..8a457798a25 --- /dev/null +++ b/docs/mimo/scheduler.md @@ -0,0 +1,3 @@ +# MIMO Scheduler + +The MIMO Scheduler is a planned component, but is not yet implemented. diff --git a/docs/mimo/writing-tasks.md b/docs/mimo/writing-tasks.md new file mode 100644 index 00000000000..91861535fd2 --- /dev/null +++ b/docs/mimo/writing-tasks.md @@ -0,0 +1 @@ +# Writing MIMO Tasks