From 013e79713ee4e07dfa84fd53b7a4631f77a4effd Mon Sep 17 00:00:00 2001 From: Matej Focko Date: Mon, 22 Jul 2024 13:11:55 +0200 Subject: [PATCH] docs: document the resource requirements Signed-off-by: Matej Focko --- docs/deployment/resource-requirements.md | 93 ++++++++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 docs/deployment/resource-requirements.md diff --git a/docs/deployment/resource-requirements.md b/docs/deployment/resource-requirements.md new file mode 100644 index 00000000..267cd1f5 --- /dev/null +++ b/docs/deployment/resource-requirements.md @@ -0,0 +1,93 @@ +--- +title: Resource requirements +--- + +Usual Packit Service deployment consists of the following services with these +resource requirements. + +## CPU requirements + +| Deployment | Requested (always assigned) | Limit | +| ---------------- | ---------------------------: | -----: | +| postgres | `30m` | `1` | +| redict | `10m` | `10m` | +| flower | `5m` | `50m` | +| nginx | `5m` | `10m` | +| pushgateway | `5m` | `10m` | +| tokman | `20m` (prod, `5m` otherwise) | `50m` | +| dashboard | `5m` | `50m` | +| fedmsg | `5m` | `50m` | +| beat | `5m` | `50m` | +| service | `10m` | `200m` | +| worker (generic) | `100m` | `400m` | +| worker (short) | `80m` | `400m` | +| worker (long) | `100m` | `600m` | + +## Memory requirements + +| Deployment | Requested (always assigned) | Limit | +| ---------------- | -------------------------------: | ---------------------------------: | +| postgres | `1Gi` (prod, `256Mi` otherwise) | `1536Mi` (prod, `512Mi` otherwise) | +| redict | `128Mi` | `256Mi` | +| flower | `80Mi` | `128Mi` | +| nginx | `8Mi` | `32Mi` | +| pushgateway | `16Mi` | `32Mi` | +| tokman | `100Mi` (prod, `88Mi` otherwise) | `160Mi` (prod, `128Mi` otherwise) | +| dashboard | `128Mi` | `256Mi` | +| fedmsg | `88Mi` | `128Mi` | +| beat | `160Mi` | `256Mi` | +| service | `320Mi` | `1Gi` (prod, `512Mi` otherwise) | +| worker (generic) | `384Mi` | `1024Mi` | +| worker (short) | `320Mi` | `640Mi` | +| worker (long) | `384Mi` | `1024Mi` | + +## Reasoning for the requirements + +**\[TODO\]** + +## Currently allowed requirements / limits + +| Resource | Allowed to request | Limit | +| -------- | -----------------: | ----: | +| CPU | `3` | `12` | +| Memory | `6Gi` | `8Gi` | + +## Total for production + +| Deployment | Memory request | Memory limit | CPU request | CPU limit | +| ---------------- | -------------: | -----------: | ----------: | ----------: | +| non-scalable[^1] | `2052Mi` | `3808Mi` | `100m` | `1280m` | +| 2× short worker | `640Mi` | `1280Mi` | `160m` | `800m` | +| 2× long worker | `768Mi` | `2048Mi` | `200m` | `1000m` | +| **Σ** | **`3460Mi`** | **`7136Mi`** | **`460m`** | **`3080m`** | + +## Proposed changes + +1. Revert to the pre-MP+ resources (they were higher for service, workers and + postgres; lower values were used due to a hardcoded check in the templates); + + Pre-MP+ memory requirements/limits for production deployment: + + | Deployment | Requested | Limit | + | ---------- | --------: | ----: | + | postgres | `2Gi` | `4Gi` | + | service | `320m` | `4Gi` | + +1. Request adjustments of the quotas such that we can have some buffer (database + migrations, higher load on service, etc.), but also could **permanently** + scale up the workers if we find service to be more reliable that way + + - Based on the calculations above, 2× the current quotas on memory would be + sufficient, but if we were to scale the workers up too (and account for + possible adjustments, e.g., Redict) we should probably go for 3× + - **\[TODO\]** Also check how such changes affect the CPU requests/limits + +1. Migrate tokman to different toolchain, it's a small self-contained app, so it + is easy to migrate to either Rust or Go that should leave smaller footprint. + - **\[TODO\]** Also research the possibility of dropping it, since GitHub + might've changed their policy about GitHub App tokens; this was discovered + by Hunor, but we haven't tried dropping it… + +[^1]: + includes non-scalable deployments, i.e., each runs just one pod, e.g., + dashboard, redict, postgres, etc.