Monitoring, Observability and HPA doc improvements (#531)

eero-t · pre-commit-ci[bot] · web-flow · commit 14198fec9a5d · 2024-11-13T08:33:11.000+02:00
* Drop obsolete Gotchas section from monitoring doc Prometheus uses nowadays ClusterRole/Binding for accessing metrics from all namespaces. There's no need to update RBAC rules. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com> * Slightly improve HPA doc CPU notes Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com> * Link Helm monitoring and k8s observalibity addon docs Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
diff --git a/helm-charts/HPA.md b/helm-charts/HPA.md
@@ -26,7 +26,7 @@ Read [post-install](#post-install) steps before installation!
 
 ### Resource requests
 
-HPA controlled CPU pods SHOULD have appropriate resource requests or affinity rules (enabled in their
+HPA controlled _CPU_ pods SHOULD have appropriate resource requests or affinity rules (enabled in their
 subcharts and tested to work) so that k8s scheduler does not schedule too many of them on the same
 node(s). Otherwise they never reach ready state.
 
@@ -79,7 +79,7 @@ Why HPA is opt-in:
 - Top level chart name needs to conform to Prometheus metric naming conventions,
   as it is also used as a metric name prefix (with dashes converted to underscores)
 - Unless pod resource requests, affinity rules, scheduling topology constraints and/or cluster NRI
-  policies are used to better isolate service inferencing pods from each other, instances
+  policies are used to better isolate _CPU_ inferencing pods from each other, service instances
   scaled up on same node may never get to ready state
 - Current HPA rules are just examples, for efficient scaling they need to be fine-tuned for given setup
   performance (underlying HW, used models and data types, OPEA version etc)
@@ -94,8 +94,9 @@ ChatQnA includes pre-configured values files for scaling the services.
 To enable HPA, add `-f chatqna/hpa-values.yaml` option to your `helm install` command line.
 
 If **CPU** versions of TGI (and TEI) services are being scaled, resource requests and probe timings
-suitable for CPU usage need to be used. Add `-f chatqna/cpu-values.yaml` option to your `helm install`
-line. If you need to change model specified there, update the resource requests accordingly.
+suitable for CPU usage need to be used. `chatqna/cpu-values.yaml` provides example of such constraints
+which can be added (with `-f` option) to your Helm install. As those values depend on the underlying HW,
+used model, data type and image versions, the specified resource values may need to be updated.
 
 ### Post-install
 
diff --git a/helm-charts/monitoring.md b/helm-charts/monitoring.md
@@ -6,7 +6,6 @@
 - [Pre-conditions](#pre-conditions)
   - [Prometheus install](#prometheus-install)
   - [Helm options](#helm-options)
-- [Gotchas](#gotchas)
 - [Install](#install)
 - [Verify](#verify)
 
@@ -17,6 +16,10 @@ which can be visualized e.g. in [Grafana](https://grafana.com/).
 
 Scaling the services automatically based on their usage with [HPA](HPA.md) also relies on these metrics.
 
+[Observability documentation](../kubernetes-addons/Observability/README.md)
+explains how to install additional monitoring for node and device metrics,
+and Grafana for visualizing those metrics.
+
 ## Pre-conditions
 
 ### Prometheus install
@@ -42,12 +45,6 @@ provide that as `global.prometheusRelease` value for the OPEA service Helm insta
 or in its `values.yaml` file. Otherwise Prometheus ignores the installed
 `serviceMonitor` objects.
 
-## Gotchas
-
-By default Prometheus adds [k8s RBAC rules](https://github.com/prometheus-operator/kube-prometheus/blob/main/manifests/prometheus-roleBindingSpecificNamespaces.yaml)
-for detecting `serviceMonitor`s and querying metrics from `default`, `kube-system` and `monitoring` namespaces.
-If Helm is asked to install OPEA service to some other namespace, those rules need to be updated accordingly.
-
 ## Install
 
 Install Helm chart with `global.monitoring:true` option.
diff --git a/kubernetes-addons/Observability/README.md b/kubernetes-addons/Observability/README.md
@@ -1,6 +1,8 @@
 # How-To Setup Observability for OPEA Workload in Kubernetes
 
-This guide provides a step-by-step approach to setting up observability for the OPEA workload in a Kubernetes environment. We will cover the setup of Prometheus and Grafana, as well as the collection of metrics for Gaudi hardware, OPEA/chatqna including TGI,TEI-Embedding,TEI-Reranking and other microservies, and PCM.
+This guide provides a step-by-step approach to setting up observability for the OPEA workload in a Kubernetes environment. We will cover the setup of Prometheus and Grafana, as well as the collection of metrics for Gaudi hardware, OPEA/chatqna including TGI, TEI-Embedding, TEI-Reranking and other microservices, and PCM.
+
+For monitoring Helm installed OPEA applications, see [Helm monitoring option](../../helm-charts/monitoring.md).
 
 ## Prepare