Skip to content

Commit 14198fe

Browse files
Monitoring, Observability and HPA doc improvements (#531)
* Drop obsolete Gotchas section from monitoring doc Prometheus uses nowadays ClusterRole/Binding for accessing metrics from all namespaces. There's no need to update RBAC rules. Signed-off-by: Eero Tamminen <[email protected]> * Slightly improve HPA doc CPU notes Signed-off-by: Eero Tamminen <[email protected]> * Link Helm monitoring and k8s observalibity addon docs Signed-off-by: Eero Tamminen <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Eero Tamminen <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 66de41c commit 14198fe

File tree

3 files changed

+12
-12
lines changed

3 files changed

+12
-12
lines changed

helm-charts/HPA.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ Read [post-install](#post-install) steps before installation!
2626

2727
### Resource requests
2828

29-
HPA controlled CPU pods SHOULD have appropriate resource requests or affinity rules (enabled in their
29+
HPA controlled _CPU_ pods SHOULD have appropriate resource requests or affinity rules (enabled in their
3030
subcharts and tested to work) so that k8s scheduler does not schedule too many of them on the same
3131
node(s). Otherwise they never reach ready state.
3232

@@ -79,7 +79,7 @@ Why HPA is opt-in:
7979
- Top level chart name needs to conform to Prometheus metric naming conventions,
8080
as it is also used as a metric name prefix (with dashes converted to underscores)
8181
- Unless pod resource requests, affinity rules, scheduling topology constraints and/or cluster NRI
82-
policies are used to better isolate service inferencing pods from each other, instances
82+
policies are used to better isolate _CPU_ inferencing pods from each other, service instances
8383
scaled up on same node may never get to ready state
8484
- Current HPA rules are just examples, for efficient scaling they need to be fine-tuned for given setup
8585
performance (underlying HW, used models and data types, OPEA version etc)
@@ -94,8 +94,9 @@ ChatQnA includes pre-configured values files for scaling the services.
9494
To enable HPA, add `-f chatqna/hpa-values.yaml` option to your `helm install` command line.
9595

9696
If **CPU** versions of TGI (and TEI) services are being scaled, resource requests and probe timings
97-
suitable for CPU usage need to be used. Add `-f chatqna/cpu-values.yaml` option to your `helm install`
98-
line. If you need to change model specified there, update the resource requests accordingly.
97+
suitable for CPU usage need to be used. `chatqna/cpu-values.yaml` provides example of such constraints
98+
which can be added (with `-f` option) to your Helm install. As those values depend on the underlying HW,
99+
used model, data type and image versions, the specified resource values may need to be updated.
99100

100101
### Post-install
101102

helm-charts/monitoring.md

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,6 @@
66
- [Pre-conditions](#pre-conditions)
77
- [Prometheus install](#prometheus-install)
88
- [Helm options](#helm-options)
9-
- [Gotchas](#gotchas)
109
- [Install](#install)
1110
- [Verify](#verify)
1211

@@ -17,6 +16,10 @@ which can be visualized e.g. in [Grafana](https://grafana.com/).
1716

1817
Scaling the services automatically based on their usage with [HPA](HPA.md) also relies on these metrics.
1918

19+
[Observability documentation](../kubernetes-addons/Observability/README.md)
20+
explains how to install additional monitoring for node and device metrics,
21+
and Grafana for visualizing those metrics.
22+
2023
## Pre-conditions
2124

2225
### Prometheus install
@@ -42,12 +45,6 @@ provide that as `global.prometheusRelease` value for the OPEA service Helm insta
4245
or in its `values.yaml` file. Otherwise Prometheus ignores the installed
4346
`serviceMonitor` objects.
4447

45-
## Gotchas
46-
47-
By default Prometheus adds [k8s RBAC rules](https://github.com/prometheus-operator/kube-prometheus/blob/main/manifests/prometheus-roleBindingSpecificNamespaces.yaml)
48-
for detecting `serviceMonitor`s and querying metrics from `default`, `kube-system` and `monitoring` namespaces.
49-
If Helm is asked to install OPEA service to some other namespace, those rules need to be updated accordingly.
50-
5148
## Install
5249

5350
Install Helm chart with `global.monitoring:true` option.

kubernetes-addons/Observability/README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# How-To Setup Observability for OPEA Workload in Kubernetes
22

3-
This guide provides a step-by-step approach to setting up observability for the OPEA workload in a Kubernetes environment. We will cover the setup of Prometheus and Grafana, as well as the collection of metrics for Gaudi hardware, OPEA/chatqna including TGI,TEI-Embedding,TEI-Reranking and other microservies, and PCM.
3+
This guide provides a step-by-step approach to setting up observability for the OPEA workload in a Kubernetes environment. We will cover the setup of Prometheus and Grafana, as well as the collection of metrics for Gaudi hardware, OPEA/chatqna including TGI, TEI-Embedding, TEI-Reranking and other microservices, and PCM.
4+
5+
For monitoring Helm installed OPEA applications, see [Helm monitoring option](../../helm-charts/monitoring.md).
46

57
## Prepare
68

0 commit comments

Comments
 (0)