Add KubeAI monitoring support + vLLM dashboard #1054

eero-t · 2025-05-16T18:19:23Z

Description

Add KubeAI monitoring support + vLLM dashboard.

Monitoring can be added either by using the helper script, or by calling Helm directly with the new metrics.

Issues

n/a.

Type of change

New feature (non-breaking change which adds new functionality)

Dependencies

n/a.

Tests

Manually tested.

Copilot

Pull Request Overview

This pull request adds support for KubeAI monitoring and a vLLM dashboard for observability.

Introduces a new YAML configuration file for Prometheus-based monitoring of vLLM metrics.
Updates the README with instructions on enabling observability using the provided install script and Helm chart.

Reviewed Changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.

File	Description
kubeai/metric-values.yaml	New configuration values enabling vLLM PodMonitor in Prometheus.
kubeai/README.md	Added Observability section with instructions for setting up monitoring and the vLLM dashboard.

Files not reviewed (1)

kubeai/install.sh: Language not supported

kubeai/README.md

Signed-off-by: Eero Tamminen <[email protected]>

In case somebody wants to run Helm directly instead of using install.sh. Signed-off-by: Eero Tamminen <[email protected]>

Signed-off-by: Eero Tamminen <[email protected]>

Copilot

Pull Request Overview

Adds Prometheus-based monitoring support for KubeAI’s vLLM engine and provides instructions for deploying a Grafana dashboard.

Introduces vLLMPodMonitor in Helm values for scraping vLLM metrics
Extends README with observability setup and dashboard installation steps

Reviewed Changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.

File	Description
kubeai/metric-values.yaml	Enable Prometheus PodMonitor resource (`vLLMPodMonitor.enabled`)
kubeai/README.md	Add “Observability” section with script usage and dashboard setup

Files not reviewed (1)

kubeai/install.sh: Language not supported

Comments suppressed due to low confidence (1)

kubeai/README.md:170

The new observability feature (Prometheus monitoring and vLLM dashboard) lacks automated tests. Consider adding unit or integration tests to validate the installation script and dashboard deployment.

# Observability

kubeai/README.md

eero-t · 2025-05-26T16:51:00Z

@marquiz, @mkbhanda OK to merge?

marquiz

Thanks @eero-t. One nit but I can live with that 😄 I think we can merge this

marquiz · 2025-05-27T13:15:29Z

kubeai/install.sh

+
+metrics=""
+for arg in "$@"; do
+	if [ -f "$arg" ]; then


nit: the arg parsing looks somewhat shaky'n'shady but I guess that's ok for this kind of hack/helper script

I would think it very unlikely that user's Prometheus release is named exactly the same as some file in the kubeai/ directory...

PS. I'm wondering about the benefit of the script, when more things are needed to configure. I think it would be clearer if user would just invoke Helm directly (with command copy-pasted from README), in this case with additional -f monitoring.yaml argument.

eero-t requested review from mkbhanda and poussa as code owners May 16, 2025 18:19

eero-t requested review from Copilot and removed request for mkbhanda and poussa May 16, 2025 18:19

Copilot AI reviewed May 16, 2025

View reviewed changes

kubeai/README.md Outdated Show resolved Hide resolved

eero-t marked this pull request as draft May 16, 2025 18:19

eero-t force-pushed the kubeai-metrics branch 2 times, most recently from cc4f8b8 to 7023f9e Compare May 16, 2025 18:23

eero-t added 3 commits May 20, 2025 19:39

KubeAI pod monitoring support

4e47137

Signed-off-by: Eero Tamminen <[email protected]>

KubeAI values file for enabling vLLM monitoring

98a1f10

In case somebody wants to run Helm directly instead of using install.sh. Signed-off-by: Eero Tamminen <[email protected]>

KubeAI vLLM dashboard for Grafana

de0f41a

Signed-off-by: Eero Tamminen <[email protected]>

eero-t force-pushed the kubeai-metrics branch from 7023f9e to de0f41a Compare May 20, 2025 18:55

eero-t marked this pull request as ready for review May 20, 2025 19:02

eero-t requested a review from Copilot May 20, 2025 19:03

Copilot AI reviewed May 20, 2025

View reviewed changes

kubeai/README.md Show resolved Hide resolved

eero-t requested a review from poussa May 20, 2025 19:10

poussa requested review from marquiz and mkbhanda May 22, 2025 07:56

poussa approved these changes May 23, 2025

View reviewed changes

This was linked to issues May 27, 2025

[Feature][KubeAI] Enabling Observability of Gaudi opea-project/GenAIEval#289

Closed

[Feature] KubeAI for OPEA v1.4 #1074

Closed

This was unlinked from issues May 27, 2025

[Feature][KubeAI] Enabling Observability of Gaudi opea-project/GenAIEval#289

Closed

[Feature] KubeAI for OPEA v1.4 #1074

Closed

joshuayao linked an issue May 27, 2025 that may be closed by this pull request

[Feature][KubeAI] Enabling Observability of Gaudi #1080

Closed

marquiz approved these changes May 27, 2025

View reviewed changes

eero-t merged commit 0efc35b into opea-project:main May 27, 2025
11 checks passed

eero-t deleted the kubeai-metrics branch May 27, 2025 13:40

joshuayao mentioned this pull request May 28, 2025

[Feature][KubeAI] Enabling Observability of Gaudi #1080

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add KubeAI monitoring support + vLLM dashboard #1054

Add KubeAI monitoring support + vLLM dashboard #1054

Uh oh!

eero-t commented May 16, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

eero-t commented May 26, 2025

Uh oh!

marquiz left a comment

Uh oh!

marquiz May 27, 2025

Uh oh!

eero-t May 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add KubeAI monitoring support + vLLM dashboard #1054

Add KubeAI monitoring support + vLLM dashboard #1054

Uh oh!

Conversation

eero-t commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issues

Type of change

Dependencies

Tests

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

eero-t commented May 26, 2025

Uh oh!

marquiz left a comment

Choose a reason for hiding this comment

Uh oh!

marquiz May 27, 2025

Choose a reason for hiding this comment

Uh oh!

eero-t May 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

eero-t commented May 16, 2025 •

edited

Loading