Skip to content

Conversation

@eero-t
Copy link
Collaborator

@eero-t eero-t commented May 16, 2025

Description

Add KubeAI monitoring support + vLLM dashboard.

Monitoring can be added either by using the helper script, or by calling Helm directly with the new metrics.

Issues

n/a.

Type of change

  • New feature (non-breaking change which adds new functionality)

Dependencies

n/a.

Tests

Manually tested.

@eero-t eero-t requested review from mkbhanda and poussa as code owners May 16, 2025 18:19
@eero-t eero-t requested review from Copilot and removed request for mkbhanda and poussa May 16, 2025 18:19
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request adds support for KubeAI monitoring and a vLLM dashboard for observability.

  • Introduces a new YAML configuration file for Prometheus-based monitoring of vLLM metrics.
  • Updates the README with instructions on enabling observability using the provided install script and Helm chart.

Reviewed Changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.

File Description
kubeai/metric-values.yaml New configuration values enabling vLLM PodMonitor in Prometheus.
kubeai/README.md Added Observability section with instructions for setting up monitoring and the vLLM dashboard.
Files not reviewed (1)
  • kubeai/install.sh: Language not supported

@eero-t eero-t marked this pull request as draft May 16, 2025 18:19
@eero-t eero-t force-pushed the kubeai-metrics branch 2 times, most recently from cc4f8b8 to 7023f9e Compare May 16, 2025 18:23
eero-t added 3 commits May 20, 2025 19:39
Signed-off-by: Eero Tamminen <[email protected]>
In case somebody wants to run Helm directly instead of using install.sh.

Signed-off-by: Eero Tamminen <[email protected]>
@eero-t eero-t marked this pull request as ready for review May 20, 2025 19:02
@eero-t eero-t requested a review from Copilot May 20, 2025 19:03
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds Prometheus-based monitoring support for KubeAI’s vLLM engine and provides instructions for deploying a Grafana dashboard.

  • Introduces vLLMPodMonitor in Helm values for scraping vLLM metrics
  • Extends README with observability setup and dashboard installation steps

Reviewed Changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.

File Description
kubeai/metric-values.yaml Enable Prometheus PodMonitor resource (vLLMPodMonitor.enabled)
kubeai/README.md Add “Observability” section with script usage and dashboard setup
Files not reviewed (1)
  • kubeai/install.sh: Language not supported
Comments suppressed due to low confidence (1)

kubeai/README.md:170

  • The new observability feature (Prometheus monitoring and vLLM dashboard) lacks automated tests. Consider adding unit or integration tests to validate the installation script and dashboard deployment.
# Observability

@eero-t eero-t requested a review from poussa May 20, 2025 19:10
@poussa poussa requested review from marquiz and mkbhanda May 22, 2025 07:56
@eero-t
Copy link
Collaborator Author

eero-t commented May 26, 2025

@marquiz, @mkbhanda OK to merge?

Copy link
Collaborator

@marquiz marquiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @eero-t. One nit but I can live with that 😄 I think we can merge this


metrics=""
for arg in "$@"; do
if [ -f "$arg" ]; then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the arg parsing looks somewhat shaky'n'shady but I guess that's ok for this kind of hack/helper script

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think it very unlikely that user's Prometheus release is named exactly the same as some file in the kubeai/ directory...

PS. I'm wondering about the benefit of the script, when more things are needed to configure. I think it would be clearer if user would just invoke Helm directly (with command copy-pasted from README), in this case with additional -f monitoring.yaml argument.

@eero-t eero-t merged commit 0efc35b into opea-project:main May 27, 2025
11 checks passed
@eero-t eero-t deleted the kubeai-metrics branch May 27, 2025 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature][KubeAI] Enabling Observability of Gaudi

3 participants