Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(vllm metrics): error stack trace #3200

Merged

Conversation

gitdallas
Copy link
Contributor

@gitdallas gitdallas commented Sep 12, 2024

closes: https://issues.redhat.com/browse/RHOAIENG-11522

this is a situation that would result in the error stack and ui crash without this change:
image

Description

prevent ui from crashing. let the query be undefined if it doesn't exist, which will result in an empty data and no errors. vince said he did not want an error message at all as it might convey to the user that it might resolve with a refresh or something.

How Has This Been Tested?

tested code on a previous deploy that would crash the ui on metrics page, it no longer crashes. used MR cluster to test. existing tests still pass.

Test Impact

added a new test using mock data that only contains 1 query and made sure that the 4 charts show up (instead of an error stack page). i also updated the test mock for prometheus/serving to return empty results if the request body includes query=undefined\b as it would in the real endpoint. Here's a screenshot from a test with a missing query resulting in no data for one of the serving endpoints (it still shows the data):
image

Request review criteria:

test a vllm deploy, view the metrics. also view metrics of other types.

Self checklist (all need to be checked):

  • The developer has manually tested the changes and verified that the changes work
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has added tests or explained why testing cannot be added (unit or cypress tests for related changes)

If you have UI changes:

  • Included any necessary screenshots or gifs if it was a UI change.
  • Included tags to the UX team if it was a UI/UX change.

After the PR is posted & before it merges:

  • The developer has tested their solution on a cluster by using the image produced by the PR to main

@gitdallas gitdallas force-pushed the bug/11522-vllm-metrics branch 2 times, most recently from 2e62654 to 9f81631 Compare September 12, 2024 15:16
@vconzola
Copy link

LGTM.

Copy link

codecov bot commented Sep 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.08%. Comparing base (854cb3c) to head (64f493b).
Report is 33 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3200      +/-   ##
==========================================
- Coverage   85.11%   85.08%   -0.03%     
==========================================
  Files        1291     1291              
  Lines       28782    28788       +6     
  Branches     7744     7752       +8     
==========================================
- Hits        24497    24495       -2     
- Misses       4285     4293       +8     
Files with missing lines Coverage Δ
...end/src/api/prometheus/kservePerformanceMetrics.ts 98.03% <100.00%> (+0.16%) ⬆️
.../metrics/kserve/content/KserveMeanLatencyGraph.tsx 100.00% <100.00%> (ø)

... and 5 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 854cb3c...64f493b. Read the comment docs.

@gitdallas gitdallas force-pushed the bug/11522-vllm-metrics branch 4 times, most recently from faab5b4 to 8df986a Compare September 17, 2024 20:14
@ppadti
Copy link
Contributor

ppadti commented Sep 27, 2024

tested locally, works fine.
/lgtm

Copy link
Contributor

openshift-ci bot commented Sep 27, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mturley

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 6565ca9 into opendatahub-io:main Sep 27, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants