Autoscaling for ChatQnA megaservice #1098

eero-t · 2025-06-05T20:06:55Z

Description

When rate of requests increases and backend inference engines get scaled, megaservice becomes performance bottleneck (as its query processing is single threaded), so that needs to be scaled too. This is the case already after scaling to few Gaudi vLLM instances with the default 8b Llama model.

Other changes:

Add Megaservice scaling info to OPEA application dashboard
Remove unused HPA setting from AgentQnA

Issues

n/a.

Type of change

New feature (non-breaking change which adds new functionality)

(Fixes scaling performance bottleneck.)

Dependencies

n/a.

Tests

Tested manually.

eero-t · 2025-06-05T20:08:30Z

Marked as draft because while I've tested the dashboard changes, I've not tested them with OPEA configMap. I'll do that tomorrow.

Copilot

Pull Request Overview

This PR addresses performance bottlenecks in ChatQnA by introducing autoscaling support for the megaservice, updating dashboards with corresponding metrics, and removing the unused HPA settings from the AgentQnA chart.

Added new Prometheus metric panels for tracking megaservice instance counts and latency in the dashboard configmap.
Introduced an HPA manifest and updated autoscaling values for ChatQnA.
Removed legacy HPA configuration from AgentQnA.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
helm-charts/common/dashboard/templates/configmap-metrics.yaml	Added new metric panels for megaservice instances and token latency.
helm-charts/chatqna/templates/horizontal-pod-autoscaler.yaml	Defined a new HPA for ChatQnA using updated autoscaling values.
helm-charts/chatqna/hpa-values.yaml	Configured autoscaling parameters for ChatQnA including resource requests.
helm-charts/agentqna/values.yaml	Removed unused HPA configuration that is no longer required.

Comments suppressed due to low confidence (2)

helm-charts/common/dashboard/templates/configmap-metrics.yaml:1714

[nitpick] Consider using a more descriptive label for this metric (e.g. 'MegaService: token latency count') to reduce potential confusion with any other 'used' metrics in the dashboard.

"legendFormat": "MegaService: used",

helm-charts/chatqna/hpa-values.yaml:25

[nitpick] Consider explicitly specifying CPU units (for example, '1000m') to ensure clarity and consistency with Kubernetes resource requests standards.

cpu: 1

eero-t · 2025-06-10T14:58:58Z

This is related to the HPA rework PR #1090.

eero-t · 2025-06-10T18:02:00Z

@lianhao, @yongfengdu Gaudi CI tests fail to timeout during vLLM warmup, and ROCM tests are just in pending state. Is there some fix for these?

yongfengdu · 2025-06-11T01:17:00Z

The ROCM test pending should be something wrong with their runners, @chensuyue should be able to contact with them.

For the vLLM warmup issue, the warmup time for different model is different, sometimes just retrigger the single test will pass CI check.
I've discussed with @lianhao for disabling the warmup with VLLM_SKIP_WARMUP: "true" for all workloads CI tests, the concern is inconsistency with compose deployment and user's environment.
If not doing that, the only option is to extend the timeout with larger numbers https://github.com/opea-project/GenAIInfra/blob/main/.github/workflows/_helm-e2e.yaml#L126 (It's already tuned from 600 seconds to 900 seconds for vllm warmup.

@lianhao, @yongfengdu Gaudi CI tests fail to timeout during vLLM warmup, and ROCM tests are just in pending state. Is there some fix for these?

poussa · 2025-06-13T08:22:20Z

For the vLLM warmup issue, the warmup time for different model is different, sometimes just retrigger the single test will pass CI check. I've discussed with @lianhao for disabling the warmup with VLLM_SKIP_WARMUP: "true" for all workloads CI tests, the concern is inconsistency with compose deployment and user's environment.

We should disable the warmup for the CI/CD. The warmup affects performance which is not the main focus on functional tests.

yongfengdu · 2025-06-16T03:04:01Z

#1126

We should disable the warmup for the CI/CD. The warmup affects performance which is not the main focus on functional tests.

Signed-off-by: Eero Tamminen <[email protected]>

eero-t · 2025-06-23T13:33:45Z

@yongfengdu, @lianhao, @chensuyue There's still something wrong with CI side.

Gaudi vLLM ChatQnA test fails to:
[pod/chatqna23100548-vllm-75b86d4d99-7l8vx/vllm] RuntimeError: synStatus=8 [Device not found] Device acquire failed.

Gaudi TGI AgentQnA fails to:

[pod/agentqna23095455-tgi-579879b8d4-rbgxb/tgi] 2025-06-23T10:05:05.048254Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
+ exit 1

And ROCM tests are all still in pending state?

lianhao · 2025-06-24T00:46:02Z

@yongfengdu, @lianhao, @chensuyue There's still something wrong with CI side.

Gaudi vLLM ChatQnA test fails to: [pod/chatqna23100548-vllm-75b86d4d99-7l8vx/vllm] RuntimeError: synStatus=8 [Device not found] Device acquire failed.

This indicates some docker container is consuming the Gaudi device which the k8s gaudi device plugin has no knowledge of.

Gaudi TGI AgentQnA fails to:

[pod/agentqna23095455-tgi-579879b8d4-rbgxb/tgi] 2025-06-23T10:05:05.048254Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
+ exit 1

And ROCM tests are all still in pending state?

I tried rerun the failed test but it turns out that the gaudi CI node is not available now. Will ping Suyue to figure out why

lianhao

@eero-t The gaudi CI is resumed.

eero-t requested review from lianhao and yongfengdu as code owners June 5, 2025 20:06

eero-t marked this pull request as draft June 5, 2025 20:07

eero-t force-pushed the cpu-scale branch from 2198069 to 1632859 Compare June 10, 2025 14:20

eero-t marked this pull request as ready for review June 10, 2025 14:20

eero-t requested a review from Copilot June 10, 2025 14:22

Copilot AI reviewed Jun 10, 2025

View reviewed changes

eero-t requested a review from poussa June 10, 2025 14:24

eero-t requested a review from marquiz June 11, 2025 12:21

poussa approved these changes Jun 13, 2025

View reviewed changes

eero-t added 3 commits June 23, 2025 12:36

Remove unused AgentQnA scaling Helm variable(s)

3a6f116

Signed-off-by: Eero Tamminen <[email protected]>

Add HPA (CPU scaling) for ChatQnA itself

996b5ea

Signed-off-by: Eero Tamminen <[email protected]>

Add MegaService (e.g. ChatQnA) scaling to OPEA dashboard

d91ce32

Signed-off-by: Eero Tamminen <[email protected]>

eero-t force-pushed the cpu-scale branch from 1632859 to d91ce32 Compare June 23, 2025 09:37

lianhao approved these changes Jun 25, 2025

View reviewed changes

poussa merged commit 4a0f386 into opea-project:main Jun 26, 2025
70 of 99 checks passed

eero-t deleted the cpu-scale branch July 2, 2025 17:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Autoscaling for ChatQnA megaservice #1098

Autoscaling for ChatQnA megaservice #1098

Uh oh!

eero-t commented Jun 5, 2025 •

edited

Loading

Uh oh!

eero-t commented Jun 5, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

eero-t commented Jun 10, 2025

Uh oh!

eero-t commented Jun 10, 2025

Uh oh!

yongfengdu commented Jun 11, 2025

Uh oh!

poussa commented Jun 13, 2025

Uh oh!

yongfengdu commented Jun 16, 2025

Uh oh!

eero-t commented Jun 23, 2025

Uh oh!

lianhao commented Jun 24, 2025

Uh oh!

lianhao left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Autoscaling for ChatQnA megaservice #1098

Autoscaling for ChatQnA megaservice #1098

Uh oh!

Conversation

eero-t commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issues

Type of change

Dependencies

Tests

Uh oh!

eero-t commented Jun 5, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

eero-t commented Jun 10, 2025

Uh oh!

eero-t commented Jun 10, 2025

Uh oh!

yongfengdu commented Jun 11, 2025

Uh oh!

poussa commented Jun 13, 2025

Uh oh!

yongfengdu commented Jun 16, 2025

Uh oh!

eero-t commented Jun 23, 2025

Uh oh!

lianhao commented Jun 24, 2025

Uh oh!

lianhao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

eero-t commented Jun 5, 2025 •

edited

Loading