Skip to content

Conversation

@wangkl2
Copy link
Collaborator

@wangkl2 wangkl2 commented Nov 28, 2024

Description

Following opea-project/GenAIExamples#1210, remove the --enforce-eager flag for vllm-gaudi service, to enable HPU graphs optimization as default. It will improve both OOB latency and OOB throughput on Gaudi SW 1.18.

Type of change

  • Others (enhancement, documentation, validation, etc.)

@wangkl2 wangkl2 requested a review from XinyaoWa November 28, 2024 08:46
@lvliang-intel lvliang-intel merged commit ddd372d into opea-project:main Dec 10, 2024
13 checks passed
madison-evans pushed a commit to SAPD-Intel/GenAIComps that referenced this pull request May 12, 2025
…project#954)

* remove enforce-eager to enable HPU graphs

Signed-off-by: Wang, Kai Lawrence <[email protected]>

* Increase the llm max timeout in ci for fully warmup

Signed-off-by: Wang, Kai Lawrence <[email protected]>

---------

Signed-off-by: Wang, Kai Lawrence <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants