Skip to content

Conversation

@wangkl2
Copy link
Collaborator

@wangkl2 wangkl2 commented Nov 28, 2024

Description

Following opea-project/GenAIExamples#1210, remove the --enforce-eager flag for vllm-gaudi service, to enable HPU graphs optimization as default. It will improve both OOB latency and OOB throughput on Gaudi SW 1.18.

Type of change

  • Others (enhancement, documentation, validation, etc.)

Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>
@wangkl2 wangkl2 requested a review from XinyaoWa November 28, 2024 08:46
@lvliang-intel lvliang-intel merged commit ddd372d into opea-project:main Dec 10, 2024
12 checks passed
madison-evans pushed a commit to SAPD-Intel/GenAIComps that referenced this pull request May 12, 2025
…project#954)

* remove enforce-eager to enable HPU graphs

Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>

* Increase the llm max timeout in ci for fully warmup

Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>

---------

Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants