Model phi performance fix #2165

slokesha · 2025-07-22T00:26:31Z

What does this PR do?

This PR fixes the performance drop for the Phi model
Added graph break with mark_step for lazy mode and attn_softmax_bf16 flag, which also improves performance in some cases.

HL-SMI Version: hl-1.22.0-rc-fw-
Driver Version: 1.22.0-48ef525
Nic Driver Version: 1.22.0-48ef525

image: artifactory-kfs.habana-labs.com/docker-local/1.22.0/ubuntu22.04/habanalabs/pytorch-installer-2.7.1:1.22.0-543

karol-brejna-i · 2025-08-01T09:16:24Z

@slokesha The PR looks good. Can you share some info on how to validate (like: which test to run).

12010486 · 2025-08-08T16:04:47Z

Chiming in.
Tests that were run, and I believe you can use to check, are:

cd optimum-habana/
pip install -e .
python -m pip install .[tests]
pip install pytest
PT_HPU_LAZY_MODE=1 python -m pytest tests/test_text_generation_example.py --device gaudi3 -v -s --token=xxx --junitxml=/tmp/test_run_microsoft_phi.xml --log-cli-level 20 -k test_text_generation_bf16_1x[microsoft/phi-2-1-False-False]

and

PT_HPU_LAZY_MODE=1 python -m pytest tests/test_text_generation_example.py --device gaudi2 -v -s --token=xxx --junitxml=/tmp/test_run_microsoft_phi.xml --log-cli-level 20 -k test_text_generation_fp8[microsoft/phi-2-1-1-True-128-128]

regisss · 2025-08-11T08:34:31Z

examples/text-generation/utils.py

    generation_config.trust_remote_code = args.trust_remote_code
    generation_config.valid_sequence_lengths = None
    generation_config.attn_batch_split = args.attn_batch_split
+    generation_config.fp8 = bool(args.quant_config)


Not sure we should add this to the generation config as there could be other ways of running the model in fp8. Maybe we could simply check the value of the env variable QUANT_CONFIG in the modeling file since this is how args.quant_config is set up:

optimum-habana/examples/text-generation/run_generation.py

Line 457 in 2ac859e

args.quant_config = os.getenv("QUANT_CONFIG", "")

WDYT?

karol-brejna-i · 2025-09-16T08:37:42Z

@slokesha Are you still working on this PR?

Update modeling_phi.py to add attn_softmax_bf16 arg

f07a367

12010486 mentioned this pull request Jul 23, 2025

Model phi performance fix #1985

Closed

3 tasks

updated graph break in phi model not to be executed by fp8 model

6b63903

slokesha marked this pull request as ready for review July 24, 2025 08:30

slokesha requested review from regisss and vivekgoe as code owners July 24, 2025 08:30

slokesha added 2 commits July 25, 2025 12:04

Merge branch 'huggingface:main' into optimize_phi_model

693e423

Merge branch 'huggingface:main' into optimize_phi_model

d9b1833

12010486 assigned karol-brejna-i and astachowiczhabana Jul 31, 2025

regisss reviewed Aug 11, 2025

View reviewed changes

astachowiczhabana removed their assignment Sep 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Model phi performance fix #2165

Model phi performance fix #2165

slokesha commented Jul 22, 2025 •

edited

Loading

Uh oh!

karol-brejna-i commented Aug 1, 2025

Uh oh!

12010486 commented Aug 8, 2025

Uh oh!

regisss Aug 11, 2025

Uh oh!

karol-brejna-i commented Sep 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Model phi performance fix #2165

Are you sure you want to change the base?

Model phi performance fix #2165

Conversation

slokesha commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

karol-brejna-i commented Aug 1, 2025

Uh oh!

12010486 commented Aug 8, 2025

Uh oh!

regisss Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

karol-brejna-i commented Sep 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

slokesha commented Jul 22, 2025 •

edited

Loading