Model phi performance fix #1985

kkoryun · 2025-05-19T12:48:08Z

What does this PR do?

This PR fixes the performance drop for the Phi model

The dtype for the matmul_qk args is temporarily in FP32 due to performance issues caused by kernels fuser when using BF16.
Added graph break with mark_step for lazy mode and attn_softmax_bf16 flag, which also improves performance in some cases.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

12010486 · 2025-05-20T10:02:12Z

@kkoryun could you add here also the command lines you tested and the performances you had before and after this commit?

kkoryun · 2025-05-21T09:34:37Z

@kkoryun could you add here also the command lines you tested and the performances you had before and after this commit?

The performance drop was seen in the test test_text_generation_bf16_1x[microsoft/phi-2-1-False-False]
cmd to run:
python -m pytest tests/test_text_generation_example.py --device <DEVICE_TYPE> -v -s --token=<HF_TOKEN> --junitxml=/tmp/test_run_microsoft_phi.xml --log-cli-level 20 -k test_text_generation_bf16_1x[microsoft/phi-2-1-False-False]

…into model_phi_performance

12010486 · 2025-06-17T14:28:57Z

@kkoryun, internal checks are passing and the code changes are clear, but while testing I collected mixed results.

I'm testing on 1.21.0-555 and adding your file on top of v1.18.0 release.

Here is what I get:

PT_HPU_LAZY_MODE=1 python -m pytest tests/test_text_generation_example.py --device <device> -v -s --token=<token> --junitxml=/tmp/test_run_microsoft_phi.xml --log-cli-level 20 -k test_text_generation_bf16_1x[microsoft/phi-2-1-False-False]

Device	Synapse v	OH v	Target	Actual
Gaudi2	1.21.0-555	v 1.18.0	224.72	232.02
Gaudi2	1.21.0-555	v 1.18.0 + phi change	224.72	244.37
Gaudi3	1.21.0-555	v 1.18.0	236.54	228.24
Gaudi3	1.21.0-555	v 1.18.0 + phi change	236.54	228.15

I tested also the fp8 one:

PT_HPU_LAZY_MODE=1 python -m pytest tests/test_text_generation_example.py --device <device> -v -s --token=<token> --junitxml=/tmp/test_run_microsoft_phi.xml --log-cli-level 20 -k test_text_generation_fp8[microsoft/phi-2-1-1-True-128-128]

Device	Synapse v	OH v	Target	Actual
Gaudi2	1.21.0-555	v 1.18.0	254.09	328.10
Gaudi2	1.21.0-555	v 1.18.0 + phi change	254.09	256.93
Gaudi3	1.21.0-555	v 1.18.0	298.62	298.53
Gaudi3	1.21.0-555	v 1.18.0 + phi change	298.62	298.72

Also, this a5ef754 commit didn't show any difference in my tests.

…into model_phi_performance

regisss · 2025-07-08T08:41:50Z

@kkoryun @12010486 Any news regarding this PR? Maybe you can test it with 1.22?

12010486 · 2025-07-23T08:42:59Z

Closing it. Will be covered by #2165

kkoryun added 2 commits May 5, 2025 21:07

fix phi model performance

f46ab54

changed matmul_qk args dtype

040c8e2

kkoryun requested a review from regisss as a code owner May 19, 2025 12:48

kkoryun added 2 commits May 22, 2025 13:10

removed the graph break for gaudi3

a5ef754

Merge branch 'main' of https://github.com/huggingface/optimum-habana …

cc7a28d

…into model_phi_performance

kkoryun added 2 commits July 2, 2025 09:17

to float32 removed

8f228d0

Merge branch 'main' of https://github.com/huggingface/optimum-habana …

0ea94b8

…into model_phi_performance

astachowiczhabana added the synapse1.22 label Jul 9, 2025

astachowiczhabana self-assigned this Jul 9, 2025

12010486 closed this Jul 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Model phi performance fix #1985

Model phi performance fix #1985

Uh oh!

kkoryun commented May 19, 2025

Uh oh!

12010486 commented May 20, 2025

Uh oh!

kkoryun commented May 21, 2025 •

edited

Loading

Uh oh!

12010486 commented Jun 17, 2025 •

edited

Loading

Uh oh!

regisss commented Jul 8, 2025

Uh oh!

12010486 commented Jul 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Model phi performance fix #1985

Model phi performance fix #1985

Uh oh!

Conversation

kkoryun commented May 19, 2025

What does this PR do?

Before submitting

Uh oh!

12010486 commented May 20, 2025

Uh oh!

kkoryun commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

12010486 commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

regisss commented Jul 8, 2025

Uh oh!

12010486 commented Jul 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kkoryun commented May 21, 2025 •

edited

Loading

12010486 commented Jun 17, 2025 •

edited

Loading