Skip to content

Conversation

@kkoryun
Copy link
Contributor

@kkoryun kkoryun commented May 19, 2025

What does this PR do?

This PR fixes the performance drop for the Phi model

  1. The dtype for the matmul_qk args is temporarily in FP32 due to performance issues caused by kernels fuser when using BF16.
  2. Added graph break with mark_step for lazy mode and attn_softmax_bf16 flag, which also improves performance in some cases.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@kkoryun kkoryun requested a review from regisss as a code owner May 19, 2025 12:48
@12010486
Copy link
Collaborator

@kkoryun could you add here also the command lines you tested and the performances you had before and after this commit?

@kkoryun
Copy link
Contributor Author

kkoryun commented May 21, 2025

@kkoryun could you add here also the command lines you tested and the performances you had before and after this commit?

The performance drop was seen in the test test_text_generation_bf16_1x[microsoft/phi-2-1-False-False]
cmd to run:
python -m pytest tests/test_text_generation_example.py --device <DEVICE_TYPE> -v -s --token=<HF_TOKEN> --junitxml=/tmp/test_run_microsoft_phi.xml --log-cli-level 20 -k test_text_generation_bf16_1x[microsoft/phi-2-1-False-False]

@12010486
Copy link
Collaborator

12010486 commented Jun 17, 2025

@kkoryun, internal checks are passing and the code changes are clear, but while testing I collected mixed results.

I'm testing on 1.21.0-555 and adding your file on top of v1.18.0 release.

Here is what I get:

PT_HPU_LAZY_MODE=1 python -m pytest tests/test_text_generation_example.py --device <device> -v -s --token=<token> --junitxml=/tmp/test_run_microsoft_phi.xml --log-cli-level 20 -k test_text_generation_bf16_1x[microsoft/phi-2-1-False-False]
Device Synapse v OH v Target Actual
Gaudi2 1.21.0-555 v 1.18.0 224.72 232.02
Gaudi2 1.21.0-555 v 1.18.0 + phi change 224.72 244.37
Gaudi3 1.21.0-555 v 1.18.0 236.54 228.24
Gaudi3 1.21.0-555 v 1.18.0 + phi change 236.54 228.15

I tested also the fp8 one:

PT_HPU_LAZY_MODE=1 python -m pytest tests/test_text_generation_example.py --device <device> -v -s --token=<token> --junitxml=/tmp/test_run_microsoft_phi.xml --log-cli-level 20 -k test_text_generation_fp8[microsoft/phi-2-1-1-True-128-128] 
Device Synapse v OH v Target Actual
Gaudi2 1.21.0-555 v 1.18.0 254.09 328.10
Gaudi2 1.21.0-555 v 1.18.0 + phi change 254.09 256.93
Gaudi3 1.21.0-555 v 1.18.0 298.62 298.53
Gaudi3 1.21.0-555 v 1.18.0 + phi change 298.62 298.72

Also, this a5ef754 commit didn't show any difference in my tests.

@regisss
Copy link
Collaborator

regisss commented Jul 8, 2025

@kkoryun @12010486 Any news regarding this PR? Maybe you can test it with 1.22?

@12010486
Copy link
Collaborator

Closing it. Will be covered by #2165

@12010486 12010486 closed this Jul 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants