-
Notifications
You must be signed in to change notification settings - Fork 269
Model phi performance fix #2165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@slokesha The PR looks good. Can you share some info on how to validate (like: which test to run). |
|
Chiming in. and |
| generation_config.trust_remote_code = args.trust_remote_code | ||
| generation_config.valid_sequence_lengths = None | ||
| generation_config.attn_batch_split = args.attn_batch_split | ||
| generation_config.fp8 = bool(args.quant_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure we should add this to the generation config as there could be other ways of running the model in fp8. Maybe we could simply check the value of the env variable QUANT_CONFIG in the modeling file since this is how args.quant_config is set up:
| args.quant_config = os.getenv("QUANT_CONFIG", "") |
WDYT?
|
@slokesha Are you still working on this PR? |
What does this PR do?
This PR fixes the performance drop for the Phi model
Added graph break with mark_step for lazy mode and attn_softmax_bf16 flag, which also improves performance in some cases.
HL-SMI Version: hl-1.22.0-rc-fw-
Driver Version: 1.22.0-48ef525
Nic Driver Version: 1.22.0-48ef525
image: artifactory-kfs.habana-labs.com/docker-local/1.22.0/ubuntu22.04/habanalabs/pytorch-installer-2.7.1:1.22.0-543