Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: FP8 checkpoints with fused linear modules fail to load scales correctly #5915

Closed
mgoin opened this issue Jun 27, 2024 · 2 comments 路 Fixed by #5921
Closed

[Bug]: FP8 checkpoints with fused linear modules fail to load scales correctly #5915

mgoin opened this issue Jun 27, 2024 · 2 comments 路 Fixed by #5921
Labels
bug Something isn't working

Comments

@mgoin
Copy link
Collaborator

mgoin commented Jun 27, 2024

Your current environment

The output of `python collect_env.py`

馃悰 Describe the bug

Description:
When loading FP8 quantized models with merged linear modules (e.g., Phi-3 with merged qkv_proj and up_gate_proj), the scales for each shard are not handled correctly. This occurs because the vLLM FP8 config assumes separate scales for each shard, but merged layers have a single scale.

Steps to Reproduce:

  1. Attempt to load an FP8 quantized Phi-3 model (e.g., https://huggingface.co/nm-testing/Phi-3-mini-128k-instruct-FP8)
  2. Observe error due to shape mismatch:
    param_data.shape=torch.Size([2]) loaded_weight.shape=torch.Size([])
    param_data.shape=torch.Size([3]) loaded_weight.shape=torch.Size([])
    

Expected Behavior:
Scales should be correctly loaded for merged linear modules in FP8 checkpoints.

Proposed Fix:
Modify process_weights_after_loading in MergedColumnParallelLinear and QKVParallelLinear to repeat the merged scale during weight loading.

Temporary Workaround:
Apply the following patch in vllm/model_executor/layers/linear.py:

- assert param_data.shape == loaded_weight.shape
- param_data.copy_(loaded_weight)
+ temp = loaded_weight.repeat(param_data.shape)
+ assert param_data.shape == temp.shape
+ param_data.copy_(temp)

cc @robertgshaw2-neuralmagic @comaniac

@mgoin mgoin added the bug Something isn't working label Jun 27, 2024
@comaniac
Copy link
Collaborator

I thought we handled this already? All the FP8 checkpoints have separated QKV scales and we merged them after weight loading. Is there anything special in Phi-3?

@robertgshaw2-neuralmagic
Copy link
Collaborator

Im working on a fix for this right now.

Issue is phi3 has fused qkv on disk so theres already only one scale!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants