-
Notifications
You must be signed in to change notification settings - Fork 235
Svdquant huggingface checkpoint export support #754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Shiyang Chen <[email protected]>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #754 +/- ##
==========================================
- Coverage 74.68% 74.63% -0.06%
==========================================
Files 192 192
Lines 18950 18995 +45
==========================================
+ Hits 14153 14177 +24
- Misses 4797 4818 +21 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Shiyang Chen <[email protected]>
jingyu-ml
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, including the approach for fusing the QKV and FFN layers. The current resmooth + refusion process means the resulting model is not exactly identical to the original, but this appears to be the only viable option at the moment unless we can fuse these layers during calibration...
Thank you for your work!
| QUANTIZATION_NONE, | ||
| QUANTIZATION_NVFP4, | ||
| QUANTIZATION_NVFP4_AWQ, | ||
| QUANTIZATION_NVFP4_SVDQUANT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a check to ensure the model is running on a single GPU and not in a distributed setup? I’m not sure that our current SVDQ implementation works correctly with multiple GPUs. We can remove this check later once we verify that SVDQ calibration functions properly in a multi-GPU setting.
| def svd(weight, rank): | ||
| original_device = weight.device | ||
| original_dtype = weight.dtype | ||
| weight_f64 = weight.to(dtype=torch.float64, device=original_device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need f64?
What does this PR do?
Type of change: new feature
Overview:
Usage
cd ./examples/llm_ptq/ python hf_ptq.py \ --pyt_ckpt_path Qwen/Qwen3-4B \ --export_path /home/scratch.shiychen_coreai/quantized_models/Qwen3-4B-svdq \ --qformat nvfp4_awq_svdquant --kv_cache_qformat none --sparsity_fmt dense --calib_size 8Testing
exported checkpoint and loaded.
Before your PR is "Ready for review"
Additional Information