-
Notifications
You must be signed in to change notification settings - Fork 283
Add DS nvfp4 #2356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add DS nvfp4 #2356
Conversation
Signed-off-by: yiliu30 <[email protected]>
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
Signed-off-by: yiliu30 <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
Signed-off-by: yiliu30 <[email protected]>
User description
Signed-off-by: yiliu30 [email protected]
PR Type
Enhancement
Description
Added support for NVFP4 quantization scheme
Updated usage instructions and validation checks
Modified environment variable settings for NVFP4
Diagram Walkthrough
File Walkthrough
quantize.py
Add NVFP4 configurationexamples/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/deepseek/quantize.py
config_dictenable_torch_compileto Truelow_gpu_mem_usageparameterrun_evaluation.sh
Update evaluation script for NVFP4examples/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/deepseek/run_evaluation.sh
run_generate.sh
Update generation script for NVFP4examples/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/deepseek/run_generate.sh