-
Notifications
You must be signed in to change notification settings - Fork 69
WNA16 does not apply optimized RTN for moe layers by default #1245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
for more information, see https://pre-commit.ci
Signed-off-by: Zhang, Weiwei1 <[email protected]>
for more information, see https://pre-commit.ci
…ix_0108 # Conflicts: # auto_round/compressors/base.py
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR addresses an issue where optimized RTN (Round-to-Nearest) quantization was not properly disabled by default for MoE (Mixture of Experts) layers in WNA16 configurations. The change introduces MoE model detection and automatically disables optimized RTN for expert layers to improve efficiency, while allowing users to override this behavior with --enable_opt_rtn.
Key Changes:
- Added
is_moe_model()utility function to detect MoE models by examining config keys and module names - Changed default value of
disable_opt_rtnfromTruetoNoneto enable automatic optimization detection - Implemented logic to automatically disable optimized RTN for MoE expert layers unless explicitly enabled by the user
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| auto_round/utils/model.py | Adds is_moe_model() function to detect MoE models through config inspection and module name checking |
| auto_round/compressors/config.py | Changes disable_opt_rtn default from True to None to allow automatic optimization |
| auto_round/compressors/base.py | Implements MoE detection and automatic optimized RTN disabling for expert layers, with improved logging and user override support |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
yiliu30
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Others LGTM.
No description provided.