-
Notifications
You must be signed in to change notification settings - Fork 69
GGUF format add support for MoE models with non-linear expert layers. #1244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: n1ck-guo <[email protected]>
Signed-off-by: n1ck-guo <[email protected]>
Signed-off-by: n1ck-guo <[email protected]>
Signed-off-by: n1ck-guo <[email protected]>
Signed-off-by: n1ck-guo <[email protected]>
Signed-off-by: n1ck-guo <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR enhances GGUF export functionality to better support Mixture-of-Experts (MoE) models with non-linear expert layers, while also refactoring and cleaning up quantization logic in the codebase.
Changes:
- Extended MOE model support by updating attribute checks to handle modules with "exps" in their name and 3D tensor shapes
- Refactored quantization logic by removing legacy code branches and improving the handling of tensor dimensions
- Updated tests to use tiny models instead of full-sized models for more efficient testing
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| auto_round/export/export_to_gguf/convert.py | Refactored _quant_data method to improve MOE support and removed legacy quantization branches |
| auto_round/export/export_to_gguf/packing.py | Removed duplicate reshape operation and added blank line after decorator |
| test/test_cuda/export/test_gguf.py | Replaced full model tests with tiny model tests and adjusted expected file sizes |
| test/test_cpu/export/test_gguf_format.py | Migrated test to use tiny model and updated file size assertions |
| test/helpers.py | Enhanced save_tiny_model to support multimodal models and strip trailing slashes from model names |
| auto_round/main.py | Added commented-out MixedHelpFormatter class for potential future use |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: n1ck-guo <[email protected]>
…uto-round into hengguo/update_gguf_0108
Signed-off-by: n1ck-guo <[email protected]>
Signed-off-by: n1ck-guo <[email protected]>
Signed-off-by: n1ck-guo <[email protected]>
This pull request refactors the
_quant_datamethod inauto_round/export/export_to_gguf/convert.pyto improve support for MOE models, streamline attribute handling, and clean up the quantization logic. The changes mainly focus on making the code more robust for different model architectures and removing legacy or redundant quantization branches.Support for MOE models and quantization logic cleanup:
"exps"in their name and 3D tensor shapes, making the code more flexible for non-linear exporters.General code cleanup: