Skip to content

Conversation

@n1ck-guo
Copy link
Contributor

@n1ck-guo n1ck-guo commented Jan 8, 2026

This pull request refactors the _quant_data method in auto_round/export/export_to_gguf/convert.py to improve support for MOE models, streamline attribute handling, and clean up the quantization logic. The changes mainly focus on making the code more robust for different model architectures and removing legacy or redundant quantization branches.

Support for MOE models and quantization logic cleanup:

  • Improved handling for MOE models by updating the attribute check to support modules with "exps" in their name and 3D tensor shapes, making the code more flexible for non-linear exporters.
  • Refactored the quantization logic to remove legacy branches and commented code, simplifying the decision flow for quantization type selection and ensuring FP16 issues are documented but not used.

General code cleanup:

  • Removed an unnecessary suffix check from the beginning of the function, streamlining the code for extracting layer names.

@n1ck-guo n1ck-guo requested review from wenhuach21 and xin3he January 8, 2026 06:54
@n1ck-guo n1ck-guo changed the title add support for moe model with non-linear exports layer for gguf GGUF format add support for MoE models with non-linear expert layers. Jan 8, 2026
Signed-off-by: n1ck-guo <[email protected]>
Signed-off-by: n1ck-guo <[email protected]>
Signed-off-by: n1ck-guo <[email protected]>
Signed-off-by: n1ck-guo <[email protected]>
Copilot AI review requested due to automatic review settings January 12, 2026 00:16
Signed-off-by: n1ck-guo <[email protected]>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances GGUF export functionality to better support Mixture-of-Experts (MoE) models with non-linear expert layers, while also refactoring and cleaning up quantization logic in the codebase.

Changes:

  • Extended MOE model support by updating attribute checks to handle modules with "exps" in their name and 3D tensor shapes
  • Refactored quantization logic by removing legacy code branches and improving the handling of tensor dimensions
  • Updated tests to use tiny models instead of full-sized models for more efficient testing

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
auto_round/export/export_to_gguf/convert.py Refactored _quant_data method to improve MOE support and removed legacy quantization branches
auto_round/export/export_to_gguf/packing.py Removed duplicate reshape operation and added blank line after decorator
test/test_cuda/export/test_gguf.py Replaced full model tests with tiny model tests and adjusted expected file sizes
test/test_cpu/export/test_gguf_format.py Migrated test to use tiny model and updated file size assertions
test/helpers.py Enhanced save_tiny_model to support multimodal models and strip trailing slashes from model names
auto_round/main.py Added commented-out MixedHelpFormatter class for potential future use

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: n1ck-guo <[email protected]>
Signed-off-by: n1ck-guo <[email protected]>
Signed-off-by: n1ck-guo <[email protected]>
@n1ck-guo n1ck-guo merged commit c7d6aee into main Jan 12, 2026
28 checks passed
@n1ck-guo n1ck-guo deleted the hengguo/update_gguf_0108 branch January 12, 2026 06:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants