GGUF format add support for MoE models with non-linear expert layers. #1244

n1ck-guo · 2026-01-08T06:54:31Z

This pull request refactors the _quant_data method in auto_round/export/export_to_gguf/convert.py to improve support for MOE models, streamline attribute handling, and clean up the quantization logic. The changes mainly focus on making the code more robust for different model architectures and removing legacy or redundant quantization branches.

Support for MOE models and quantization logic cleanup:

Improved handling for MOE models by updating the attribute check to support modules with "exps" in their name and 3D tensor shapes, making the code more flexible for non-linear exporters.
Refactored the quantization logic to remove legacy branches and commented code, simplifying the decision flow for quantization type selection and ensuring FP16 issues are documented but not used.

General code cleanup:

Removed an unnecessary suffix check from the beginning of the function, streamlining the code for extracting layer names.

Signed-off-by: n1ck-guo <[email protected]>

auto_round/export/export_to_gguf/convert.py

Signed-off-by: n1ck-guo <[email protected]>

Copilot

Pull request overview

This PR enhances GGUF export functionality to better support Mixture-of-Experts (MoE) models with non-linear expert layers, while also refactoring and cleaning up quantization logic in the codebase.

Changes:

Extended MOE model support by updating attribute checks to handle modules with "exps" in their name and 3D tensor shapes
Refactored quantization logic by removing legacy code branches and improving the handling of tensor dimensions
Updated tests to use tiny models instead of full-sized models for more efficient testing

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
auto_round/export/export_to_gguf/convert.py	Refactored `_quant_data` method to improve MOE support and removed legacy quantization branches
auto_round/export/export_to_gguf/packing.py	Removed duplicate reshape operation and added blank line after decorator
test/test_cuda/export/test_gguf.py	Replaced full model tests with tiny model tests and adjusted expected file sizes
test/test_cpu/export/test_gguf_format.py	Migrated test to use tiny model and updated file size assertions
test/helpers.py	Enhanced `save_tiny_model` to support multimodal models and strip trailing slashes from model names
auto_round/main.py	Added commented-out `MixedHelpFormatter` class for potential future use

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

auto_round/export/export_to_gguf/convert.py

auto_round/export/export_to_gguf/packing.py

auto_round/__main__.py

Signed-off-by: n1ck-guo <[email protected]>

…uto-round into hengguo/update_gguf_0108

test/helpers.py

test/test_cuda/export/test_gguf.py

Signed-off-by: n1ck-guo <[email protected]>

n1ck-guo requested review from wenhuach21 and xin3he January 8, 2026 06:54

add support for moe model with non-linear exports layer for gguf

813459c

Signed-off-by: n1ck-guo <[email protected]>

n1ck-guo changed the title ~~add support for moe model with non-linear exports layer for gguf~~ GGUF format add support for MoE models with non-linear expert layers. Jan 8, 2026

wenhuach21 reviewed Jan 8, 2026

View reviewed changes

auto_round/export/export_to_gguf/convert.py Outdated Show resolved Hide resolved

n1ck-guo added 4 commits January 8, 2026 23:14

update

23a9847

Signed-off-by: n1ck-guo <[email protected]>

merge

a0bc64c

Signed-off-by: n1ck-guo <[email protected]>

codescan

81f45d6

Signed-off-by: n1ck-guo <[email protected]>

fix merge

1a5693c

Signed-off-by: n1ck-guo <[email protected]>

Copilot AI review requested due to automatic review settings January 12, 2026 00:16

fix import error

0b846c6

Signed-off-by: n1ck-guo <[email protected]>

Copilot AI reviewed Jan 12, 2026

View reviewed changes

n1ck-guo added 3 commits January 12, 2026 08:18

Merge branch 'main' into hengguo/update_gguf_0108

94044cb

modify by comment

8e6745a

Signed-off-by: n1ck-guo <[email protected]>

Merge branch 'hengguo/update_gguf_0108' of https://github.com/intel/a…

0963042

…uto-round into hengguo/update_gguf_0108

xin3he approved these changes Jan 12, 2026

View reviewed changes

test/helpers.py Outdated Show resolved Hide resolved

test/test_cuda/export/test_gguf.py Show resolved Hide resolved

n1ck-guo added 3 commits January 11, 2026 22:43

fix gguf mixed

2661032

Signed-off-by: n1ck-guo <[email protected]>

fix and update docs

ae08e67

Signed-off-by: n1ck-guo <[email protected]>

update doc

2e0ecdb

Signed-off-by: n1ck-guo <[email protected]>

n1ck-guo merged commit c7d6aee into main Jan 12, 2026
28 checks passed

n1ck-guo deleted the hengguo/update_gguf_0108 branch January 12, 2026 06:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GGUF format add support for MoE models with non-linear expert layers. #1244

GGUF format add support for MoE models with non-linear expert layers. #1244

Uh oh!

n1ck-guo commented Jan 8, 2026 •

edited by xin3he

Loading

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

GGUF format add support for MoE models with non-linear expert layers. #1244

GGUF format add support for MoE models with non-linear expert layers. #1244

Uh oh!

Conversation

n1ck-guo commented Jan 8, 2026 • edited by xin3he Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

n1ck-guo commented Jan 8, 2026 •

edited by xin3he

Loading