align input scale for moe experts for fp8 #1216

SwekeR-463 · 2026-01-04T04:04:22Z

Fixes #1194

Aligned the act_max for the first linear projections (gate and up) across all experts in MoE blocks to ensure a single input_scale for FP8 dispatch.
Added unit test with DeepSeek-V2-Lite-Chat.

…new utility and tests.

for more information, see https://pre-commit.ci

yiliu30

Thanks for the PR!
Overall LGTM. Left a few comments.

auto_round/utils/model.py

test/test_cpu/test_moe_alignment.py

auto_round/utils/model.py

for more information, see https://pre-commit.ci

yiliu30 · 2026-01-05T01:52:08Z

@chensuyue @XuehaoSun Looks like the XPU test failed, and it doesn’t seem related to this PR. Could you take a look?

WeiweiZhang1

LGTM

yiliu30

LGTM, thanks!

xin3he · 2026-01-05T06:02:26Z

@SwekeR-463 @yiliu30 I think here is a missing change, set_amax_for_all_moe_layers is not applied for fp8 during tuning.

auto-round/auto_round/compressors/base.py

Line 1326 in 46aec78

if is_nv_fp(self.act_data_type) or is_static_wfp8afp8(self):

auto-round/auto_round/compressors/base.py

Line 2763 in 46aec78

if is_nv_fp(self.act_data_type):

xin3he · 2026-01-05T06:07:18Z

BTW, if my understanding is correct that vLLM requires AR_ENABLE_UNIFY_MOE_INPUT_SCALE, I think the default value should be True since vLLM is our main target.

yiliu30 · 2026-01-05T07:36:04Z

@SwekeR-463 @yiliu30 I think here is a missing change, set_amax_for_all_moe_layers is not applied for fp8 during tuning.

auto-round/auto_round/compressors/base.py

Line 1326 in 46aec78

if is_nv_fp(self.act_data_type) or is_static_wfp8afp8(self):

auto-round/auto_round/compressors/base.py

Line 2763 in 46aec78

if is_nv_fp(self.act_data_type):

Hi @xin3he ,this seems to be a general gap in FP8_STATIC rather than something related to this enhancement. Since this PR focus is the RTN case, it’s fine to ignore it here. Please feel free to create another PR to fix that part.

yiliu30 · 2026-01-05T07:39:56Z

BTW, if my understanding is correct that vLLM requires AR_ENABLE_UNIFY_MOE_INPUT_SCALE, I think the default value should be True since vLLM is our main target.

Since FP8 dispatch is primarily for extreme inference speed, and it’s disabled by default in vllm‑gaudi.
And sharing input scales across all experts may degrade accuracy; I prefer to keep it disabled by default.

xin3he

Thanks for your reply, LGTM since it's for RTN only.

yiliu30 · 2026-01-05T13:35:19Z

Hi @XuehaoSun looks like the CI was blocked by CodeQLExpected test? Could you please take a look, thx!

SwekeR-463 and others added 3 commits January 4, 2026 09:24

feat: Add MoE expert input scale alignment for FP8 quantization with …

4112e75

…new utility and tests.

[pre-commit.ci] auto fixes from pre-commit.com hooks

7661a66

for more information, see https://pre-commit.ci

Merge branch 'main' into feat/align-input-scale

fa9f3cf

yiliu30 reviewed Jan 4, 2026

View reviewed changes

yiliu30 requested review from WeiweiZhang1 and n1ck-guo January 4, 2026 06:48

SwekeR-463 and others added 2 commits January 4, 2026 17:07

add unify_input_moe in env var and fixes to test

0d33d71

[pre-commit.ci] auto fixes from pre-commit.com hooks

18def03

for more information, see https://pre-commit.ci

WeiweiZhang1 approved these changes Jan 5, 2026

View reviewed changes

Merge branch 'main' into feat/align-input-scale

833fa0c

yiliu30 approved these changes Jan 5, 2026

View reviewed changes

xin3he approved these changes Jan 5, 2026

View reviewed changes

yiliu30 added the ready only add when the PR is ready to merge label Jan 5, 2026

Merge branch 'main' into feat/align-input-scale

e65eca5

wenhuach21 merged commit 41a8377 into intel:main Jan 6, 2026
25 checks passed

align input scale for moe experts for fp8 #1216

align input scale for moe experts for fp8 #1216

Uh oh!

Conversation

SwekeR-463 commented Jan 4, 2026

Uh oh!

yiliu30 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yiliu30 commented Jan 5, 2026

Uh oh!

WeiweiZhang1 left a comment

Choose a reason for hiding this comment

Uh oh!

yiliu30 left a comment

Choose a reason for hiding this comment

Uh oh!

xin3he commented Jan 5, 2026

Uh oh!

xin3he commented Jan 5, 2026

Uh oh!

yiliu30 commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yiliu30 commented Jan 5, 2026

Uh oh!

xin3he left a comment

Choose a reason for hiding this comment

Uh oh!

yiliu30 commented Jan 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

yiliu30 left a comment •

edited

Loading

yiliu30 commented Jan 5, 2026 •

edited

Loading