-
Notifications
You must be signed in to change notification settings - Fork 69
align input scale for moe experts for fp8 #1216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…new utility and tests.
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
Overall LGTM. Left a few comments.
|
@chensuyue @XuehaoSun Looks like the XPU test failed, and it doesn’t seem related to this PR. Could you take a look? |
WeiweiZhang1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
yiliu30
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
|
@SwekeR-463 @yiliu30 I think here is a missing change, auto-round/auto_round/compressors/base.py Line 1326 in 46aec78
auto-round/auto_round/compressors/base.py Line 2763 in 46aec78
|
|
BTW, if my understanding is correct that vLLM requires |
Hi @xin3he ,this seems to be a general gap in |
Since FP8 dispatch is primarily for extreme inference speed, and it’s disabled by default in vllm‑gaudi. |
xin3he
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your reply, LGTM since it's for RTN only.
|
Hi @XuehaoSun looks like the CI was blocked by CodeQLExpected test? Could you please take a look, thx! |
Fixes #1194
DeepSeek-V2-Lite-Chat.