You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
AWQ 代码深入剖析 - Zhang
从事 LLM 推理部署、视觉算法开发、模型压缩部署以及算法SDK开发工作,终身学习践行者。LLM_Compressionawq 量化模型推理的实现是通过下述步骤(模块):1, 基于校准集得到激活再根据量化算法计算量化缩放因子 s;2, 裁剪线性层权重的最小、最大值,推测了是为了抑制权重的异常值(smoothquant 没有这步);3, 在前面得到权重缩放因子 s 和裁剪最大值的基础上,将浮点模型权重转换为 int4 量化模型权重;4. 自定义 int4 矩阵乘法 kernel,并替换掉原来的浮点线性层,得到量化模型,再执行真正的量化模型推理(forward)。
https://www.armcvai.cn/2024-11-03/awq-code.html
Beta Was this translation helpful? Give feedback.
All reactions