关于使用 cutlass 实现 per-channel 量化的 int8 conv #3

zhexinli · 2024-03-29T07:51:07Z

大佬你好，我最近在参考你的 cutlass 写一个 int8 quantized 的 conv。在你的代码中，int8 conv 和 dequantize 是两个 kernel。我想将 DQ fuse 到 conv 中作为一个 kernel 执行以节省访存。我希望借助 DefaultConv2dFprop 中的 EpilogueOp 来实现，也就是将 alpha 设为 input_scale * per_channel_weight_scale，计算公式是 alpha * conv(qin, qweight)。但是我看 alpha 只能传一个 float，所以 weight 只能做 per-tensor 量化了，这对于精度的影响是比较大的。大佬当初是不是也是因为这个原因所以选择了手动 dequant 呢？

ThisisBillhe · 2024-04-01T06:17:40Z

Hi，大佬我是不敢当. 要不我们加个微信后续交流？wechat: hyfll2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

关于使用 cutlass 实现 per-channel 量化的 int8 conv #3

关于使用 cutlass 实现 per-channel 量化的 int8 conv #3

zhexinli commented Mar 29, 2024

ThisisBillhe commented Apr 1, 2024

关于使用 cutlass 实现 per-channel 量化 的 int8 conv #3

关于使用 cutlass 实现 per-channel 量化 的 int8 conv #3

Comments

zhexinli commented Mar 29, 2024

ThisisBillhe commented Apr 1, 2024

关于使用 cutlass 实现 per-channel 量化的 int8 conv #3

关于使用 cutlass 实现 per-channel 量化的 int8 conv #3