You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Nice work in this paper, I want to know that:
the paper mentioned that all linear ops are quantized into int4, what about mat-multiply ops in the attention module? Is the activation gradient in matmul ops float or int4?
The text was updated successfully, but these errors were encountered:
brisker
changed the title
the paper mentioned that all linear ops are quantized into int4, what about mat-multiply ops in the attention module? Float or int4?
the paper mentioned that all linear ops are quantized into int4, what about gradients in mat-multiply ops in the attention module? Float or int4?
Aug 11, 2023
@xijiu9
besides, in the grad_weight calculation process, the code here seems to be not int4 matmul, since sample_x3 is divided by norm_weight_loopafter quantized into INT4here. The code is a little confusing to me, since I can not quite understand: norm_weight_loop ,which is in N*1 shape is involved in the backprop, is your int4 matmul per-channel(batch-channel) quantization? But still this can not be done in hardward(or this will lose the accelerating meaning of quantization) since Cout * N(activation gradient) and N * Cin(input activation) matmul can not be per-channel quantized at N(batch) level
Nice work in this paper, I want to know that:
the paper mentioned that all linear ops are quantized into int4, what about mat-multiply ops in the attention module? Is the activation gradient in matmul ops float or int4?
The text was updated successfully, but these errors were encountered: