Quantization bert:How can I fix some layers for full accuracy? #14162
Unanswered
shiqingzhangCSU
asked this question in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
In paper “Understanding and Overcoming the Challenges of Efficient Transformer Quantization”,the authors point out that use the W8A32 or fixed layers full precision such as the residual connections can reduce accuracy loss. How can I use the quantization tool to fix some layers for FP32 quantization?
Beta Was this translation helpful? Give feedback.
All reactions