Quantization bert：How can I fix some layers for full accuracy? #14162

shiqingzhangCSU · 2023-01-06T09:26:48Z

shiqingzhangCSU
Jan 6, 2023

In paper “Understanding and Overcoming the Challenges of Efficient Transformer Quantization”，the authors point out that use the W8A32 or fixed layers full precision such as the residual connections can reduce accuracy loss. How can I use the quantization tool to fix some layers for FP32 quantization?


import onnx
from onnxruntime.quantization import quantize_dynamic, QuantType

model_fp32 = 'path/to/the/model.onnx'
model_quant = 'path/to/the/model.quant.onnx'
quantized_model = quantize_dynamic(model_fp32, model_quant)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantization bert：How can I fix some layers for full accuracy? #14162

{{title}}

Replies: 0 comments

Select a reply

Quantization bert：How can I fix some layers for full accuracy? #14162

shiqingzhangCSU Jan 6, 2023

Replies: 0 comments

shiqingzhangCSU
Jan 6, 2023