Quantization of pretrained models using onnxruntime #6387
Unanswered
HemaSowjanyaMamidi
asked this question in
Other Q&A
Replies: 1 comment 11 replies
-
@HemaSowjanyaMamidi, ORT TensorRT EP can run the static quantization, but official onnxruntime-gpu doesn't. Please refer to this script for an e2e example with TRT ep. BTW, the quantization support with TRT EP is still in progress and you may not see performance gain with it: https://github.com/microsoft/onnxruntime/blob/master/onnxruntime/python/tools/quantization/E2E_example_model/e2e_tensorrt_resnet_example.py. |
Beta Was this translation helpful? Give feedback.
11 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi All,
I took a pretrained model in keras and converted to onnx model and later I wanted to check in terms of quantized models so I tried the following ways.
Does the onnxruntime-gpu support the quantized models?
Are the all operators being quantized properly?
Beta Was this translation helpful? Give feedback.
All reactions