Does the cuda execution provider support dynamically-quantized model? #10066
Unanswered
brevity2021
asked this question in
Other Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I build a quantized ONNX model using onnxruntime.quantization.quantize_dynamic, and try it with a runtime session with CUDAExecutionProvider enabled with GPU. It runs even slower than only using CPUExecutionProvider. But when we use the unquantized model, enabling CUDAExecutionProvider is significantly faster than the CPUExecutionProvider.
Is this because the current CUDAExecutionProvider does not support running quantized models? To get quantization benefit on GPU, the only way is to use TensorRT execution provider for now?
Thanks a lot.
Beta Was this translation helpful? Give feedback.
All reactions