Does the cuda execution provider support dynamically-quantized model? #10066

brevity2021 · 2021-12-17T00:24:42Z

brevity2021
Dec 17, 2021

Hi,

I build a quantized ONNX model using onnxruntime.quantization.quantize_dynamic, and try it with a runtime session with CUDAExecutionProvider enabled with GPU. It runs even slower than only using CPUExecutionProvider. But when we use the unquantized model, enabling CUDAExecutionProvider is significantly faster than the CPUExecutionProvider.

Is this because the current CUDAExecutionProvider does not support running quantized models? To get quantization benefit on GPU, the only way is to use TensorRT execution provider for now?

Thanks a lot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does the cuda execution provider support dynamically-quantized model? #10066

{{title}}

Replies: 0 comments

Select a reply

Does the cuda execution provider support dynamically-quantized model? #10066

brevity2021 Dec 17, 2021

Replies: 0 comments

brevity2021
Dec 17, 2021