What happens on OpenVino int8 inference? #9013
-
I have quantized an ONNX model to int8, and I am attempting to run it using the OpenVino provider. It does run, but very slowly (as compared to fp32 at least). What is going on? Given that ONNX doesn't seem to have quantized model export for OpenVino models, I'd think it can't infer quantized models using OpenVino either. So is the model just being converted back to FP32 and then passed to OpenVino? I've tried searching in the source code but I couldn't quite find the relevant bit. If the answer had reference to the code that'd be great, but not necessary :) |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
@marcinkeviciusp, OpenVino quantization support is in progress. |
Beta Was this translation helpful? Give feedback.
@marcinkeviciusp, OpenVino quantization support is in progress.