What happens on OpenVino int8 inference? #9013

marcinkeviciusp · 2021-09-09T11:09:05Z

marcinkeviciusp
Sep 9, 2021

I have quantized an ONNX model to int8, and I am attempting to run it using the OpenVino provider. It does run, but very slowly (as compared to fp32 at least).

What is going on? Given that ONNX doesn't seem to have quantized model export for OpenVino models, I'd think it can't infer quantized models using OpenVino either. So is the model just being converted back to FP32 and then passed to OpenVino? I've tried searching in the source code but I couldn't quite find the relevant bit. If the answer had reference to the code that'd be great, but not necessary :)

Answered by yufenglee

Sep 10, 2021

@marcinkeviciusp, OpenVino quantization support is in progress.

View full answer

yufenglee · 2021-09-10T18:50:03Z

yufenglee
Sep 10, 2021
Collaborator

@marcinkeviciusp, OpenVino quantization support is in progress.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What happens on OpenVino int8 inference? #9013

{{title}}

Replies: 1 comment

{{title}}

Select a reply

What happens on OpenVino int8 inference? #9013

marcinkeviciusp Sep 9, 2021

Replies: 1 comment

yufenglee Sep 10, 2021 Collaborator

marcinkeviciusp
Sep 9, 2021

yufenglee
Sep 10, 2021
Collaborator