Do not drop QDQ around linear Resize #22089

mgehre-amd · 2024-09-13T08:27:35Z

It's not numerically equivalent to drop Q DQ nodes around a Resize when the Resize is using linear interpolation.
This PR only drops QDQ around resize using the nearest interpolation.

See #21319 for details.

fixes #21319

See microsoft#21319 for details. This PR disables the QDQ resize matching to avoid numerical issues.

fajin-corp

skottmckay · 2024-09-16T23:04:35Z

Can you quantify the effect of dropping the QDQ? How significant is the effect on model accuracy for real world input?

Wondering if this needs to be configurable so users can prioritize performance if the accuracy loss might be acceptable.

fajin-corp · 2024-09-16T23:11:47Z

/azp run Big Models Expected,Linux Android Emulator QNN CI Pipeline Expected,Linux CPU CI Pipeline Expected,Linux CPU Minimal Build E2E CI Pipeline Expected,Linux GPU CI Pipeline Expected,Linux GPU TensorRT CI Pipeline Expected,Linux OpenVINO CI Pipeline Expected,Linux QNN CI Pipeline Expected,MacOS CI Pipeline Expected,ONNX Runtime Web CI Pipeline Expected

azure-pipelines · 2024-09-16T23:11:53Z

No pipelines are associated with this pull request.

skottmckay · 2024-09-23T22:36:56Z

I'm wondering if this is the right place to make changes. If the model has DQ/Q around the operator, isn't it saying that that group of nodes can be executed using quantized data if the EP supports it?

If you feel like that isn't a valid option, shouldn't the fix be to not produce a model with a DQ/Q around this sort of Resize?

mgehre-amd · 2024-09-24T08:44:17Z

I'm wondering if this is the right place to make changes. If the model has DQ/Q around the operator, isn't it saying that that group of nodes can be executed using quantized data if the EP supports it?

If you feel like that isn't a valid option, shouldn't the fix be to not produce a model with a DQ/Q around this sort of Resize?

I think the ONNX spec is pretty clear on the numerical computations that need to be done for DQ - Resize - Q, and I would expect that onnxruntime follows those. The intention of having a model with DQ/Q around Resize is that it follows whatever ONNX specifies it to mean.
I would be surprised if onnxruntime is allowed to compute a different result.

mgehre-amd · 2024-09-24T13:44:54Z

@microsoft-github-policy-service agree

skottmckay · 2024-09-25T09:05:59Z

I think the ONNX spec is pretty clear on the numerical computations that need to be done for DQ - Resize - Q, and I would expect that onnxruntime follows those. The intention of having a model with DQ/Q around Resize is that it follows whatever ONNX specifies it to mean.
I would be surprised if onnxruntime is allowed to compute a different result.

I'm not sure it's that black and white.

As an example, a QDQ format model would generally look something like this:

Having DQ/Q around most if not all other operators in the model is the way in ONNX to allow executing the model using quantized data. The DQ and Q nodes provide the zero point and scale info across those operators and the runtime decides whether the operator can be executed using quantized data for better performance, at a potential accuracy cost.

c.f. with TF Lite where zero point and scale is part of the operator specs so the intention that 'this operator should always be executed with quantized/unquantized data' can be expressed more clearly.

Is it not the case that many operators if executed using quantized data (DQ -> op -> Q is handled by a quantized implementation of the op) will not yield exactly the same results as converting to float to perform the operation?

If it is the case, I'm not sure there's a 'never drop QDQ nodes around operator X' rule given in general a QDQ format model is created for the performance gain of using quantized implementations of the operators where possible, at the cost of some accuracy.

Can you quantify the effect of dropping the QDQ? How significant is the effect on model accuracy for real world input?
Wondering if this needs to be configurable so users can prioritize performance if the accuracy loss might be acceptable.

Back to this question, do you have a specific model and use case where a QDQ format model must be used, and special casing the Resize is required to avoid accuracy issues?

mgehre-amd · 2024-09-25T14:30:13Z

I think the ONNX spec is pretty clear on the numerical computations that need to be done for DQ - Resize - Q, and I would expect that onnxruntime follows those. The intention of having a model with DQ/Q around Resize is that it follows whatever ONNX specifies it to mean.
I would be surprised if onnxruntime is allowed to compute a different result.

I'm not sure it's that black and white.

As an example, a QDQ format model would generally look something like this:

Having DQ/Q around most if not all other operators in the model is the way in ONNX to allow executing the model using quantized data. The DQ and Q nodes provide the zero point and scale info across those operators and the runtime decides whether the operator can be executed using quantized data for better performance, at a potential accuracy cost.

c.f. with TF Lite where zero point and scale is part of the operator specs so the intention that 'this operator should always be executed with quantized/unquantized data' can be expressed more clearly.

Is it not the case that many operators if executed using quantized data (DQ -> op -> Q is handled by a quantized implementation of the op) will not yield exactly the same results as converting to float to perform the operation?

If it is the case, I'm not sure there's a 'never drop QDQ nodes around operator X' rule given in general a QDQ format model is created for the performance gain of using quantized implementations of the operators where possible, at the cost of some accuracy.

Can you quantify the effect of dropping the QDQ? How significant is the effect on model accuracy for real world input?
Wondering if this needs to be configurable so users can prioritize performance if the accuracy loss might be acceptable.

Back to this question, do you have a specific model and use case where a QDQ format model must be used, and special casing the Resize is required to avoid accuracy issues?

Hey, I'm pretty sure that the way how DQ - Conv - Q is implemented is precise, and it's not done by dropping DQ and Q, but by having a quantized implementation of the Conv which will accumulate the integers inputs into a larger accumulator and then shift/round that using integer arithmetic.

The quantized linear Resize can also be implemented more efficiently than executing the DQ - Resize - Q step-by-step, but it would also require to have a quantized resize that e.g. would do the arithmetic on a larger integer type than the integer input to have enough bits to do the "division" (which then becomes a shift, I guess) and the "rounding".

mgehre-amd · 2024-10-11T15:05:09Z

@fajin-corp, thanks for your approval! What is the policy to merge the PR? Is it ready?

fajin-corp · 2024-10-11T17:46:53Z

@mgehre-amd please get green light from skott before merging the PR. The required CI pipelines need to be run and passed before merging the PR.

skottmckay · 2024-10-24T09:09:21Z

@fajin-corp last I heard there was a suspicion that the Resize implementation had a bug and @yihonglyu was going to look into it. What was the result of that investigation?

fajin-corp · 2024-10-24T17:56:22Z

@yihonglyu could you pls provide updates to Scott? thanks

In reply to: 2434720179

skottmckay · 2024-10-29T04:03:00Z

@yihonglyu created this PR: #22476

Do not merge Resize with QDQ (fixes microsoft#21319)

8edf7b0

See microsoft#21319 for details. This PR disables the QDQ resize matching to avoid numerical issues.

yufenglee requested review from yihonglyu, adrianlizarraga and fajin-corp September 16, 2024 16:12

fajin-corp approved these changes Sep 16, 2024

View reviewed changes

justinchuby changed the title ~~Do not drop QDQ around linear Resize (fixes #21319)~~ Do not drop QDQ around linear Resize Oct 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not drop QDQ around linear Resize #22089

Do not drop QDQ around linear Resize #22089

mgehre-amd commented Sep 13, 2024 •

edited by justinchuby

Loading

fajin-corp left a comment

skottmckay commented Sep 16, 2024 •

edited

Loading

fajin-corp commented Sep 16, 2024

azure-pipelines bot commented Sep 16, 2024

skottmckay commented Sep 23, 2024

mgehre-amd commented Sep 24, 2024

mgehre-amd commented Sep 24, 2024

skottmckay commented Sep 25, 2024

mgehre-amd commented Sep 25, 2024

mgehre-amd commented Oct 11, 2024

fajin-corp commented Oct 11, 2024

skottmckay commented Oct 24, 2024

fajin-corp commented Oct 24, 2024

skottmckay commented Oct 29, 2024

Do not drop QDQ around linear Resize #22089

Are you sure you want to change the base?

Do not drop QDQ around linear Resize #22089

Conversation

mgehre-amd commented Sep 13, 2024 • edited by justinchuby Loading

fajin-corp left a comment

Choose a reason for hiding this comment

skottmckay commented Sep 16, 2024 • edited Loading

fajin-corp commented Sep 16, 2024

azure-pipelines bot commented Sep 16, 2024

skottmckay commented Sep 23, 2024

mgehre-amd commented Sep 24, 2024

mgehre-amd commented Sep 24, 2024

skottmckay commented Sep 25, 2024

mgehre-amd commented Sep 25, 2024

mgehre-amd commented Oct 11, 2024

fajin-corp commented Oct 11, 2024

skottmckay commented Oct 24, 2024

fajin-corp commented Oct 24, 2024

skottmckay commented Oct 29, 2024

mgehre-amd commented Sep 13, 2024 •

edited by justinchuby

Loading

skottmckay commented Sep 16, 2024 •

edited

Loading