Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not drop QDQ around linear Resize #22089

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mgehre-amd
Copy link

@mgehre-amd mgehre-amd commented Sep 13, 2024

It's not numerically equivalent to drop Q DQ nodes around a Resize when the Resize is using linear interpolation.
This PR only drops QDQ around resize using the nearest interpolation.

See #21319 for details.

fixes #21319

See microsoft#21319 for details.
This PR disables the QDQ resize matching to avoid numerical issues.
Copy link
Contributor

@fajin-corp fajin-corp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@skottmckay
Copy link
Contributor

skottmckay commented Sep 16, 2024

Can you quantify the effect of dropping the QDQ? How significant is the effect on model accuracy for real world input?

Wondering if this needs to be configurable so users can prioritize performance if the accuracy loss might be acceptable.

@fajin-corp
Copy link
Contributor

/azp run Big Models Expected,Linux Android Emulator QNN CI Pipeline Expected,Linux CPU CI Pipeline Expected,Linux CPU Minimal Build E2E CI Pipeline Expected,Linux GPU CI Pipeline Expected,Linux GPU TensorRT CI Pipeline Expected,Linux OpenVINO CI Pipeline Expected,Linux QNN CI Pipeline Expected,MacOS CI Pipeline Expected,ONNX Runtime Web CI Pipeline Expected

Copy link

No pipelines are associated with this pull request.

@skottmckay
Copy link
Contributor

I'm wondering if this is the right place to make changes. If the model has DQ/Q around the operator, isn't it saying that that group of nodes can be executed using quantized data if the EP supports it?

If you feel like that isn't a valid option, shouldn't the fix be to not produce a model with a DQ/Q around this sort of Resize?

@mgehre-amd
Copy link
Author

I'm wondering if this is the right place to make changes. If the model has DQ/Q around the operator, isn't it saying that that group of nodes can be executed using quantized data if the EP supports it?

If you feel like that isn't a valid option, shouldn't the fix be to not produce a model with a DQ/Q around this sort of Resize?

I think the ONNX spec is pretty clear on the numerical computations that need to be done for DQ - Resize - Q, and I would expect that onnxruntime follows those. The intention of having a model with DQ/Q around Resize is that it follows whatever ONNX specifies it to mean.
I would be surprised if onnxruntime is allowed to compute a different result.

@mgehre-amd
Copy link
Author

@microsoft-github-policy-service agree

@skottmckay
Copy link
Contributor

I think the ONNX spec is pretty clear on the numerical computations that need to be done for DQ - Resize - Q, and I would expect that onnxruntime follows those. The intention of having a model with DQ/Q around Resize is that it follows whatever ONNX specifies it to mean.
I would be surprised if onnxruntime is allowed to compute a different result.

I'm not sure it's that black and white.

As an example, a QDQ format model would generally look something like this:

image

Having DQ/Q around most if not all other operators in the model is the way in ONNX to allow executing the model using quantized data. The DQ and Q nodes provide the zero point and scale info across those operators and the runtime decides whether the operator can be executed using quantized data for better performance, at a potential accuracy cost.

c.f. with TF Lite where zero point and scale is part of the operator specs so the intention that 'this operator should always be executed with quantized/unquantized data' can be expressed more clearly.

Is it not the case that many operators if executed using quantized data (DQ -> op -> Q is handled by a quantized implementation of the op) will not yield exactly the same results as converting to float to perform the operation?

If it is the case, I'm not sure there's a 'never drop QDQ nodes around operator X' rule given in general a QDQ format model is created for the performance gain of using quantized implementations of the operators where possible, at the cost of some accuracy.

Can you quantify the effect of dropping the QDQ? How significant is the effect on model accuracy for real world input?
Wondering if this needs to be configurable so users can prioritize performance if the accuracy loss might be acceptable.

Back to this question, do you have a specific model and use case where a QDQ format model must be used, and special casing the Resize is required to avoid accuracy issues?

@mgehre-amd
Copy link
Author

I think the ONNX spec is pretty clear on the numerical computations that need to be done for DQ - Resize - Q, and I would expect that onnxruntime follows those. The intention of having a model with DQ/Q around Resize is that it follows whatever ONNX specifies it to mean.
I would be surprised if onnxruntime is allowed to compute a different result.

I'm not sure it's that black and white.

As an example, a QDQ format model would generally look something like this:

image

Having DQ/Q around most if not all other operators in the model is the way in ONNX to allow executing the model using quantized data. The DQ and Q nodes provide the zero point and scale info across those operators and the runtime decides whether the operator can be executed using quantized data for better performance, at a potential accuracy cost.

c.f. with TF Lite where zero point and scale is part of the operator specs so the intention that 'this operator should always be executed with quantized/unquantized data' can be expressed more clearly.

Is it not the case that many operators if executed using quantized data (DQ -> op -> Q is handled by a quantized implementation of the op) will not yield exactly the same results as converting to float to perform the operation?

If it is the case, I'm not sure there's a 'never drop QDQ nodes around operator X' rule given in general a QDQ format model is created for the performance gain of using quantized implementations of the operators where possible, at the cost of some accuracy.

Can you quantify the effect of dropping the QDQ? How significant is the effect on model accuracy for real world input?
Wondering if this needs to be configurable so users can prioritize performance if the accuracy loss might be acceptable.

Back to this question, do you have a specific model and use case where a QDQ format model must be used, and special casing the Resize is required to avoid accuracy issues?

Hey, I'm pretty sure that the way how DQ - Conv - Q is implemented is precise, and it's not done by dropping DQ and Q, but by having a quantized implementation of the Conv which will accumulate the integers inputs into a larger accumulator and then shift/round that using integer arithmetic.

The quantized linear Resize can also be implemented more efficiently than executing the DQ - Resize - Q step-by-step, but it would also require to have a quantized resize that e.g. would do the arithmetic on a larger integer type than the integer input to have enough bits to do the "division" (which then becomes a shift, I guess) and the "rounding".

@mgehre-amd
Copy link
Author

@fajin-corp, thanks for your approval! What is the policy to merge the PR? Is it ready?

@fajin-corp
Copy link
Contributor

@mgehre-amd please get green light from skott before merging the PR. The required CI pipelines need to be run and passed before merging the PR.

@justinchuby justinchuby changed the title Do not drop QDQ around linear Resize (fixes #21319) Do not drop QDQ around linear Resize Oct 14, 2024
@skottmckay
Copy link
Contributor

@fajin-corp last I heard there was a suspicion that the Resize implementation had a bug and @yihonglyu was going to look into it. What was the result of that investigation?

@fajin-corp
Copy link
Contributor

  • @yihonglyu could you pls provide updates to Scott? thanks

In reply to: 2434720179

@skottmckay
Copy link
Contributor

@yihonglyu created this PR: #22476

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

QDQ removal around Resize (mode=linear) causes wrong numeric values
3 participants