-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not drop QDQ around linear Resize #22089
base: main
Are you sure you want to change the base?
Conversation
See microsoft#21319 for details. This PR disables the QDQ resize matching to avoid numerical issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you quantify the effect of dropping the QDQ? How significant is the effect on model accuracy for real world input? Wondering if this needs to be configurable so users can prioritize performance if the accuracy loss might be acceptable. |
/azp run Big Models Expected,Linux Android Emulator QNN CI Pipeline Expected,Linux CPU CI Pipeline Expected,Linux CPU Minimal Build E2E CI Pipeline Expected,Linux GPU CI Pipeline Expected,Linux GPU TensorRT CI Pipeline Expected,Linux OpenVINO CI Pipeline Expected,Linux QNN CI Pipeline Expected,MacOS CI Pipeline Expected,ONNX Runtime Web CI Pipeline Expected |
No pipelines are associated with this pull request. |
I'm wondering if this is the right place to make changes. If the model has DQ/Q around the operator, isn't it saying that that group of nodes can be executed using quantized data if the EP supports it? If you feel like that isn't a valid option, shouldn't the fix be to not produce a model with a DQ/Q around this sort of Resize? |
I think the ONNX spec is pretty clear on the numerical computations that need to be done for DQ - Resize - Q, and I would expect that onnxruntime follows those. The intention of having a model with DQ/Q around Resize is that it follows whatever ONNX specifies it to mean. |
@microsoft-github-policy-service agree |
I'm not sure it's that black and white. As an example, a QDQ format model would generally look something like this: Having DQ/Q around most if not all other operators in the model is the way in ONNX to allow executing the model using quantized data. The DQ and Q nodes provide the zero point and scale info across those operators and the runtime decides whether the operator can be executed using quantized data for better performance, at a potential accuracy cost. c.f. with TF Lite where zero point and scale is part of the operator specs so the intention that 'this operator should always be executed with quantized/unquantized data' can be expressed more clearly. Is it not the case that many operators if executed using quantized data (DQ -> op -> Q is handled by a quantized implementation of the op) will not yield exactly the same results as converting to float to perform the operation? If it is the case, I'm not sure there's a 'never drop QDQ nodes around operator X' rule given in general a QDQ format model is created for the performance gain of using quantized implementations of the operators where possible, at the cost of some accuracy.
Back to this question, do you have a specific model and use case where a QDQ format model must be used, and special casing the Resize is required to avoid accuracy issues? |
Hey, I'm pretty sure that the way how DQ - Conv - Q is implemented is precise, and it's not done by dropping DQ and Q, but by having a quantized implementation of the Conv which will accumulate the integers inputs into a larger accumulator and then shift/round that using integer arithmetic. The quantized linear Resize can also be implemented more efficiently than executing the DQ - Resize - Q step-by-step, but it would also require to have a quantized resize that e.g. would do the arithmetic on a larger integer type than the integer input to have enough bits to do the "division" (which then becomes a shift, I guess) and the "rounding". |
@fajin-corp, thanks for your approval! What is the policy to merge the PR? Is it ready? |
@mgehre-amd please get green light from skott before merging the PR. The required CI pipelines need to be run and passed before merging the PR. |
@fajin-corp last I heard there was a suspicion that the Resize implementation had a bug and @yihonglyu was going to look into it. What was the result of that investigation? |
In reply to: 2434720179 |
@yihonglyu created this PR: #22476 |
It's not numerically equivalent to drop Q DQ nodes around a Resize when the Resize is using linear interpolation.
This PR only drops QDQ around resize using the
nearest
interpolation.See #21319 for details.
fixes #21319