-
Notifications
You must be signed in to change notification settings - Fork 235
Integrate Automated QDQ placement tool - Part 4 #704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Integrate Automated QDQ placement tool - Part 4 #704
Conversation
|
@ChenhanYu @cjluo-nv could you help me review this PR? thanks! |
1698082 to
6d55fcb
Compare
|
|
||
| **Q: Can I optimize for accuracy instead of latency?** | ||
|
|
||
| A: Currently, the autotuner optimizes for latency. For accuracy-aware optimization, you would need to implement a custom benchmarking function that evaluates accuracy on a validation dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we provide an example on how users could do that? You may re-use or modify the evaluate.py script in modelopt/examples/onnx_ptq as a starting point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this line was generated by cursor. This tool currently only focus on perf, accuracy-aware optimization is not supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way for users to implement something to direct the Q/DQ node placement according to an accuracy metric as well or would that not be straight-forward to do?
Signed-off-by: Will Guo <[email protected]>
6d55fcb to
14ab5b6
Compare
|
|
||
| **Q: Can I optimize for accuracy instead of latency?** | ||
|
|
||
| A: Currently, the autotuner optimizes for latency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: A: Currently, the autotuner optimizes for latency only.
What does this PR do?
Type of change: new feature
Overview: This PR integrates automated QDQ placement tool to ModelOpt, this PR is 4/4 parts of the changes. This PR contains the following changes:
Part 1: #701
Part 2: #702
Part 3: #703
Part 4: #704
Usage
Testing
This PR does not contains tests
Before your PR is "Ready for review"
Additional Information