Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Support ONNX and TensorRT export #4

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

IamShubhamGupto
Copy link
Contributor

@IamShubhamGupto IamShubhamGupto commented May 3, 2024

Hello @guipotje and team,

In this PR, I am adding support for exporting ONNX and TensorRT models. On running the TensorRT engine, we can observe improvement in FPS. Since the evaluation code is currently unavailable, I cannot quantify the accuracy but it feels similar to the provided xfeat.pt file performance. The following is currently supported and changed:

  • Export onnx with opset version preference (default 17)
  • Export onnx with dynamic axis preference (enabled by default)
  • Export TensorRT with image size preference (default 480x640)
  • Export TensorRT with fp32 preference
  • Export TensorRT with fp16 preference (enabled by default)
  • Export TensorRT with workspace size preference (default 4096MiB)
  • Demo realtime_demo.py with engine
  • Make realtime_demo.py backwards compatible with cv2 version 4.5.0
  • Update README.md for TensorRT export

We can possibly further improve FPS by simplifying the onnx export using onnx-simplier but that can be an iterative PR.

I will be attaching logs and performance observations in comments below. Let me know if there any changes required. Thank you again for the amazing work!

@IamShubhamGupto
Copy link
Contributor Author

TensorRT export logs and statistics - trt.log

mode observed FPS
XFeat (cuda) 21.4
XFeat (tensorrt) 24.7

XFeat (cuda)
Screenshot from 2024-05-03 03-17-08

XFeat (tensorrt)
Screenshot from 2024-05-03 03-25-10

TensoRT is more sparse for almost similar poses but generally seems to have the same performance if not better.

Device: Nvidia Jetson AGX Xavier - Jetpack 5.1
Docker image: dustynv/l4t-ml:r35.4.1

@IamShubhamGupto IamShubhamGupto marked this pull request as ready for review May 3, 2024 07:29
@guipotje
Copy link
Collaborator

guipotje commented May 7, 2024

Hello @IamShubhamGupto and @acai66, thank you so much for your contributions!

I notice that both of you are providing an ONNX export, and it probably doesn't make sense to merge both since they serve the same purpose. Do you have any suggestions on how to proceed? In my opinion, it makes sense to choose the one that supports the earliest operator set, as it would maximize the model's compatibility with the greatest number of devices and libraries

@IamShubhamGupto
Copy link
Contributor Author

Hello @IamShubhamGupto and @acai66, thank you so much for your contributions!

I notice that both of you are providing an ONNX export, and it probably doesn't make sense to merge both since they serve the same purpose. Do you have any suggestions on how to proceed? In my opinion, it makes sense to choose the one that supports the earliest operator set, as it would maximize the model's compatibility with the greatest number of devices and libraries

sure, maybe me and @acai66 can work on collaborating our work. In summary, there is no specific operator set defined in this PR and is up to the user to provide one, by default it is the latest supported and that is 17. Users can choose to use an older op set such as 13 by providing command line arguments.

python3 export.py --onnx_opset 13 

Another key difference in our PRs is @acai66 can quantize the XFeatModel and XFeat modules where as I choose to only quantize XFeatModel. This feature can be merged into my branch as the code to then export TensorRT engine is already available here .

Let me know what you guys think

@IamShubhamGupto
Copy link
Contributor Author

Hey just bumping this conversation up again @guipotje @acai66

@acai66
Copy link

acai66 commented May 9, 2024

Hello @IamShubhamGupto and @acai66, thank you so much for your contributions!
I notice that both of you are providing an ONNX export, and it probably doesn't make sense to merge both since they serve the same purpose. Do you have any suggestions on how to proceed? In my opinion, it makes sense to choose the one that supports the earliest operator set, as it would maximize the model's compatibility with the greatest number of devices and libraries

sure, maybe me and @acai66 can work on collaborating our work. In summary, there is no specific operator set defined in this PR and is up to the user to provide one, by default it is the latest supported and that is 17. Users can choose to use an older op set such as 13 by providing command line arguments.

python3 export.py --onnx_opset 13 

Another key difference in our PRs is @acai66 can quantize the XFeatModel and XFeat modules where as I choose to only quantize XFeatModel. This feature can be merged into my branch as the code to then export TensorRT engine is already available here .

Let me know what you guys think

Thank you for your suggestion. I concur with your viewpoint and believe that merging our work would indeed be a beneficial course of action.

@acai66
Copy link

acai66 commented May 9, 2024

My PR is aimed at improving the convenience and compatibility of deployment, especially in environments without PyTorch. I've uploaded an example of using the ONNX Runtime Python API, which can be easily reimplemented using the ONNX Runtime C++ API.

@IamShubhamGupto
Copy link
Contributor Author

My PR is aimed at improving the convenience and compatibility of deployment, especially in environments without PyTorch. I've uploaded an example of using the ONNX Runtime Python API, which can be easily reimplemented using the ONNX Runtime C++ API.

that's amazing, we can then find some time this week to start merging the examples. I'll give your editor access to my fork.

@guipotje
Copy link
Collaborator

guipotje commented May 9, 2024

My PR is aimed at improving the convenience and compatibility of deployment, especially in environments without PyTorch. I've uploaded an example of using the ONNX Runtime Python API, which can be easily reimplemented using the ONNX Runtime C++ API.

that's amazing, we can then find some time this week to start merging the examples. I'll give your editor access to my fork.

Fantastic, @IamShubhamGupto and @acai66! While I'm not experienced in ONNX, I'm here to help with anything related to the original code — just let me know.

@acai66
Copy link

acai66 commented May 10, 2024

My PR is aimed at improving the convenience and compatibility of deployment, especially in environments without PyTorch. I've uploaded an example of using the ONNX Runtime Python API, which can be easily reimplemented using the ONNX Runtime C++ API.

that's amazing, we can then find some time this week to start merging the examples. I'll give your editor access to my fork.

Thank you very much for the invitation. I am concerned that I have little experience with TensorRT deployment. I have invited you to be a collaborator on the forked repository.

@acai66
Copy link

acai66 commented May 10, 2024

My PR is aimed at improving the convenience and compatibility of deployment, especially in environments without PyTorch. I've uploaded an example of using the ONNX Runtime Python API, which can be easily reimplemented using the ONNX Runtime C++ API.

that's amazing, we can then find some time this week to start merging the examples. I'll give your editor access to my fork.

Fantastic, @IamShubhamGupto and @acai66! While I'm not experienced in ONNX, I'm here to help with anything related to the original code — just let me know.

I apologize for my poor English; I am using translation software to communicate.
I have made some modifications to the source code to improve the compatibility and convenience of exporting ONNX. However, I am unsure if this will affect the original PyTorch functionality and performance. I am unable to conduct sufficient related tests.

@IamShubhamGupto IamShubhamGupto marked this pull request as draft May 10, 2024 12:34
@IamShubhamGupto
Copy link
Contributor Author

@acai66 for now I am pausing development on my branch. we will merge onnx export from your PR and then I will continue TensorRT development here.

I will review and commit changes on your PR in the next couple of hours.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants