Replies: 1 comment 5 replies
-
Which version of OnnxRuntime are you using? How many threads did you use when benchmarking for multiple threads? |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Please inform me if this is the wrong place to ask.
I have a BERT based model, exported to ONNX and I am trying to gauge the performance increase. For GPU, the performance makes sense, and it excellent, but I am also trying to check performance on CPU.
The setup is as follows: A Bert encoder with a single linear layer and softmax layer attached, created and trained in pytorch. Exported with pytorch to onnx. Using ONNX runtime to run the model. Additionally, the model is optimized using
onnxruntime/transformers/optimize.py
.These are the values I have gotten so far, leaving all other variables constant:
VM hardware:
Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
Thread(s) per core: 1
Core(s) per socket: 6
Socket(s): 1
GPU: Tesla V100
Local hardware:
AMD Ryzen 7 4800H with Radeon Graphics
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
GPU: NVidia GeForce RTX 3050
Now here's the issue:
Why is the performance increase between PyTorch and ONNX locally so much bigger than on the VM? Obviously there is a difference between the local hardware and the VM, and a difference between multithreaded and single-threaded, which is exacerbated by the choice of hardware, but I do not understand why the difference between PyTorch and ONNX is not consistent, within the same environment.
Beta Was this translation helpful? Give feedback.
All reactions