Performance gains by ONNX inconsistent #11100

Kroshtan · 2022-04-04T09:23:27Z

Kroshtan
Apr 4, 2022

Please inform me if this is the wrong place to ask.

I have a BERT based model, exported to ONNX and I am trying to gauge the performance increase. For GPU, the performance makes sense, and it excellent, but I am also trying to check performance on CPU.

The setup is as follows: A Bert encoder with a single linear layer and softmax layer attached, created and trained in pytorch. Exported with pytorch to onnx. Using ONNX runtime to run the model. Additionally, the model is optimized using onnxruntime/transformers/optimize.py.

These are the values I have gotten so far, leaving all other variables constant:

Framework	Location	Time/iteration (single-threaded in brackets)
Pytorch	CPU, laptop local	250ms (1510ms)
ONNX	CPU, laptop local	130ms (420ms)
ONNX (onnxruntime/transformers/optimize.py)	CPU, laptop local	124ms (409ms)
Pytorch	GPU, laptop local	15ms
ONNX	GPU, laptop local
Pytorch	CPU, VM	150ms (671ms)
ONNX	CPU, VM	240ms (606ms)
ONNX (onnxruntime/transformers/optimizer.py)	CPU, VM	222ms (609ms)
Pytorch	GPU, VM (V100)	15ms
ONNX	GPU, VM (V100)	10ms
ONNX (onnxruntime/transformers/optimizer.py)	GPU, VM (V100)	3.20ms

VM hardware:
Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
Thread(s) per core: 1
Core(s) per socket: 6
Socket(s): 1
GPU: Tesla V100

Local hardware:
AMD Ryzen 7 4800H with Radeon Graphics
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
GPU: NVidia GeForce RTX 3050

Now here's the issue:
Why is the performance increase between PyTorch and ONNX locally so much bigger than on the VM? Obviously there is a difference between the local hardware and the VM, and a difference between multithreaded and single-threaded, which is exacerbated by the choice of hardware, but I do not understand why the difference between PyTorch and ONNX is not consistent, within the same environment.

yufenglee · 2022-04-05T00:17:09Z

yufenglee
Apr 5, 2022
Collaborator

Which version of OnnxRuntime are you using? How many threads did you use when benchmarking for multiple threads?

5 replies

Kroshtan Apr 5, 2022
Author

@yufenglee The Onnxruntime version is onnxruntime-gpu 1.11.0 locally and onnxruntime-gpu 1.10.0 on the VM (which, I realise now, is not good for consistency). The number of threads is 16 and 6 for local and VM, respectively. I simply used all threads in benchmarking the highest performance I could get.

riqiang-dp Jun 14, 2022

Sorry for hijacking this post! I've just started using onnxruntime and observed the same thing. Onnxruntime provides speed up on an Intel i7 Macbook but not on and VM running on Intel Xeon. Are there any followups as to why this is? How to debug this?

And when we are discussing number of threads, is that changed by setting intra_op_num_threads / inter_op_num_threads?

yufenglee Jun 14, 2022
Collaborator

@riqiang-dp, could you please also share your machine config?

riqiang-dp Jun 15, 2022

@yufenglee
Macbook Pro:
CPU: Intel i7-9750H (12) @ 2.60GHz
Mem: 16G

VM (Google Cloud VM, N1-standard-1):
CPU: Xeon E5-2699 v3 @ 2.30GHz, 45MB cache (Intel Haswell)
Mem: 7.5G

Inference with one thread, one process. The numbers are similar to what the OP stated, where ONNX is 1/3 the delay compared to Pytorch on the Macbook, but about the same on the VM. The baseline Pytorch performance for the VM and the Macbook is similar however for my usecase, which makes it more frustrating to see the huge speed boost on Macbook but the performance lags behind on VM.

riqiang-dp Jun 22, 2022

@yufenglee hi any ideas or updates concerning this topic? Many thanks!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance gains by ONNX inconsistent #11100

{{title}}

Replies: 1 comment 5 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Performance gains by ONNX inconsistent #11100

Kroshtan Apr 4, 2022

Replies: 1 comment · 5 replies

yufenglee Apr 5, 2022 Collaborator

Kroshtan Apr 5, 2022 Author

riqiang-dp Jun 14, 2022

yufenglee Jun 14, 2022 Collaborator

riqiang-dp Jun 15, 2022

riqiang-dp Jun 22, 2022

Kroshtan
Apr 4, 2022

Replies: 1 comment 5 replies

yufenglee
Apr 5, 2022
Collaborator

Kroshtan Apr 5, 2022
Author

yufenglee Jun 14, 2022
Collaborator