Performance problem for model with multiple heads #8192
Unanswered
gorodnitskiy
asked this question in
Other Q&A
Replies: 1 comment
-
It might be useful to read this as Convolutions attempt to use threads heavily. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, guys!
I found weird case with onnx runtime performance:
I trained resnext50_32x4d with 3 heads (head = conv2d(..., groups=64)-relu-linear) via pytorch-lightnings. Model code is below. Then I exported model to onnx via torch.onnx.export with opset_version=13 (using code from example) and ran by onnxruntime on CPU. I got approximately 225 ms per image in a single-thread mode.
But if I round-off manually model weights with 16 digits after decimal point (using load_model(…, n_digits=16) func) before exporting to onnx, I got 75 ms per image - 3x speed-up. I checked output tensors with 16-digits precision and it completely matched. And I checked its rounding did not change the model weights.
Could someone explain it? Any suggestions?
I suppose this case related to PyTorch onnx exporting, but I'll leave it here, in case it will be useful to someone.
Versions:
Hardware: Intel (R) Core(TM) i5-10600K CPU @ 4.10GHz
Single-thread mode CPU:
Beta Was this translation helpful? Give feedback.
All reactions