Replies: 1 comment 1 reply
-
could be related with #14023? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
When making multiple inferences in a row in the same session, some inferences incur additional costs.
How can I correct this ? I'm using Cuda 11.7 and OnnxRuntime 1.15.0
I did a few tests and you can see a sort of patern
I also use the tool /onnxruntime_perf_test :
./onnxruntime_perf_test -I -S 1 -e cuda -r 2048 -p profile.json -s /data/model/googlenet/dynamic_batch_googlenet_opt.onnx
And for the task Id ~1000, 2000, 4000, 8000, 16000 the bug doubles latency.
Maybe it's Cuda or something reallocate from time to time, and in larger and larger quantities to limit the extra cost.
Thanks in advance
Beta Was this translation helpful? Give feedback.
All reactions