Skip to content

Running multiple parallel inferences on the same model using the CUDA provider. #7876

Answered by snnn
xgirones asked this question in Other Q&A
Discussion options

You must be logged in to vote

Since the CUDA provider appears to serialize concurrent calls to the same session

If that's true, we should fix this. Do you have more details? Do you know which part of code restricts the concurrency?

Replies: 2 comments 3 replies

Comment options

You must be logged in to vote
1 reply
@xgirones
Comment options

Answer selected by xgirones
Comment options

You must be logged in to vote
2 replies
@xgirones
Comment options

@pranavsharma
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants