You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Expect these 10 requests to be processed simultaneously and return results.
actual behavior
If the model instance is limited to one, then during the simulation of concurrent requests, the requests will be processed sequentially, one after another. For example, if processing and generating the full text for one request takes 10 seconds, the second request will only begin after 10 seconds, resulting in a total duration of 20 seconds.
additional notes
If you need me to provide the complete config.pbtxt file, feel free to ask。
The text was updated successfully, but these errors were encountered:
Try increasing your max_queue_delay_microseconds to a larger value to give your calls a chance to arrive within the same time window. This will help you debug whether dynamic_batching is broken, or whether the parameters you chose are too tight.
Try for example max_queue_delay_microseconds = 10000000 (given that the queue delay is in microseconds), your queue would wait 10 seconds for an incoming requests.
System Info
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
adding dynamic_batching in tensorrt_llm/config.txt
dynamic_batching {
preferred_batch_size: [ 32 ]
max_queue_delay_microseconds: 10000
default_queue_policy: { max_queue_size: 32 }
}
instance_group [
{
count: 1
kind : KIND_GPU
gpus: [ 0 ]
}
]
Simulate 10 concurrent requests.
Expected behavior
Expect these 10 requests to be processed simultaneously and return results.
actual behavior
If the model instance is limited to one, then during the simulation of concurrent requests, the requests will be processed sequentially, one after another. For example, if processing and generating the full text for one request takes 10 seconds, the second request will only begin after 10 seconds, resulting in a total duration of 20 seconds.
additional notes
If you need me to provide the complete config.pbtxt file, feel free to ask。
The text was updated successfully, but these errors were encountered: