triton server multi request dynamic_batching not work #661

kazyun · 2024-12-13T07:45:06Z

System Info

GPU A800 80G *2
Container：nvcr.io/nvidia/tritonserver:24.11-trtllm-python-py3
Model：Qwen2.5-14B-Instruct

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

adding dynamic_batching in tensorrt_llm/config.txt
dynamic_batching {
preferred_batch_size: [ 32 ]
max_queue_delay_microseconds: 10000
default_queue_policy: { max_queue_size: 32 }
}
instance_group [
{
count: 1
kind : KIND_GPU
gpus: [ 0 ]
}
]
Simulate 10 concurrent requests.

Expected behavior

Expect these 10 requests to be processed simultaneously and return results.

actual behavior

If the model instance is limited to one, then during the simulation of concurrent requests, the requests will be processed sequentially, one after another. For example, if processing and generating the full text for one request takes 10 seconds, the second request will only begin after 10 seconds, resulting in a total duration of 20 seconds.

additional notes

If you need me to provide the complete config.pbtxt file, feel free to ask。

The text was updated successfully, but these errors were encountered:

jadhosn · 2024-12-17T16:34:14Z

Try increasing your max_queue_delay_microseconds to a larger value to give your calls a chance to arrive within the same time window. This will help you debug whether dynamic_batching is broken, or whether the parameters you chose are too tight.

Try for example max_queue_delay_microseconds = 10000000 (given that the queue delay is in microseconds), your queue would wait 10 seconds for an incoming requests.

kazyun added the bug Something isn't working label Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

triton server multi request dynamic_batching not work #661

triton server multi request dynamic_batching not work #661

kazyun commented Dec 13, 2024

jadhosn commented Dec 17, 2024

triton server multi request dynamic_batching not work #661

triton server multi request dynamic_batching not work #661

Comments

kazyun commented Dec 13, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

jadhosn commented Dec 17, 2024