TensortRT-LLM compilation parameter overwrite #3489

CoolFish88 · 2024-10-01T11:40:38Z

Description

When deploying a Mistral Instruct 7B v.02 on a SageMaker endpoint (ml.g5.12xlarge) using the TensortRT-LLM backend (just-in-time compilation), I noticed that some of the serving parameters get overwritten.

Specifically, I used the following set of serving properties:
"SERVING_ENGINE": "MPI",
"OPTION_TENSOR_PARALLEL_DEGREE": "1",
"OPTION_MAX_ROLLING_BATCH_SIZE": "16",
"OPTION_ROLLING_BATCH":"trtllm",
"OPTION_MAX_INPUT_LEN":"2048",
"OPTION_MAX_OUTPUT_LEN":"16",
"OPTION_BATCH_SCHEDULER_POLICY": "max_utilization"

CouldWatch logs state the following:

max_input_len is 2048 is larger than max_seq_len 16, clipping it to max_seq_len
max_num_tokens (256) shouldn't be greater than max_seq_len * max_batch_size (256), specifying to max_seq_len * max_batch_size (256).

max_num_tokens is marked in the documentation as taking the default value 16384

Expected Behavior

Parameters to preserve their supplied value

Error Message

When submitting inference requests:
this model is compiled to take up to 16 tokens. But actual tokens is 987 > 16. Please set with option.max_input_len=987

How to Reproduce?

Recipe to reproduce the error presented in description

Environment Info

Docker image: 763104351884.dkr.ecr.us-west-2.amazonaws.com/djl-inference:0.29.0-tensorrtllm0.11.0-cu124

PASTE OUTPUT HERE

The text was updated successfully, but these errors were encountered:

CoolFish88 added the bug Something isn't working label Oct 1, 2024

CoolFish88 changed the title ~~TensortRT-LLM compilation parameter overwrite !!~~ TensortRT-LLM compilation parameter overwrite Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensortRT-LLM compilation parameter overwrite #3489

TensortRT-LLM compilation parameter overwrite #3489

CoolFish88 commented Oct 1, 2024

TensortRT-LLM compilation parameter overwrite #3489

TensortRT-LLM compilation parameter overwrite #3489

Comments

CoolFish88 commented Oct 1, 2024

Description

Expected Behavior

Error Message

How to Reproduce?

Environment Info