Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensortRT-LLM compilation parameter overwrite #3489

Open
CoolFish88 opened this issue Oct 1, 2024 · 0 comments
Open

TensortRT-LLM compilation parameter overwrite #3489

CoolFish88 opened this issue Oct 1, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@CoolFish88
Copy link

Description

When deploying a Mistral Instruct 7B v.02 on a SageMaker endpoint (ml.g5.12xlarge) using the TensortRT-LLM backend (just-in-time compilation), I noticed that some of the serving parameters get overwritten.

Specifically, I used the following set of serving properties:
"SERVING_ENGINE": "MPI",
"OPTION_TENSOR_PARALLEL_DEGREE": "1",
"OPTION_MAX_ROLLING_BATCH_SIZE": "16",
"OPTION_ROLLING_BATCH":"trtllm",
"OPTION_MAX_INPUT_LEN":"2048",
"OPTION_MAX_OUTPUT_LEN":"16",
"OPTION_BATCH_SCHEDULER_POLICY": "max_utilization"

CouldWatch logs state the following:

  • max_input_len is 2048 is larger than max_seq_len 16, clipping it to max_seq_len
  • max_num_tokens (256) shouldn't be greater than max_seq_len * max_batch_size (256), specifying to max_seq_len * max_batch_size (256).

max_num_tokens is marked in the documentation as taking the default value 16384

Expected Behavior

Parameters to preserve their supplied value

Error Message

When submitting inference requests:
this model is compiled to take up to 16 tokens. But actual tokens is 987 > 16. Please set with option.max_input_len=987

How to Reproduce?

Recipe to reproduce the error presented in description

Environment Info

Docker image: 763104351884.dkr.ecr.us-west-2.amazonaws.com/djl-inference:0.29.0-tensorrtllm0.11.0-cu124

PASTE OUTPUT HERE
@CoolFish88 CoolFish88 added the bug Something isn't working label Oct 1, 2024
@CoolFish88 CoolFish88 changed the title TensortRT-LLM compilation parameter overwrite !! TensortRT-LLM compilation parameter overwrite Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant