-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Frontend] Warn if user max_model_len
is greater than derived max_model_len
#5911
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I am worried this warning will go unnoticed. I would prefer an exception. If we need an escape hatch here for some reason, then we should default to an exception and only override that behavior with an environment variable. |
@Yard1 in some ways the user would already be intentionally overriding the model default here via explicitly passing |
We currently have a similar issue with ignored warnings when users deploy the chat/completions api for models without a chat template I agree with Antoni here --- I think an env variable would make sense |
max_model_len
is greater than derived max_model_len
@fialhocoelho could you make this change? Check https://github.com/vllm-project/vllm/blob/main/vllm/envs.py to see how other env vars are handled. Perhaps we could call it something like |
Sure, @njhill . Thanks for the references. I'll start making the changes right away. |
Explanation of ChangesThis update addresses the handling of user-specified
|
52e0939
to
f34cde5
Compare
Convert to draft to test with latest upstream version. |
Tested with the latest upstream build image, and it works properly. Ready for review. |
6175a4a
to
db35186
Compare
Summary
Switch the error condition to a warning in cases where the user-specified
max_model_len
exceeds the derivedmax_model_len
from the model's configuration parameters. This adjustment acknowledges that users may need to set values higher than those defined in the model's configuration file to meet specific requirements.Motivation
Previously, an error was triggered when user
max_model_len
exceeded the derived value, potentially leading to unintended behavior or CUDA errors. By changing this to a warning, users are alerted without halting execution, allowing flexibility depending on their needs.Notes