-
Notifications
You must be signed in to change notification settings - Fork 1.5k
fix(vlm): add max_tokens parameter to VLM completion calls to prevent vLLM rejection #689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -26,6 +26,10 @@ class VLMConfig(BaseModel): | |
|
|
||
| default_provider: Optional[str] = Field(default=None, description="Default provider name") | ||
|
|
||
| max_tokens: Optional[int] = Field( | ||
| default=4096, description="Maximum tokens for VLM completion output" | ||
| ) | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [Design] (blocking) The default of Suggestion: default to max_tokens: Optional[int] = Field(
default=None, description="Maximum tokens for VLM completion output"
)You could also call this out more prominently in the config example in the PR description or docs, so vLLM users know to set it. |
||
|
|
||
| thinking: bool = Field(default=False, description="Enable thinking mode for VolcEngine models") | ||
|
|
||
| max_concurrent: int = Field( | ||
|
|
@@ -134,6 +138,7 @@ def _build_vlm_config_dict(self) -> Dict[str, Any]: | |
| "max_retries": self.max_retries, | ||
| "provider": name, | ||
| "thinking": self.thinking, | ||
| "max_tokens": self.max_tokens, | ||
| } | ||
|
|
||
| if config: | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Suggestion] (non-blocking)
Using
if self.max_tokenstreats0the same asNone(both falsy). Whilemax_tokens=0is never a valid API value,if self.max_tokens is not Noneis semantically clearer and avoids any edge-case surprises. Same applies to all other backends.