-
Notifications
You must be signed in to change notification settings - Fork 253
Open
Description
Context
A Stack overflow error happens with payload containing about 1.3 million times the same character (eg: 'a') on my vLLM instance running a gpt-oss-120b model.
Stacktrace
vllm-1 | (APIServer pid=1) conversation, engine_prompts = self._make_request_with_harmony(request)
vllm-1 | (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-1 | (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_chat.py", line 1815, in _make_request_with_harmony
vllm-1 | (APIServer pid=1) prompt_token_ids = render_for_completion(messages)
vllm-1 | (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-1 | (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/parser/harmony_utils.py", line 485, in render_for_completion
vllm-1 | (APIServer pid=1) token_ids = get_encoding().render_conversation_for_completion(
vllm-1 | (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-1 | (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/openai_harmony/__init__.py", line 469, in render_conversation_for_completion
vllm-1 | (APIServer pid=1) return self._inner.render_conversation_for_completion(
vllm-1 | (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-1 | (APIServer pid=1) pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: RuntimeError(StackOverflow)
vllm-1 | (APIServer pid=1) INFO: xxxxx:xxxxx - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
Additional information
I hope this issue is in the right place and useful for you, otherwise feel free to close it 👍
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels