-
Notifications
You must be signed in to change notification settings - Fork 2.3k
fix(chat): handle message creation in own thread #7303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,8 @@ | ||
| import asyncio | ||
| import datetime | ||
| import json | ||
| import os | ||
| from collections.abc import AsyncGenerator | ||
| from collections.abc import Generator | ||
| from datetime import timedelta | ||
| from uuid import UUID | ||
|
|
@@ -103,6 +105,7 @@ | |
| from onyx.utils.headers import get_custom_tool_additional_request_headers | ||
| from onyx.utils.logger import setup_logger | ||
| from onyx.utils.telemetry import mt_cloud_telemetry | ||
| from onyx.utils.threadpool_concurrency import run_in_background | ||
| from shared_configs.contextvars import get_current_tenant_id | ||
|
|
||
| logger = setup_logger() | ||
|
|
@@ -507,7 +510,7 @@ def stream_generator() -> Generator[str, None, None]: | |
|
|
||
|
|
||
| @router.post("/send-chat-message", response_model=None, tags=PUBLIC_API_TAGS) | ||
| def handle_send_chat_message( | ||
| async def handle_send_chat_message( | ||
| chat_message_req: SendMessageRequest, | ||
| request: Request, | ||
| user: User | None = Depends(current_chat_accessible_user), | ||
|
|
@@ -572,34 +575,63 @@ def handle_send_chat_message( | |
| # Note: LLM cost tracking is now handled in multi_llm.py | ||
| return result | ||
|
|
||
| # Streaming path, normal Onyx UI behavior | ||
| def stream_generator() -> Generator[str, None, None]: | ||
| # Use prod-cons pattern to continue processing even if request stops yielding | ||
| buffer: asyncio.Queue[str | None] = asyncio.Queue() | ||
| loop = asyncio.get_event_loop() | ||
|
||
|
|
||
| # Capture headers before spawning thread | ||
| litellm_headers = extract_headers(request.headers, LITELLM_PASS_THROUGH_HEADERS) | ||
| custom_tool_headers = get_custom_tool_additional_request_headers(request.headers) | ||
|
|
||
| def producer() -> None: | ||
| """ | ||
| Producer function that runs handle_stream_message_objects in a loop | ||
| and writes results to the buffer. | ||
| """ | ||
| state_container = ChatStateContainer() | ||
| try: | ||
| with get_session_with_current_tenant() as db_session: | ||
| for obj in handle_stream_message_objects( | ||
| new_msg_req=chat_message_req, | ||
| user=user, | ||
| db_session=db_session, | ||
| litellm_additional_headers=extract_headers( | ||
| request.headers, LITELLM_PASS_THROUGH_HEADERS | ||
| ), | ||
| custom_tool_additional_headers=get_custom_tool_additional_request_headers( | ||
| request.headers | ||
| ), | ||
| litellm_additional_headers=litellm_headers, | ||
| custom_tool_additional_headers=custom_tool_headers, | ||
| external_state_container=state_container, | ||
| ): | ||
Danelegend marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| yield get_json_line(obj.model_dump()) | ||
| logger.debug(f"Streaming object: {obj.model_dump()}") | ||
| # Thread-safe put into the asyncio queue | ||
| loop.call_soon_threadsafe( | ||
| buffer.put_nowait, get_json_line(obj.model_dump()) | ||
| ) | ||
|
Comment on lines
+604
to
+606
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the asyncio queue is full (when maxsize is set), While the queue is currently unbounded, if backpressure is added in the future, this needs proper error handling.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is currently no maxsize set, and for now we don't expect to put one |
||
| # Note: LLM cost tracking is now handled in multi_llm.py | ||
|
|
||
| except Exception as e: | ||
| logger.exception("Error in chat message streaming") | ||
| yield json.dumps({"error": str(e)}) | ||
|
|
||
| loop.call_soon_threadsafe(buffer.put_nowait, json.dumps({"error": str(e)})) | ||
| finally: | ||
| logger.debug("Stream generator finished") | ||
| # Signal end of stream | ||
| loop.call_soon_threadsafe(buffer.put_nowait, None) | ||
| logger.debug("Producer finished") | ||
|
|
||
| async def stream_from_buffer() -> AsyncGenerator[str, None]: | ||
| """ | ||
| Async generator that reads from the buffer and yields to the client. | ||
| """ | ||
| try: | ||
| while True: | ||
| # Await the next item from the buffer | ||
| item = await buffer.get() | ||
| if item is None: | ||
| # End of stream signal | ||
| break | ||
| yield item | ||
| finally: | ||
| logger.debug("Stream consumer finished") | ||
|
Comment on lines
616
to
630
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If Consider storing the thread reference and implementing cleanup logic to handle consumer failures, or add try-except around the consumer to signal the producer to stop.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't want to stop the producer. The producer should complete it's job. |
||
|
|
||
| return StreamingResponse(stream_generator(), media_type="text/event-stream") | ||
| # Start the producer in a background thread | ||
| run_in_background(producer) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The producer thread continues running even if the client disconnects, which is the intended fix for the refresh bug. However, if the consumer ( Consider tracking the consumer's state and providing a way for the producer to check if it should stop early, or use a cancellation token pattern.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should only stop if the user presses the stop button. There is another mechanism that is applied that takes care of this. |
||
|
|
||
| return StreamingResponse(stream_from_buffer(), media_type="text/event-stream") | ||
|
|
||
|
|
||
| @router.put("/set-message-as-latest") | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.