LCORE-1262: OpenResponses endpoint #1301
LCORE-1262: OpenResponses endpoint #1301asimurka wants to merge 2 commits intolightspeed-core:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis pull request introduces a new Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant ResponsesHandler as Responses<br/>Handler
participant ConfigResolver as Config &<br/>Context Resolver
participant ShieldMod as Shield<br/>Moderation
participant RAGContext as RAG<br/>Context Builder
participant LlamaStack as Llama Stack<br/>API
participant Database as Database<br/>(Persistence)
Client->>ResponsesHandler: POST /v1/responses
ResponsesHandler->>ConfigResolver: validate config, resolve model
ConfigResolver->>ConfigResolver: check model config
ResponsesHandler->>ShieldMod: run_shield_moderation
ShieldMod->>ShieldMod: evaluate safety rules
alt Moderation Blocked
ShieldMod-->>ResponsesHandler: ShieldModerationResult (blocked)
ResponsesHandler->>Database: persist blocked turn
ResponsesHandler-->>Client: Response with error status
else Moderation Passed
ShieldMod-->>ResponsesHandler: ShieldModerationResult (passed)
ResponsesHandler->>RAGContext: build inline RAG context
RAGContext-->>ResponsesHandler: context text
ResponsesHandler->>LlamaStack: request response (streaming or non-streaming)
alt Streaming Path
loop Process Stream
LlamaStack-->>ResponsesHandler: response chunks
ResponsesHandler->>ResponsesHandler: track quotas, aggregate usage
ResponsesHandler-->>Client: SSE chunks
end
else Non-Streaming Path
LlamaStack-->>ResponsesHandler: full response
ResponsesHandler->>ResponsesHandler: build response summary
end
ResponsesHandler->>Database: persist conversation turn
ResponsesHandler-->>Client: ResponsesResponse (final)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 16
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
src/models/responses.py (1)
986-993:⚠️ Potential issue | 🟡 MinorExample includes
forkedfield not declared in model.The
ConversationResponseexample at line 989 includes"forked": False, but theConversationResponseclass (lines 936-993) doesn't declare aforkedfield. This will cause the example to be inconsistent with the actual model schema.Either add the field to the model or remove it from the example:
Option 1: Add field to model (if forked is part of the API)
class ConversationResponse(AbstractSuccessfulResponse): """Model representing a response for retrieving a conversation. Attributes: conversation_id: The conversation ID (UUID). chat_history: The chat history as a list of conversation turns. + forked: Whether this conversation was forked from another. """ conversation_id: str = Field( ..., description="Conversation ID (UUID)", examples=["c5260aec-4d82-4370-9fdf-05cf908b3f16"], ) chat_history: list[ConversationTurn] = Field( ..., description="The simplified chat history as a list of conversation turns", # ... examples ... ) + + forked: bool = Field( + False, + description="Whether this conversation was forked from another", + )Option 2: Remove from example (if not part of API)
"chat_history": [ { "messages": [ {"content": "Hello", "type": "user"}, {"content": "Hi there!", "type": "assistant"}, ], "tool_calls": [], "tool_results": [], "provider": "openai", "model": "gpt-4o-mini", "started_at": "2024-01-01T00:01:00Z", "completed_at": "2024-01-01T00:01:05Z", } ], - "forked": False, }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/models/responses.py` around lines 986 - 993, The example for ConversationResponse contains a "forked" field that is not declared on the ConversationResponse model; either add a boolean field named forked to the ConversationResponse class (type bool, sensible default like False, update schema/serialization and tests) if the API actually returns it, or remove "forked" from the example JSON if the API does not include it—locate the ConversationResponse class and the example block and apply the corresponding change so model and example stay consistent.tests/unit/app/endpoints/test_streaming_query.py (1)
334-366:⚠️ Potential issue | 🟡 MinorAssert that the moderation decision is passed into
build_rag_context.
build_rag_context()gained a new positional parameter, but these tests replace it with a permissiveAsyncMockand never verify the call arguments. A regression back to the old signature, or forwarding the wrong decision, would still pass here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/app/endpoints/test_streaming_query.py` around lines 334 - 366, The test replaces build_rag_context with a permissive AsyncMock but never verifies it received the moderation decision; update the test to capture and assert that build_rag_context was called with the moderation result returned by run_shield_moderation (e.g., the ShieldModerationPassed instance) as the new positional argument. Specifically, keep the AsyncMock replacement for app.endpoints.streaming_query.build_rag_context, await the endpoint invocation, then assert mock_build_rag_context.assert_awaited() / assert_called_with(...) (or inspect mock_build_rag_context.await_args) includes the moderation decision object returned by the patched run_shield_moderation so the test fails if the decision is not forwarded.tests/unit/app/test_routers.py (1)
111-133:⚠️ Potential issue | 🟡 MinorInclude
rags.routerin the presence assertions.The expected count was bumped to 21, but this test still never checks that
rags.routeris actually registered. A missing RAG router could be masked by a duplicate registration elsewhere and this test would still pass.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/app/test_routers.py` around lines 111 - 133, The test is missing an assertion for rags.router; update the presence checks in test_routers.py by adding an assertion that rags.router is in app.get_routers() (alongside the existing assertions like conversations_v2.router and responses.router) so the RAG router is explicitly verified.tests/unit/utils/test_vector_search.py (1)
452-500:⚠️ Potential issue | 🟡 MinorAdd a test for the new blocked-moderation fast path.
These updates only exercise the
"passed"path. The actual behavior change here is thatbuild_rag_context()should return an empty context and skip vector lookups when moderation is blocked, so this file should cover that branch explicitly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/utils/test_vector_search.py` around lines 452 - 500, Add a new async test in TestBuildRagContext that calls build_rag_context with moderation status "blocked" (instead of "passed") and asserts the returned context has empty context_text, rag_chunks, and referenced_documents; also verify the vector client was not used by asserting client_mock.vector_io.query (or client_mock.vector_io.query.assert_not_awaited()) was not awaited/called. Reference build_rag_context and client_mock.vector_io.query to locate the behavior under test.src/app/endpoints/query.py (1)
198-221:⚠️ Potential issue | 🟠 MajorDon’t generate topic summaries for refused turns.
retrieve_response()can now short-circuit with a blocked moderation result, but this handler still falls through toget_topic_summary()for new conversations. That creates a second model call for a turn that was intentionally refused, and can resend the blocked prompt text when the refusal was caused by the prompt itself.Possible fix
- if not user_conversation and query_request.generate_topic_summary: + if ( + moderation_result.decision == "passed" + and not user_conversation + and query_request.generate_topic_summary + ):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/query.py` around lines 198 - 221, The handler currently calls get_topic_summary for new conversations even when retrieve_response short-circuited due to moderation; update the logic to skip topic summary generation whenever moderation_result.decision is not "passed" (or when it indicates a refusal) by checking moderation_result before invoking get_topic_summary (use the existing moderation_result variable returned from retrieve_response); only call get_topic_summary when not user_conversation, query_request.generate_topic_summary is true, and moderation_result.decision == "passed".src/utils/shields.py (1)
137-164:⚠️ Potential issue | 🟠 MajorTranslate
models.list()failures the same way as the other shield API calls.
get_shields_for_request()now normalizesclient.shields.list()failures, butavailable_models = {model.id for model in await client.models.list()}can still raise straight out of this function. If Llama Stack is unavailable during model resolution, this path will bypass the structured 503/status handling you just added for moderation calls.Based on learnings: Applies to
src/**/*.py: HandleAPIConnectionErrorfrom Llama Stack in API operations.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/shields.py` around lines 137 - 164, The call to client.models.list() that builds available_models can raise APIConnectionError and currently bypasses your structured 503 handling; wrap the await client.models.list() call (used to compute available_models in the shields resolution path after get_shields_for_request) in a try/except that catches APIConnectionError and converts it to a ServiceUnavailableResponse (same pattern as the moderation create except block), then raise HTTPException(**response.model_dump()) from the caught exception; reference ServiceUnavailableResponse, client.models.list(), available_models, and get_shields_for_request when locating where to add the try/except.
🧹 Nitpick comments (6)
tests/unit/utils/test_query.py (1)
410-411: Tests updated correctly for new parameter.The test calls now include the required
response_idparameter. The comment at line 507 accurately reflects thatUserTurnnow includesresponse_id.Consider adding test cases for the moderation ID special case where
last_response_idshould not be updated whenresponse_idstarts with "modr".🧪 Suggested additional test case for moderation ID handling
def test_persist_does_not_update_last_response_id_for_moderation( self, mocker: MockerFixture ) -> None: """Test that moderation IDs don't update last_response_id.""" existing_conv = UserConversation( id="conv1", user_id="user1", last_used_model="old_model", last_used_provider="old_provider", message_count=5, last_response_id="resp_previous", ) mock_session = mocker.Mock() mock_conv_query = mocker.Mock() mock_conv_query.filter_by.return_value.first.return_value = existing_conv # ... setup similar to test_update_existing_conversation ... persist_user_conversation_details( user_id="user1", conversation_id="conv1", started_at="2024-01-01T00:00:00Z", completed_at="2024-01-01T00:00:05Z", model_id="new_model", provider_id="new_provider", topic_summary=None, response_id="modr_blocked_123", # Moderation ID ) # last_response_id should remain unchanged assert existing_conv.last_response_id == "resp_previous"Also applies to: 458-459, 502-507
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/utils/test_query.py` around lines 410 - 411, Add a unit test in tests/unit/utils/test_query.py that verifies persist_user_conversation_details does not update UserConversation.last_response_id when the provided response_id begins with the moderation prefix (e.g., "modr"); locate the test pattern used in test_update_existing_conversation to mock an existing UserConversation (id, user_id, last_response_id set to a sentinel), call persist_user_conversation_details with response_id="modr_..." (or "modr_blocked_123"), and assert the existing_conv.last_response_id remains the original value; reference the function persist_user_conversation_details, the UserConversation model, and the last_response_id field when adding this new test case.src/utils/query.py (1)
40-47: Consider consolidating imports from the same module.The
is_moderation_idimport (line 47) could be combined with the existingnormalize_conversation_idimport (line 40) from the same module.♻️ Consolidate imports
-from utils.suid import normalize_conversation_id +from utils.suid import is_moderation_id, normalize_conversation_id from utils.token_counter import TokenCounter from utils.transcripts import ( create_transcript, create_transcript_metadata, store_transcript, ) -from utils.suid import is_moderation_id from utils.types import TurnSummary🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/query.py` around lines 40 - 47, The imports from utils.suid are duplicated; consolidate them by importing both normalize_conversation_id and is_moderation_id in a single from utils.suid import ... statement (replace the separate normalize_conversation_id and is_moderation_id import lines with one combined import) to remove the redundant import and keep imports tidy.src/utils/suid.py (1)
108-113: Consider handling non-string input for consistency.The existing
check_suidfunction (lines 39-40) validates that input is a string before processing. This new function will raiseAttributeErrorif passedNone. While callers currently pass valid strings, adding a guard would be consistent with the module's defensive style.🛡️ Optional: Add type guard for consistency
def is_moderation_id(suid: str) -> bool: """Check if given string is a moderation ID. Returns True if the string starts with 'modr'. """ + if not isinstance(suid, str): + return False return suid.startswith("modr")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/suid.py` around lines 108 - 113, The is_moderation_id(suid: str) function can raise AttributeError on non-string inputs; update it to guard against non-strings (consistent with check_suid) by returning False when suid is not a str (or by calling the existing check_suid(suid) first) and only then checking startswith("modr"), so callers receiving None or other types won't crash.tests/e2e/features/responses.feature (1)
8-24: Assert the response contract, not just the status code.Both scenarios currently pass even if
/v1/responsesreturns the wrong payload shape, and the"stream": truecase also passes if the endpoint stops streaming but still replies with HTTP 200. Please add at least one body assertion for the non-streaming path and a streaming-specific assertion (headers and/or event chunks) for the streaming path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/features/responses.feature` around lines 8 - 24, The tests only assert HTTP 200; update the feature assertions so the non-streaming scenario validates the response body shape (e.g., presence of output text/choices or a top-level "output" or "generation" field) and the streaming scenario asserts streaming behavior (e.g., Transfer-Encoding/Content-Type headers for SSE/chunked and at least one event/chunk containing a partial generation). Modify the step implementations used by "When I use \"responses\" to ask question with authorization header" and the "Then The status code of the response is 200" verification to add a body contract check for the non-streaming case and a header + chunk/event assertion for the streaming case so the tests fail if the payload shape or streaming behavior changes.src/models/requests.py (1)
739-745: Use the shared moderation-ID helper here.Hardcoding the
"modr"prefix duplicates ID classification logic insideutils.suid. Calling the shared helper keeps this validator aligned if the moderation-ID format changes later.Suggested fix
`@field_validator`("previous_response_id") `@classmethod` def check_previous_response_id(cls, value: Optional[str]) -> Optional[str]: """Validate that previous_response_id does not start with 'modr'.""" - if value is not None and value.startswith("modr"): + if value is not None and suid.is_moderation_id(value): raise ValueError("You cannot provide context by moderation response.") return value🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/models/requests.py` around lines 739 - 745, Replace the hardcoded prefix check in the `@field_validator` for previous_response_id (check_previous_response_id) with the shared moderation-ID helper from utils.suid: call the helper (e.g., utils.suid.is_moderation_suid(value) or the equivalent function provided) instead of value.startswith("modr"), raise the same ValueError when the helper returns true, and add the necessary import for that helper at the top of the file.src/app/endpoints/responses.py (1)
624-624: Remove unused assignment.
completed_atassigned here is always overwritten at line 697, making this a dead assignment.🧹 Proposed fix
if moderation_result.decision == "blocked": - completed_at = datetime.now(UTC) output_text = moderation_result.message🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/responses.py` at line 624, Remove the dead assignment to completed_at in src/app/endpoints/responses.py: the early "completed_at = datetime.now(UTC)" is always overwritten later (the other completed_at assignment near where the response is finalized), so delete the initial assignment to avoid the unused variable; keep the later, correct completed_at assignment and any uses that rely on it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/responses.md`:
- Around line 522-524: The docs currently state that providing shield_ids as an
empty list returns 422, but the implementation treats an empty list as "run no
shields"; update the documentation for the `shield_ids` request parameter to
reflect that behavior (i.e., if omitted, all configured shields are used; if
provided and empty, no shields are run) and ensure `generate_topic_summary` and
`solr` descriptions remain accurate; reference the `shield_ids` parameter name
to locate the doc string to change.
- Line 90: The documentation is inconsistent: the request/response tables
declare the reasoning field as a string while the dedicated "reasoning" section
defines it as an object with an effort property; decide on a single canonical
shape and make all mentions match it — either change the tables' `reasoning`
type to `object` (with a short inline note like `{ effort: string }`) and update
examples/schemas to show `{ "reasoning": { "effort": "low|medium|high" } }`, or
simplify the dedicated section to document `reasoning` as a `string` and remove
the object/effort example; update all occurrences of `reasoning` (including the
detailed section and examples) so the symbol `reasoning` has the same type
everywhere.
In `@src/app/endpoints/query.py`:
- Around line 298-302: The current code logs the full response payload at error
level (see the response variable cast to OpenAIResponseObject and the call to
response.model_dump()), which misclassifies normal requests as errors; change
the log to logger.debug and log only minimal identifiers (e.g., response id,
status, or request id) instead of the entire model_dump, and remove or downgrade
the logger.error("Response: %s", response.model_dump()) call so only concise
diagnostic IDs are emitted at debug level.
In `@src/app/endpoints/responses.py`:
- Line 535: The comment contains a typo: replace "Explicitely append the turn to
conversation if context passed by previous response" with "Explicitly append the
turn to conversation if context passed by previous response" (update the comment
string wherever "Explicitely" appears, e.g., the comment near the logic that
appends a turn to the conversation/context in responses.py).
- Around line 558-566: The docstring Args block lists parameters that don't
match the function signature: remove the nonexistent parameter
moderation_result, replace the incorrect user_id entry with the actual auth
parameter, and ensure the Args list and short descriptions match the real
parameters (generator, turn_summary, client which is an AsyncLlamaStackClient,
auth, input_text, started_at, api_params which is a ResponsesApiParams); update
each description to correspond to those symbols so the docstring accurately
reflects the function signature.
- Around line 609-616: Update the Args section of the function docstring where
it currently lists api_params and user_id: replace api_params with request and
user_id with auth so the docstring matches the actual function signature;
specifically edit the docstring entries that mention "api_params" and "user_id"
in the responses endpoint (the Args block in the function around the
AsyncLlamaStackClient usage) to read "request" and "auth" respectively, and
ensure the descriptions still correspond to the request object and authenticated
auth parameter.
- Around line 269-271: The docstring in the function inside
src/app/endpoints/responses.py contains stale Args entries for
forked_from_response_id and prev_valid_response_id that are not present in the
function signature; update the function's docstring (remove those two parameters
or replace them with the actual parameters used by the function) so the Args
section matches the real signature (search for the function definition in
responses.py and edit its docstring accordingly).
- Line 655: Update the inline comment that currently reads "# Explicitely append
the turn to conversation if context passed by previous response" to correct the
typo so it reads "# Explicitly append the turn to conversation if context passed
by previous response"; locate and change this comment in the responses.py area
where the conversation turn is appended (search for the existing comment text
near the code that appends the turn to the conversation).
In `@src/app/endpoints/streaming_query.py`:
- Around line 303-317: The blocked-moderation branch returns a normal generator
(shield_violation_generator) so generate_response() treats it as a successful
completed stream and later calls get_topic_summary(), causing an extra model
call; change the behavior in the branch where context.moderation_result.decision
== "blocked" (and where shield_violation_generator is returned) to instead
signal a non-success stream to the caller—for example by raising or returning
the same error/exception path generate_response() uses for failures or by
setting the stream result flag that generate_response() checks—so that
generate_response() does not execute post-stream success logic (including
get_topic_summary()); update code paths around generate_response(),
shield_violation_generator, and any stream success flag or exception used to
mark failed/blocked streams accordingly.
- Around line 259-264: The merge of inline RAG docs happens too early: because
retrieve_response_generator() returns a lazy generator,
turn_summary.referenced_documents is still empty when you call
deduplicate_referenced_documents with inline_rag_context.referenced_documents,
so those inline docs get overwritten later by response_generator(); move the
merge to after the generator has been consumed (e.g., after response_generator()
completes or inside the response.completed handling) so you combine
inline_rag_context.referenced_documents with the final
turn_summary.referenced_documents and then call
deduplicate_referenced_documents; update the logic around
retrieve_response_generator(), response_generator(), and response.completed to
perform the merge at that later point.
In `@src/models/requests.py`:
- Around line 694-695: The ResponsesRequest Pydantic model currently sets
model_config = {"extra": "ignore"}, which silently drops unknown request fields;
change this to {"extra": "forbid"} on the ResponsesRequest class so Pydantic
returns a 422 for unexpected keys, update any tests or callers that relied on
ignored extras, and run the test suite to ensure no callers depend on the
previous ignore behavior.
In `@src/utils/conversations.py`:
- Around line 497-507: Wrap the Llama Stack pagination in
get_all_conversation_items with a try/except that catches APIConnectionError and
APIStatusError around both the initial await of paginator (first_page) and the
subsequent page.get_next_page() loop; when caught, log the error and convert it
to the project’s standard HTTP response (e.g., raise FastAPI HTTPException with
an appropriate 5xx status and a concise message including the original error),
and ensure the required exception classes are imported so raw client exceptions
are not leaked from paginator/first_page/page.get_next_page.
- Around line 473-481: The code in append_turn_items_to_conversation swallows
upstream APIStatusError and always returns a generic 500; instead, preserve the
upstream status and message: in the APIStatusError except block (alongside the
existing APIConnectionError handling), read the status code from the exception
(e.g., e.status_code or e.response.status_code) and the error text/details
(e.g., str(e) or e.response.text) and raise HTTPException(status_code=<upstream
status>, detail=<upstream detail>) from e rather than using
InternalServerErrorResponse.generic(); keep the APIConnectionError branch as-is.
In `@src/utils/endpoints.py`:
- Around line 305-325: Rename the misspelled function check_turn_existance to
check_turn_existence (and update its docstring) and update every call/reference
to that symbol (e.g., places that invoke check_turn_existance) to the new name;
also update the logger message string that contains "existance" to "existence"
and ensure any exported API or tests that reference the old name are adjusted
accordingly (search for check_turn_existance to find all locations to change).
In `@tests/unit/app/endpoints/test_responses.py`:
- Around line 543-555: Remove the test-side workaround that patches
ResponsesResponse.model_validate (the _validate_with_model helper and the
mocker.patch.object call) so the test exercises the real validation; instead
call ResponsesResponse.model_validate on the handler's produced payload and
assert the payload includes a real "model" value (and otherwise let validation
fail), ensuring the test fails until the handler itself sets the model field.
Keep references to ResponsesResponse and model_validate when updating the test
and delete the _validate_with_model/mocker.patch.object usage.
In `@tests/unit/app/endpoints/test_streaming_query.py`:
- Around line 864-877: The test does not verify that the blocked path actually
persists the refusal turn; after calling retrieve_response_generator ensure the
patched append_turn_items_to_conversation AsyncMock was awaited with the refusal
data from mock_context.moderation_result.refusal_response (and the expected
conversation/context identifiers) so the blocked-flow persistence is exercised.
Locate the AsyncMock patch for
app.endpoints.streaming_query.append_turn_items_to_conversation and add an
assertion like: append_turn_items_to_conversation.assert_awaited_once_with(...
refusal_response ..., mock_context or conversation id) referencing
retrieve_response_generator, mock_context.moderation_result.refusal_response,
and append_turn_items_to_conversation.
---
Outside diff comments:
In `@src/app/endpoints/query.py`:
- Around line 198-221: The handler currently calls get_topic_summary for new
conversations even when retrieve_response short-circuited due to moderation;
update the logic to skip topic summary generation whenever
moderation_result.decision is not "passed" (or when it indicates a refusal) by
checking moderation_result before invoking get_topic_summary (use the existing
moderation_result variable returned from retrieve_response); only call
get_topic_summary when not user_conversation,
query_request.generate_topic_summary is true, and moderation_result.decision ==
"passed".
In `@src/models/responses.py`:
- Around line 986-993: The example for ConversationResponse contains a "forked"
field that is not declared on the ConversationResponse model; either add a
boolean field named forked to the ConversationResponse class (type bool,
sensible default like False, update schema/serialization and tests) if the API
actually returns it, or remove "forked" from the example JSON if the API does
not include it—locate the ConversationResponse class and the example block and
apply the corresponding change so model and example stay consistent.
In `@src/utils/shields.py`:
- Around line 137-164: The call to client.models.list() that builds
available_models can raise APIConnectionError and currently bypasses your
structured 503 handling; wrap the await client.models.list() call (used to
compute available_models in the shields resolution path after
get_shields_for_request) in a try/except that catches APIConnectionError and
converts it to a ServiceUnavailableResponse (same pattern as the moderation
create except block), then raise HTTPException(**response.model_dump()) from the
caught exception; reference ServiceUnavailableResponse, client.models.list(),
available_models, and get_shields_for_request when locating where to add the
try/except.
In `@tests/unit/app/endpoints/test_streaming_query.py`:
- Around line 334-366: The test replaces build_rag_context with a permissive
AsyncMock but never verifies it received the moderation decision; update the
test to capture and assert that build_rag_context was called with the moderation
result returned by run_shield_moderation (e.g., the ShieldModerationPassed
instance) as the new positional argument. Specifically, keep the AsyncMock
replacement for app.endpoints.streaming_query.build_rag_context, await the
endpoint invocation, then assert mock_build_rag_context.assert_awaited() /
assert_called_with(...) (or inspect mock_build_rag_context.await_args) includes
the moderation decision object returned by the patched run_shield_moderation so
the test fails if the decision is not forwarded.
In `@tests/unit/app/test_routers.py`:
- Around line 111-133: The test is missing an assertion for rags.router; update
the presence checks in test_routers.py by adding an assertion that rags.router
is in app.get_routers() (alongside the existing assertions like
conversations_v2.router and responses.router) so the RAG router is explicitly
verified.
In `@tests/unit/utils/test_vector_search.py`:
- Around line 452-500: Add a new async test in TestBuildRagContext that calls
build_rag_context with moderation status "blocked" (instead of "passed") and
asserts the returned context has empty context_text, rag_chunks, and
referenced_documents; also verify the vector client was not used by asserting
client_mock.vector_io.query (or
client_mock.vector_io.query.assert_not_awaited()) was not awaited/called.
Reference build_rag_context and client_mock.vector_io.query to locate the
behavior under test.
---
Nitpick comments:
In `@src/app/endpoints/responses.py`:
- Line 624: Remove the dead assignment to completed_at in
src/app/endpoints/responses.py: the early "completed_at = datetime.now(UTC)" is
always overwritten later (the other completed_at assignment near where the
response is finalized), so delete the initial assignment to avoid the unused
variable; keep the later, correct completed_at assignment and any uses that rely
on it.
In `@src/models/requests.py`:
- Around line 739-745: Replace the hardcoded prefix check in the
`@field_validator` for previous_response_id (check_previous_response_id) with the
shared moderation-ID helper from utils.suid: call the helper (e.g.,
utils.suid.is_moderation_suid(value) or the equivalent function provided)
instead of value.startswith("modr"), raise the same ValueError when the helper
returns true, and add the necessary import for that helper at the top of the
file.
In `@src/utils/query.py`:
- Around line 40-47: The imports from utils.suid are duplicated; consolidate
them by importing both normalize_conversation_id and is_moderation_id in a
single from utils.suid import ... statement (replace the separate
normalize_conversation_id and is_moderation_id import lines with one combined
import) to remove the redundant import and keep imports tidy.
In `@src/utils/suid.py`:
- Around line 108-113: The is_moderation_id(suid: str) function can raise
AttributeError on non-string inputs; update it to guard against non-strings
(consistent with check_suid) by returning False when suid is not a str (or by
calling the existing check_suid(suid) first) and only then checking
startswith("modr"), so callers receiving None or other types won't crash.
In `@tests/e2e/features/responses.feature`:
- Around line 8-24: The tests only assert HTTP 200; update the feature
assertions so the non-streaming scenario validates the response body shape
(e.g., presence of output text/choices or a top-level "output" or "generation"
field) and the streaming scenario asserts streaming behavior (e.g.,
Transfer-Encoding/Content-Type headers for SSE/chunked and at least one
event/chunk containing a partial generation). Modify the step implementations
used by "When I use \"responses\" to ask question with authorization header" and
the "Then The status code of the response is 200" verification to add a body
contract check for the non-streaming case and a header + chunk/event assertion
for the streaming case so the tests fail if the payload shape or streaming
behavior changes.
In `@tests/unit/utils/test_query.py`:
- Around line 410-411: Add a unit test in tests/unit/utils/test_query.py that
verifies persist_user_conversation_details does not update
UserConversation.last_response_id when the provided response_id begins with the
moderation prefix (e.g., "modr"); locate the test pattern used in
test_update_existing_conversation to mock an existing UserConversation (id,
user_id, last_response_id set to a sentinel), call
persist_user_conversation_details with response_id="modr_..." (or
"modr_blocked_123"), and assert the existing_conv.last_response_id remains the
original value; reference the function persist_user_conversation_details, the
UserConversation model, and the last_response_id field when adding this new test
case.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d4e2a78b-c87f-4692-8aea-4b8ba8f7d4fd
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (35)
docs/openapi.jsondocs/responses.mdpyproject.tomlsrc/app/endpoints/conversations_v1.pysrc/app/endpoints/query.pysrc/app/endpoints/responses.pysrc/app/endpoints/streaming_query.pysrc/app/routers.pysrc/constants.pysrc/models/context.pysrc/models/database/conversations.pysrc/models/requests.pysrc/models/responses.pysrc/utils/conversations.pysrc/utils/endpoints.pysrc/utils/query.pysrc/utils/responses.pysrc/utils/shields.pysrc/utils/suid.pysrc/utils/types.pysrc/utils/vector_search.pytests/e2e/features/info.featuretests/e2e/features/responses.featuretests/e2e/test_list.txttests/integration/test_openapi_json.pytests/unit/app/endpoints/test_conversations.pytests/unit/app/endpoints/test_query.pytests/unit/app/endpoints/test_responses.pytests/unit/app/endpoints/test_streaming_query.pytests/unit/app/test_routers.pytests/unit/utils/test_conversations.pytests/unit/utils/test_query.pytests/unit/utils/test_responses.pytests/unit/utils/test_shields.pytests/unit/utils/test_vector_search.py
63592ab to
f553d06
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (5)
src/utils/conversations.py (1)
479-481:⚠️ Potential issue | 🟠 MajorPreserve upstream
APIStatusErrordetails here.Both helpers collapse every Llama Stack status failure into a generic 500. That turns upstream 404/401/403 cases into misleading internal errors and makes callers/debugging substantially harder. Please forward the upstream status/message instead of always returning
InternalServerErrorResponse.generic().Also applies to: 515-517
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/conversations.py` around lines 479 - 481, In the except APIStatusError blocks (the handlers that now call InternalServerErrorResponse.generic()), preserve and forward the upstream status and message from the caught exception variable (e) into the HTTPException instead of always returning a generic 500; replace the fixed InternalServerErrorResponse.generic() usage with a response that uses e.status_code (or e.status and/or e.status_code property available on APIStatusError) and e.message (or str(e)) as the body/description, and apply the same change to both occurrences around the catch at the APIStatusError handling near the code shown and the duplicated block at lines 515-517 so callers receive the original upstream HTTP status and message.docs/responses.md (1)
106-106:⚠️ Potential issue | 🟡 MinorSpell out the
shield_ids=[]contract.Both sections describe the omitted case, but they still leave
shield_ids: []ambiguous. Clients need an explicit note that an empty list disables shields rather than inheriting the configured defaults.Also applies to: 523-523
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/responses.md` at line 106, Update the `shield_ids` documentation entry to explicitly state the difference between omitting the field and passing an empty array: clarify that omitting `shield_ids` will apply all configured shields (inherit defaults) while providing `shield_ids: []` disables all shields, and apply this exact clarification to both occurrences of the `shield_ids` table row so the contract is unambiguous; reference the `shield_ids` symbol in the docs and use wording like "omit to inherit all configured shields; explicitly set to [] to disable shields" for consistency.src/app/endpoints/streaming_query.py (2)
254-258:⚠️ Potential issue | 🟠 MajorInline RAG chunks are still dropped on streaming requests.
This only threads
inline_rag_context.referenced_documentsinto the generator. The BYOK/Solr chunks never make it intoturn_summary.rag_chunks, so persisted streaming metadata diverges from the non-streaming path whenever inline RAG is used.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/streaming_query.py` around lines 254 - 258, The streaming path only passes inline_rag_context.referenced_documents into retrieve_response_generator so BYOK/Solr RAG chunks never make it into the returned turn_summary.rag_chunks; update the streaming flow so all RAG chunks are included: either pass a combined list (inline_rag_context.referenced_documents plus the BYOK/Solr chunk lists) into retrieve_response_generator (where generator, turn_summary are returned) or, immediately after the call to retrieve_response_generator, merge the BYOK/Solr chunks into turn_summary.rag_chunks. Refer to the symbols retrieve_response_generator, inline_rag_context.referenced_documents, and turn_summary.rag_chunks when making the change.
300-316:⚠️ Potential issue | 🟠 MajorBlocked moderation still looks like a successful stream.
shield_violation_generator()finishes normally, sogenerate_response()setsstream_completed=Trueand runs the success-only tail afterwards. For new conversations that includesget_topic_summary(), which means blocked prompts can still trigger an extra model call.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/streaming_query.py` around lines 300 - 316, The blocked-moderation path currently returns shield_violation_generator which completes normally, causing generate_response to treat the stream as successful; change the flow so completion is treated as an aborted/failed stream: implement a dedicated exception (e.g., ModerationBlocked) and update shield_violation_generator to raise that exception after emitting the refusal payload (or otherwise yield then raise) so generate_response sees a non-success termination; reference shield_violation_generator, ModerationBlocked (new), and generate_response to locate where to adjust behavior and where the exception will be handled to skip success-only tail like get_topic_summary().tests/unit/app/endpoints/test_streaming_query.py (1)
867-873:⚠️ Potential issue | 🟡 MinorAssert the blocked-turn append was actually awaited.
The test patches
append_turn_items_to_conversation()but never checks the awaited call or therefusal_responsepayload. Capture the AsyncMock and useassert_awaited_once_with(...), otherwise a regression that drops the persistence step or passes the wrong refusal item still passes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/app/endpoints/test_streaming_query.py` around lines 867 - 873, The test currently patches append_turn_items_to_conversation with an AsyncMock but never asserts it was awaited or that the refusal payload was passed; change the patch to capture the AsyncMock (e.g., mock_append = mocker.AsyncMock() and patch append_turn_items_to_conversation to new=mock_append) and after invoking the code assert the mock was awaited exactly once with the expected conversation identifier/context and the moderation refusal payload (use mock_moderation_result.refusal_response and the test's conversation/context object) via mock_append.assert_awaited_once_with(...).
🧹 Nitpick comments (1)
src/utils/endpoints.py (1)
245-253: Don’t return the parentUserConversationfor a forked conversation.This branch creates a brand-new conversation ID, but the returned
ResponsesConversationContextstill carries the source conversation row. That violates the context model’s “Nonefor new ones” contract and leaves downstream consumers guessing whetheruser_conversationrefers to the new thread or the parent. ReturnNonehere, or add a separatesource_conversationfield if the parent row is still needed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/endpoints.py` around lines 245 - 253, The branch that forks a conversation returns a new ID from create_new_conversation but still sets user_conversation in the ResponsesConversationContext, violating the contract that new conversations must have user_conversation = None; update the return to set user_conversation to None (and keep generate_topic_summary logic using want_topic_summary) or alternatively add a new field source_conversation to ResponsesConversationContext and populate that with the original user_conversation while setting user_conversation to None; modify the instantiation of ResponsesConversationContext (the return block) accordingly so downstream code can unambiguously tell this is a new forked conversation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/app/endpoints/query.py`:
- Around line 198-212: The post-response processing currently only gates the
inline-RAG merge on moderation_result.decision == "passed" but lets blocked
turns continue into the topic-summary/success-only tail (causing extra model
calls on refused content); change the handler so that all post-response work
after await retrieve_response(...) — including merging
inline_rag_context.rag_chunks and turn_summary.rag_chunks, combining
referenced_documents via deduplicate_referenced_documents, and any subsequent
topic-summary logic that uses turn_summary — is executed only when
moderation_result.decision == "passed" (i.e., wrap the entire post-response
block that touches inline_rag_context, turn_summary, tool_rag_chunks,
tool_rag_documents, and downstream topic-summary functions inside the same
if-check).
In `@src/models/requests.py`:
- Around line 739-745: The validator check_previous_response_id currently
rejects any previous_response_id starting with "modr", which breaks follow-ups
when responses are emitted with id=moderation_result.moderation_id (the code
that sets response.id to moderation_result.moderation_id). Fix this by removing
or relaxing the startswith("modr") rejection in check_previous_response_id so
"modr"-prefixed IDs are accepted, and ensure downstream conversation-state
resolution logic (where previous_response_id is consumed) maps moderation IDs to
the real conversation entries; alternatively, stop emitting
moderation_result.moderation_id as the public response.id and assign a different
stable response id instead (adjust the response construction that sets
response.id accordingly).
In `@src/utils/types.py`:
- Around line 277-278: The current unconditional result.pop("conversation",
None) removes the conversation for forked requests; update
resolve_response_context() so it only drops "conversation" when the
previous_response_id is a linear/tip continuation. Specifically, determine
whether the previous_response_id refers to the tip (e.g., use or expose the
resolved context flag that indicates a tip/linear continuation from
resolve_response_context or check a property like previous_response_is_tip) and
only call result.pop("conversation", None) in that case so forks keep the
conversation and downstream responses.create() can attach to the new
conversation.
---
Duplicate comments:
In `@docs/responses.md`:
- Line 106: Update the `shield_ids` documentation entry to explicitly state the
difference between omitting the field and passing an empty array: clarify that
omitting `shield_ids` will apply all configured shields (inherit defaults) while
providing `shield_ids: []` disables all shields, and apply this exact
clarification to both occurrences of the `shield_ids` table row so the contract
is unambiguous; reference the `shield_ids` symbol in the docs and use wording
like "omit to inherit all configured shields; explicitly set to [] to disable
shields" for consistency.
In `@src/app/endpoints/streaming_query.py`:
- Around line 254-258: The streaming path only passes
inline_rag_context.referenced_documents into retrieve_response_generator so
BYOK/Solr RAG chunks never make it into the returned turn_summary.rag_chunks;
update the streaming flow so all RAG chunks are included: either pass a combined
list (inline_rag_context.referenced_documents plus the BYOK/Solr chunk lists)
into retrieve_response_generator (where generator, turn_summary are returned)
or, immediately after the call to retrieve_response_generator, merge the
BYOK/Solr chunks into turn_summary.rag_chunks. Refer to the symbols
retrieve_response_generator, inline_rag_context.referenced_documents, and
turn_summary.rag_chunks when making the change.
- Around line 300-316: The blocked-moderation path currently returns
shield_violation_generator which completes normally, causing generate_response
to treat the stream as successful; change the flow so completion is treated as
an aborted/failed stream: implement a dedicated exception (e.g.,
ModerationBlocked) and update shield_violation_generator to raise that exception
after emitting the refusal payload (or otherwise yield then raise) so
generate_response sees a non-success termination; reference
shield_violation_generator, ModerationBlocked (new), and generate_response to
locate where to adjust behavior and where the exception will be handled to skip
success-only tail like get_topic_summary().
In `@src/utils/conversations.py`:
- Around line 479-481: In the except APIStatusError blocks (the handlers that
now call InternalServerErrorResponse.generic()), preserve and forward the
upstream status and message from the caught exception variable (e) into the
HTTPException instead of always returning a generic 500; replace the fixed
InternalServerErrorResponse.generic() usage with a response that uses
e.status_code (or e.status and/or e.status_code property available on
APIStatusError) and e.message (or str(e)) as the body/description, and apply the
same change to both occurrences around the catch at the APIStatusError handling
near the code shown and the duplicated block at lines 515-517 so callers receive
the original upstream HTTP status and message.
In `@tests/unit/app/endpoints/test_streaming_query.py`:
- Around line 867-873: The test currently patches
append_turn_items_to_conversation with an AsyncMock but never asserts it was
awaited or that the refusal payload was passed; change the patch to capture the
AsyncMock (e.g., mock_append = mocker.AsyncMock() and patch
append_turn_items_to_conversation to new=mock_append) and after invoking the
code assert the mock was awaited exactly once with the expected conversation
identifier/context and the moderation refusal payload (use
mock_moderation_result.refusal_response and the test's conversation/context
object) via mock_append.assert_awaited_once_with(...).
---
Nitpick comments:
In `@src/utils/endpoints.py`:
- Around line 245-253: The branch that forks a conversation returns a new ID
from create_new_conversation but still sets user_conversation in the
ResponsesConversationContext, violating the contract that new conversations must
have user_conversation = None; update the return to set user_conversation to
None (and keep generate_topic_summary logic using want_topic_summary) or
alternatively add a new field source_conversation to
ResponsesConversationContext and populate that with the original
user_conversation while setting user_conversation to None; modify the
instantiation of ResponsesConversationContext (the return block) accordingly so
downstream code can unambiguously tell this is a new forked conversation.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ff8e977c-7ba5-44b0-b4ce-7e95796adfec
📒 Files selected for processing (13)
docs/openapi.jsondocs/responses.mdsrc/app/endpoints/query.pysrc/app/endpoints/responses.pysrc/app/endpoints/streaming_query.pysrc/models/requests.pysrc/models/responses.pysrc/utils/conversations.pysrc/utils/endpoints.pysrc/utils/types.pytests/unit/app/endpoints/test_conversations.pytests/unit/app/endpoints/test_responses.pytests/unit/app/endpoints/test_streaming_query.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/app/endpoints/responses.py
f553d06 to
aa6c416
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/utils/shields.py (1)
137-143:⚠️ Potential issue | 🟠 MajorReturn early when no shields are selected.
get_shields_for_request()can legitimately return[], butclient.models.list()still runs unconditionally. That means a request can still fail on model discovery even when moderation is effectively disabled, and any exception frommodels.list()bypasses the local error mapping.💡 Minimal guard
shields_to_run = await get_shields_for_request(client, shield_ids) + if not shields_to_run: + return ShieldModerationPassed() + available_models = {model.id for model in await client.models.list()}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/shields.py` around lines 137 - 143, If get_shields_for_request(...) returns an empty list, avoid calling client.models.list() unnecessarily; add an early return/check after shields_to_run = await get_shields_for_request(client, shield_ids) (or skip the model discovery) so that when shields_to_run is empty you do not compute available_models or run client.models.list(), and only compute available_models and perform the provider_resource_id checks for shields when shields_to_run contains items (referencing shields_to_run, available_models, client.models.list, shield.provider_id and shield.provider_resource_id).
♻️ Duplicate comments (6)
src/utils/conversations.py (2)
468-481:⚠️ Potential issue | 🟠 MajorPreserve upstream
APIStatusErrordetails when appending items.
conversations.items.create()can fail with non-500 statuses, and flattening every status error toInternalServerErrorResponse.generic()gives callers the wrong failure mode for missing or forbidden conversations. Bubble the upstream status/detail instead of hard-coding 500 here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/conversations.py` around lines 468 - 481, The current except block for APIStatusError in the conversations.items.create call masks upstream status and details by returning InternalServerErrorResponse.generic(); instead extract the actual status and message from the APIStatusError (e.g., use e.status_code or e.response.status_code and e.response.text / str(e) as available) and raise HTTPException with the preserved status_code and detail (and any relevant headers) so callers receive the original upstream error (keep the existing APIConnectionError handling unchanged).
497-517:⚠️ Potential issue | 🟠 MajorDon't translate paginated conversation read failures to a generic 500.
Both
items.list()andpage.get_next_page()can raiseAPIStatusError. Mapping all of them toInternalServerErrorResponse.generic()means/conversations/{id}returns 500 even when Llama Stack is explicitly saying "not found" or "forbidden."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/conversations.py` around lines 497 - 517, The current except APIStatusError block always returns InternalServerErrorResponse.generic(), which hides Llama Stack errors like 404/403 from client.conversations.items.list and page.get_next_page; modify the APIStatusError handler to inspect the exception (APIStatusError) for the upstream status/code and message (e.g., e.status or e.response.status_code and e.message or e.response.text), and raise HTTPException with that status_code and a payload containing the upstream error details (falling back to InternalServerErrorResponse.generic() only if no status can be determined); keep the existing APIConnectionError handling and reuse ServiceUnavailableResponse where appropriate.src/models/requests.py (1)
739-745:⚠️ Potential issue | 🟠 MajorDon't reject moderation response IDs that
/responsesalready exposes.Blocked
/responsesreplies are built withid=moderation_result.moderation_idinsrc/app/endpoints/responses.py, so this validator makes the normalprevious_response_id=response.idfollow-up fail immediately after a moderated turn. If this special case stays, route it through the shared moderation-ID helper instead of another hard-coded prefix check.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/models/requests.py` around lines 739 - 745, The validator check_previous_response_id currently rejects any previous_response_id that startswith("modr") which blocks legitimate IDs created by src/app/endpoints/responses.py (moderation_result.moderation_id); remove the hard-coded prefix check and instead call the shared moderation-ID helper used by the responses endpoint (e.g., the project’s is_moderation_id/normalize_moderation_id helper) to determine if the value is a moderation ID and only reject when the shared helper indicates it’s an invalid/forbidden moderation ID; update check_previous_response_id to rely on that helper so responses-created moderation IDs are allowed.src/app/endpoints/streaming_query.py (1)
300-316:⚠️ Potential issue | 🟠 MajorBlocked streams still fall into the success epilogue.
Returning
shield_violation_generator()here makesgenerate_response()treat the stream as completed successfully, so blocked turns still run the post-stream success path (get_topic_summary(), quota/end events, and persistence) instead of short-circuiting as a refused request.tests/unit/app/endpoints/test_streaming_query.py (1)
870-877:⚠️ Potential issue | 🟡 MinorAssert the blocked-path persistence call.
This test patches
append_turn_items_to_conversation()but never verifies that it was awaited with the refusal item, so the blocked-flow persistence wiring can regress without failing the suite.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/app/endpoints/test_streaming_query.py` around lines 870 - 877, The test patches append_turn_items_to_conversation but never asserts it was awaited with the refusal item; after calling retrieve_response_generator and receiving turn_summary, construct the expected refusal item from turn_summary (e.g., the refusal text/role or the item that represents the refusal in turn_summary) and add an assertion that the AsyncMock was awaited with the conversation and that refusal item, e.g. append_turn_items_to_conversation.assert_awaited_once_with(mock_context["conversation"], [expected_refusal_item]); this ensures the blocked-path persistence wiring (append_turn_items_to_conversation + retrieve_response_generator) is exercised.src/app/endpoints/query.py (1)
198-221:⚠️ Potential issue | 🟠 MajorBlocked turns still fall through to topic summary generation.
When
moderation_result.decision == "blocked", the RAG merge is correctly skipped, but the topic summary generation at lines 214-221 is not guarded. This causes an unnecessary LLM call (get_topic_summary) on content that was just blocked by moderation.Consider wrapping the topic summary block with the same moderation check:
Possible fix
# Get topic summary for new conversation - if not user_conversation and query_request.generate_topic_summary: + if ( + moderation_result.decision == "passed" + and not user_conversation + and query_request.generate_topic_summary + ): logger.debug("Generating topic summary for new conversation") topic_summary = await get_topic_summary( query_request.query, client, responses_params.model ) else: topic_summary = None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/query.py` around lines 198 - 221, The topic-summary call is executed even when moderation_result.decision == "blocked"; wrap the existing topic summary generation (the block that checks user_conversation and query_request.generate_topic_summary and calls get_topic_summary) with the same moderation check used for RAG merging (i.e., ensure moderation_result.decision == "passed" before calling get_topic_summary), so that get_topic_summary(query_request.query, client, responses_params.model) only runs for passed content and set topic_summary = None otherwise.
🧹 Nitpick comments (1)
src/utils/endpoints.py (1)
229-235: Collapse theprevious_response_idexistence check into a single query.
check_turn_existence()andretrieve_turn_by_response_id()hit the same row back-to-back. That adds an extra query and leaves a race window between the check and the read. Callretrieve_turn_by_response_id()directly and let its 404 path handle the missing case.♻️ Suggested simplification
- if not check_turn_existence(previous_response_id): - error_response = NotFoundResponse( - resource="response", resource_id=previous_response_id - ) - raise HTTPException(**error_response.model_dump()) prev_user_turn = retrieve_turn_by_response_id(previous_response_id)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/endpoints.py` around lines 229 - 235, Collapse the two database calls into one: remove the pre-check using check_turn_existence() and call retrieve_turn_by_response_id(previous_response_id) directly; if retrieve_turn_by_response_id returns None or raises when not found, convert that into the same NotFoundResponse/HTTPException you previously built (resource="response", resource_id=previous_response_id) so a single query both fetches the turn and handles the 404 path without the extra race window.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/responses.md`:
- Line 72: TOC links pointing to old anchors are broken after renaming the
"Inherited LLS OpenAPI Attributes" heading; update the table-of-contents entries
(and any other links) to match the new Markdown anchor (e.g.,
inherited-lls-openapi-attributes) so they point to the renamed heading text, and
repeat the same fix for the other occurrence mentioned (the second "Inherited
LLS OpenAPI Attributes" link).
In `@src/app/endpoints/responses.py`:
- Around line 576-579: The code currently calls get_topic_summary() whenever
generate_topic_summary is true even for turns that were blocked by moderation;
change the logic so topic summary generation is skipped if the moderation result
indicates the turn was refused/blocked. Concretely, guard the
get_topic_summary() calls (the ones at the generate_topic_summary sites) with an
additional condition that the moderation result is not blocked/refused (e.g.,
check the existing moderation flag/variable such as shield_blocked,
moderation_result.refused, or similar) so you only call
get_topic_summary(input_text, client, api_params.model) when
generate_topic_summary is true AND the moderation check passed; apply this
change to both occurrences referenced in the diff.
- Around line 225-247: The handler builds inline_rag_context via
build_rag_context but only forwards referenced_documents; modify the call that
selects and invokes handle_streaming_response / handle_non_streaming_response to
also pass inline_rag_context.context_text (e.g., as a parameter named
static_rag_context or inline_rag_context_text) so downstream code (and
ultimately client.responses.create) receives the RAG context text; update the
parameter list in handle_streaming_response and handle_non_streaming_response
(and their usages) to accept and forward that context_text into
client.responses.create().
In `@src/utils/shields.py`:
- Around line 255-256: The code currently treats an explicit empty override
(shield_ids == []) as "run none" by returning [], which bypasses moderation;
instead, detect an explicit empty list and reject it by raising an HTTP 422
(Unprocessable Entity) error consistent with QueryRequest.shield_ids. Locate the
check that inspects shield_ids (the `if shield_ids == []:` branch) and replace
the return with raising an appropriate 422 error (include a clear detail message
referencing shield_ids) so callers receive validation failure rather than an
empty-shields bypass.
---
Outside diff comments:
In `@src/utils/shields.py`:
- Around line 137-143: If get_shields_for_request(...) returns an empty list,
avoid calling client.models.list() unnecessarily; add an early return/check
after shields_to_run = await get_shields_for_request(client, shield_ids) (or
skip the model discovery) so that when shields_to_run is empty you do not
compute available_models or run client.models.list(), and only compute
available_models and perform the provider_resource_id checks for shields when
shields_to_run contains items (referencing shields_to_run, available_models,
client.models.list, shield.provider_id and shield.provider_resource_id).
---
Duplicate comments:
In `@src/app/endpoints/query.py`:
- Around line 198-221: The topic-summary call is executed even when
moderation_result.decision == "blocked"; wrap the existing topic summary
generation (the block that checks user_conversation and
query_request.generate_topic_summary and calls get_topic_summary) with the same
moderation check used for RAG merging (i.e., ensure moderation_result.decision
== "passed" before calling get_topic_summary), so that
get_topic_summary(query_request.query, client, responses_params.model) only runs
for passed content and set topic_summary = None otherwise.
In `@src/models/requests.py`:
- Around line 739-745: The validator check_previous_response_id currently
rejects any previous_response_id that startswith("modr") which blocks legitimate
IDs created by src/app/endpoints/responses.py (moderation_result.moderation_id);
remove the hard-coded prefix check and instead call the shared moderation-ID
helper used by the responses endpoint (e.g., the project’s
is_moderation_id/normalize_moderation_id helper) to determine if the value is a
moderation ID and only reject when the shared helper indicates it’s an
invalid/forbidden moderation ID; update check_previous_response_id to rely on
that helper so responses-created moderation IDs are allowed.
In `@src/utils/conversations.py`:
- Around line 468-481: The current except block for APIStatusError in the
conversations.items.create call masks upstream status and details by returning
InternalServerErrorResponse.generic(); instead extract the actual status and
message from the APIStatusError (e.g., use e.status_code or
e.response.status_code and e.response.text / str(e) as available) and raise
HTTPException with the preserved status_code and detail (and any relevant
headers) so callers receive the original upstream error (keep the existing
APIConnectionError handling unchanged).
- Around line 497-517: The current except APIStatusError block always returns
InternalServerErrorResponse.generic(), which hides Llama Stack errors like
404/403 from client.conversations.items.list and page.get_next_page; modify the
APIStatusError handler to inspect the exception (APIStatusError) for the
upstream status/code and message (e.g., e.status or e.response.status_code and
e.message or e.response.text), and raise HTTPException with that status_code and
a payload containing the upstream error details (falling back to
InternalServerErrorResponse.generic() only if no status can be determined); keep
the existing APIConnectionError handling and reuse ServiceUnavailableResponse
where appropriate.
In `@tests/unit/app/endpoints/test_streaming_query.py`:
- Around line 870-877: The test patches append_turn_items_to_conversation but
never asserts it was awaited with the refusal item; after calling
retrieve_response_generator and receiving turn_summary, construct the expected
refusal item from turn_summary (e.g., the refusal text/role or the item that
represents the refusal in turn_summary) and add an assertion that the AsyncMock
was awaited with the conversation and that refusal item, e.g.
append_turn_items_to_conversation.assert_awaited_once_with(mock_context["conversation"],
[expected_refusal_item]); this ensures the blocked-path persistence wiring
(append_turn_items_to_conversation + retrieve_response_generator) is exercised.
---
Nitpick comments:
In `@src/utils/endpoints.py`:
- Around line 229-235: Collapse the two database calls into one: remove the
pre-check using check_turn_existence() and call
retrieve_turn_by_response_id(previous_response_id) directly; if
retrieve_turn_by_response_id returns None or raises when not found, convert that
into the same NotFoundResponse/HTTPException you previously built
(resource="response", resource_id=previous_response_id) so a single query both
fetches the turn and handles the 404 path without the extra race window.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 99307210-19f4-4090-a63b-be8ed66f6f33
📒 Files selected for processing (14)
docs/openapi.jsondocs/responses.mdsrc/app/endpoints/query.pysrc/app/endpoints/responses.pysrc/app/endpoints/streaming_query.pysrc/models/requests.pysrc/models/responses.pysrc/utils/conversations.pysrc/utils/endpoints.pysrc/utils/shields.pysrc/utils/types.pytests/unit/app/endpoints/test_conversations.pytests/unit/app/endpoints/test_responses.pytests/unit/app/endpoints/test_streaming_query.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/utils/types.py
| return "" | ||
|
|
||
| message_item = cast(ResponseMessage, response_item) | ||
| if message_item.role == "user": |
There was a problem hiding this comment.
The function can be used for both input and output items.
1f4c764 to
95e12bc
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (9)
docs/responses.md (1)
12-12:⚠️ Potential issue | 🟡 MinorFix the TOC anchor to match the renamed heading.
The TOC entry at line 12 links to
#inherited-lls-openapi-fields, but the heading at line 72 is now "Inherited LLS OpenAPI Attributes". Markdown will generate the anchor#inherited-lls-openapi-attributes, so this link is broken.🔗 Proposed fix
- * [Inherited LLS OpenAPI Fields](`#inherited-lls-openapi-fields`) + * [Inherited LLS OpenAPI Attributes](`#inherited-lls-openapi-attributes`)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/responses.md` at line 12, TOC link target is out of sync with the renamed heading: update the TOC entry that currently links to "#inherited-lls-openapi-fields" to the correct anchor "#inherited-lls-openapi-attributes" (or revert the heading to "Inherited LLS OpenAPI Fields") so the link matches the heading "Inherited LLS OpenAPI Attributes"; locate the bullet "* [Inherited LLS OpenAPI Fields](`#inherited-lls-openapi-fields`)" and change the anchor to "#inherited-lls-openapi-attributes".src/app/endpoints/query.py (1)
212-217:⚠️ Potential issue | 🟠 MajorSkip topic-summary generation for blocked turns.
When moderation returns a refusal, this block still makes a second model call for the new-conversation topic summary. Guard it with
moderation_result.decision == "passed"so blocked prompts never re-enter the LLM path.Suggested fix
- if not user_conversation and query_request.generate_topic_summary: + if ( + moderation_result.decision == "passed" + and not user_conversation + and query_request.generate_topic_summary + ): logger.debug("Generating topic summary for new conversation") topic_summary = await get_topic_summary( query_request.query, client, responses_params.model )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/query.py` around lines 212 - 217, The topic-summary generation currently runs for new conversations unconditionally; change the guard so that you only call get_topic_summary(query_request.query, client, responses_params.model) when this is a new conversation (not user_conversation), query_request.generate_topic_summary is true, AND the moderation result indicates the prompt passed (i.e., moderation_result.decision == "passed"); update the if condition that surrounds get_topic_summary to include the moderation_result.decision check so blocked/refused turns never trigger the second LLM call.src/utils/shields.py (1)
255-256:⚠️ Potential issue | 🟠 MajorReject
shield_ids=[]instead of disabling moderation.Returning
[]here turns an explicit empty override into a full moderation bypass. This should raise a 422 so callers cannot opt out of shields by sending an empty list.Suggested fix
- if shield_ids == []: - return [] + if shield_ids == []: + response = UnprocessableEntityResponse( + response="Invalid shield_ids", + cause=( + "shield_ids must be omitted or contain at least one shield identifier." + ), + ) + raise HTTPException(**response.model_dump())🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/shields.py` around lines 255 - 256, The code currently treats an explicit empty override list as a signal to disable moderation by returning [] when shield_ids == []; change this to reject an explicit empty list with a 422 error instead. In the function that validates/handles the shield_ids parameter (the block containing the check "if shield_ids == []:"), replace the return [] with raising an HTTP 422 (e.g., raise HTTPException(status_code=422, detail="shield_ids cannot be an empty list") or the project's equivalent) so callers cannot opt out of shields by sending an empty list; keep None or missing values allowed if the rest of the code expects them.src/app/endpoints/streaming_query.py (2)
254-258:⚠️ Potential issue | 🟠 MajorCarry inline RAG chunks through the streaming path too.
Only
inline_rag_docsare threaded through here. Unlike the non-streaming query path,turn_summary.rag_chunksnever receivesinline_rag_context.rag_chunks, so stored streaming results lose all pre-query RAG chunks.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/streaming_query.py` around lines 254 - 258, The streaming path calls retrieve_response_generator but only threads inline_rag_context.referenced_documents as inline_rag_docs, so turn_summary.rag_chunks never gets inline_rag_context.rag_chunks and stored streaming results lose pre-query RAG chunks; fix by ensuring the streaming call carries the rag_chunks through—either pass inline_rag_context.rag_chunks into retrieve_response_generator (add an inline_rag_chunks/rag_chunks param) or after the call assign turn_summary.rag_chunks = inline_rag_context.rag_chunks so turn_summary contains the same rag chunks as the non-streaming path (update the call site around retrieve_response_generator and the handling of turn_summary to include inline_rag_context.rag_chunks).
300-315:⚠️ Potential issue | 🟠 MajorBlocked streams still go through the success tail.
This branch returns a normal generator, so
generate_response()marks the stream as completed and still runs success-only post-processing, including topic summary generation for new conversations. Return a blocked/failure signal alongside the generator so the success path is skipped.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/streaming_query.py` around lines 300 - 315, The blocked moderation branch currently returns only the generator and turn_summary (shield_violation_generator, turn_summary) so generate_response treats it as success; change this return to include an explicit failure signal (e.g., return (shield_violation_generator(...), turn_summary, False) or a Failure enum) and update generate_response to check that third value and skip success-only post-processing (topic summary, etc.) when it indicates failure; adjust callers to destructure the new triple where necessary.src/utils/types.py (1)
258-279:⚠️ Potential issue | 🟠 MajorSeparate
previous_response_idhandling from the MCP tool rewrite.Because this branch sits behind the early returns above it, linear continuations without tools keep both
conversationandprevious_response_id. And when tools are present, forked continuations still lose the freshconversationunconditionally. Please move the conversation stripping out of the tool rehydration block and gate it only on the resolved tip/linear case.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/types.py` around lines 258 - 279, The conversation popping logic currently sits inside the tool rehydration block in model_dump; move it out and only strip "conversation" for linear continuations (i.e., when there are no tools). Concretely, remove the if self.previous_response_id: result.pop("conversation", None) from inside the loop/after tool-rewrite, and instead add a guarded pop like if self.previous_response_id and not self.tools: result.pop("conversation", None) placed after you compute result and dumped_tools but before returning early for tool handling or at the end of the function so it executes only for the resolved tip/linear case; keep the existing tool rehydration (tools/dumped_tools loop) unchanged.src/app/endpoints/responses.py (2)
692-695:⚠️ Potential issue | 🟡 MinorTopic summary generation should be skipped for shield-blocked responses.
Similar to the streaming path, when
moderation_result.decision == "blocked", the code still generates a topic summary ifrequest.generate_topic_summaryis True. This makes an unnecessary LLM call for refused content.🛡️ Proposed fix
# Get topic summary for new conversation topic_summary = None - if request.generate_topic_summary: + if request.generate_topic_summary and moderation_result.decision != "blocked": logger.debug("Generating topic summary for new conversation") topic_summary = await get_topic_summary(input_text, client, api_params.model)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/responses.py` around lines 692 - 695, When moderation_result.decision == "blocked" we should skip generating a topic summary to avoid an unnecessary LLM call; update the logic around request.generate_topic_summary and the call to get_topic_summary so you only call get_topic_summary(input_text, client, api_params.model) when request.generate_topic_summary is true AND moderation_result.decision != "blocked", and ensure topic_summary stays None for blocked responses (the variable topic_summary and the logging call remain but the await get_topic_summary must be gated by the moderation_result check).
284-296:⚠️ Potential issue | 🟡 MinorVerify topic summary generation is skipped for shield-blocked requests.
When moderation blocks,
turn_summary.llm_responseis set to the refusal message at line 286, butgenerate_topic_summaryis still passed togenerate_response(line 341). This could generate a topic summary for blocked content, which wastes an LLM call and may produce inappropriate summaries.Consider passing
generate_topic_summary=Falsewhen moderation is blocked, or adding a guard ingenerate_response.🛡️ Proposed fix
+ is_blocked = moderation_result.decision == "blocked" return StreamingResponse( generate_response( generator=generator, turn_summary=turn_summary, client=client, auth=auth, input_text=input_text, started_at=started_at, api_params=api_params, - generate_topic_summary=request.generate_topic_summary or False, + generate_topic_summary=(request.generate_topic_summary or False) and not is_blocked, ), media_type="text/event-stream", )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/responses.py` around lines 284 - 296, The topic-summary generator should be disabled for shield-blocked requests: detect when moderation_result.decision == "blocked" (where turn_summary.llm_response is set and shield_violation_generator is created) and ensure generate_response is invoked with generate_topic_summary=False (or add a guard inside generate_response to skip summary generation when moderation_result indicates a block); update the call site that currently passes through generate_topic_summary to respect this blocked state so no topic-summary LLM call is made for shielded requests.tests/unit/app/endpoints/test_streaming_query.py (1)
864-880:⚠️ Potential issue | 🟡 MinorTest does not verify blocked path persistence.
The test patches
append_turn_items_to_conversationbut never asserts it was called. This means a regression in the blocked-flow persistence would go undetected.💚 Proposed fix to verify persistence
+ append_mock = mocker.patch( + "app.endpoints.streaming_query.append_turn_items_to_conversation", + new=mocker.AsyncMock(), + ) - mocker.patch( - "app.endpoints.streaming_query.append_turn_items_to_conversation", - new=mocker.AsyncMock(), - ) _generator, turn_summary = await retrieve_response_generator( mock_responses_params, mock_context, [] ) assert isinstance(turn_summary, TurnSummary) assert turn_summary.llm_response == "Content blocked" + + # Verify persistence was called with refusal response + append_mock.assert_awaited_once()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/app/endpoints/test_streaming_query.py` around lines 864 - 880, The test currently patches append_turn_items_to_conversation but never asserts it was invoked; after calling retrieve_response_generator with mock_context (whose moderation_result.decision == "blocked"), add an assertion that the patched append_turn_items_to_conversation AsyncMock was awaited (e.g., assert_called_once or assert_awaited_with) and verify it was passed the expected elements such as mock_context (or its conversation) and mock_context.moderation_result.refusal_response so the blocked-flow persistence is actually tested alongside the existing TurnSummary and llm_response assertions.
🧹 Nitpick comments (2)
docs/responses.md (2)
106-106: Document the behavior whenshield_idsis provided as an empty list.The documentation states what happens when
shield_idsis omitted (all configured shields are used), but it doesn't specify what happens when an empty list is explicitly provided. Based on the implementation context, an empty list means "run no shields".📝 Suggested clarification
-| `shield_ids` | array[string] | Shield IDs to apply. If omitted, all configured shields in LCORE are used | No | +| `shield_ids` | array[string] | Shield IDs to apply. If omitted, all configured shields in LCORE are used. If provided as an empty list, no shields are run | No |Also update line 523 in the LCORE-Specific Extensions section under Behavioral Differences to match.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/responses.md` at line 106, Update the docs to explicitly state that when the `shield_ids` parameter is provided as an empty array it means "run no shields" (distinct from omitting it which runs all configured shields), and mirror that clarification in the LCORE-Specific Extensions -> Behavioral Differences section so both places consistently document the empty-list behavior for `shield_ids`.
388-388: Clarify theidfield description.The description "Unique response ID or moderation ID" is ambiguous. Does the field contain a response ID in all cases, or does it vary based on whether the request was moderated? Consider clarifying whether this is always a response ID or if the type changes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/responses.md` at line 388, Update the `id` field description to clarify when it contains a response ID versus a moderation ID: replace "Unique response ID or moderation ID" with a precise statement such as "Unique identifier for the returned item: the response ID for normal responses, or the moderation ID when the request was moderated (string)"; ensure the `id` entry in docs/responses.md explicitly states that the value varies based on moderation and that the type is always string.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/utils/vector_search.py`:
- Around line 639-672: The function append_inline_rag_context_to_input currently
mutates the passed input_value (e.g., message.content and part.text); instead
create and return a fresh ResponseInput: if input_value is a str, return a new
concatenated string; if it is a sequence, deep-copy the list and items (e.g.,
via copy.deepcopy or constructing new ResponseItem/ResponseMessage objects),
find the first user message (inspect item.type and item.role), modify the copy's
message.content or the input_text part.text on the copy, and return the copied
structure so the original input_value is not mutated.
---
Duplicate comments:
In `@docs/responses.md`:
- Line 12: TOC link target is out of sync with the renamed heading: update the
TOC entry that currently links to "#inherited-lls-openapi-fields" to the correct
anchor "#inherited-lls-openapi-attributes" (or revert the heading to "Inherited
LLS OpenAPI Fields") so the link matches the heading "Inherited LLS OpenAPI
Attributes"; locate the bullet "* [Inherited LLS OpenAPI
Fields](`#inherited-lls-openapi-fields`)" and change the anchor to
"#inherited-lls-openapi-attributes".
In `@src/app/endpoints/query.py`:
- Around line 212-217: The topic-summary generation currently runs for new
conversations unconditionally; change the guard so that you only call
get_topic_summary(query_request.query, client, responses_params.model) when this
is a new conversation (not user_conversation),
query_request.generate_topic_summary is true, AND the moderation result
indicates the prompt passed (i.e., moderation_result.decision == "passed");
update the if condition that surrounds get_topic_summary to include the
moderation_result.decision check so blocked/refused turns never trigger the
second LLM call.
In `@src/app/endpoints/responses.py`:
- Around line 692-695: When moderation_result.decision == "blocked" we should
skip generating a topic summary to avoid an unnecessary LLM call; update the
logic around request.generate_topic_summary and the call to get_topic_summary so
you only call get_topic_summary(input_text, client, api_params.model) when
request.generate_topic_summary is true AND moderation_result.decision !=
"blocked", and ensure topic_summary stays None for blocked responses (the
variable topic_summary and the logging call remain but the await
get_topic_summary must be gated by the moderation_result check).
- Around line 284-296: The topic-summary generator should be disabled for
shield-blocked requests: detect when moderation_result.decision == "blocked"
(where turn_summary.llm_response is set and shield_violation_generator is
created) and ensure generate_response is invoked with
generate_topic_summary=False (or add a guard inside generate_response to skip
summary generation when moderation_result indicates a block); update the call
site that currently passes through generate_topic_summary to respect this
blocked state so no topic-summary LLM call is made for shielded requests.
In `@src/app/endpoints/streaming_query.py`:
- Around line 254-258: The streaming path calls retrieve_response_generator but
only threads inline_rag_context.referenced_documents as inline_rag_docs, so
turn_summary.rag_chunks never gets inline_rag_context.rag_chunks and stored
streaming results lose pre-query RAG chunks; fix by ensuring the streaming call
carries the rag_chunks through—either pass inline_rag_context.rag_chunks into
retrieve_response_generator (add an inline_rag_chunks/rag_chunks param) or after
the call assign turn_summary.rag_chunks = inline_rag_context.rag_chunks so
turn_summary contains the same rag chunks as the non-streaming path (update the
call site around retrieve_response_generator and the handling of turn_summary to
include inline_rag_context.rag_chunks).
- Around line 300-315: The blocked moderation branch currently returns only the
generator and turn_summary (shield_violation_generator, turn_summary) so
generate_response treats it as success; change this return to include an
explicit failure signal (e.g., return (shield_violation_generator(...),
turn_summary, False) or a Failure enum) and update generate_response to check
that third value and skip success-only post-processing (topic summary, etc.)
when it indicates failure; adjust callers to destructure the new triple where
necessary.
In `@src/utils/shields.py`:
- Around line 255-256: The code currently treats an explicit empty override list
as a signal to disable moderation by returning [] when shield_ids == []; change
this to reject an explicit empty list with a 422 error instead. In the function
that validates/handles the shield_ids parameter (the block containing the check
"if shield_ids == []:"), replace the return [] with raising an HTTP 422 (e.g.,
raise HTTPException(status_code=422, detail="shield_ids cannot be an empty
list") or the project's equivalent) so callers cannot opt out of shields by
sending an empty list; keep None or missing values allowed if the rest of the
code expects them.
In `@src/utils/types.py`:
- Around line 258-279: The conversation popping logic currently sits inside the
tool rehydration block in model_dump; move it out and only strip "conversation"
for linear continuations (i.e., when there are no tools). Concretely, remove the
if self.previous_response_id: result.pop("conversation", None) from inside the
loop/after tool-rewrite, and instead add a guarded pop like if
self.previous_response_id and not self.tools: result.pop("conversation", None)
placed after you compute result and dumped_tools but before returning early for
tool handling or at the end of the function so it executes only for the resolved
tip/linear case; keep the existing tool rehydration (tools/dumped_tools loop)
unchanged.
In `@tests/unit/app/endpoints/test_streaming_query.py`:
- Around line 864-880: The test currently patches
append_turn_items_to_conversation but never asserts it was invoked; after
calling retrieve_response_generator with mock_context (whose
moderation_result.decision == "blocked"), add an assertion that the patched
append_turn_items_to_conversation AsyncMock was awaited (e.g.,
assert_called_once or assert_awaited_with) and verify it was passed the expected
elements such as mock_context (or its conversation) and
mock_context.moderation_result.refusal_response so the blocked-flow persistence
is actually tested alongside the existing TurnSummary and llm_response
assertions.
---
Nitpick comments:
In `@docs/responses.md`:
- Line 106: Update the docs to explicitly state that when the `shield_ids`
parameter is provided as an empty array it means "run no shields" (distinct from
omitting it which runs all configured shields), and mirror that clarification in
the LCORE-Specific Extensions -> Behavioral Differences section so both places
consistently document the empty-list behavior for `shield_ids`.
- Line 388: Update the `id` field description to clarify when it contains a
response ID versus a moderation ID: replace "Unique response ID or moderation
ID" with a precise statement such as "Unique identifier for the returned item:
the response ID for normal responses, or the moderation ID when the request was
moderated (string)"; ensure the `id` entry in docs/responses.md explicitly
states that the value varies based on moderation and that the type is always
string.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 92ec9903-9469-4377-947c-37c91c6bc0fd
📒 Files selected for processing (16)
docs/openapi.jsondocs/responses.mdsrc/app/endpoints/query.pysrc/app/endpoints/responses.pysrc/app/endpoints/streaming_query.pysrc/models/requests.pysrc/models/responses.pysrc/utils/conversations.pysrc/utils/endpoints.pysrc/utils/responses.pysrc/utils/shields.pysrc/utils/types.pysrc/utils/vector_search.pytests/unit/app/endpoints/test_conversations.pytests/unit/app/endpoints/test_responses.pytests/unit/app/endpoints/test_streaming_query.py
🚧 Files skipped from review as they are similar to previous changes (4)
- tests/unit/app/endpoints/test_conversations.py
- src/utils/conversations.py
- tests/unit/app/endpoints/test_responses.py
- src/models/requests.py
95e12bc to
bb1c1b7
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (9)
src/utils/vector_search.py (1)
639-672:⚠️ Potential issue | 🟠 MajorReturn a fresh
ResponseInputinstead of mutating the caller's models.This function directly mutates
message.content(line 666) andpart.text(line 670) on objects within the passedinput_valuelist. As per coding guidelines: "Avoid in-place parameter modification anti-patterns; return new data structures instead."♻️ Proposed fix using deep copy
+import copy + def append_inline_rag_context_to_input( input_value: ResponseInput, inline_rag_context_text: str, ) -> ResponseInput: """Append inline RAG context to Responses API input. If input is str, appends the context text. If input is a sequence of items, appends the context to the text of the first user message. If there is no user message, returns the input unchanged. Parameters: input_value: The request input (string or list of ResponseItem). inline_rag_context_text: RAG context string to inject. Returns: - The same type as input_value, with context merged in. + A new ResponseInput with context merged in (original is not mutated). """ if not inline_rag_context_text: return input_value if isinstance(input_value, str): return input_value + "\n\n" + inline_rag_context_text + + # Deep copy to avoid mutating the original + input_copy = copy.deepcopy(input_value) - for item in input_value: + for item in input_copy: if item.type != "message" or item.role != "user": continue message = cast(ResponseMessage, item) content = message.content if isinstance(content, str): message.content = content + "\n\n" + inline_rag_context_text - return input_value + return input_copy for part in content: if part.type == "input_text": part.text = part.text + "\n\n" + inline_rag_context_text - return input_value - return input_value + return input_copy + return input_copy🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/vector_search.py` around lines 639 - 672, The function append_inline_rag_context_to_input mutates the input_value in-place by modifying message.content and part.text; instead, create and return a new ResponseInput without mutating the caller's objects: detect whether input_value is a str and return a new string with the context appended, or if it's a sequence, produce a shallow copy of the list and replace the first matching ResponseItem (type=="message" and role=="user") with a cloned ResponseMessage whose content is also cloned (copy the string or clone the content parts) and with the inline_rag_context_text appended to the cloned content (modify the cloned part.text, not the original); ensure append_inline_rag_context_to_input returns this new list (or new string) and leaves the original input_value, ResponseMessage, and parts unchanged.src/utils/shields.py (1)
255-256:⚠️ Potential issue | 🟠 MajorReject
shield_ids=[]instead of treating it as "run none".Returning an empty list when
shield_ids == []creates a moderation bypass. An explicit empty override should be rejected with HTTP 422, consistent withQueryRequest.shield_idsvalidation.🛡️ Proposed fix
- if shield_ids == []: - return [] + if shield_ids == []: + response = UnprocessableEntityResponse( + response="Invalid shield_ids", + cause="shield_ids must be omitted or contain at least one shield identifier.", + ) + raise HTTPException(**response.model_dump())You'll also need to add
UnprocessableEntityResponseto the imports frommodels.responses.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/shields.py` around lines 255 - 256, The function currently treats an explicit shield_ids == [] as "run none" by returning an empty list; instead detect this case and raise an UnprocessableEntityResponse (HTTP 422) so explicit empty overrides are rejected; update the logic in the function that references shield_ids to replace the early return with raising UnprocessableEntityResponse (include a clear message like "shield_ids may not be an empty list"), and add UnprocessableEntityResponse to the imports from models.responses so the exception is available.src/utils/endpoints.py (1)
319-321:⚠️ Potential issue | 🟡 MinorFix remaining typo: "existance" → "existence" in log message.
The function was renamed correctly, but the log message still contains the typo.
✏️ Proposed fix
logger.exception( - "Database error while checking turn existance for response_id %s", + "Database error while checking turn existence for response_id %s", response_id, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/endpoints.py` around lines 319 - 321, Update the log message passed to logger.exception in the database check for turn existence (the logger.exception call that currently uses "Database error while checking turn existance for response_id %s") to correct the typo "existance" to "existence" so the message reads "Database error while checking turn existence for response_id %s"; keep the existing arguments (response_id) and call site intact.tests/unit/app/endpoints/test_streaming_query.py (1)
867-877:⚠️ Potential issue | 🟡 MinorAssert the blocked-turn append payload here.
This still passes if
append_turn_items_to_conversation()is awaited with the wrong conversation or refusal item. Please switch this toassert_awaited_once_with(...)and verify it usesmock_responses_params.conversationplusmock_moderation_result.refusal_response.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/app/endpoints/test_streaming_query.py` around lines 867 - 877, The test currently doesn't assert the payload passed to app.endpoints.streaming_query.append_turn_items_to_conversation; change the mock assertion to use append_turn_items_to_conversation.assert_awaited_once_with(...) and verify the first arg is mock_responses_params.conversation and the second arg is mock_moderation_result.refusal_response (use the same mock names as in the test where retrieve_response_generator is awaited) so the blocked-turn append is validated explicitly.src/app/endpoints/query.py (1)
198-219:⚠️ Potential issue | 🟠 MajorBlocked requests still fall into the topic-summary path.
Only the inline-RAG merge is gated. For a new conversation with
generate_topic_summary=True, a blocked moderation result still reachesget_topic_summary()below and triggers another model call on refused content. The success-only post-processing needs to stay entirely behind thedecision == "passed"branch.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/query.py` around lines 198 - 219, The topic-summary generation is executed even when moderation_result.decision != "passed"; move the entire new-conversation/topic-summary path into the success branch so it only runs on passed content: inside the block guarded by moderation_result.decision == "passed" (where inline_rag_context, turn_summary, rag_chunks, and referenced_documents are handled), also perform the check for not user_conversation and query_request.generate_topic_summary and call get_topic_summary(query_request.query, client, responses_params.model) there; otherwise set topic_summary = None outside that branch. Ensure you reference the same symbols (moderation_result.decision, inline_rag_context, turn_summary, get_topic_summary, query_request.generate_topic_summary) so the logic remains fully gated by the moderation pass.docs/responses.md (2)
22-27:⚠️ Potential issue | 🟡 MinorThe TOC still points at the old anchor.
The section title is now
## Behavioral Differences, so the#known-limitations-and-behavioral-differenceslink no longer resolves.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/responses.md` around lines 22 - 27, The TOC entry still links to the old anchor '#known-limitations-and-behavioral-differences' but the actual section header was changed to '## Behavioral Differences'; update the TOC link to match the new header (replace '#known-limitations-and-behavioral-differences' with '#behavioral-differences') or rename the header back to the original; adjust the list item that references "Known Limitations and Behavioral Differences" so its anchor matches the 'Behavioral Differences' header.
105-107:⚠️ Potential issue | 🟡 MinorDocument the empty
shield_idscase explicitly.This explains what happens when
shield_idsis omitted, but not when callers send[]. Please spell out that behavior here as well, otherwise clients cannot distinguish “disable shields” from “invalid request”.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/responses.md` around lines 105 - 107, Update the `shield_ids` documentation to explicitly describe both cases: when the field is omitted (current behavior: apply all configured shields in LCORE) and when an empty array `[]` is provided (explicitly state whether that disables all shields or is treated as an invalid request), and include a short example or note showing the exact client payload for each behavior so integrators can distinguish "omit" vs "empty array" cases for the `shield_ids` parameter.src/app/endpoints/streaming_query.py (1)
300-316:⚠️ Potential issue | 🟠 MajorA blocked stream is still being reported as a successful one.
This branch returns a normal generator, so
generate_response()later treats the stream as completed and runs the success-only tail, including topic-summary generation for new conversations. Please mark blocked moderation as a non-success path instead of returning a generator that looks completed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/streaming_query.py` around lines 300 - 316, The branch handling context.moderation_result.decision == "blocked" currently returns shield_violation_generator which looks like a successful stream; instead raise a specific non-success signal (e.g., raise ModerationBlockedError(message=context.moderation_result.message, moderation_id=context.moderation_result.moderation_id)) or return a clearly non-success payload that generate_response recognizes, so the success-only tail (topic-summary generation) is not executed. Update the blocked branch (where turn_summary is set and append_turn_items_to_conversation is called) to raise or return that failure indicator and ensure generate_response is updated/checked to treat ModerationBlockedError (or the chosen failure payload) as a non-success path rather than a completed generator.src/app/endpoints/responses.py (1)
341-342:⚠️ Potential issue | 🟠 MajorStill skip topic-summary generation for blocked turns.
Both flows can still call
get_topic_summary()after moderation has already blocked the prompt. That adds another model call over refused content and can persist a summary for a turn that never produced a valid response.Also applies to: 586-589, 693-695
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/responses.py` around lines 341 - 342, Guard topic-summary generation behind the moderation result so we never call get_topic_summary or persist summaries for prompts that were blocked/refused: before passing generate_topic_summary or invoking get_topic_summary in the response-building paths (e.g., where generate_topic_summary=request.generate_topic_summary or where get_topic_summary(...) is called), add a check against the moderation outcome (e.g., only proceed if moderation_result is not blocked/refused or if is_blocked/is_refused is False). Apply this same guard in all three locations referenced (around generate_topic_summary assignment and the calls at the other two spots) so topic summaries are skipped when moderation has blocked the turn.
🧹 Nitpick comments (1)
src/app/endpoints/responses.py (1)
377-386: Prefer validated construction for the synthetic blocked responses.
ResponsesResponsenow has several required echoed fields, andmodel_construct()will silently emit an incomplete SSE payload ifechoed_params()ever drifts. Using normal validation here keeps the blocked streaming path in sync with the public schema.Suggested change
- created_response_object = ResponsesResponse.model_construct( + created_response_object = ResponsesResponse( id=moderation_result.moderation_id, created_at=int(created_at.timestamp()), status="in_progress", output=[], conversation=normalized_conv_id, available_quotas={}, output_text="", **echoed_params, ) @@ - completed_response_object = ResponsesResponse.model_construct( + completed_response_object = ResponsesResponse( id=moderation_result.moderation_id, created_at=int(created_at.timestamp()), completed_at=int(datetime.now(UTC).timestamp()), status="completed", output=[moderation_result.refusal_response], usage=get_zero_usage(), conversation=normalized_conv_id, available_quotas=available_quotas, output_text=moderation_result.message, **echoed_params, )Also applies to: 417-428
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/responses.py` around lines 377 - 386, The synthetic "blocked" response is being built with ResponsesResponse.model_construct(...), which bypasses Pydantic validation and can produce incomplete SSE payloads if echoed_params drifts; replace the model_construct usage with normal validated construction by calling ResponsesResponse(...) so required echoed fields are enforced, update both occurrences (the block around creation with id=moderation_result.moderation_id, created_at=..., status="in_progress", output=[], conversation=normalized_conv_id, available_quotas={}, output_text="", **echoed_params and the similar block at the later occurrence) and ensure any missing required keys in echoed_params are provided or handled so the constructor doesn't raise unexpected exceptions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/app/endpoints/query.py`:
- Around line 285-291: The blocked-moderation path currently calls
append_turn_items_to_conversation(...) synchronously and a failure there can
return a 5xx instead of the moderation refusal; make this append best-effort by
wrapping the call in a try/except (or equivalent error handling) around the
append_turn_items_to_conversation invocation that references
responses_params.conversation, responses_params.input and
moderation_result.refusal_response, log any exception with context (including
conversation id and that the append failed) and then continue to return the
moderation refusal regardless of append outcome.
In `@src/app/endpoints/responses.py`:
- Around line 98-123: The 200 entry in responses_response only advertises JSON;
add a streaming SSE variant for stream=true by expanding the 200 response to
include a text/event-stream content type alongside application/json (either by
updating ResponsesResponse.openapi_response() to return both media types or
replacing the 200 mapping with an explicit dict that includes content:
{"application/json": {...}, "text/event-stream": {"schema": {"type":"string"},
"description":"SSE stream of events"}}) so the router.post("/responses",
responses=responses_response) OpenAPI map documents the SSE response.
In `@src/utils/responses.py`:
- Around line 1217-1241: The function extract_attachments_text has a misspelled
parameter name response_inputnse_input; rename it to response_input in the
function signature (preserve the type ResponseInput) and update all internal
references (the isinstance check, the for loop, and any casts) to use
response_input so the docstring and code refer to the same identifier.
In `@tests/unit/app/endpoints/test_responses.py`:
- Around line 538-552: Replace the loose mock assertion so the test verifies the
exact refusal turn persisted: instead of mock_append.assert_called_once(), use
mock_append.assert_awaited_once_with(...) (or assert_awaited_once_with) and pass
the expected conversation ID, the original input text from responses_request,
and mock_moderation.refusal_response as the turn payload; keep the call to
responses_endpoint_handler and the final ResponsesResponse type check but assert
the precise arguments to append_turn_items_to_conversation to ensure the
blocked-flow persistence wiring is exercised.
---
Duplicate comments:
In `@docs/responses.md`:
- Around line 22-27: The TOC entry still links to the old anchor
'#known-limitations-and-behavioral-differences' but the actual section header
was changed to '## Behavioral Differences'; update the TOC link to match the new
header (replace '#known-limitations-and-behavioral-differences' with
'#behavioral-differences') or rename the header back to the original; adjust the
list item that references "Known Limitations and Behavioral Differences" so its
anchor matches the 'Behavioral Differences' header.
- Around line 105-107: Update the `shield_ids` documentation to explicitly
describe both cases: when the field is omitted (current behavior: apply all
configured shields in LCORE) and when an empty array `[]` is provided
(explicitly state whether that disables all shields or is treated as an invalid
request), and include a short example or note showing the exact client payload
for each behavior so integrators can distinguish "omit" vs "empty array" cases
for the `shield_ids` parameter.
In `@src/app/endpoints/query.py`:
- Around line 198-219: The topic-summary generation is executed even when
moderation_result.decision != "passed"; move the entire
new-conversation/topic-summary path into the success branch so it only runs on
passed content: inside the block guarded by moderation_result.decision ==
"passed" (where inline_rag_context, turn_summary, rag_chunks, and
referenced_documents are handled), also perform the check for not
user_conversation and query_request.generate_topic_summary and call
get_topic_summary(query_request.query, client, responses_params.model) there;
otherwise set topic_summary = None outside that branch. Ensure you reference the
same symbols (moderation_result.decision, inline_rag_context, turn_summary,
get_topic_summary, query_request.generate_topic_summary) so the logic remains
fully gated by the moderation pass.
In `@src/app/endpoints/responses.py`:
- Around line 341-342: Guard topic-summary generation behind the moderation
result so we never call get_topic_summary or persist summaries for prompts that
were blocked/refused: before passing generate_topic_summary or invoking
get_topic_summary in the response-building paths (e.g., where
generate_topic_summary=request.generate_topic_summary or where
get_topic_summary(...) is called), add a check against the moderation outcome
(e.g., only proceed if moderation_result is not blocked/refused or if
is_blocked/is_refused is False). Apply this same guard in all three locations
referenced (around generate_topic_summary assignment and the calls at the other
two spots) so topic summaries are skipped when moderation has blocked the turn.
In `@src/app/endpoints/streaming_query.py`:
- Around line 300-316: The branch handling context.moderation_result.decision ==
"blocked" currently returns shield_violation_generator which looks like a
successful stream; instead raise a specific non-success signal (e.g., raise
ModerationBlockedError(message=context.moderation_result.message,
moderation_id=context.moderation_result.moderation_id)) or return a clearly
non-success payload that generate_response recognizes, so the success-only tail
(topic-summary generation) is not executed. Update the blocked branch (where
turn_summary is set and append_turn_items_to_conversation is called) to raise or
return that failure indicator and ensure generate_response is updated/checked to
treat ModerationBlockedError (or the chosen failure payload) as a non-success
path rather than a completed generator.
In `@src/utils/endpoints.py`:
- Around line 319-321: Update the log message passed to logger.exception in the
database check for turn existence (the logger.exception call that currently uses
"Database error while checking turn existance for response_id %s") to correct
the typo "existance" to "existence" so the message reads "Database error while
checking turn existence for response_id %s"; keep the existing arguments
(response_id) and call site intact.
In `@src/utils/shields.py`:
- Around line 255-256: The function currently treats an explicit shield_ids ==
[] as "run none" by returning an empty list; instead detect this case and raise
an UnprocessableEntityResponse (HTTP 422) so explicit empty overrides are
rejected; update the logic in the function that references shield_ids to replace
the early return with raising UnprocessableEntityResponse (include a clear
message like "shield_ids may not be an empty list"), and add
UnprocessableEntityResponse to the imports from models.responses so the
exception is available.
In `@src/utils/vector_search.py`:
- Around line 639-672: The function append_inline_rag_context_to_input mutates
the input_value in-place by modifying message.content and part.text; instead,
create and return a new ResponseInput without mutating the caller's objects:
detect whether input_value is a str and return a new string with the context
appended, or if it's a sequence, produce a shallow copy of the list and replace
the first matching ResponseItem (type=="message" and role=="user") with a cloned
ResponseMessage whose content is also cloned (copy the string or clone the
content parts) and with the inline_rag_context_text appended to the cloned
content (modify the cloned part.text, not the original); ensure
append_inline_rag_context_to_input returns this new list (or new string) and
leaves the original input_value, ResponseMessage, and parts unchanged.
In `@tests/unit/app/endpoints/test_streaming_query.py`:
- Around line 867-877: The test currently doesn't assert the payload passed to
app.endpoints.streaming_query.append_turn_items_to_conversation; change the mock
assertion to use append_turn_items_to_conversation.assert_awaited_once_with(...)
and verify the first arg is mock_responses_params.conversation and the second
arg is mock_moderation_result.refusal_response (use the same mock names as in
the test where retrieve_response_generator is awaited) so the blocked-turn
append is validated explicitly.
---
Nitpick comments:
In `@src/app/endpoints/responses.py`:
- Around line 377-386: The synthetic "blocked" response is being built with
ResponsesResponse.model_construct(...), which bypasses Pydantic validation and
can produce incomplete SSE payloads if echoed_params drifts; replace the
model_construct usage with normal validated construction by calling
ResponsesResponse(...) so required echoed fields are enforced, update both
occurrences (the block around creation with id=moderation_result.moderation_id,
created_at=..., status="in_progress", output=[],
conversation=normalized_conv_id, available_quotas={}, output_text="",
**echoed_params and the similar block at the later occurrence) and ensure any
missing required keys in echoed_params are provided or handled so the
constructor doesn't raise unexpected exceptions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: eded7ad2-8e9b-4821-84a2-6809241a4e7e
📒 Files selected for processing (16)
docs/openapi.jsondocs/responses.mdsrc/app/endpoints/query.pysrc/app/endpoints/responses.pysrc/app/endpoints/streaming_query.pysrc/models/requests.pysrc/models/responses.pysrc/utils/conversations.pysrc/utils/endpoints.pysrc/utils/responses.pysrc/utils/shields.pysrc/utils/types.pysrc/utils/vector_search.pytests/unit/app/endpoints/test_conversations.pytests/unit/app/endpoints/test_responses.pytests/unit/app/endpoints/test_streaming_query.py
🚧 Files skipped from review as they are similar to previous changes (2)
- src/utils/conversations.py
- src/models/requests.py
c16088b to
7142479
Compare
| llama_stack_conv_id, | ||
| ) | ||
|
|
||
| # Use Conversations API to retrieve conversation items |
There was a problem hiding this comment.
Extracted to get_all_conversation_items
| len(items), | ||
| conversation_id, | ||
| ) | ||
| # Retrieve turns metadata from database |
There was a problem hiding this comment.
Extracted to retrieve_conversation_turns
| stream: bool = False | ||
| temperature: Optional[float] = None | ||
| text: Optional[Text] = None | ||
| tool_choice: Optional[ToolChoice] = ToolChoiceMode.auto |
There was a problem hiding this comment.
This is default in LLS
| None, # provider specified as model prefix | ||
| request.state.authorized_actions, | ||
| ) | ||
|
|
There was a problem hiding this comment.
Conversation management logic extracted to utility function.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 6
♻️ Duplicate comments (7)
docs/responses.md (1)
106-107:⚠️ Potential issue | 🟡 MinorClarify
shield_ids=[]behavior explicitly.Please document that an empty list means “run no shields” (while omitted means “use all configured shields”) to avoid client-side misinterpretation.
Also applies to: 523-524
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/responses.md` around lines 106 - 107, Update the API docs for the `shield_ids` field to explicitly state the semantics of an empty list versus omission: document that `shield_ids=[]` means "run no shields" while omitting `shield_ids` (no key present) means "use all configured shields in LCORE"; apply this clarification to both occurrences of the `shield_ids` row (the existing table entry and the duplicate at lines referenced) so clients won't misinterpret an empty array as "use defaults."src/app/endpoints/streaming_query.py (1)
303-319:⚠️ Potential issue | 🟠 MajorBlocked moderation path still looks like a successful stream path.
This branch returns a regular generator, so downstream completion logic can still execute success-only side effects (notably topic-summary generation) for blocked turns.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/streaming_query.py` around lines 303 - 319, The blocked moderation branch currently returns a normal generator so downstream success-only completion logic can still run; change this to return a stream that signals failure immediately by returning an async generator which raises a specific error (e.g., ModerationBlockedError) on first iteration instead of a regular success stream. Update the code path that returns shield_violation_generator(...) (involving context.moderation_result, turn_summary, append_turn_items_to_conversation, MEDIA_TYPE_JSON) to return a generator that raises when consumed (or replace shield_violation_generator with a new async generator like moderation_blocked_stream that raises ModerationBlockedError) so downstream completion/summary logic will detect the failure and skip success-only side effects. Ensure the error type/message includes context.moderation_result.message and preserve the current media_type handling.src/app/endpoints/query.py (1)
285-294:⚠️ Potential issue | 🟠 MajorDon’t let refusal persistence failures override the moderation response.
The blocked path should return the refusal even if conversation-item persistence fails. Right now, a storage/API failure can surface as 5xx and hide the moderation outcome.
Suggested fix
if moderation_result.decision == "blocked": - await append_turn_items_to_conversation( - client, - responses_params.conversation, - responses_params.input, - [moderation_result.refusal_response], - ) + try: + await append_turn_items_to_conversation( + client, + responses_params.conversation, + responses_params.input, + [moderation_result.refusal_response], + ) + except HTTPException as exc: + logger.warning( + "Failed to persist blocked moderation turn for conversation %s: %s", + responses_params.conversation, + exc.detail, + ) return TurnSummary( id=moderation_result.moderation_id, llm_response=moderation_result.message )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/query.py` around lines 285 - 294, The blocked-moderation branch currently calls append_turn_items_to_conversation and lets any persistence/API exception bubble up, which can turn a moderation refusal into a 5xx error; wrap the call to append_turn_items_to_conversation in a try/except (or try/except Exception) inside the moderation_result.decision == "blocked" branch, log the exception (don't re-raise), and always return the TurnSummary constructed from moderation_result (use moderation_result.moderation_id and moderation_result.message) so the refusal is returned even if persistence fails.src/utils/conversations.py (1)
476-478:⚠️ Potential issue | 🟠 MajorPreserve upstream API status/details instead of always returning generic 500.
Both exception blocks currently collapse every
APIStatusErrorinto the same internal error. That masks actionable upstream statuses (e.g., 4xx) and makes failures harder to diagnose and handle correctly.Suggested fix
- except APIStatusError as e: - error_response = InternalServerErrorResponse.generic() - raise HTTPException(**error_response.model_dump()) from e + except APIStatusError as e: + status_code = ( + getattr(e, "status_code", None) + or getattr(getattr(e, "response", None), "status_code", 500) + ) + detail = getattr(getattr(e, "response", None), "text", None) or str(e) + raise HTTPException(status_code=status_code, detail=detail) from eAlso applies to: 512-514
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/conversations.py` around lines 476 - 478, Currently APIStatusError is always converted to InternalServerErrorResponse.generic() before raising HTTPException; change the handler in src/utils/conversations.py to preserve the upstream status and details: in the except APIStatusError as e blocks extract the upstream HTTP status (e.g., e.status_code or e.response.status_code), any error body/message (e.g., e.response.json() or str(e)), and headers if present, then raise HTTPException(status_code=<upstream status>, detail=<upstream detail>, headers=<upstream headers>) instead of using InternalServerErrorResponse.generic(); apply the same change to the other APIStatusError handler around the 512-514 area so 4xx/5xx upstream info is propagated rather than always returning a generic 500.src/app/endpoints/responses.py (2)
586-589:⚠️ Potential issue | 🟠 MajorSkip topic-summary generation when moderation blocks the turn.
Both streaming and non-streaming flows still call
get_topic_summary()for blocked requests. That causes avoidable model calls and can persist summaries for refused content.🛡️ Guard summary generation by moderation outcome
- return StreamingResponse( + should_generate_topic_summary = ( + (request.generate_topic_summary or False) + and moderation_result.decision == "passed" + ) + return StreamingResponse( generate_response( generator=generator, turn_summary=turn_summary, client=client, auth=auth, input_text=input_text, started_at=started_at, api_params=api_params, - generate_topic_summary=request.generate_topic_summary or False, + generate_topic_summary=should_generate_topic_summary, ), media_type="text/event-stream", )- if request.generate_topic_summary: + if request.generate_topic_summary and moderation_result.decision == "passed": logger.debug("Generating topic summary for new conversation") topic_summary = await get_topic_summary(input_text, client, api_params.model)Also applies to: 693-696
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/responses.py` around lines 586 - 589, The topic-summary generation is being triggered even for moderation-blocked turns; before calling get_topic_summary (in the generate_topic_summary branch around where topic_summary is set), check the moderation outcome (e.g., moderation or moderation_result) and skip calling get_topic_summary when the request is blocked/refused; apply the same guard to the other occurrence (the second generate_topic_summary block around lines 693-696) so summaries are not created for moderated content.
98-100:⚠️ Potential issue | 🟠 MajorDocument the SSE 200 variant in OpenAPI.
/responsescan returntext/event-stream(whenstream=true), but 200 currently advertises only JSON. Client generation and docs will be inaccurate.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/responses.py` around lines 98 - 100, The OpenAPI responses map currently only registers the JSON 200 response via ResponsesResponse.openapi_response() causing docs/clients to miss the SSE variant; update the 200 entry in responses_response to include both the JSON and the text/event-stream variants (e.g. add the streaming SSE response from the streaming response helper, such as a ResponsesStreamResponse.openapi_response() or by extending ResponsesResponse.openapi_response() to include "text/event-stream" media type), ensuring the 200 response object lists the "application/json" and "text/event-stream" content entries so the /responses endpoint advertises the SSE (stream=true) variant.src/models/requests.py (1)
737-743:⚠️ Potential issue | 🟠 Major
previous_response_idvalidation blocks valid blocked-turn follow-ups.Blocked responses are emitted with moderation IDs (
src/app/endpoints/responses.py, Line 378 and Line 635). Rejecting IDs with"modr"prefix here breaksprevious_response_id=response.idflows after moderation.🔧 Suggested fix
- `@field_validator`("previous_response_id") - `@classmethod` - def check_previous_response_id(cls, value: Optional[str]) -> Optional[str]: - """Validate that previous_response_id does not start with 'modr'.""" - if value is not None and value.startswith("modr"): - raise ValueError("You cannot provide context by moderation response.") - return value + `@field_validator`("previous_response_id") + `@classmethod` + def check_previous_response_id(cls, value: Optional[str]) -> Optional[str]: + """Validate previous_response_id format.""" + return value🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/models/requests.py` around lines 737 - 743, The current validator check_previous_response_id rejects any previous_response_id starting with "modr", which prevents legitimate follow-ups that reference moderation-generated response IDs; update the validator on the previous_response_id field in the same class so it no longer raises for values that start with "modr" (i.e., remove or relax the startswith("modr") rejection) while preserving any other existing validations and return the value unchanged. Ensure you only change the logic inside check_previous_response_id and do not alter the field name or decorator so other code referencing previous_response_id continues to work.
🧹 Nitpick comments (7)
tests/e2e/test_list.txt (1)
1-20: Consider alphabetizing the test list for easier maintenance.The test list is not currently in alphabetical order. While not a blocker, maintaining alphabetical order would make it easier to navigate and prevent duplicate entries in the future.
📝 Alphabetically sorted version
features/authorized_noop.feature features/authorized_noop_token.feature features/authorized_rh_identity.feature +features/conversation_cache_v2.feature +features/conversations.feature features/faiss.feature +features/feedback.feature +features/health.feature +features/info.feature +features/mcp.feature +features/mcp_file_auth.feature +features/models.feature +features/query.feature +features/rbac.feature +features/responses.feature +features/rest_api.feature +features/rlsapi_v1.feature +features/rlsapi_v1_errors.feature features/smoketests.feature -features/authorized_noop.feature -features/authorized_noop_token.feature -features/authorized_rh_identity.feature -features/rbac.feature -features/conversations.feature -features/conversation_cache_v2.feature -features/feedback.feature -features/health.feature -features/info.feature -features/responses.feature -features/query.feature -features/rlsapi_v1.feature -features/rlsapi_v1_errors.feature features/streaming_query.feature -features/rest_api.feature -features/mcp.feature -features/mcp_file_auth.feature -features/models.feature🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/test_list.txt` around lines 1 - 20, The test list is unsorted; open the test_list file and alphabetize the lines containing the feature file entries (the lines like "features/faiss.feature", "features/smoketests.feature", etc.), remove any exact duplicate entries, and save with a single trailing newline preserved; ensure the ordering matches ASCII alphabetical order so entries like "authorized_*" appear before "conversations.feature" and verify no other non-feature lines are reordered or removed.src/utils/suid.py (1)
108-113: Align this new utility docstring with Google-style sections.Please add explicit
ParametersandReturnssections for consistency with project docstring standards.As per coding guidelines, "Follow Google Python docstring conventions ... with Parameters, Returns, Raises, and Attributes sections".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/suid.py` around lines 108 - 113, Update the docstring for the is_moderation_id function to follow Google-style sections: add a Parameters section documenting the suid: str argument and its meaning, and add a Returns section describing the returned bool value (True when suid starts with 'modr', False otherwise); keep the one-line summary and ensure types and descriptions match existing project conventions in the is_moderation_id docstring.docs/responses.md (1)
28-35: Add the new “Implicit Conversation Management” section to the TOC.The section exists now, but it’s not listed in the table of contents.
Also applies to: 535-535
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/responses.md` around lines 28 - 35, TOC is missing the new "Implicit Conversation Management" heading; add an entry linking to that section (use the exact anchor "#implicit-conversation-management") into the existing list of top-level items (near the entries like "Streaming Request" and "Error Handling") so the table of contents includes "Implicit Conversation Management" with proper nesting consistent with other entries.src/models/responses.py (1)
1565-1576: Fail fast when JSON examples are missing inopenapi_response().The current fallback to
{}can silently degrade generated docs. Consider raisingSchemaError(same pattern asAbstractSuccessfulResponse) when the primary JSON example is absent.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/models/responses.py` around lines 1565 - 1576, openapi_response currently falls back to an empty JSON example when model_json_schema() has no "examples", which should instead raise a SchemaError like AbstractSuccessfulResponse does; update openapi_response to check json_example after computing model_json_schema()/model_examples and if missing raise SchemaError with a clear message (include cls.__name__ or the model identifier) rather than using an empty dict, and keep the rest of the logic (schema_extra / sse_example / content assembly) unchanged so SSE handling remains intact.src/models/database/conversations.py (1)
34-35: Use optional typing for nullable columns.
nullable=Truecolumns are currently typed as non-nullMapped[str], which is inconsistent with runtime values and weakens type safety.Suggested typing fix
+from typing import Optional @@ - last_response_id: Mapped[str] = mapped_column(nullable=True) + last_response_id: Mapped[Optional[str]] = mapped_column(nullable=True) @@ - response_id: Mapped[str] = mapped_column(nullable=True, index=True) + response_id: Mapped[Optional[str]] = mapped_column(nullable=True, index=True)As per coding guidelines, "Include complete type annotations for all class attributes".
Also applies to: 71-73
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/models/database/conversations.py` around lines 34 - 35, The nullable columns in the Conversations model (e.g., last_response_id and the other attributes marked nullable=True around lines 71-73) are typed as Mapped[str] but must be optional; change their annotations to Mapped[Optional[str]] (or Mapped[Optional[Type]] for non-str columns), import Optional from typing, and update each attribute declaration that has nullable=True to use Optional in its Mapped[...] type so the type hints match runtime nullability.tests/unit/utils/test_endpoints.py (1)
456-716: Add one explicit mutual-exclusivity test for context strategies.This suite is comprehensive; one extra case where both
conversation_idandprevious_response_idare provided would lock in the contract and prevent regressions around strategy precedence/validation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/utils/test_endpoints.py` around lines 456 - 716, Add a unit test that calls resolve_response_context with both conversation_id and previous_response_id set and asserts the function enforces mutual exclusivity by raising an HTTPException (status_code 400); mock AsyncLlamaStackClientHolder as in other tests, call endpoints.resolve_response_context(user_id="user-1", others_allowed=False, conversation_id="conv-raw", previous_response_id="resp-123", generate_topic_summary=None) and assert exc_info.value.status_code == 400 and that the exception detail mentions both "conversation_id" and "previous_response_id" to lock in the validation contract.src/utils/endpoints.py (1)
255-260: Avoid double DB lookup forprevious_response_id.
check_turn_existence()andretrieve_turn_by_response_id()query the same row.retrieve_turn_by_response_id()already handles missing rows with 404, so the pre-check is redundant.♻️ Simplify to single lookup
- if not check_turn_existence(previous_response_id): - error_response = NotFoundResponse( - resource="response", resource_id=previous_response_id - ) - raise HTTPException(**error_response.model_dump()) - prev_user_turn = retrieve_turn_by_response_id(previous_response_id) + prev_user_turn = retrieve_turn_by_response_id(previous_response_id)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/endpoints.py` around lines 255 - 260, Remove the redundant pre-check: stop calling check_turn_existence(previous_response_id) before retrieve_turn_by_response_id(previous_response_id) since retrieve_turn_by_response_id already raises a 404; instead directly call retrieve_turn_by_response_id(previous_response_id) and let it raise the HTTPException/NotFoundResponse as needed, removing the NotFoundResponse construction and raise around the check to avoid the double DB lookup and duplicated logic (refer to functions check_turn_existence, retrieve_turn_by_response_id and variable previous_response_id).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/app/endpoints/responses.py`:
- Line 12: ResponsesRequest.tool_choice is an OpenAIResponseInputToolChoiceMode
enum, but the code imports and compares against ToolChoice (from
llama_stack_api) so the check in the no_tools logic (where ToolChoice.none is
used) always fails; fix by importing and using the matching enum
OpenAIResponseInputToolChoiceMode (or replace the comparison with
OpenAIResponseInputToolChoiceMode.none) and update the import line that
currently brings in ToolChoice to instead import
OpenAIResponseInputToolChoiceMode (or add it) so the comparison in the no_tools
check and any other tool_choice comparisons use the same enum type.
In `@src/models/database/conversations.py`:
- Around line 71-73: Make response_id unique so a previous_response_id maps to
exactly one turn: update the Conversation model's response_id mapped_column to
enforce uniqueness (e.g., add unique=True or add a UniqueConstraint on
response_id) while keeping nullable semantics, and add a corresponding DB
migration that handles existing duplicates/nulls (clean or dedupe rows before
applying the constraint). Target symbols: response_id (mapped_column on the
Conversation model) and any migration script you generate to alter the schema.
In `@src/utils/query.py`:
- Around line 294-295: The code is persisting summary.id directly which allows
empty strings to be stored into UserTurn.response_id and propagate to
UserConversation.last_response_id; update the logic around where
response_id=summary.id is set (and the other occurrences noted) to first
normalize and guard: treat empty/whitespace summary.id as None/NULL (or skip
updating) and only write/update UserTurn.response_id and
UserConversation.last_response_id when summary.id is a non-empty valid string;
apply this check in the functions/methods that set response_id (locations
handling summary.id assignment and any update_last_response_id logic) so empty
values are never persisted.
In `@tests/unit/app/endpoints/test_conversations.py`:
- Around line 588-592: The current assertions only check that
exc_info.value.status_code equals status.HTTP_500_INTERNAL_SERVER_ERROR and that
"response" is present in detail, which is too permissive; update the assertions
around exc_info and detail to validate the response shape/content more strictly
by asserting that detail["response"] is a dict (or expected type) and contains
expected keys/values (for example assert "message" or "error" in
detail["response"] and that detail["response"]["message"] contains the expected
error substring or code), and/or assert a specific substring in
str(detail["response"]) to ensure the failure is the one intended; make these
changes where exc_info, detail, and status.HTTP_500_INTERNAL_SERVER_ERROR are
used so regressions are caught.
In `@tests/unit/app/endpoints/test_responses.py`:
- Around line 764-767: The test uses a regular assert_called_once on an
AsyncMock (append_turn_items_to_conversation) which can pass even if not
awaited; change to awaiting the AsyncMock assertion (use await
mock_append.assert_awaited_once()) and read its awaited call args via
mock_append.await_args (e.g., positional args = mock_append.await_args[0])
instead of mock_append.call_args; apply the same change for the other occurrence
of this AsyncMock in the file.
In `@tests/unit/utils/test_shields.py`:
- Around line 619-745: The TestAppendTurnItemsRefusalCases class is a duplicate
that only calls get_shields_for_request instead of testing
append_turn_items_to_conversation; remove or replace it: either delete the
entire TestAppendTurnItemsRefusalCases class (and its methods) to avoid
redundant tests, or rewrite the class to actually exercise
append_turn_items_to_conversation by creating fixtures/mocks for a conversation
turn, a single output item marked as a refusal, and asserting the conversation
is updated accordingly; reference the class name TestAppendTurnItemsRefusalCases
and the function append_turn_items_to_conversation to locate where to change
tests, and remove usages of get_shields_for_request within that class.
---
Duplicate comments:
In `@docs/responses.md`:
- Around line 106-107: Update the API docs for the `shield_ids` field to
explicitly state the semantics of an empty list versus omission: document that
`shield_ids=[]` means "run no shields" while omitting `shield_ids` (no key
present) means "use all configured shields in LCORE"; apply this clarification
to both occurrences of the `shield_ids` row (the existing table entry and the
duplicate at lines referenced) so clients won't misinterpret an empty array as
"use defaults."
In `@src/app/endpoints/query.py`:
- Around line 285-294: The blocked-moderation branch currently calls
append_turn_items_to_conversation and lets any persistence/API exception bubble
up, which can turn a moderation refusal into a 5xx error; wrap the call to
append_turn_items_to_conversation in a try/except (or try/except Exception)
inside the moderation_result.decision == "blocked" branch, log the exception
(don't re-raise), and always return the TurnSummary constructed from
moderation_result (use moderation_result.moderation_id and
moderation_result.message) so the refusal is returned even if persistence fails.
In `@src/app/endpoints/responses.py`:
- Around line 586-589: The topic-summary generation is being triggered even for
moderation-blocked turns; before calling get_topic_summary (in the
generate_topic_summary branch around where topic_summary is set), check the
moderation outcome (e.g., moderation or moderation_result) and skip calling
get_topic_summary when the request is blocked/refused; apply the same guard to
the other occurrence (the second generate_topic_summary block around lines
693-696) so summaries are not created for moderated content.
- Around line 98-100: The OpenAPI responses map currently only registers the
JSON 200 response via ResponsesResponse.openapi_response() causing docs/clients
to miss the SSE variant; update the 200 entry in responses_response to include
both the JSON and the text/event-stream variants (e.g. add the streaming SSE
response from the streaming response helper, such as a
ResponsesStreamResponse.openapi_response() or by extending
ResponsesResponse.openapi_response() to include "text/event-stream" media type),
ensuring the 200 response object lists the "application/json" and
"text/event-stream" content entries so the /responses endpoint advertises the
SSE (stream=true) variant.
In `@src/app/endpoints/streaming_query.py`:
- Around line 303-319: The blocked moderation branch currently returns a normal
generator so downstream success-only completion logic can still run; change this
to return a stream that signals failure immediately by returning an async
generator which raises a specific error (e.g., ModerationBlockedError) on first
iteration instead of a regular success stream. Update the code path that returns
shield_violation_generator(...) (involving context.moderation_result,
turn_summary, append_turn_items_to_conversation, MEDIA_TYPE_JSON) to return a
generator that raises when consumed (or replace shield_violation_generator with
a new async generator like moderation_blocked_stream that raises
ModerationBlockedError) so downstream completion/summary logic will detect the
failure and skip success-only side effects. Ensure the error type/message
includes context.moderation_result.message and preserve the current media_type
handling.
In `@src/models/requests.py`:
- Around line 737-743: The current validator check_previous_response_id rejects
any previous_response_id starting with "modr", which prevents legitimate
follow-ups that reference moderation-generated response IDs; update the
validator on the previous_response_id field in the same class so it no longer
raises for values that start with "modr" (i.e., remove or relax the
startswith("modr") rejection) while preserving any other existing validations
and return the value unchanged. Ensure you only change the logic inside
check_previous_response_id and do not alter the field name or decorator so other
code referencing previous_response_id continues to work.
In `@src/utils/conversations.py`:
- Around line 476-478: Currently APIStatusError is always converted to
InternalServerErrorResponse.generic() before raising HTTPException; change the
handler in src/utils/conversations.py to preserve the upstream status and
details: in the except APIStatusError as e blocks extract the upstream HTTP
status (e.g., e.status_code or e.response.status_code), any error body/message
(e.g., e.response.json() or str(e)), and headers if present, then raise
HTTPException(status_code=<upstream status>, detail=<upstream detail>,
headers=<upstream headers>) instead of using
InternalServerErrorResponse.generic(); apply the same change to the other
APIStatusError handler around the 512-514 area so 4xx/5xx upstream info is
propagated rather than always returning a generic 500.
---
Nitpick comments:
In `@docs/responses.md`:
- Around line 28-35: TOC is missing the new "Implicit Conversation Management"
heading; add an entry linking to that section (use the exact anchor
"#implicit-conversation-management") into the existing list of top-level items
(near the entries like "Streaming Request" and "Error Handling") so the table of
contents includes "Implicit Conversation Management" with proper nesting
consistent with other entries.
In `@src/models/database/conversations.py`:
- Around line 34-35: The nullable columns in the Conversations model (e.g.,
last_response_id and the other attributes marked nullable=True around lines
71-73) are typed as Mapped[str] but must be optional; change their annotations
to Mapped[Optional[str]] (or Mapped[Optional[Type]] for non-str columns), import
Optional from typing, and update each attribute declaration that has
nullable=True to use Optional in its Mapped[...] type so the type hints match
runtime nullability.
In `@src/models/responses.py`:
- Around line 1565-1576: openapi_response currently falls back to an empty JSON
example when model_json_schema() has no "examples", which should instead raise a
SchemaError like AbstractSuccessfulResponse does; update openapi_response to
check json_example after computing model_json_schema()/model_examples and if
missing raise SchemaError with a clear message (include cls.__name__ or the
model identifier) rather than using an empty dict, and keep the rest of the
logic (schema_extra / sse_example / content assembly) unchanged so SSE handling
remains intact.
In `@src/utils/endpoints.py`:
- Around line 255-260: Remove the redundant pre-check: stop calling
check_turn_existence(previous_response_id) before
retrieve_turn_by_response_id(previous_response_id) since
retrieve_turn_by_response_id already raises a 404; instead directly call
retrieve_turn_by_response_id(previous_response_id) and let it raise the
HTTPException/NotFoundResponse as needed, removing the NotFoundResponse
construction and raise around the check to avoid the double DB lookup and
duplicated logic (refer to functions check_turn_existence,
retrieve_turn_by_response_id and variable previous_response_id).
In `@src/utils/suid.py`:
- Around line 108-113: Update the docstring for the is_moderation_id function to
follow Google-style sections: add a Parameters section documenting the suid: str
argument and its meaning, and add a Returns section describing the returned bool
value (True when suid starts with 'modr', False otherwise); keep the one-line
summary and ensure types and descriptions match existing project conventions in
the is_moderation_id docstring.
In `@tests/e2e/test_list.txt`:
- Around line 1-20: The test list is unsorted; open the test_list file and
alphabetize the lines containing the feature file entries (the lines like
"features/faiss.feature", "features/smoketests.feature", etc.), remove any exact
duplicate entries, and save with a single trailing newline preserved; ensure the
ordering matches ASCII alphabetical order so entries like "authorized_*" appear
before "conversations.feature" and verify no other non-feature lines are
reordered or removed.
In `@tests/unit/utils/test_endpoints.py`:
- Around line 456-716: Add a unit test that calls resolve_response_context with
both conversation_id and previous_response_id set and asserts the function
enforces mutual exclusivity by raising an HTTPException (status_code 400); mock
AsyncLlamaStackClientHolder as in other tests, call
endpoints.resolve_response_context(user_id="user-1", others_allowed=False,
conversation_id="conv-raw", previous_response_id="resp-123",
generate_topic_summary=None) and assert exc_info.value.status_code == 400 and
that the exception detail mentions both "conversation_id" and
"previous_response_id" to lock in the validation contract.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0c721055-72b9-4ba1-b1da-06563442cd90
📒 Files selected for processing (29)
docs/openapi.jsondocs/responses.mdsrc/app/endpoints/conversations_v1.pysrc/app/endpoints/query.pysrc/app/endpoints/responses.pysrc/app/endpoints/streaming_query.pysrc/app/routers.pysrc/models/database/conversations.pysrc/models/requests.pysrc/models/responses.pysrc/utils/conversations.pysrc/utils/endpoints.pysrc/utils/query.pysrc/utils/responses.pysrc/utils/shields.pysrc/utils/suid.pysrc/utils/types.pysrc/utils/vector_search.pytests/e2e/features/responses.featuretests/e2e/test_list.txttests/integration/test_openapi_json.pytests/unit/app/endpoints/test_conversations.pytests/unit/app/endpoints/test_responses.pytests/unit/app/test_routers.pytests/unit/utils/test_conversations.pytests/unit/utils/test_endpoints.pytests/unit/utils/test_query.pytests/unit/utils/test_responses.pytests/unit/utils/test_shields.py
💤 Files with no reviewable changes (1)
- src/utils/shields.py
🚧 Files skipped from review as they are similar to previous changes (4)
- tests/e2e/features/responses.feature
- src/utils/vector_search.py
- tests/unit/utils/test_query.py
- tests/unit/utils/test_conversations.py
tisnik
left a comment
There was a problem hiding this comment.
LGTM (but for the records, there is high probability that some problems were not found)
are-ces
left a comment
There was a problem hiding this comment.
Great work. Left couple of nits, as we discussed there is also a small bug to fix in the tool definition in the response.
src/utils/vector_search.py
Outdated
| return rag_chunks | ||
|
|
||
|
|
||
| def append_inline_rag_context_to_input( |
There was a problem hiding this comment.
Nit: I would change the name to indicate that this function is for the responses API
src/app/endpoints/responses.py
Outdated
| responses_request.solr, | ||
| ) | ||
| if moderation_result.decision == "passed": | ||
| responses_request.input = append_inline_rag_context_to_input( |
There was a problem hiding this comment.
Should we use this method also for query and streaming_query endpoints
There was a problem hiding this comment.
Yes, but it will require further refactoring as queries append inline RAG chunks and attachments at the same place. This is IMO out of scope of this PR
7142479 to
180dd49
Compare
Description
This PR introduces a fully functional POST /v1/responses API endpoint compatible with the OpenResponses specification. It integrates with existing LCORE capabilities including conversation tracking, shields, streaming support, RAG, MCP tools, topic summaries, and quota management.
The implementation also introduces implicit conversation management so that every response is associated with a conversation and can be inspected via the Conversations API.
Users can provide context to the LLM using one of the following mutually exclusive strategies:
In LCORE, a conversation is modeled as a linear chain of user turns (user query + LLM response), and every response belongs to exactly one conversation. Supporting previous_response_id as a context mechanism introduces branching semantics that would break this linearity if handled naively. To preserve a consistent conversation model, implicit conversation management applies the following rules:
Important: The conversation entity acts as a structural and inspection layer. If user wants to provide full context from a forked conversation they can use
previous_response_id, ensuring that the full historical context is preserved.This design preserves strict linear conversations while enabling exploration of all generated responses.
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
Documentation