LCORE-1409: Refactor of shield moderation and inline RAG content persistence#1291
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughModeration is run early and its ShieldModerationResult is threaded through query and streaming endpoints. If blocked, handlers append a refusal turn and return a moderation-driven TurnSummary; if passed, inline and tool RAG contexts are built/merged and normal response generation proceeds. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client/User
participant Endpoint as Query/Streaming Endpoint
participant Shield as ShieldService
participant Vector as VectorSearch
participant Tools as ToolRAG
participant Responses as ResponsesAPI
participant Conv as Conversations
Client->>Endpoint: Send user query
Endpoint->>Shield: run_shield_moderation(query)
Shield-->>Endpoint: moderation_result {decision, message, ids}
alt moderation_result.decision == "blocked"
Endpoint->>Conv: append_turn_items_to_conversation(user_input, refusal)
Endpoint-->>Client: Return TurnSummary (moderation message)
else decision == "passed"
Endpoint->>Vector: build_rag_context(moderation_decision="passed", query)
Vector-->>Endpoint: inline_rag_context
Endpoint->>Tools: fetch tool-based RAG (if tools configured)
Tools-->>Endpoint: tool_rag_results
Endpoint->>Responses: prepare_responses_params(inline_context.context_text, tools)
Responses-->>Endpoint: LLM response + referenced_documents
Endpoint->>Conv: append_turn_items_to_conversation(user_input, llm_output)
Endpoint-->>Client: Return TurnSummary (id, response, merged referenced_documents)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
tests/unit/utils/test_vector_search.py (1)
455-500:⚠️ Potential issue | 🟡 MinorAdd a regression test for the new
"blocked"branch.These updates only cover the
"passed"path, butbuild_rag_context()now has a moderation short-circuit that skips inline RAG entirely. Please add a case that passes"blocked"and asserts the returnedRAGContextis empty and_fetch_byok_rag()/_fetch_solr_rag()are not called.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/utils/test_vector_search.py` around lines 455 - 500, Add a new async test that calls build_rag_context(client_mock, "blocked", "any query", None) to cover the moderation short-circuit branch, mock the configuration to enable inline BYOK and inline Solr as in the other tests, and assert the returned RAGContext has empty context_text, rag_chunks, and referenced_documents; also assert that the helper methods _fetch_byok_rag and _fetch_solr_rag (mocked or spied on) were not called. Use the same test file tests/unit/utils/test_vector_search.py and name the test something like test_moderation_blocked_short_circuits_inline; reference build_rag_context, _fetch_byok_rag, and _fetch_solr_rag when locating the code to mock/spy.src/app/endpoints/query.py (1)
198-221:⚠️ Potential issue | 🔴 CriticalDo not generate topic summaries for blocked moderation results.
After
retrieve_response()returns a blockedTurnSummary, this handler still falls through to the topic-summary branch for new conversations. That sends rejected input to another model call and weakens the shield.🛡️ Minimal guard
- # Get topic summary for new conversation - if not user_conversation and query_request.generate_topic_summary: + # Get topic summary only when the request actually reached the model + if ( + moderation_result.decision == "passed" + and not user_conversation + and query_request.generate_topic_summary + ): logger.debug("Generating topic summary for new conversation") topic_summary = await get_topic_summary( query_request.query, client, responses_params.model ) else: topic_summary = None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/query.py` around lines 198 - 221, The handler currently calls get_topic_summary for new conversations even when retrieve_response returned a blocked moderation_result; update the guard so topic summaries are only generated when the moderation result passed. Concretely, change the condition that uses user_conversation and query_request.generate_topic_summary to also require moderation_result.decision == "passed" (references: retrieve_response, moderation_result, query_request.generate_topic_summary, get_topic_summary, user_conversation) so blocked or rejected inputs do not trigger an extra model call — otherwise set topic_summary = None.src/utils/shields.py (1)
131-156:⚠️ Potential issue | 🟠 MajorSkip
models.list()when no shields will run, and translate its failures too.
shield_ids=[]is documented as “skip all shields”, but this path still makes a backendclient.models.list()call before returning. That adds an unnecessary dependency for a no-op moderation request, and anyAPIConnectionError/APIStatusErrorfrom model discovery bypasses the new HTTPException mapping entirely.🛠️ Minimal fix
shields_to_run = await get_shields_for_request(client, shield_ids) - available_models = {model.id for model in await client.models.list()} + if not shields_to_run: + return ShieldModerationPassed() + + try: + available_models = {model.id for model in await client.models.list()} + except APIConnectionError as e: + error_response = ServiceUnavailableResponse( + backend_name="Llama Stack", + cause=str(e), + ) + raise HTTPException(**error_response.model_dump()) from e + except APIStatusError as e: + error_response = InternalServerErrorResponse.generic() + raise HTTPException(**error_response.model_dump()) from eAs per coding guidelines,
src/**/*.py: HandleAPIConnectionErrorfrom Llama Stack in API operations.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/shields.py` around lines 131 - 156, The code calls client.models.list() unconditionally which triggers unnecessary backend calls and can raise APIConnectionError/APIStatusError before we know shields_to_run; change logic in get_shields_for_request / the loop where shields_to_run is computed so that you only call client.models.list() if shields_to_run is non-empty (i.e., after awaiting get_shields_for_request and confirming shields_to_run), and wrap the client.models.list() call in the same exception handling used for client.moderations.create (catch APIConnectionError and APIStatusError and translate them into ServiceUnavailableResponse / InternalServerErrorResponse HTTPExceptions) so failures from model discovery are mapped consistently; reference symbols: shields_to_run, get_shields_for_request, client.models.list, available_models, moderation_result, client.moderations.create.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/app/endpoints/streaming_query.py`:
- Around line 304-320: The branch handling context.moderation_result.decision ==
"blocked" must short-circuit post-stream processing instead of returning a
generator that the normal completion path will treat as a model stream; mark the
request as blocked (e.g., set a blocked flag on context or responses_params, or
set a field on turn_summary) before returning the shield_violation_generator,
and update the caller(s) such as generate_response and any post-processing
functions like get_topic_summary to check that blocked flag and skip
model-backed post-processing when true so blocked streams never flow through the
normal completion wrapper.
In `@src/utils/shields.py`:
- Around line 256-259: The f-string in the NotFoundResponse call is invalid due
to unescaped nested double quotes; update the resource argument in the
NotFoundResponse call (the line constructing resource=f"Shield{"s" if
len(missing) > 1 else ""}") to use a valid string expression—e.g., use single
quotes inside the expression or concatenate ("Shield" + ("s" if len(missing) > 1
else ""))—so the NotFoundResponse invocation compiles correctly; keep references
to NotFoundResponse and the missing variable unchanged.
In `@src/utils/types.py`:
- Line 330: build_turn_summary currently returns a TurnSummary without copying
response.id, so only blocked turns get the new TurnSummary.id while successful
turns remain empty; update src/utils/responses.py function build_turn_summary to
set TurnSummary.id = response.id (or pass id into the TurnSummary constructor)
whenever a response has an id, ensuring both blocked and non-blocked paths
populate the id field on the returned TurnSummary object.
---
Outside diff comments:
In `@src/app/endpoints/query.py`:
- Around line 198-221: The handler currently calls get_topic_summary for new
conversations even when retrieve_response returned a blocked moderation_result;
update the guard so topic summaries are only generated when the moderation
result passed. Concretely, change the condition that uses user_conversation and
query_request.generate_topic_summary to also require moderation_result.decision
== "passed" (references: retrieve_response, moderation_result,
query_request.generate_topic_summary, get_topic_summary, user_conversation) so
blocked or rejected inputs do not trigger an extra model call — otherwise set
topic_summary = None.
In `@src/utils/shields.py`:
- Around line 131-156: The code calls client.models.list() unconditionally which
triggers unnecessary backend calls and can raise
APIConnectionError/APIStatusError before we know shields_to_run; change logic in
get_shields_for_request / the loop where shields_to_run is computed so that you
only call client.models.list() if shields_to_run is non-empty (i.e., after
awaiting get_shields_for_request and confirming shields_to_run), and wrap the
client.models.list() call in the same exception handling used for
client.moderations.create (catch APIConnectionError and APIStatusError and
translate them into ServiceUnavailableResponse / InternalServerErrorResponse
HTTPExceptions) so failures from model discovery are mapped consistently;
reference symbols: shields_to_run, get_shields_for_request, client.models.list,
available_models, moderation_result, client.moderations.create.
In `@tests/unit/utils/test_vector_search.py`:
- Around line 455-500: Add a new async test that calls
build_rag_context(client_mock, "blocked", "any query", None) to cover the
moderation short-circuit branch, mock the configuration to enable inline BYOK
and inline Solr as in the other tests, and assert the returned RAGContext has
empty context_text, rag_chunks, and referenced_documents; also assert that the
helper methods _fetch_byok_rag and _fetch_solr_rag (mocked or spied on) were not
called. Use the same test file tests/unit/utils/test_vector_search.py and name
the test something like test_moderation_blocked_short_circuits_inline; reference
build_rag_context, _fetch_byok_rag, and _fetch_solr_rag when locating the code
to mock/spy.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f3253b00-6d5f-4a4e-abc6-36627d3ad421
📒 Files selected for processing (13)
src/app/endpoints/query.pysrc/app/endpoints/streaming_query.pysrc/constants.pysrc/models/context.pysrc/utils/conversations.pysrc/utils/responses.pysrc/utils/shields.pysrc/utils/types.pysrc/utils/vector_search.pytests/unit/app/endpoints/test_query.pytests/unit/app/endpoints/test_streaming_query.pytests/unit/utils/test_shields.pytests/unit/utils/test_vector_search.py
36c2fa3 to
b93f99c
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/utils/shields.py (1)
131-156:⚠️ Potential issue | 🟠 MajorWrap model discovery in error handling; add early return for empty shields.
Line 132 calls
await client.models.list()outside the try/except block. If this API call fails withAPIConnectionErrororAPIStatusError, the exception bubbles up unmapped, violating the error handling contract. Additionally, ifshields_to_runis empty, the function still queries for available models unnecessarily.Suggested fix
shields_to_run = await get_shields_for_request(client, shield_ids) + if not shields_to_run: + return ShieldModerationPassed() + + try: + available_models = {model.id for model in await client.models.list()} + except APIConnectionError as e: + error_response = ServiceUnavailableResponse( + backend_name="Llama Stack", + cause=str(e), + ) + raise HTTPException(**error_response.model_dump()) from e + except APIStatusError as e: + error_response = InternalServerErrorResponse.generic() + raise HTTPException(**error_response.model_dump()) from e for shield in shields_to_run:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/shields.py` around lines 131 - 156, If get_shields_for_request returns no shields, return early before calling the models API; otherwise, move the await client.models.list() call into the existing try/except and wrap it to catch APIConnectionError and APIStatusError (same mapping used for client.moderations.create): on APIConnectionError raise HTTPException with ServiceUnavailableResponse(backend_name="Llama Stack", cause=str(e)); on APIStatusError raise HTTPException with InternalServerErrorResponse.generic(); update the code paths around shields_to_run, available_models and the loop that references shield.provider_resource_id (functions/methods involved: get_shields_for_request, client.models.list, client.moderations.create) so model discovery is protected by the same error handling and skipped when shields_to_run is empty.src/app/endpoints/streaming_query.py (1)
293-296:⚠️ Potential issue | 🟡 MinorStale docstring parameter.
The docstring mentions
inline_rag_documentsparameter at line 296, but this parameter was removed from the function signature.📝 Proposed fix
Args: responses_params: The Responses API parameters context: The response generator context - inline_rag_documents: Referenced documents from inline RAG (BYOK + Solr) Returns:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/streaming_query.py` around lines 293 - 296, The docstring for the function that accepts responses_params and context still mentions the removed parameter inline_rag_documents; update the docstring to reflect the current signature by removing any reference to inline_rag_documents (or replace it with the correct current parameter name if applicable) and ensure the Args section only lists actual parameters (e.g., responses_params, context) and their descriptions for the function in streaming_query.py.src/app/endpoints/query.py (1)
214-221:⚠️ Potential issue | 🟠 MajorBlocked content still triggers
get_topic_summary().Same issue as in streaming_query.py - when moderation blocks,
get_topic_summary()is still called for new conversations. This makes an LLM call with content that was already flagged by moderation.🛡️ Proposed fix
# Get topic summary for new conversation - if not user_conversation and query_request.generate_topic_summary: + if ( + not user_conversation + and query_request.generate_topic_summary + and moderation_result.decision == "passed" + ): logger.debug("Generating topic summary for new conversation") topic_summary = await get_topic_summary( query_request.query, client, responses_params.model )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/query.py` around lines 214 - 221, The topic summary is still generated for new conversations even when moderation blocked the content; modify the logic around get_topic_summary so it only runs when this is a new conversation (user_conversation is falsy), query_request.generate_topic_summary is true, AND the moderation step did not block the request—i.e., move or add a guard that checks the moderation result (use the same moderation flag/variable used elsewhere in this flow) before calling get_topic_summary(query_request.query, client, responses_params.model); ensure the variables user_conversation, query_request.generate_topic_summary and get_topic_summary are referenced to locate and update the condition accordingly.
♻️ Duplicate comments (2)
src/utils/shields.py (1)
256-259:⚠️ Potential issue | 🔴 CriticalFix the f-string before this module can import.
The nested double quotes in
resource=...are invalid Python syntax, sosrc/utils/shields.pywill fail to load and every shield path will break.🐛 Minimal fix
- resource=f"Shield{"s" if len(missing) > 1 else ""}", + resource=f"Shield{'s' if len(missing) > 1 else ''}",This verification should currently fail with a
SyntaxErrorat Line 257:#!/bin/bash python - <<'PY' from pathlib import Path import ast path = Path("src/utils/shields.py") source = path.read_text() try: ast.parse(source) print("syntax ok") except SyntaxError as exc: print(f"{path}:{exc.lineno}:{exc.offset}: {exc.msg}") if exc.text: print(exc.text.rstrip()) PY🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/shields.py` around lines 256 - 259, The f-string in the NotFoundResponse call is syntactically invalid due to nested double quotes; update the resource argument in the NotFoundResponse(...) call (the one constructing resource for pluralization) to use valid quoting or a small precomputed variable (e.g., compute plural = "s" if len(missing) > 1 else "" or use single quotes inside the f-string) so the expression like resource=f"Shield{'s' if len(missing) > 1 else ''}" is a valid Python expression that lets the module import; ensure you only change the resource argument near the NotFoundResponse instantiation and keep resource_id=", ".join(missing) intact.src/app/endpoints/streaming_query.py (1)
304-320:⚠️ Potential issue | 🟠 MajorBlocked streams still trigger
get_topic_summary()for new conversations.When moderation blocks, the
shield_violation_generatoris returned and passed togenerate_response(). However,generate_response()still callsget_topic_summary()at lines 527-535 for new conversations regardless of moderation status. This:
- Makes an unnecessary LLM call for blocked content
- Could expose the model to the same content that moderation already blocked
Consider checking
context.moderation_result.decisionbefore callingget_topic_summary():🛡️ Proposed fix in generate_response()
# Get topic summary for new conversations if needed topic_summary = None - if not context.query_request.conversation_id: + if not context.query_request.conversation_id and context.moderation_result.decision == "passed": should_generate = context.query_request.generate_topic_summary if should_generate: logger.debug("Generating topic summary for new conversation")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/streaming_query.py` around lines 304 - 320, The code returns a shield_violation_generator when moderation blocks, but generate_response still calls get_topic_summary() for new conversations; update generate_response to check context.moderation_result.decision (or equivalent moderation flag) before invoking get_topic_summary() so that if decision == "blocked" you skip the topic-summary call and any LLM work for new conversations; locate the generate_response function and add a guard that short-circuits get_topic_summary() when moderation indicates "blocked" (keeping existing behavior for non-blocked flows and reusing shield_violation_generator/turn_summary handling).
🧹 Nitpick comments (1)
src/utils/conversations.py (1)
438-478: Avoid a second conversation-append implementation.This repeats the same
client.conversations.items.create(...)call andAPIConnectionError/APIStatusErrortranslation that already exists insrc/utils/shields.py. Making the shield-specific helper delegate here, or extracting a shared low-level helper, would keep the two persistence paths from drifting.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/conversations.py` around lines 438 - 478, append_turn_items_to_conversation duplicates the low-level persistence call and error translation around client.conversations.items.create (catching APIConnectionError/APIStatusError and raising ServiceUnavailableResponse/InternalServerErrorResponse); instead, remove the try/except and delegate to the shared helper that already implements this logic (the helper in shields.py that wraps client.conversations.items.create), or extract a new shared function like safe_create_conversation_items(client, conversation_id, items) and call that from append_turn_items_to_conversation; keep the same item construction (user_items/output_items) but replace the direct client.conversations.items.create call with a call to the shared helper so the APIConnectionError/APIStatusError translation (to ServiceUnavailableResponse/InternalServerErrorResponse) is centralized.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/app/endpoints/query.py`:
- Around line 214-221: The topic summary is still generated for new
conversations even when moderation blocked the content; modify the logic around
get_topic_summary so it only runs when this is a new conversation
(user_conversation is falsy), query_request.generate_topic_summary is true, AND
the moderation step did not block the request—i.e., move or add a guard that
checks the moderation result (use the same moderation flag/variable used
elsewhere in this flow) before calling get_topic_summary(query_request.query,
client, responses_params.model); ensure the variables user_conversation,
query_request.generate_topic_summary and get_topic_summary are referenced to
locate and update the condition accordingly.
In `@src/app/endpoints/streaming_query.py`:
- Around line 293-296: The docstring for the function that accepts
responses_params and context still mentions the removed parameter
inline_rag_documents; update the docstring to reflect the current signature by
removing any reference to inline_rag_documents (or replace it with the correct
current parameter name if applicable) and ensure the Args section only lists
actual parameters (e.g., responses_params, context) and their descriptions for
the function in streaming_query.py.
In `@src/utils/shields.py`:
- Around line 131-156: If get_shields_for_request returns no shields, return
early before calling the models API; otherwise, move the await
client.models.list() call into the existing try/except and wrap it to catch
APIConnectionError and APIStatusError (same mapping used for
client.moderations.create): on APIConnectionError raise HTTPException with
ServiceUnavailableResponse(backend_name="Llama Stack", cause=str(e)); on
APIStatusError raise HTTPException with InternalServerErrorResponse.generic();
update the code paths around shields_to_run, available_models and the loop that
references shield.provider_resource_id (functions/methods involved:
get_shields_for_request, client.models.list, client.moderations.create) so model
discovery is protected by the same error handling and skipped when
shields_to_run is empty.
---
Duplicate comments:
In `@src/app/endpoints/streaming_query.py`:
- Around line 304-320: The code returns a shield_violation_generator when
moderation blocks, but generate_response still calls get_topic_summary() for new
conversations; update generate_response to check
context.moderation_result.decision (or equivalent moderation flag) before
invoking get_topic_summary() so that if decision == "blocked" you skip the
topic-summary call and any LLM work for new conversations; locate the
generate_response function and add a guard that short-circuits
get_topic_summary() when moderation indicates "blocked" (keeping existing
behavior for non-blocked flows and reusing
shield_violation_generator/turn_summary handling).
In `@src/utils/shields.py`:
- Around line 256-259: The f-string in the NotFoundResponse call is
syntactically invalid due to nested double quotes; update the resource argument
in the NotFoundResponse(...) call (the one constructing resource for
pluralization) to use valid quoting or a small precomputed variable (e.g.,
compute plural = "s" if len(missing) > 1 else "" or use single quotes inside the
f-string) so the expression like resource=f"Shield{'s' if len(missing) > 1 else
''}" is a valid Python expression that lets the module import; ensure you only
change the resource argument near the NotFoundResponse instantiation and keep
resource_id=", ".join(missing) intact.
---
Nitpick comments:
In `@src/utils/conversations.py`:
- Around line 438-478: append_turn_items_to_conversation duplicates the
low-level persistence call and error translation around
client.conversations.items.create (catching APIConnectionError/APIStatusError
and raising ServiceUnavailableResponse/InternalServerErrorResponse); instead,
remove the try/except and delegate to the shared helper that already implements
this logic (the helper in shields.py that wraps
client.conversations.items.create), or extract a new shared function like
safe_create_conversation_items(client, conversation_id, items) and call that
from append_turn_items_to_conversation; keep the same item construction
(user_items/output_items) but replace the direct
client.conversations.items.create call with a call to the shared helper so the
APIConnectionError/APIStatusError translation (to
ServiceUnavailableResponse/InternalServerErrorResponse) is centralized.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ddb91651-773b-4a11-b9e4-275db8e8c9c3
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (15)
pyproject.tomlsrc/app/endpoints/query.pysrc/app/endpoints/streaming_query.pysrc/constants.pysrc/models/context.pysrc/utils/conversations.pysrc/utils/responses.pysrc/utils/shields.pysrc/utils/types.pysrc/utils/vector_search.pytests/e2e/features/info.featuretests/unit/app/endpoints/test_query.pytests/unit/app/endpoints/test_streaming_query.pytests/unit/utils/test_shields.pytests/unit/utils/test_vector_search.py
✅ Files skipped from review due to trivial changes (1)
- pyproject.toml
🚧 Files skipped from review as they are similar to previous changes (3)
- src/utils/types.py
- tests/unit/utils/test_vector_search.py
- src/models/context.py
b93f99c to
1261239
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/app/endpoints/streaming_query.py (1)
713-727:⚠️ Potential issue | 🟡 Minor
turn_summary.idis not set in the streaming path.In the non-streaming
query.py,build_turn_summarysetssummary.id = response.idfrom the response object. However, in this streaming path,turn_summary.idis never extracted fromlatest_response_object, leaving it as the default empty string. This creates an inconsistency where blocked requests have a proper ID (moderation_id) but successful streaming responses do not.🔧 Proposed fix
# Completed response - capture final text and response object elif event_type == "response.completed": latest_response_object = cast( OpenAIResponseObject, getattr(chunk, "response") ) + turn_summary.id = latest_response_object.id turn_summary.llm_response = turn_summary.llm_response or "".join(text_parts)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/streaming_query.py` around lines 713 - 727, The streaming path doesn't set turn_summary.id, causing successful streamed responses to lack the response ID; update the "response.completed" branch to extract and assign the ID from latest_response_object (e.g., set turn_summary.id = getattr(latest_response_object, "id", turn_summary.id) or latest_response_object.id if present) before yielding the LLM_TURN_COMPLETE_EVENT so the turn_summary (used elsewhere like moderation_id comparisons) contains the same response.id as non-streaming flow (see symbols turn_summary and latest_response_object in the response.completed branch).
🧹 Nitpick comments (2)
tests/unit/app/endpoints/test_query.py (1)
550-567: Assert the blocked turn id as well.This test now seeds
moderation_id, but it only checksllm_response. Adding an assertion forresult.idwill catch regressions in the newTurnSummary.idfield.🧪 Suggested assertion
assert isinstance(result, TurnSummary) + assert result.id == "mod_123" assert result.llm_response == "Content blocked by moderation" mock_append.assert_called_once()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/app/endpoints/test_query.py` around lines 550 - 567, The test seeds mock_moderation_result.moderation_id but never asserts the TurnSummary.id; update the test around the retrieve_response call (which returns a TurnSummary) to assert the blocked turn id is set by adding an assertion that result.id equals mock_moderation_result.moderation_id (or the literal "mod_123") in the same test where result.llm_response is checked and mock_append is asserted.src/utils/shields.py (1)
13-13: Replace the privateopenai._exceptionsimport with the public API.The private module
openai._exceptionsis brittle across SDK upgrades. Use the public importfrom openai import APIStatusErrorinstead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/shields.py` at line 13, Replace the private import from openai._exceptions with the public exported symbol: remove "from openai._exceptions import APIStatusError as OpenAIAPIStatusError" and import the public APIStatusError instead (use the name APIStatusError where OpenAIAPIStatusError is referenced in this module, e.g., in shields.py functions or exception handling). Ensure all references to OpenAIAPIStatusError are updated to the public APIStatusError identifier.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/utils/shields.py`:
- Around line 137-138: Short-circuit the no-shields path: after calling
get_shields_for_request (shields_to_run), if shields_to_run is empty immediately
return ShieldModerationPassed() instead of proceeding to call
client.models.list(); additionally wrap the client.models.list() call in a
try/except that catches the Llama Stack APIConnectionError (or equivalent) and
returns ShieldModerationPassed() or handles it gracefully so transient
network/errors while listing models don't surface as unmapped exceptions.
---
Outside diff comments:
In `@src/app/endpoints/streaming_query.py`:
- Around line 713-727: The streaming path doesn't set turn_summary.id, causing
successful streamed responses to lack the response ID; update the
"response.completed" branch to extract and assign the ID from
latest_response_object (e.g., set turn_summary.id =
getattr(latest_response_object, "id", turn_summary.id) or
latest_response_object.id if present) before yielding the
LLM_TURN_COMPLETE_EVENT so the turn_summary (used elsewhere like moderation_id
comparisons) contains the same response.id as non-streaming flow (see symbols
turn_summary and latest_response_object in the response.completed branch).
---
Nitpick comments:
In `@src/utils/shields.py`:
- Line 13: Replace the private import from openai._exceptions with the public
exported symbol: remove "from openai._exceptions import APIStatusError as
OpenAIAPIStatusError" and import the public APIStatusError instead (use the name
APIStatusError where OpenAIAPIStatusError is referenced in this module, e.g., in
shields.py functions or exception handling). Ensure all references to
OpenAIAPIStatusError are updated to the public APIStatusError identifier.
In `@tests/unit/app/endpoints/test_query.py`:
- Around line 550-567: The test seeds mock_moderation_result.moderation_id but
never asserts the TurnSummary.id; update the test around the retrieve_response
call (which returns a TurnSummary) to assert the blocked turn id is set by
adding an assertion that result.id equals mock_moderation_result.moderation_id
(or the literal "mod_123") in the same test where result.llm_response is checked
and mock_append is asserted.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e7fae02e-26bf-47cd-bc85-66f8139fb4f8
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (15)
pyproject.tomlsrc/app/endpoints/query.pysrc/app/endpoints/streaming_query.pysrc/constants.pysrc/models/context.pysrc/utils/conversations.pysrc/utils/responses.pysrc/utils/shields.pysrc/utils/types.pysrc/utils/vector_search.pytests/e2e/features/info.featuretests/unit/app/endpoints/test_query.pytests/unit/app/endpoints/test_streaming_query.pytests/unit/utils/test_shields.pytests/unit/utils/test_vector_search.py
🚧 Files skipped from review as they are similar to previous changes (5)
- tests/e2e/features/info.feature
- pyproject.toml
- src/utils/responses.py
- src/models/context.py
- tests/unit/utils/test_vector_search.py
| Returns: | ||
| RAGContext containing formatted context text and referenced documents | ||
| """ | ||
| if moderation_decision == "blocked": |
There was a problem hiding this comment.
Don't do inline RAG if input was already marked as blocked
| # Moderation input is the raw user content (query + attachments) without injected RAG | ||
| # context, to avoid false positives from retrieved document content. | ||
| moderation_input = prepare_input(query_request) | ||
| moderation_result = await run_shield_moderation( |
There was a problem hiding this comment.
Do moderation before the inline RAG
| stream=False, | ||
| store=True, | ||
| request_headers=request.headers, | ||
| inline_rag_context=inline_rag_context.context_text or None, |
There was a problem hiding this comment.
Model uses default factory list
| client = await update_azure_token(client) | ||
|
|
||
| # Build index identification mapping for RAG source resolution | ||
| vector_store_ids = extract_vector_store_ids_from_tools(responses_params.tools) |
There was a problem hiding this comment.
Do this inside retrieve_response instead of passing as arguments
| ) | ||
| turn_summary = await retrieve_response(client, responses_params, moderation_result) | ||
|
|
||
| if moderation_result.decision == "passed": |
There was a problem hiding this comment.
Only append RAG chunks if moderation passed
| error_response = handle_known_apistatus_errors(e, responses_params.model) | ||
| raise HTTPException(**error_response.model_dump()) from e | ||
|
|
||
| vector_store_ids = extract_vector_store_ids_from_tools(responses_params.tools) |
There was a problem hiding this comment.
Replaced from outer scope
| ) | ||
|
|
||
| # Combine inline RAG results (BYOK + Solr) with tool-based results | ||
| if context.moderation_result.decision == "passed": |
There was a problem hiding this comment.
Combine tool-based and inline referenced docs here
| moderation_result = await client.moderations.create( | ||
| input=input_text, model=shield.provider_resource_id | ||
| ) | ||
| # Known Llama Stack bug: error is raised when violation is present |
|
|
||
| # Reject if no requested shields were found (prevents accidental bypass) | ||
| if not shields_to_run: | ||
| response = UnprocessableEntityResponse( |
There was a problem hiding this comment.
Use 404 instead of 422
1261239 to
dac0275
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/app/endpoints/query.py (1)
162-186:⚠️ Potential issue | 🟠 MajorBlocked requests still pay the full response-prep cost.
After moderation blocks the input, the handler still goes through
prepare_responses_params(). That can list models, resolve vector stores, and build tools beforeretrieve_response()returns the refusal. A blocked request can therefore still do the expensive/network-dependent work this refactor is trying to skip, and may fail with an unrelated 5xx instead of returning the moderation message.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/query.py` around lines 162 - 186, After run_shield_moderation returns, short-circuit blocked requests immediately instead of continuing to build RAG context or call prepare_responses_params; inspect moderation_result.decision (or equivalent "blocked"/"refuse" state) and if it indicates a block, return the moderation refusal response to the caller (or call the existing path that yields the refusal) without invoking build_rag_context or prepare_responses_params so you avoid listing models, resolving vector stores, or initializing tools for blocked inputs. Ensure the decision check happens right after run_shield_moderation and before any expensive calls (build_rag_context, prepare_responses_params, retrieve_response).tests/unit/utils/test_vector_search.py (1)
455-500:⚠️ Potential issue | 🟡 MinorAdd coverage for the new blocked branch.
The updated tests only exercise
moderation_decision="passed". Please add a"blocked"case that assertsbuild_rag_context()returns an emptyRAGContextand never calls the BYOK/Solr fetchers; otherwise the main optimization from this refactor can regress unnoticed.🧪 Suggested test shape
`@pytest.mark.asyncio` +async def test_blocked_skips_all_rag_fetches(self, mocker) -> None: # type: ignore[no-untyped-def] + client_mock = mocker.AsyncMock() + fetch_byok = mocker.patch( + "utils.vector_search._fetch_byok_rag", + new=mocker.AsyncMock(), + ) + fetch_solr = mocker.patch( + "utils.vector_search._fetch_solr_rag", + new=mocker.AsyncMock(), + ) + + context = await build_rag_context(client_mock, "blocked", "test query", None) + + assert context.context_text == "" + assert context.rag_chunks == [] + assert context.referenced_documents == [] + fetch_byok.assert_not_called() + fetch_solr.assert_not_called()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/utils/test_vector_search.py` around lines 455 - 500, Add a test exercising moderation_decision="blocked" for build_rag_context: create a config_mock like in test_byok_enabled_only (inline BYOK configured and inline_solr_enabled False), create an AsyncMock client_mock and patch any Solr/BYOK fetcher or client methods (e.g., client_mock.vector_io.query) to be spies, then call await build_rag_context(client_mock, "blocked", "test query", None) and assert the returned RAGContext has empty context_text, rag_chunks, and referenced_documents, and assert that client_mock.vector_io.query (and any Solr fetcher mocks) were never awaited/called to ensure the blocked branch short-circuits the fetchers.
♻️ Duplicate comments (1)
src/utils/shields.py (1)
137-138:⚠️ Potential issue | 🟠 MajorReturn early when no shields will run.
get_shields_for_request()can legitimately return[](no configured shields, orshield_ids=[]), but Line 138 still callsclient.models.list()without any error mapping. That adds avoidable network I/O to the no-op path and can turn a request that should returnShieldModerationPassed()into an unrelated backend failure.🛠️ Suggested fix
shields_to_run = await get_shields_for_request(client, shield_ids) - available_models = {model.id for model in await client.models.list()} + if not shields_to_run: + return ShieldModerationPassed() + + try: + available_models = {model.id for model in await client.models.list()} + except APIConnectionError as e: + error_response = ServiceUnavailableResponse( + backend_name="Llama Stack", + cause=str(e), + ) + raise HTTPException(**error_response.model_dump()) from e + except LLSApiStatusError as e: + error_response = InternalServerErrorResponse.generic() + raise HTTPException(**error_response.model_dump()) from e + for shield in shields_to_run:As per coding guidelines: Handle
APIConnectionErrorfrom Llama Stack in API operations.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/shields.py` around lines 137 - 138, After calling get_shields_for_request(client, shield_ids), check if shields_to_run is empty and return the no-op result (e.g., ShieldModerationPassed()) immediately to avoid calling client.models.list(); if shields exist, then call client.models.list() to build available_models. Also wrap the client.models.list() call in a try/except that catches APIConnectionError (from the Llama stack) and maps/raises it to the module's expected error type instead of letting it surface as an unrelated backend failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/utils/shields.py`:
- Around line 256-270: The helper currently calls client.shields.list() even
when shield_ids is an explicit empty list (meaning "skip all shields"); modify
the logic in the function handling shield resolution (the code block that uses
configured_shields, requested, configured_ids, missing) to check for shield_ids
== [] and immediately return an empty list before calling client.shields.list();
ensure this check preserves the existing behavior for shield_ids is None (i.e.,
still fetch configured_shields via client.shields.list()), and keep the rest of
the missing-ID validation (using requested, configured_ids, NotFoundResponse,
and HTTPException) unchanged.
---
Outside diff comments:
In `@src/app/endpoints/query.py`:
- Around line 162-186: After run_shield_moderation returns, short-circuit
blocked requests immediately instead of continuing to build RAG context or call
prepare_responses_params; inspect moderation_result.decision (or equivalent
"blocked"/"refuse" state) and if it indicates a block, return the moderation
refusal response to the caller (or call the existing path that yields the
refusal) without invoking build_rag_context or prepare_responses_params so you
avoid listing models, resolving vector stores, or initializing tools for blocked
inputs. Ensure the decision check happens right after run_shield_moderation and
before any expensive calls (build_rag_context, prepare_responses_params,
retrieve_response).
In `@tests/unit/utils/test_vector_search.py`:
- Around line 455-500: Add a test exercising moderation_decision="blocked" for
build_rag_context: create a config_mock like in test_byok_enabled_only (inline
BYOK configured and inline_solr_enabled False), create an AsyncMock client_mock
and patch any Solr/BYOK fetcher or client methods (e.g.,
client_mock.vector_io.query) to be spies, then call await
build_rag_context(client_mock, "blocked", "test query", None) and assert the
returned RAGContext has empty context_text, rag_chunks, and
referenced_documents, and assert that client_mock.vector_io.query (and any Solr
fetcher mocks) were never awaited/called to ensure the blocked branch
short-circuits the fetchers.
---
Duplicate comments:
In `@src/utils/shields.py`:
- Around line 137-138: After calling get_shields_for_request(client,
shield_ids), check if shields_to_run is empty and return the no-op result (e.g.,
ShieldModerationPassed()) immediately to avoid calling client.models.list(); if
shields exist, then call client.models.list() to build available_models. Also
wrap the client.models.list() call in a try/except that catches
APIConnectionError (from the Llama stack) and maps/raises it to the module's
expected error type instead of letting it surface as an unrelated backend
failure.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c9ad4e7d-5ff5-483f-a424-805c5aa04f74
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (15)
pyproject.tomlsrc/app/endpoints/query.pysrc/app/endpoints/streaming_query.pysrc/constants.pysrc/models/context.pysrc/utils/conversations.pysrc/utils/responses.pysrc/utils/shields.pysrc/utils/types.pysrc/utils/vector_search.pytests/e2e/features/info.featuretests/unit/app/endpoints/test_query.pytests/unit/app/endpoints/test_streaming_query.pytests/unit/utils/test_shields.pytests/unit/utils/test_vector_search.py
🚧 Files skipped from review as they are similar to previous changes (7)
- tests/e2e/features/info.feature
- src/utils/types.py
- src/constants.py
- pyproject.toml
- src/models/context.py
- src/app/endpoints/streaming_query.py
- src/utils/conversations.py
c727d41 to
047e955
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
src/app/endpoints/streaming_query.py (1)
305-320:⚠️ Potential issue | 🟠 MajorReturning the refusal generator here still triggers the normal success footer.
Lines 305-320 hand back a plain async iterator, so
generate_response()later treats a blocked request as a successful completion and still runs topic-summary generation plus quota/token bookkeeping. That reintroduces model-backed work after moderation already rejected the input, and footer failures can still tear down a refused stream. Please carry an explicit blocked flag, or branch the wrapper, so the success-only footer is skipped for this path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/streaming_query.py` around lines 305 - 320, The refusal path currently returns only (shield_violation_generator(...), turn_summary) which causes generate_response() to treat it as a normal success and run the success-only footer and bookkeeping; change this return to include an explicit blocked flag or different wrapper so generate_response can detect it and skip the success footer. Concretely, update the block in streaming_query.py that uses context.moderation_result to return a 3-tuple (response_iterator, turn_summary, blocked=True) or a small ResponseWrapper object, and then modify generate_response (the consumer of that return) to check the blocked flag or wrapper type and branch to skip topic-summary generation and quota/token bookkeeping when blocked is true; keep existing calls like append_turn_items_to_conversation and shield_violation_generator unchanged.src/utils/shields.py (1)
137-149:⚠️ Potential issue | 🟠 MajorDon't call
models.list()on paths that never need a shield model lookup.After Lines 137-149, requests with
shield_ids=[], no configured shields, or only non-llama-guardshields still make a Llama Stackmodels.list()call. That adds avoidable latency and can surface a 503/500 before moderation even starts, even though those paths should just returnShieldModerationPassed().🛠️ Suggested fix
shields_to_run = await get_shields_for_request(client, shield_ids) - available_models = {model.id for model in await client.models.list()} + if not shields_to_run: + return ShieldModerationPassed() + + available_models: Optional[set[str]] = None for shield in shields_to_run: - if shield.provider_id == "llama-guard" and ( - not shield.provider_resource_id - or shield.provider_resource_id not in available_models - ): + if shield.provider_id == "llama-guard": + if available_models is None: + try: + available_models = {model.id for model in await client.models.list()} + except APIConnectionError as e: + error_response = ServiceUnavailableResponse( + backend_name="Llama Stack", + cause=str(e), + ) + raise HTTPException(**error_response.model_dump()) from e + except LLSApiStatusError as e: + error_response = InternalServerErrorResponse.generic() + raise HTTPException(**error_response.model_dump()) from e + + if ( + not shield.provider_resource_id + or shield.provider_resource_id not in available_models + ): logger.error("Shield model not found: %s", shield.provider_resource_id) response = NotFoundResponse( resource="Shield model", resource_id=shield.provider_resource_id or "" ) raise HTTPException(**response.model_dump())Run this to confirm the unconditional
models.list()call is still on the fast paths:#!/bin/bash set -euo pipefail python - <<'PY' from pathlib import Path path = Path("src/utils/shields.py") lines = path.read_text().splitlines() for start, end in [(114, 181), (237, 281)]: print(f"\n--- {path}:{start}-{end} ---") for i in range(start - 1, end): print(f"{i + 1}: {lines[i]}") PYExpected result:
run_shield_moderation()callsclient.models.list()immediately afterget_shields_for_request()and before anyif not shields_to_runor lazyprovider_id == "llama-guard"guard. Based on learnings: Applies to src/**/*.py : HandleAPIConnectionErrorfrom Llama Stack in API operations.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/shields.py` around lines 137 - 149, The code eagerly calls client.models.list() right after get_shields_for_request(), causing unnecessary Llama Stack calls on fast paths; change run_shield_moderation so you only call client.models.list() when shields_to_run is non-empty and at least one shield has provider_id == "llama-guard" (check shields_to_run and any(s.provider_id == "llama-guard" for s in shields_to_run)) before invoking client.models.list(); use the existing shields_to_run variable and the shield.provider_id/provider_resource_id checks to gate the lookup and avoid introducing latency or upstream 5xxs on requests that don't need a llama-guard model lookup.
🧹 Nitpick comments (1)
tests/unit/app/endpoints/test_streaming_query.py (1)
845-880: This test stops one layer short of the blocked-stream regression.The new path here only asserts that
retrieve_response_generator()picks the refusal generator. The bug is in the wrapper above it:generate_response()still treats that generator as a normal completion. Please run the returned generator throughgenerate_response()(or the full endpoint) and assertget_topic_summary()/consume_query_tokens()stay untouched for blocked moderation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/app/endpoints/test_streaming_query.py` around lines 845 - 880, The test currently verifies retrieve_response_generator returns the refusal generator but doesn't exercise the wrapper that caused the regression; update the test to take the returned generator from retrieve_response_generator (call it _generator) and pass it into generate_response (or call the full streaming endpoint) so the refusal generator is executed through generate_response, then assert that get_topic_summary() and consume_query_tokens() were not called/modified for the blocked moderation case and that the TurnSummary produced by generate_response (or the endpoint) retains llm_response == "Content blocked"; use the existing mocked objects (mock_context, mock_client, mock_moderation_result) and patch spies on get_topic_summary and consume_query_tokens to validate they remain untouched.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/models/requests.py`:
- Around line 179-183: The Field description for shield_ids in the request
schema only documents the None case but omits that an explicit empty list
opt-out means "run no shields"; update the description string for shield_ids
(Field on the shield_ids attribute) to state both behaviors: None = use all
configured shields, [] = explicitly run no shields, and provide example for the
empty list; ensure the public contract matches get_shields_for_request()
semantics so OpenAPI docs reflect the empty-list opt-out.
---
Duplicate comments:
In `@src/app/endpoints/streaming_query.py`:
- Around line 305-320: The refusal path currently returns only
(shield_violation_generator(...), turn_summary) which causes generate_response()
to treat it as a normal success and run the success-only footer and bookkeeping;
change this return to include an explicit blocked flag or different wrapper so
generate_response can detect it and skip the success footer. Concretely, update
the block in streaming_query.py that uses context.moderation_result to return a
3-tuple (response_iterator, turn_summary, blocked=True) or a small
ResponseWrapper object, and then modify generate_response (the consumer of that
return) to check the blocked flag or wrapper type and branch to skip
topic-summary generation and quota/token bookkeeping when blocked is true; keep
existing calls like append_turn_items_to_conversation and
shield_violation_generator unchanged.
In `@src/utils/shields.py`:
- Around line 137-149: The code eagerly calls client.models.list() right after
get_shields_for_request(), causing unnecessary Llama Stack calls on fast paths;
change run_shield_moderation so you only call client.models.list() when
shields_to_run is non-empty and at least one shield has provider_id ==
"llama-guard" (check shields_to_run and any(s.provider_id == "llama-guard" for s
in shields_to_run)) before invoking client.models.list(); use the existing
shields_to_run variable and the shield.provider_id/provider_resource_id checks
to gate the lookup and avoid introducing latency or upstream 5xxs on requests
that don't need a llama-guard model lookup.
---
Nitpick comments:
In `@tests/unit/app/endpoints/test_streaming_query.py`:
- Around line 845-880: The test currently verifies retrieve_response_generator
returns the refusal generator but doesn't exercise the wrapper that caused the
regression; update the test to take the returned generator from
retrieve_response_generator (call it _generator) and pass it into
generate_response (or call the full streaming endpoint) so the refusal generator
is executed through generate_response, then assert that get_topic_summary() and
consume_query_tokens() were not called/modified for the blocked moderation case
and that the TurnSummary produced by generate_response (or the endpoint) retains
llm_response == "Content blocked"; use the existing mocked objects
(mock_context, mock_client, mock_moderation_result) and patch spies on
get_topic_summary and consume_query_tokens to validate they remain untouched.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 2fd22921-2622-47ed-b58d-b3290b000e77
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (16)
pyproject.tomlsrc/app/endpoints/query.pysrc/app/endpoints/streaming_query.pysrc/constants.pysrc/models/context.pysrc/models/requests.pysrc/utils/conversations.pysrc/utils/responses.pysrc/utils/shields.pysrc/utils/types.pysrc/utils/vector_search.pytests/e2e/features/info.featuretests/unit/app/endpoints/test_query.pytests/unit/app/endpoints/test_streaming_query.pytests/unit/utils/test_shields.pytests/unit/utils/test_vector_search.py
💤 Files with no reviewable changes (1)
- src/utils/types.py
🚧 Files skipped from review as they are similar to previous changes (5)
- src/utils/vector_search.py
- src/models/context.py
- src/utils/responses.py
- src/constants.py
- tests/e2e/features/info.feature
047e955 to
b9ae26e
Compare
| tool_results: list[ToolResultSummary] = Field(default_factory=list) | ||
| rag_chunks: list[RAGChunk] = Field(default_factory=list) | ||
| referenced_documents: list[ReferencedDocument] = Field(default_factory=list) | ||
| inline_rag_documents: list[ReferencedDocument] = Field(default_factory=list) |
There was a problem hiding this comment.
Merged with tool-based docs into referenced_documents
are-ces
left a comment
There was a problem hiding this comment.
TY very nice, LGTM
Just a tiny nit.
| available_models = {model.id for model in await client.models.list()} | ||
|
|
||
| for shield in shields_to_run: | ||
| # Only validate provider_resource_id against models for llama-guard. |
There was a problem hiding this comment.
Do we want to leave some small comment here to remember why we have this check?
There was a problem hiding this comment.
Returned a shorter version of the original comment
b9ae26e to
726d986
Compare
726d986 to
8d2deee
Compare
d7ef69a to
b15482f
Compare
b15482f to
524ac1c
Compare
Description
This PR refactors flow of shield moderation. The moderation is now executed before inline RAG so that we avoid doing expensive RAG operations if the input was blocked by shields. It also proposes refactor of the main
run_moderationfunction.Finally, it fixes inconsistency between query and streaming query persistence of context from inline RAG. It also adds unit tests that verify that rag contents of both types is correctly merged.
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
Improvements
Chores