LCORE-1423: BYOK Inline and Tool RAG integration tests#1292
LCORE-1423: BYOK Inline and Tool RAG integration tests#1292radofuchs merged 2 commits intolightspeed-core:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughAdds two extensive integration test modules that validate BYOK RAG behaviors for Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
59329f4 to
1755a79
Compare
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/integration/endpoints/test_query_byok_integration.py`:
- Around line 1124-1201: The test
test_query_byok_max_chunks_caps_across_multiple_sources currently gives source-a
all top scores so it can't detect a dropped source; modify the mock responses
resp_a and resp_b (and the _side_effect that returns them) so their score ranges
overlap/interleave instead of source-a being strictly higher (e.g., interspersed
score values), then add assertions after the length and ordering checks that at
least one rag chunk from "source-a" and one from "source-b" are present (check
their vector IDs or chunk IDs like the f"a-chunk-..." and f"b-chunk-..." values)
and assert that some known low-scored chunk(s) (e.g., the lowest from both
resp_a/resp_b) are excluded to prove the global cap selected top-scoring chunks
across both sources.
- Around line 706-733: The test test_query_byok_tool_rag_referenced_documents
currently only asserts response.referenced_documents is not None which allows an
empty list to pass; update the assertions to require at least one referenced
document (e.g., assert len(response.referenced_documents) > 0) and assert a
known field/value from the mocked file_search_call results is present in the
first referenced document (e.g., check a metadata key or expected text in
response.referenced_documents[0].metadata or
response.referenced_documents[0].content) so the test verifies actual extraction
logic in query_endpoint_handler.
- Around line 472-509: The test
test_query_byok_inline_rag_no_chunks_returns_empty currently inspects
mock_byok_client.responses.create.call_args which picks the last call; change
the assertion to use the first responses.create invocation instead by reading
mock_byok_client.responses.create.call_args_list[0].kwargs (or its args) to
extract the "input" text and assert that "file_search found" is not present so
the check is stable even if later topic-summary calls occur.
In `@tests/integration/endpoints/test_streaming_query_byok_integration.py`:
- Around line 501-529: Update
test_streaming_query_byok_inline_rag_returns_referenced_documents to actually
consume the StreamingResponse instead of only inspecting
mock_streaming_byok_client.responses.create input: call
streaming_query_endpoint_handler to get the StreamingResponse, iterate/consume
its SSE stream (or use helper to read events) until the final "end" event, parse
the end event payload and assert it contains the referenced_documents metadata
(e.g., contains "doc-ocp-overview" or "doc-k8s-pods") that the BYOK mock emits;
keep existing checks on mock_streaming_byok_client.responses.create but add the
final assertion on the end event payload to ensure end-event serialization is
validated.
- Around line 705-737: Update the
test_streaming_query_byok_tool_rag_emits_referenced_documents test to assert
actual referenced-document payloads instead of just their type: after collecting
events and extracting ref_docs from the "end" event (variables events and
ref_docs), assert ref_docs is non-empty (e.g., assert len(ref_docs) > 0) and
assert at least one known value from the mocked file_search result appears in
ref_docs (for example check a known URL or document id used in the mock). This
ensures the handler actually propagates tool references into
referenced_documents.
- Around line 1098-1178: The test
test_streaming_query_byok_max_chunks_caps_across_multiple_sources currently uses
non-overlapping score ranges so it can't detect a bug that ignores one source;
update the fixtures resp_a and resp_b to produce overlapping scores (e.g., some
resp_b scores higher than some resp_a scores) so global top-K selection must
include chunks from both sources, keep the vector_io mock side effect
(_side_effect) and the call to streaming_query_endpoint_handler as-is, then add
assertions on the final create call input
(mock_client.responses.create.call_args_list[0].kwargs["input"]) to explicitly
assert certain chunk IDs from both "a-chunk-X" and "b-chunk-Y" are present and
that some lower-scored chunk IDs are absent, ensuring the test verifies
cross-source global ranking while still asserting the header contains
BYOK_RAG_MAX_CHUNKS.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 869bd444-9f9b-4267-afc2-fe585b589076
📒 Files selected for processing (2)
tests/integration/endpoints/test_query_byok_integration.pytests/integration/endpoints/test_streaming_query_byok_integration.py
tests/integration/endpoints/test_streaming_query_byok_integration.py
Outdated
Show resolved
Hide resolved
1755a79 to
b0795d0
Compare
b0795d0 to
335297a
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (2)
tests/integration/endpoints/test_streaming_query_byok_integration.py (1)
1045-1099:⚠️ Potential issue | 🟠 MajorAssert the actual cross-source cutoff, not just source presence.
The interleaved scores are an improvement, but this still passes if the handler keeps an arbitrary subset from each store as long as both sources appear and
chunk 0is dropped. Please assert concrete kept/dropped chunks at the cutoff so this proves global top-K ranking rather than just mixed-source inclusion.🧪 Tighten the assertions around the cutoff
- # Both sources must appear in the context (overlapping scores guarantee this) - assert "Source A chunk" in input_text - assert "Source B chunk" in input_text - - # Lowest-scoring chunks from each source must be dropped - assert "Source A chunk 0" not in input_text - assert "Source B chunk 0" not in input_text + # Concrete winners from both sources must survive the global cap + assert "Source A chunk 9" in input_text + assert "Source B chunk 9" in input_text + assert "Source A chunk 5" in input_text + assert "Source B chunk 5" in input_text + + # Concrete losers just below the cutoff must be excluded + assert "Source A chunk 4" not in input_text + assert "Source B chunk 4" not in input_text🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/endpoints/test_streaming_query_byok_integration.py` around lines 1045 - 1099, The test currently only checks that both sources appear and chunk 0 from each source is dropped, which doesn't prove a global top-K cutoff; update the assertions after calling streaming_query_endpoint_handler (use the captured create_call.kwargs["input"] / input_text from mock_client.responses.create) to compute the expected global top-K by sorting all (source,chunk,score) pairs used to build resp_a and resp_b (their scores are in the test setup using constants.BYOK_RAG_MAX_CHUNKS and the round(...) values), then assert that the exact expected kept chunk labels (e.g., "Source A chunk X", "Source B chunk Y") for the top-K appear in input_text and that any chunks ranked below the cutoff do not appear; keep using streaming_query_endpoint_handler, QueryRequest, mock_client.responses.create, and constants.BYOK_RAG_MAX_CHUNKS to locate the code to change.tests/integration/endpoints/test_query_byok_integration.py (1)
1058-1111:⚠️ Potential issue | 🟠 MajorThis still doesn't prove the global merge boundary.
Right now the test only checks sorted scores, both sources present, and
chunk 0excluded. A per-store cap followed by merge/sort could still satisfy that. Assert specific chunks on both sides of the cutoff so the test fails when cross-source ranking is wrong.🧪 Tighten the capped-result assertions
- # Both sources must survive the cap - sources = {chunk.source for chunk in response.rag_chunks} - assert "source-a" in sources - assert "source-b" in sources - - # Lowest-scoring chunks from each source must be dropped - chunk_contents = {chunk.content for chunk in response.rag_chunks} - assert "Source A chunk 0" not in chunk_contents - assert "Source B chunk 0" not in chunk_contents + chunk_contents = {chunk.content for chunk in response.rag_chunks} + + # Concrete winners from both sources must survive the global cap + assert "Source A chunk 9" in chunk_contents + assert "Source B chunk 9" in chunk_contents + assert "Source A chunk 5" in chunk_contents + assert "Source B chunk 5" in chunk_contents + + # Concrete losers just below the cutoff must be excluded + assert "Source A chunk 4" not in chunk_contents + assert "Source B chunk 4" not in chunk_contents🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/endpoints/test_query_byok_integration.py` around lines 1058 - 1111, The test currently only checks sorted scores and presence of both sources but not that the global merge cutoff is correct; modify the assertions after calling query_endpoint_handler (response.rag_chunks) to assert that specific chunks on both sides of the global cutoff are present/absent: compute the global cutoff index = constants.BYOK_RAG_MAX_CHUNKS and given the constructed resp_a/resp_b score patterns, assert that the highest-ranked chunk just above the cutoff from one source (e.g., "Source B chunk X" where X is the index that should survive) is present and that the next lower-ranked chunk just below the cutoff from the other source (e.g., "Source A chunk Y" which should be dropped) is absent; use resp_a/resp_b, n and constants.BYOK_RAG_MAX_CHUNKS to determine the exact chunk indices and add assertions referencing their content strings in chunk_contents to ensure cross-source ranking is enforced.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@tests/integration/endpoints/test_query_byok_integration.py`:
- Around line 1058-1111: The test currently only checks sorted scores and
presence of both sources but not that the global merge cutoff is correct; modify
the assertions after calling query_endpoint_handler (response.rag_chunks) to
assert that specific chunks on both sides of the global cutoff are
present/absent: compute the global cutoff index = constants.BYOK_RAG_MAX_CHUNKS
and given the constructed resp_a/resp_b score patterns, assert that the
highest-ranked chunk just above the cutoff from one source (e.g., "Source B
chunk X" where X is the index that should survive) is present and that the next
lower-ranked chunk just below the cutoff from the other source (e.g., "Source A
chunk Y" which should be dropped) is absent; use resp_a/resp_b, n and
constants.BYOK_RAG_MAX_CHUNKS to determine the exact chunk indices and add
assertions referencing their content strings in chunk_contents to ensure
cross-source ranking is enforced.
In `@tests/integration/endpoints/test_streaming_query_byok_integration.py`:
- Around line 1045-1099: The test currently only checks that both sources appear
and chunk 0 from each source is dropped, which doesn't prove a global top-K
cutoff; update the assertions after calling streaming_query_endpoint_handler
(use the captured create_call.kwargs["input"] / input_text from
mock_client.responses.create) to compute the expected global top-K by sorting
all (source,chunk,score) pairs used to build resp_a and resp_b (their scores are
in the test setup using constants.BYOK_RAG_MAX_CHUNKS and the round(...)
values), then assert that the exact expected kept chunk labels (e.g., "Source A
chunk X", "Source B chunk Y") for the top-K appear in input_text and that any
chunks ranked below the cutoff do not appear; keep using
streaming_query_endpoint_handler, QueryRequest, mock_client.responses.create,
and constants.BYOK_RAG_MAX_CHUNKS to locate the code to change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1bc37e6f-8cf6-4cfc-84b5-1bdfee2b7b36
📒 Files selected for processing (2)
tests/integration/endpoints/test_query_byok_integration.pytests/integration/endpoints/test_streaming_query_byok_integration.py
335297a to
015b3ce
Compare
015b3ce to
7cca40b
Compare
Description
Added integration tests for the BYOK Inline and Tool RAG functionality.
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Integration tests passing
Summary by CodeRabbit