LCORE-1423: BYOK Inline and Tool RAG integration tests by are-ces · Pull Request #1292 · lightspeed-core/lightspeed-stack

are-ces · 2026-03-09T13:37:14Z

Description

Added integration tests for the BYOK Inline and Tool RAG functionality.

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Generated by: Claude Code

Related Tickets & Documents

Related Issue # LCORE-1423
Closes # LCORE-1423

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Integration tests passing

Summary by CodeRabbit

Tests
- Added comprehensive integration tests for BYOK inline and tool-based RAG on /query and /streaming_query endpoints.
- Validate SSE/streaming behavior, injection of referenced documents and weighted scores, and emission of tool_call events and end-event references.
- Cover per-request vector_store_ids overrides, score multipliers, max-chunk caps across sources, merging/ordering by weighted score, and graceful handling of empty results and query errors.

coderabbitai · 2026-03-09T13:37:34Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 96a2b254-f09c-43d8-8cab-408867fe3ae9

📥 Commits

Reviewing files that changed from the base of the PR and between 015b3ce and 7cca40b.

📒 Files selected for processing (2)

tests/integration/endpoints/test_query_byok_integration.py
tests/integration/endpoints/test_streaming_query_byok_integration.py

Walkthrough

Adds two extensive integration test modules that validate BYOK RAG behaviors for /query and /streaming_query, exercising inline and tool-based retrievals, SSE streaming, injected RAG context, score weighting, chunk capping, per-request vector_store_ids overrides, and edge-case handling.

Changes

Cohort / File(s)	Summary
BYOK Query Integration Tests `tests/integration/endpoints/test_query_byok_integration.py`	Adds a comprehensive integration test suite for `/query` covering inline BYOK RAG, tool-based BYOK RAG (file_search), combined flows, per-request `vector_store_ids` overrides, `referenced_documents` extraction, chunk ordering by weighted scores, `score_multiplier` behavior, max-chunk caps, and various error/edge-case scenarios.
BYOK Streaming Query Integration Tests `tests/integration/endpoints/test_streaming_query_byok_integration.py`	Adds a comprehensive integration test suite for `/streaming_query` covering SSE streaming with inline and tool BYOK RAG: streaming input injection, `tool_call` events, `referenced_documents` in end events, per-request overrides, `TOOL_RAG_MAX_CHUNKS`/`BYOK_RAG_MAX_CHUNKS` enforcement, weighted-score ordering, handling of empty/error BYOK results, and combined inline+tool scenarios.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'LCORE-1423: BYOK Inline and Tool RAG integration tests' clearly and specifically describes the main change: adding integration tests for BYOK Inline and Tool RAG functionality.
Docstring Coverage	✅ Passed	Docstring coverage is 83.67% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/integration/endpoints/test_query_byok_integration.py`:
- Around line 1124-1201: The test
test_query_byok_max_chunks_caps_across_multiple_sources currently gives source-a
all top scores so it can't detect a dropped source; modify the mock responses
resp_a and resp_b (and the _side_effect that returns them) so their score ranges
overlap/interleave instead of source-a being strictly higher (e.g., interspersed
score values), then add assertions after the length and ordering checks that at
least one rag chunk from "source-a" and one from "source-b" are present (check
their vector IDs or chunk IDs like the f"a-chunk-..." and f"b-chunk-..." values)
and assert that some known low-scored chunk(s) (e.g., the lowest from both
resp_a/resp_b) are excluded to prove the global cap selected top-scoring chunks
across both sources.
- Around line 706-733: The test test_query_byok_tool_rag_referenced_documents
currently only asserts response.referenced_documents is not None which allows an
empty list to pass; update the assertions to require at least one referenced
document (e.g., assert len(response.referenced_documents) > 0) and assert a
known field/value from the mocked file_search_call results is present in the
first referenced document (e.g., check a metadata key or expected text in
response.referenced_documents[0].metadata or
response.referenced_documents[0].content) so the test verifies actual extraction
logic in query_endpoint_handler.
- Around line 472-509: The test
test_query_byok_inline_rag_no_chunks_returns_empty currently inspects
mock_byok_client.responses.create.call_args which picks the last call; change
the assertion to use the first responses.create invocation instead by reading
mock_byok_client.responses.create.call_args_list[0].kwargs (or its args) to
extract the "input" text and assert that "file_search found" is not present so
the check is stable even if later topic-summary calls occur.

In `@tests/integration/endpoints/test_streaming_query_byok_integration.py`:
- Around line 501-529: Update
test_streaming_query_byok_inline_rag_returns_referenced_documents to actually
consume the StreamingResponse instead of only inspecting
mock_streaming_byok_client.responses.create input: call
streaming_query_endpoint_handler to get the StreamingResponse, iterate/consume
its SSE stream (or use helper to read events) until the final "end" event, parse
the end event payload and assert it contains the referenced_documents metadata
(e.g., contains "doc-ocp-overview" or "doc-k8s-pods") that the BYOK mock emits;
keep existing checks on mock_streaming_byok_client.responses.create but add the
final assertion on the end event payload to ensure end-event serialization is
validated.
- Around line 705-737: Update the
test_streaming_query_byok_tool_rag_emits_referenced_documents test to assert
actual referenced-document payloads instead of just their type: after collecting
events and extracting ref_docs from the "end" event (variables events and
ref_docs), assert ref_docs is non-empty (e.g., assert len(ref_docs) > 0) and
assert at least one known value from the mocked file_search result appears in
ref_docs (for example check a known URL or document id used in the mock). This
ensures the handler actually propagates tool references into
referenced_documents.
- Around line 1098-1178: The test
test_streaming_query_byok_max_chunks_caps_across_multiple_sources currently uses
non-overlapping score ranges so it can't detect a bug that ignores one source;
update the fixtures resp_a and resp_b to produce overlapping scores (e.g., some
resp_b scores higher than some resp_a scores) so global top-K selection must
include chunks from both sources, keep the vector_io mock side effect
(_side_effect) and the call to streaming_query_endpoint_handler as-is, then add
assertions on the final create call input
(mock_client.responses.create.call_args_list[0].kwargs["input"]) to explicitly
assert certain chunk IDs from both "a-chunk-X" and "b-chunk-Y" are present and
that some lower-scored chunk IDs are absent, ensuring the test verifies
cross-source global ranking while still asserting the header contains
BYOK_RAG_MAX_CHUNKS.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 869bd444-9f9b-4267-afc2-fe585b589076

📥 Commits

Reviewing files that changed from the base of the PR and between e73778a and 1755a79.

📒 Files selected for processing (2)

tests/integration/endpoints/test_query_byok_integration.py
tests/integration/endpoints/test_streaming_query_byok_integration.py

tests/integration/endpoints/test_query_byok_integration.py

tests/integration/endpoints/test_streaming_query_byok_integration.py

tisnik

LGTM

coderabbitai

♻️ Duplicate comments (2)

tests/integration/endpoints/test_streaming_query_byok_integration.py (1)

1045-1099: ⚠️ Potential issue | 🟠 Major

Assert the actual cross-source cutoff, not just source presence.

The interleaved scores are an improvement, but this still passes if the handler keeps an arbitrary subset from each store as long as both sources appear and chunk 0 is dropped. Please assert concrete kept/dropped chunks at the cutoff so this proves global top-K ranking rather than just mixed-source inclusion.

🧪 Tighten the assertions around the cutoff

-    # Both sources must appear in the context (overlapping scores guarantee this)
-    assert "Source A chunk" in input_text
-    assert "Source B chunk" in input_text
-
-    # Lowest-scoring chunks from each source must be dropped
-    assert "Source A chunk 0" not in input_text
-    assert "Source B chunk 0" not in input_text
+    # Concrete winners from both sources must survive the global cap
+    assert "Source A chunk 9" in input_text
+    assert "Source B chunk 9" in input_text
+    assert "Source A chunk 5" in input_text
+    assert "Source B chunk 5" in input_text
+
+    # Concrete losers just below the cutoff must be excluded
+    assert "Source A chunk 4" not in input_text
+    assert "Source B chunk 4" not in input_text

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/integration/endpoints/test_streaming_query_byok_integration.py` around
lines 1045 - 1099, The test currently only checks that both sources appear and
chunk 0 from each source is dropped, which doesn't prove a global top-K cutoff;
update the assertions after calling streaming_query_endpoint_handler (use the
captured create_call.kwargs["input"] / input_text from
mock_client.responses.create) to compute the expected global top-K by sorting
all (source,chunk,score) pairs used to build resp_a and resp_b (their scores are
in the test setup using constants.BYOK_RAG_MAX_CHUNKS and the round(...)
values), then assert that the exact expected kept chunk labels (e.g., "Source A
chunk X", "Source B chunk Y") for the top-K appear in input_text and that any
chunks ranked below the cutoff do not appear; keep using
streaming_query_endpoint_handler, QueryRequest, mock_client.responses.create,
and constants.BYOK_RAG_MAX_CHUNKS to locate the code to change.

tests/integration/endpoints/test_query_byok_integration.py (1)

1058-1111: ⚠️ Potential issue | 🟠 Major

This still doesn't prove the global merge boundary.

Right now the test only checks sorted scores, both sources present, and chunk 0 excluded. A per-store cap followed by merge/sort could still satisfy that. Assert specific chunks on both sides of the cutoff so the test fails when cross-source ranking is wrong.

🧪 Tighten the capped-result assertions

-    # Both sources must survive the cap
-    sources = {chunk.source for chunk in response.rag_chunks}
-    assert "source-a" in sources
-    assert "source-b" in sources
-
-    # Lowest-scoring chunks from each source must be dropped
-    chunk_contents = {chunk.content for chunk in response.rag_chunks}
-    assert "Source A chunk 0" not in chunk_contents
-    assert "Source B chunk 0" not in chunk_contents
+    chunk_contents = {chunk.content for chunk in response.rag_chunks}
+
+    # Concrete winners from both sources must survive the global cap
+    assert "Source A chunk 9" in chunk_contents
+    assert "Source B chunk 9" in chunk_contents
+    assert "Source A chunk 5" in chunk_contents
+    assert "Source B chunk 5" in chunk_contents
+
+    # Concrete losers just below the cutoff must be excluded
+    assert "Source A chunk 4" not in chunk_contents
+    assert "Source B chunk 4" not in chunk_contents

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/integration/endpoints/test_query_byok_integration.py` around lines 1058
- 1111, The test currently only checks sorted scores and presence of both
sources but not that the global merge cutoff is correct; modify the assertions
after calling query_endpoint_handler (response.rag_chunks) to assert that
specific chunks on both sides of the global cutoff are present/absent: compute
the global cutoff index = constants.BYOK_RAG_MAX_CHUNKS and given the
constructed resp_a/resp_b score patterns, assert that the highest-ranked chunk
just above the cutoff from one source (e.g., "Source B chunk X" where X is the
index that should survive) is present and that the next lower-ranked chunk just
below the cutoff from the other source (e.g., "Source A chunk Y" which should be
dropped) is absent; use resp_a/resp_b, n and constants.BYOK_RAG_MAX_CHUNKS to
determine the exact chunk indices and add assertions referencing their content
strings in chunk_contents to ensure cross-source ranking is enforced.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@tests/integration/endpoints/test_query_byok_integration.py`:
- Around line 1058-1111: The test currently only checks sorted scores and
presence of both sources but not that the global merge cutoff is correct; modify
the assertions after calling query_endpoint_handler (response.rag_chunks) to
assert that specific chunks on both sides of the global cutoff are
present/absent: compute the global cutoff index = constants.BYOK_RAG_MAX_CHUNKS
and given the constructed resp_a/resp_b score patterns, assert that the
highest-ranked chunk just above the cutoff from one source (e.g., "Source B
chunk X" where X is the index that should survive) is present and that the next
lower-ranked chunk just below the cutoff from the other source (e.g., "Source A
chunk Y" which should be dropped) is absent; use resp_a/resp_b, n and
constants.BYOK_RAG_MAX_CHUNKS to determine the exact chunk indices and add
assertions referencing their content strings in chunk_contents to ensure
cross-source ranking is enforced.

In `@tests/integration/endpoints/test_streaming_query_byok_integration.py`:
- Around line 1045-1099: The test currently only checks that both sources appear
and chunk 0 from each source is dropped, which doesn't prove a global top-K
cutoff; update the assertions after calling streaming_query_endpoint_handler
(use the captured create_call.kwargs["input"] / input_text from
mock_client.responses.create) to compute the expected global top-K by sorting
all (source,chunk,score) pairs used to build resp_a and resp_b (their scores are
in the test setup using constants.BYOK_RAG_MAX_CHUNKS and the round(...)
values), then assert that the exact expected kept chunk labels (e.g., "Source A
chunk X", "Source B chunk Y") for the top-K appear in input_text and that any
chunks ranked below the cutoff do not appear; keep using
streaming_query_endpoint_handler, QueryRequest, mock_client.responses.create,
and constants.BYOK_RAG_MAX_CHUNKS to locate the code to change.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1bc37e6f-8cf6-4cfc-84b5-1bdfee2b7b36

📥 Commits

Reviewing files that changed from the base of the PR and between b0795d0 and 335297a.

📒 Files selected for processing (2)

tests/integration/endpoints/test_query_byok_integration.py
tests/integration/endpoints/test_streaming_query_byok_integration.py

radofuchs

LGTM

are-ces force-pushed the inline-byok-integration-tests branch from 59329f4 to 1755a79 Compare March 9, 2026 13:38

coderabbitai bot reviewed Mar 9, 2026

View reviewed changes

tisnik approved these changes Mar 10, 2026

View reviewed changes

are-ces force-pushed the inline-byok-integration-tests branch from 1755a79 to b0795d0 Compare March 10, 2026 08:53

BYOK Inline and Tool RAG integration tests

335297a

are-ces force-pushed the inline-byok-integration-tests branch from b0795d0 to 335297a Compare March 10, 2026 13:38

coderabbitai bot reviewed Mar 10, 2026

View reviewed changes

are-ces force-pushed the inline-byok-integration-tests branch from 335297a to 015b3ce Compare March 11, 2026 08:08

Assert QueryResponse

7cca40b

are-ces force-pushed the inline-byok-integration-tests branch from 015b3ce to 7cca40b Compare March 11, 2026 08:25

radofuchs approved these changes Mar 11, 2026

View reviewed changes

radofuchs merged commit eb97491 into lightspeed-core:main Mar 11, 2026
20 of 22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LCORE-1423: BYOK Inline and Tool RAG integration tests#1292

LCORE-1423: BYOK Inline and Tool RAG integration tests#1292
radofuchs merged 2 commits intolightspeed-core:mainfrom
are-ces:inline-byok-integration-tests

are-ces commented Mar 9, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 9, 2026 •

edited

Loading

Reviews paused

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tisnik left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

radofuchs left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

are-ces commented Mar 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tisnik left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

radofuchs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

are-ces commented Mar 9, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 9, 2026 •

edited

Loading