LCORE-1377 Parse index name from chunk metadata source attribute by max-svistunov · Pull Request #1300 · lightspeed-core/lightspeed-stack

max-svistunov · 2026-03-10T12:05:38Z

Description

Resolve the embedded index name in the chunk metadata "source" attribute to figure out which store a chunk came from. This is done after single-store ID mapping and vector_store_id attribute resolution have been tried.

The rag-content counterpart PR is lightspeed-core/rag-content#99

(Continuation of #1135, #1208, #1248)

Type of change

Tools used to create PR

Assisted-by: Claude Opus 4.6
Generated by: Claude Opus 4.6

Related Tickets & Documents

Related Issue # LCORE-1377
Closes # LCORE-1377

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Start llama-stack with faiss + sentence-transformers provider
Create a vector store via REST API
Insert a chunk with metadata: {"source": "test-index", ...} through vector_io.insert
Search the vector store using vector_stores/{id}/search.
Verify result.attributes["source"] == "test-index".

Summary by CodeRabbit

Bug Fixes
- Improved source attribution when multiple data sources are configured: results now fall back to an explicit source attribute if the primary identifier is missing, while still preferring the primary identifier when present.
Tests
- Added unit tests covering precedence and fallback behaviors for source resolution across multiple configured stores.

coderabbitai · 2026-03-10T12:06:01Z

Walkthrough

refine resolve_source_for_result to read the result's attributes["source"] as a fallback when vector_store_id is absent in multi-store setups; when source is present it is returned directly (no mapping). Added four unit tests covering precedence and edge cases.

Changes

Cohort / File(s)	Summary
Source resolution logic `src/utils/responses.py`	When multiple vector stores configured, `_resolve_source_for_result` now falls back to `result.attributes["source"]` if `vector_store_id` is missing and returns that value directly (no rag_id_mapping applied).
Unit tests for fallback and precedence `tests/unit/utils/test_responses.py`	Added four tests: `test_multiple_stores_source_attribute_fallback`, `test_multiple_stores_source_attribute_ignores_mapping`, `test_multiple_stores_vector_store_id_preferred_over_source`, and `test_multiple_stores_no_vector_store_id_no_source` to validate fallback behavior, mapping precedence, and absence cases.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly reflects the main change: adding logic to parse and resolve an index name from the chunk metadata source attribute to determine vector store origin.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/utils/responses.py`:
- Around line 849-852: The fallback uses attributes.get("source") but then looks
up rag_id_mapping (built as vector_db_id -> rag_id), which is the wrong key
shape and will usually return the raw metadata; change the fallback to return
attr_source directly instead of rag_id_mapping.get(attr_source, attr_source),
i.e., when attributes.get("source") is present just return attr_source, or
alternatively implement and call a dedicated index-name-to-rag lookup (e.g., an
indexNameToRag map) before this helper if you need translation—update references
to rag_id_mapping, attributes.get("source"), and the fallback branch
accordingly.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 50c19659-1ea1-42fa-b276-7ea4b3b584f9

📥 Commits

Reviewing files that changed from the base of the PR and between 36c2ae2 and bbccaa0.

📒 Files selected for processing (2)

src/utils/responses.py
tests/unit/utils/test_responses.py

src/utils/responses.py

max-svistunov · 2026-03-10T12:30:40Z

@are-ces Could you PTAL?

tisnik

LGTM

tisnik · 2026-03-10T16:01:52Z

@coderabbitai review

coderabbitai · 2026-03-10T16:02:06Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/utils/responses.py (1)

844-854: ⚠️ Potential issue | 🟡 Minor

Support dict-shaped results in this fallback.

parse_referenced_documents() already handles both object and dict results, but this branch still pulls attributes via getattr(...) only. In a multi-store response backed by dicts, both vector_store_id and the new source fallback will be skipped and source resolves to None.

🩹 Proposed fix

     if len(vector_store_ids) > 1:
-        attributes = getattr(result, "attributes", {}) or {}
+        if isinstance(result, dict):
+            attributes = result.get("attributes", {}) or {}
+        else:
+            attributes = getattr(result, "attributes", {}) or {}
         attr_store_id: Optional[str] = attributes.get("vector_store_id")
         if attr_store_id:
             return rag_id_mapping.get(attr_store_id, attr_store_id)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/utils/responses.py` around lines 844 - 854, The fallback assumes result
is an object and uses getattr(result, "attributes", {}), which skips dict-shaped
results; update the attributes extraction in parse_referenced_documents() (the
block that sets attributes, attr_store_id, attr_source and uses rag_id_mapping)
to first handle dictionaries (e.g., if isinstance(result, dict): attributes =
result.get("attributes", {}) ) and otherwise use getattr, then proceed to check
attr_store_id and attr_source as before so both "vector_store_id" and the
"source" fallback work for dict and object results.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/utils/responses.py`:
- Around line 844-854: The fallback assumes result is an object and uses
getattr(result, "attributes", {}), which skips dict-shaped results; update the
attributes extraction in parse_referenced_documents() (the block that sets
attributes, attr_store_id, attr_source and uses rag_id_mapping) to first handle
dictionaries (e.g., if isinstance(result, dict): attributes =
result.get("attributes", {}) ) and otherwise use getattr, then proceed to check
attr_store_id and attr_source as before so both "vector_store_id" and the
"source" fallback work for dict and object results.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6c64e5a7-232b-43e1-ae13-1e82277a704a

📥 Commits

Reviewing files that changed from the base of the PR and between bbccaa0 and 3c1abde.

📒 Files selected for processing (2)

src/utils/responses.py
tests/unit/utils/test_responses.py

🚧 Files skipped from review as they are similar to previous changes (1)

tests/unit/utils/test_responses.py

max-svistunov · 2026-03-10T18:37:29Z

Thanks @tisnik , could you also PTAL at the rag-content counterpart?

lightspeed-core/rag-content#99

max-svistunov mentioned this pull request Mar 10, 2026

LCORE-1377 Embed index name as source attribute in chunk metadata lightspeed-core/rag-content#99

Open

19 tasks

coderabbitai bot reviewed Mar 10, 2026

View reviewed changes

src/utils/responses.py Outdated Show resolved Hide resolved

Parse index name from chunk metadata source attribute

3c1abde

max-svistunov force-pushed the lcore-1377-parse-index-name-from-chunk-metadata branch from bbccaa0 to 3c1abde Compare March 10, 2026 12:29

tisnik approved these changes Mar 10, 2026

View reviewed changes

coderabbitai bot reviewed Mar 10, 2026

View reviewed changes

tisnik requested a review from are-ces March 10, 2026 16:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LCORE-1377 Parse index name from chunk metadata source attribute#1300

LCORE-1377 Parse index name from chunk metadata source attribute#1300
max-svistunov wants to merge 1 commit intolightspeed-core:mainfrom
max-svistunov:lcore-1377-parse-index-name-from-chunk-metadata

max-svistunov commented Mar 10, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 10, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

max-svistunov commented Mar 10, 2026

Uh oh!

tisnik left a comment

Uh oh!

tisnik commented Mar 10, 2026

Uh oh!

coderabbitai bot commented Mar 10, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

max-svistunov commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

max-svistunov commented Mar 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

max-svistunov commented Mar 10, 2026

Uh oh!

tisnik left a comment

Choose a reason for hiding this comment

Uh oh!

tisnik commented Mar 10, 2026

Uh oh!

coderabbitai bot commented Mar 10, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

max-svistunov commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

max-svistunov commented Mar 10, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 10, 2026 •

edited

Loading