Skip to content

LCORE-1377 Parse index name from chunk metadata source attribute#1300

Open
max-svistunov wants to merge 1 commit intolightspeed-core:mainfrom
max-svistunov:lcore-1377-parse-index-name-from-chunk-metadata
Open

LCORE-1377 Parse index name from chunk metadata source attribute#1300
max-svistunov wants to merge 1 commit intolightspeed-core:mainfrom
max-svistunov:lcore-1377-parse-index-name-from-chunk-metadata

Conversation

@max-svistunov
Copy link
Contributor

@max-svistunov max-svistunov commented Mar 10, 2026

Description

Resolve the embedded index name in the chunk metadata "source" attribute to figure out which store a chunk came from. This is done after single-store ID mapping and vector_store_id attribute resolution have been tried.

The rag-content counterpart PR is lightspeed-core/rag-content#99

(Continuation of #1135, #1208, #1248)

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

  • Assisted-by: Claude Opus 4.6
  • Generated by: Claude Opus 4.6

Related Tickets & Documents

  • Related Issue # LCORE-1377
  • Closes # LCORE-1377

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  1. Start llama-stack with faiss + sentence-transformers provider
  2. Create a vector store via REST API
  3. Insert a chunk with metadata: {"source": "test-index", ...} through vector_io.insert
  4. Search the vector store using vector_stores/{id}/search.
  5. Verify result.attributes["source"] == "test-index".

Summary by CodeRabbit

  • Bug Fixes
    • Improved source attribution when multiple data sources are configured: results now fall back to an explicit source attribute if the primary identifier is missing, while still preferring the primary identifier when present.
  • Tests
    • Added unit tests covering precedence and fallback behaviors for source resolution across multiple configured stores.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 10, 2026

Walkthrough

refine resolve_source_for_result to read the result's attributes["source"] as a fallback when vector_store_id is absent in multi-store setups; when source is present it is returned directly (no mapping). Added four unit tests covering precedence and edge cases.

Changes

Cohort / File(s) Summary
Source resolution logic
src/utils/responses.py
When multiple vector stores configured, _resolve_source_for_result now falls back to result.attributes["source"] if vector_store_id is missing and returns that value directly (no rag_id_mapping applied).
Unit tests for fallback and precedence
tests/unit/utils/test_responses.py
Added four tests: test_multiple_stores_source_attribute_fallback, test_multiple_stores_source_attribute_ignores_mapping, test_multiple_stores_vector_store_id_preferred_over_source, and test_multiple_stores_no_vector_store_id_no_source to validate fallback behavior, mapping precedence, and absence cases.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly reflects the main change: adding logic to parse and resolve an index name from the chunk metadata source attribute to determine vector store origin.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/utils/responses.py`:
- Around line 849-852: The fallback uses attributes.get("source") but then looks
up rag_id_mapping (built as vector_db_id -> rag_id), which is the wrong key
shape and will usually return the raw metadata; change the fallback to return
attr_source directly instead of rag_id_mapping.get(attr_source, attr_source),
i.e., when attributes.get("source") is present just return attr_source, or
alternatively implement and call a dedicated index-name-to-rag lookup (e.g., an
indexNameToRag map) before this helper if you need translation—update references
to rag_id_mapping, attributes.get("source"), and the fallback branch
accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 50c19659-1ea1-42fa-b276-7ea4b3b584f9

📥 Commits

Reviewing files that changed from the base of the PR and between 36c2ae2 and bbccaa0.

📒 Files selected for processing (2)
  • src/utils/responses.py
  • tests/unit/utils/test_responses.py

@max-svistunov max-svistunov force-pushed the lcore-1377-parse-index-name-from-chunk-metadata branch from bbccaa0 to 3c1abde Compare March 10, 2026 12:29
@max-svistunov
Copy link
Contributor Author

@are-ces Could you PTAL?

Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tisnik
Copy link
Contributor

tisnik commented Mar 10, 2026

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 10, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/utils/responses.py (1)

844-854: ⚠️ Potential issue | 🟡 Minor

Support dict-shaped results in this fallback.

parse_referenced_documents() already handles both object and dict results, but this branch still pulls attributes via getattr(...) only. In a multi-store response backed by dicts, both vector_store_id and the new source fallback will be skipped and source resolves to None.

🩹 Proposed fix
     if len(vector_store_ids) > 1:
-        attributes = getattr(result, "attributes", {}) or {}
+        if isinstance(result, dict):
+            attributes = result.get("attributes", {}) or {}
+        else:
+            attributes = getattr(result, "attributes", {}) or {}
         attr_store_id: Optional[str] = attributes.get("vector_store_id")
         if attr_store_id:
             return rag_id_mapping.get(attr_store_id, attr_store_id)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/responses.py` around lines 844 - 854, The fallback assumes result
is an object and uses getattr(result, "attributes", {}), which skips dict-shaped
results; update the attributes extraction in parse_referenced_documents() (the
block that sets attributes, attr_store_id, attr_source and uses rag_id_mapping)
to first handle dictionaries (e.g., if isinstance(result, dict): attributes =
result.get("attributes", {}) ) and otherwise use getattr, then proceed to check
attr_store_id and attr_source as before so both "vector_store_id" and the
"source" fallback work for dict and object results.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/utils/responses.py`:
- Around line 844-854: The fallback assumes result is an object and uses
getattr(result, "attributes", {}), which skips dict-shaped results; update the
attributes extraction in parse_referenced_documents() (the block that sets
attributes, attr_store_id, attr_source and uses rag_id_mapping) to first handle
dictionaries (e.g., if isinstance(result, dict): attributes =
result.get("attributes", {}) ) and otherwise use getattr, then proceed to check
attr_store_id and attr_source as before so both "vector_store_id" and the
"source" fallback work for dict and object results.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6c64e5a7-232b-43e1-ae13-1e82277a704a

📥 Commits

Reviewing files that changed from the base of the PR and between bbccaa0 and 3c1abde.

📒 Files selected for processing (2)
  • src/utils/responses.py
  • tests/unit/utils/test_responses.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/unit/utils/test_responses.py

@tisnik tisnik requested a review from are-ces March 10, 2026 16:21
@max-svistunov
Copy link
Contributor Author

Thanks @tisnik , could you also PTAL at the rag-content counterpart?

lightspeed-core/rag-content#99

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants