Skip to content

feat(retrieve): use tags metadata for cross-subtree retrieval#1162

Open
13ernkastel wants to merge 3 commits intovolcengine:mainfrom
13ernkastel:codex/issue-1147-tags-retrieval
Open

feat(retrieve): use tags metadata for cross-subtree retrieval#1162
13ernkastel wants to merge 3 commits intovolcengine:mainfrom
13ernkastel:codex/issue-1147-tags-retrieval

Conversation

@13ernkastel
Copy link
Copy Markdown
Contributor

@13ernkastel 13ernkastel commented Apr 1, 2026

Summary

  • auto-extract and persist tags during resource vectorization, while also allowing user-supplied tags on resource ingestion and search requests
  • thread tags through the resource/search services, SDK clients, and vector index filtering so callers can explicitly constrain retrieval by tag
  • expand HierarchicalRetriever with bounded, down-weighted tag-based cross-subtree discovery before BFS traversal starts
  • add focused retriever and router tests, and make the shared server test fixture use an isolated local config/AGFS setup

Related Issue

Fixes #1147

Why

Issue #1147 asks for tags to act as a lateral discovery signal across semantically distant subtrees. This keeps the existing hierarchical retrieval flow, but gives it a controlled way to discover related branches that the initial semantic top-K would otherwise miss.

Impact

  • resources can store merged auto-extracted and user-provided tags
  • find/search requests can explicitly scope retrieval by tags
  • global semantic hits can seed additional related subtrees through shared tags with capped expansion and lower initial scores

Validation

  • PYTHONPATH=/Users/lennonchia/Documents/Project/OpenViking /Users/lennonchia/Documents/Project/OpenViking/.venv/bin/python -m pytest -q tests/retrieve/test_hierarchical_retriever_target_dirs.py tests/retrieve/test_hierarchical_retriever_rerank.py tests/retrieve/test_hierarchical_retriever_tags.py tests/server/test_api_search.py::test_find_forwards_tags_to_service tests/server/test_api_resources.py::test_add_resource_forwards_tags_to_service

Notes

  • I attempted the broader tests/server/test_api_search.py suite locally, but the current local wheel is missing the native VectorDB PersistStore symbol, so full end-to-end vector search verification is still environment-dependent here.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

Failed to generate code suggestions for PR

@13ernkastel 13ernkastel changed the title [codex] use tags metadata for cross-subtree retrieval [feat] use tags metadata for cross-subtree retrieval Apr 1, 2026
@13ernkastel 13ernkastel marked this pull request as ready for review April 1, 2026 16:35
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

Failed to generate code suggestions for PR

@13ernkastel 13ernkastel changed the title [feat] use tags metadata for cross-subtree retrieval feat(retrieve): use tags metadata for cross-subtree retrieval Apr 1, 2026
@MaojiaSheng
Copy link
Copy Markdown
Collaborator

我建议对 tag 的内容命名进行约束,例如强制声明 tag 来源或原因:
user:machine-learning;user:model-training;auto:pytorch

@qin-ctx qin-ctx requested a review from zhoujh01 April 2, 2026 07:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

[Feature]: 利用 Tags 元数据增强跨子树检索能力

3 participants