Releases: ponsde/OpenViking_Curator
Releases · ponsde/OpenViking_Curator
v0.1.0 — Feedback Loop + Enhanced Dedup + Decision Report
What's new in v0.1.0
Feedback-driven retrieval ranking
feedback_storerecordsup/down/adoptsignals per resource URIrerank_with_feedback()inretrieval_v2.pyadjusts OV retrieval scores by ±0.10 max (conservative — OV's original score stays dominant)- Pipeline auto-records
adoptafter each successful retrieval (closed feedback loop) - Manual feedback:
feedback_store.apply(uri, 'up'|'down'|'adopt')
Enhanced deduplication
- Layer 1 — URL hash: extracts source URLs from text, hashes with md5, instant duplicate detection when source URLs overlap
- Layer 2 — Jaccard word similarity: replaces SequenceMatcher; word-set intersection/union, order-invariant, no external deps
- Duplicate reports now include
methodfield (url_hash|jaccard) - CJK single-character tokens preserved (技术文档里单个汉字有区分性)
Decision Report
format_report(result): CJK-safe ASCII box summary of every pipeline run (coverage, load stage, external trigger, LLM calls, conflict, duration)format_report_short(result): single-line log-friendly format- Automatically included in every
pipeline_v2.run()return asresult["decision_report"] - Uses
unicodedata.east_asian_widthfor correct terminal alignment with Chinese/Japanese/Korean text
Project structure
- Added
pyproject.toml(setuptools, deps, pytest config) - Updated README.md + README_CN.md
Tests
172 tests, all passing.