Skip to content

Releases: ponsde/OpenViking_Curator

v0.1.0 — Feedback Loop + Enhanced Dedup + Decision Report

25 Feb 10:59

Choose a tag to compare

What's new in v0.1.0

Feedback-driven retrieval ranking

  • feedback_store records up/down/adopt signals per resource URI
  • rerank_with_feedback() in retrieval_v2.py adjusts OV retrieval scores by ±0.10 max (conservative — OV's original score stays dominant)
  • Pipeline auto-records adopt after each successful retrieval (closed feedback loop)
  • Manual feedback: feedback_store.apply(uri, 'up'|'down'|'adopt')

Enhanced deduplication

  • Layer 1 — URL hash: extracts source URLs from text, hashes with md5, instant duplicate detection when source URLs overlap
  • Layer 2 — Jaccard word similarity: replaces SequenceMatcher; word-set intersection/union, order-invariant, no external deps
  • Duplicate reports now include method field (url_hash | jaccard)
  • CJK single-character tokens preserved (技术文档里单个汉字有区分性)

Decision Report

  • format_report(result): CJK-safe ASCII box summary of every pipeline run (coverage, load stage, external trigger, LLM calls, conflict, duration)
  • format_report_short(result): single-line log-friendly format
  • Automatically included in every pipeline_v2.run() return as result["decision_report"]
  • Uses unicodedata.east_asian_width for correct terminal alignment with Chinese/Japanese/Korean text

Project structure

  • Added pyproject.toml (setuptools, deps, pytest config)
  • Updated README.md + README_CN.md

Tests

172 tests, all passing.