fix(techmeme): emit dated, valid JSON from search#1440
Conversation
Techmeme's search command scraped every anchor on the results page and broke
its own JSON contract: zero hits printed prose ('No results for ...') to
stdout before the --json check, bare publication anchors ('Wall Street
Journal') leaked in as headlines, and the per-result dates on the page were
never parsed - so archive hits from 2022 were indistinguishable from today's
news (observed breaking the last30days engine on 2026-07-04).
search now pairs each headline anchor with its iinf date block in document
order (bounded to the results canvas, cut at prevnext), emits a date field
(ISO YYYY-MM-DD, empty when unparseable), always outputs a valid JSON array
in JSON mode (empty array on zero hits - nil slices marshal as null, so the
slice is initialized), warns on stderr when markup drift yields anchors but
no date blocks, and gains --days N for client-side recency filtering
(undated records drop when the filter is active). Human mode keeps the
friendly zero-results line and gains a DATE column.
Recorded as .printing-press-patches/search-json-dates-contract.json so a
reprint preserves the contract; contributor attribution added per the
three-surface convention.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01BWuSdMPdQLnAeh65wTeG3L
… day-boundary, symmetric markup guard
Review pass (2 independent reviewers + live repro) caught: --agent's implied
--compact stripped every search field to {} via the shared keepFields
allow-list (fixed with a search-local default --select, shared map
untouched); --days compared midnight-UTC dates against a wall-clock instant,
dropping records dated exactly N days ago; the markup-shift guard only fired
for anchors-without-dates, leaving the reverse direction silent. Plus DATE
column test coverage and an anchor regex hardened for attributes before HREF.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01BWuSdMPdQLnAeh65wTeG3L
The cobra-tree walker skipped search as 'covered by a typed MCP tool', but techmeme-search_rss returns the raw archive-search HTML of the same endpoint the CLI now parses into dated, valid JSON - so the shell-out is the strictly better surface and MCP agents could not reach the fixed contract at all. Removed search from frameworkCommands per the file's own when-in-doubt principle and recorded the lesson in the patch entry. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01BWuSdMPdQLnAeh65wTeG3L
Greptile SummaryThis PR fixes three distinct bugs in
Confidence Score: 5/5Safe to merge — the parser rewrite is well-scoped, all three bug fixes are covered by targeted tests, and the MCP registration change is a single-line removal with no behavioral side effects on other commands. The structural anchor/iinf pairing algorithm is correct: document-order position comparison is consistent within the trimmed page string, the boundary-inclusive cutoff in filterSearchByDays is derived from midnight UTC to avoid time-of-day skew, the nil-slice guard correctly prevents json.Marshal emitting null, and the flags copy in the --compact path leaves the shared struct unmutated. The 15-test suite exercises all meaningful branches including the zone-skew boundary case, the renamed-permalink exclusion, and the partial-pairing markup-shift signal. No files require special attention. Important Files Changed
|
… failures Greptile review: the 'In context' filter keyed on anchor text, so a renamed label would let a Techmeme story permalink pair with the NEXT story's date block and misattribute its link - now excluded by href shape (techmeme.com/YYMMDD/pNN). And the markup-shift guard fired only on total pairing collapse; it now warns whenever any iinf block goes unmatched. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01BWuSdMPdQLnAeh65wTeG3L
What
techmeme-pp-cli searchscraped every anchor on Techmeme's archive-search results page and broke its own JSON contract. This PR makes the search surface honest and agent-safe:--json/--agentemit a valid JSON array for every successful invocation, including[]on zero hits (previously:No results for "q"prose on stdout, exit 0 - a nil Go slice also marshals tonull, so the empty slice is initialized explicitly).date(ISOYYYY-MM-DD,""when unparseable), parsed from the results page's per-itemiinfblocks with the same multi-layout approach asparseRSSDate/parseRiverTimestamp. Human table gains a DATE column.prevnext) - bare publication anchors ("Wall Street Journal") and post-results promos no longer become records.--days N: client-side recency filter at UTC day granularity (undated records drop when active).iinfblocks fail to pair in either direction.--agentactually works now: the implied--compactstripped every search field to{}via the sharedkeepFieldsallow-list; search now defaults an explicit field select when compacting (shared map untouched).searchwas hardcoded into the cobra-treeframeworkCommandsskip-list on the premise that the typedtechmeme-search_rsstool covers it - but that tool returns the raw HTML of the same endpoint.searchis now registered as a shell-out MCP tool.Recorded in
.printing-press-patches/search-json-dates-contract.jsonas a reprint-guard; contributor attribution added on all three surfaces. No release-owned files touched.Why (reproduction)
Observed live in a last30days engine run on 2026-07-04 ("Kanye West"): two subqueries died with
JSON decode failed: Expecting value: line 1 column 1 (char 0)(zero-hit prose), and the one that returned data delivered Dec 2022 Parler-saga headlines indistinguishable from current news (no dates). Raw page: each result is publication anchor + headline anchor +iinfdate; the old single-regex sweep captured all anchors and no dates.Validation
go build ./.../go vet ./...go test ./...govulncheck ./...verify_skill.py --dir library/productivity/techmeme/exportpositional-arg failure in README)search "kanye west" --json--json[], exit 0, pipes throughpython3 -m json.toolsearch "kanye west" --agent[{},{},...])Fixture-first: the trimmed live-capture fixture was committed and parse tests run against the old parser first (6 records incl. "Leaderboard"/"Wall Street Journal", zero dates) before the rewrite.
Known Residuals
internal/cli/search.go:88---dayscomposes with a single-page fetch (~10 archive hits per page); no signal when more in-window results exist. Pagination was deliberately deferred; a minimal follow-up is parsing the "Results 1 - 10 of about N" header and warning on stderr when truncated under--days.author.goand at least 5 other published CLIs - filed as a generator-level issue (linked below) rather than widened into this PR.Post-Deploy Monitoring & Validation
go install .../techmeme/cmd/techmeme-pp-cli@latest, thensearch <topic> --json | python3 -m json.tool(valid array, dated records) and a zero-hit query (must print[]).[Techmeme] found N recordswith noJSON decode failedlines.--agent; revert this PR.🤖 Generated with Claude Code
https://claude.ai/code/session_01BWuSdMPdQLnAeh65wTeG3L