Skip to content

feat(retraction-checker): add retraction-checker#1438

Open
laci141 wants to merge 6 commits into
mvanhorn:mainfrom
laci141:feat/retraction-checker
Open

feat(retraction-checker): add retraction-checker#1438
laci141 wants to merge 6 commits into
mvanhorn:mainfrom
laci141:feat/retraction-checker

Conversation

@laci141

@laci141 laci141 commented Jul 4, 2026

Copy link
Copy Markdown

retraction-checker

Keyless CLI that checks whether a scientific paper has been retracted, when, why, and where the retraction notice is — then finds what current research says about the topic now. Runs over Crossref (retraction status) and OpenAlex (superseding work), with no API key required. Optional AI keys upgrade superseded's narrative summary only.

API: retraction-checker | Category: other | Press version: 4.25.0
Spec: openapi3 (bundled) — sha256:cfe96d62abb26981fe806f89f495fec6536d33c6294a70c90f1f1084bff19d3e

CLI Shape

Retraction Checker CLI — Check whether a paper is retracted, why, and what the current research says now — keyless, over Crossref and OpenAlex.

Highlights (not in the official API docs):
• check Tell whether a paper (by DOI or PMID) has been retracted, when, why, and where the notice is.
• scan Batch-check a reading list or .bib file and flag every retracted entry.
• superseded For a retracted or older paper, find related more-recent research on the same topic, ranked by citation count.
• watch Monitor a topic or reading list for newly-announced retractions since the last run.

Agent mode: add --agent to any command for JSON output + non-interactive mode.
Health check: run 'retraction-checker-pp-cli doctor' to verify auth and connectivity.
See README.md or the bundled SKILL.md for recipes.

Usage:
retraction-checker-pp-cli [command]

Available Commands:
agent-context Emit structured JSON describing this CLI for agents
check Tell whether a paper (by DOI or PMID) has been retracted, when, why, and where the notice is.
completion Generate the autocompletion script for the specified shell
doctor Check CLI health
export Export data to JSONL or JSON for backup, migration, or analysis
feedback Record feedback about this CLI (local by default; upstream opt-in)
help Help about any command
import Import data from JSONL file via API create/upsert calls
profile Named sets of flags saved for reuse
scan Batch-check a reading list or .bib file and flag every retracted entry.
search Full-text search across synced data or live API
superseded For a retracted or older paper, find related more-recent research on the same topic, ranked by citation count.
sync Sync API data to local SQLite for offline search and analysis
version Print version
watch Monitor a topic or reading list for newly-announced retractions since the last run.
which Find the command that implements a capability
workflow Compound workflows that combine multiple API operations
works Check whether a scientific paper has been retracted, why, and what current research says about the topic now.

Flags:
--agent Set all agent-friendly defaults (--json --compact --no-input --no-color --yes)
--compact Return only key fields (id, name, status, timestamps) for minimal token usage
--config string Config file path
--csv Output as CSV (table and array responses)
--data-source string Data source for read commands: auto (live with local fallback), live (API only), local (synced data only) (default "auto")
--deliver string Route output to a sink: stdout (default), file:, webhook:
--dry-run Show request without sending
-h, --help help for retraction-checker-pp-cli
--home string Root directory for config, data, state, and cache files
--human-friendly Enable colored output and rich formatting
--idempotent Treat already-existing create results as a successful no-op
--json Output as JSON
--max-age duration Maximum acceptable age of local-store data before a stderr hint suggests sync; 0 disables (default 30m0s)
--no-cache Bypass response cache
--no-color Disable colored output
--no-input Disable all interactive prompts (for CI/agents)
--plain Output as plain tab-separated text
--profile string Apply values from a saved profile (see 'retraction-checker-pp-cli profile list')
--quiet Bare output, one value per line
--rate-limit float Max requests per second (0 to disable)
--select string Comma-separated fields to include in output (e.g. --select id,name,status)
--timeout duration Request timeout (default 1m0s)
-v, --version version for retraction-checker-pp-cli
--yes Skip confirmation prompts (for agents and scripts)

Use "retraction-checker-pp-cli [command] --help" for more information about a command.

What This CLI Does

  • check — DOI/PMID retraction lookup via Crossref; returns structured status, update type, date, source, and notice URL. PMIDs resolve to DOIs via NCBI E-utilities first.
  • scan — batch-check a bibliography or .bib file (one DOI/PMID per line, or extracted from doi = {…} fields); flags every retracted entry.
  • superseded — for a retracted/older paper, finds newer research on the same topic via OpenAlex, ranked by citation count; optional AI narrative summary when a key is configured.
  • watch — monitors a topic or reading list for newly-announced retractions since the last run.
  • doctor — Crossref/OpenAlex reachability + cache-path health checks.

Fully keyless by default. AI keys (ANTHROPIC/OPENAI/GEMINI/GROQ/MISTRAL) are an optional upgrade for superseded's summary only.

Manuscripts

  • Research: library/other/retraction-checker/.manuscripts/20260702-090842-5bac53b7/research/
    • 2026-07-02-feat-retraction-checker-pp-cli-brief.md
    • 2026-07-02-feat-retraction-checker-pp-cli-absorb-manifest.md
    • crossref-retraction-spec.yaml
  • Proofs: library/other/retraction-checker/.manuscripts/20260702-090842-5bac53b7/proofs/
    • phase5-acceptance.json (live acceptance, status: pass)
    • publish-live-gate.json
    • 2026-07-02-fix-retraction-checker-pp-cli-shipcheck.md

Validation Results

Check Result
Manifest PASS
Phase 5 (live dogfood) PASS (46/46, level: full)
Structural dogfood PASS (0 dead flags/functions, 26/26 wired)
go mod tidy PASS (no diff)
go vet PASS
go build PASS
--help PASS
--version PASS
verify-skill PASS
govulncheck PASS (no vulnerabilities)
Manuscripts PASS (research + proofs present)

New keyless CLI that checks whether a scientific paper has been retracted, why, and what current research says about the topic now.

Commands: check (DOI/PMID retraction lookup via Crossref, structured reason + notice URL), scan (batch-check a bibliography/reading list), superseded (find newer research on the same topic via OpenAlex, ranked by citation count, optional AI-assisted summary if a key is configured), watch (monitor a topic or reading list for new retractions), doctor (Crossref/OpenAlex reachability + cache path checks).

Fully keyless by default; AI keys (ANTHROPIC/OPENAI/GEMINI/GROQ/MISTRAL) are an optional upgrade for superseded's narrative summary only.

Fixed during dogfood testing: sync used Crossref's limit param instead of the correct rows param, causing HTTP 400s. Also added a --rate-limit flag (0=disabled default, matching the other CLIs' convention) and extended pacing to OpenAlex calls in superseded, which previously had no throttling and could trip OpenAlex's anonymous-traffic rate limiting.

Phase 5 dogfood acceptance: 46/46 passed (status: pass).
@greptile-apps

greptile-apps Bot commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds retraction-checker, a new keyless CLI that checks whether a scientific paper has been retracted (via Crossref) and finds superseding research (via OpenAlex). It ships the full standard library layout — commands (check, scan, superseded, watch, doctor), MCP server, sync/store layer, rate-limiting utilities, file-based cache, and structured test coverage — with all canonical validation gates passing.

  • Novel commands (check, scan, superseded, watch) are hand-authored and cover DOI/PMID resolution, BibTeX batch scanning with concurrent fan-out, TTL-pruned watch baselines, and OpenAlex relation queries with retry-backoff.
  • Previous review concerns (silent false-negative on check, unbounded watch baseline growth, missing dogfood-results.json, NCBI rate-limit threading in scan) have been addressed in this iteration with explicit error surfacing, SeenEntry timestamps, and the shared AdaptiveLimiter passed through resolveAndCheck.

Confidence Score: 5/5

Safe to merge; all functional issues flagged in previous review rounds are resolved, and the remaining observations are minor quality nits with no runtime impact.

The three hand-authored commands address their previously flagged bugs: the silent false-negative in check now surfaces v.Error in human-readable mode, NCBI rate-limit pacing is threaded through resolveAndCheck in scan, and the watch baseline no longer grows without bound thanks to SeenEntry timestamps and TTL pruning. Remaining findings do not affect correctness under normal usage.

No files require special attention; style nits are in retraction.go (defer pattern), scan.go (scanner error), and watch.go (TrackedTotal semantics).

Important Files Changed

Filename Overview
library/other/retraction-checker/internal/cli/check.go Single-DOI/PMID check command; previous silent-false-negative bug correctly fixed by surfacing v.Error in human-readable path.
library/other/retraction-checker/internal/cli/scan.go Batch-check command with goroutine fan-out; rate limiter now wired through to NCBI calls. Scanner error from parseIdentifiers is not propagated, risking silent partial results on large-line inputs.
library/other/retraction-checker/internal/cli/watch.go Watch command with TTL-based pruning; TrackedTotal field semantically mismatches cumulative baseline size. Previous unbounded-growth concern addressed via SeenEntry timestamps.
library/other/retraction-checker/internal/cli/retraction.go Core retraction-check logic for Crossref/NCBI/OpenAlex; defer inside retry loop in fetchOpenAlexRelated is an anti-pattern though no current leak exists.
library/other/retraction-checker/internal/cli/superseded.go Superseded command looks up related OpenAlex works; --from-year flag registered and wired correctly.
library/other/retraction-checker/internal/cliutil/ratelimit.go AdaptiveLimiter with nil-safe methods, backoff helpers, and Retry-After parsing; clean and well-tested.
library/other/retraction-checker/internal/cache/cache.go File-based response cache with TTL; straightforward and safe.
library/other/retraction-checker/internal/cli/root.go Root command and flag definitions; default --rate-limit=0 intentional (user opt-in pacing).
library/other/retraction-checker/go.mod Go 1.26 module; per-iteration loop variable semantics apply. x/sys floor raised above CVE-vulnerable v0.31.0.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant User
    participant CLI
    participant Crossref
    participant NCBI
    participant OpenAlex

    User->>CLI: "check / scan doi|pmid"
    alt input is PMID
        CLI->>NCBI: esummary (rate-limited)
        NCBI-->>CLI: DOI
    end
    CLI->>Crossref: "GET /works/{doi}"
    Crossref-->>CLI: update-to / update-by metadata
    CLI-->>User: retraction verdict

    User->>CLI: superseded doi
    CLI->>Crossref: "GET /works/{doi}"
    Crossref-->>CLI: title + year
    CLI->>OpenAlex: search works (retry on 503)
    OpenAlex-->>CLI: related works ranked by citations
    CLI-->>User: superseded result

    User->>CLI: watch topic
    CLI->>Crossref: "GET /works?filter=retraction"
    Crossref-->>CLI: current notices
    CLI->>CLI: diff vs TTL-pruned baseline
    CLI-->>User: new retractions since last run
    CLI->>CLI: save merged baseline
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant User
    participant CLI
    participant Crossref
    participant NCBI
    participant OpenAlex

    User->>CLI: "check / scan doi|pmid"
    alt input is PMID
        CLI->>NCBI: esummary (rate-limited)
        NCBI-->>CLI: DOI
    end
    CLI->>Crossref: "GET /works/{doi}"
    Crossref-->>CLI: update-to / update-by metadata
    CLI-->>User: retraction verdict

    User->>CLI: superseded doi
    CLI->>Crossref: "GET /works/{doi}"
    Crossref-->>CLI: title + year
    CLI->>OpenAlex: search works (retry on 503)
    OpenAlex-->>CLI: related works ranked by citations
    CLI-->>User: superseded result

    User->>CLI: watch topic
    CLI->>Crossref: "GET /works?filter=retraction"
    Crossref-->>CLI: current notices
    CLI->>CLI: diff vs TTL-pruned baseline
    CLI-->>User: new retractions since last run
    CLI->>CLI: save merged baseline
Loading

Reviews (6): Last reviewed commit: "feat(watch): add TTL-based pruning with ..." | Re-trigger Greptile

Comment thread library/other/retraction-checker/.printing-press.json
Comment thread library/other/retraction-checker/.printing-press.json
Comment thread library/other/retraction-checker/.printing-press.json
Comment thread library/other/retraction-checker/internal/cli/watch.go Outdated
Comment thread library/other/retraction-checker/internal/cli/scan.go
scan dispatches up to 6 concurrent resolveAndCheck goroutines, and when an ID is a PMID, resolvePMIDToDOI called NCBI's API via the unthrottled sharedHTTP client with no rate limiting. NCBI's public API allows only 3 requests/second without a key, so a bibliography with many PMIDs could trigger 429s under this concurrency (Greptile, PR mvanhorn#1438 review).

resolvePMIDToDOI and resolveAndCheck now take an *cliutil.AdaptiveLimiter and call Wait()/OnRateLimit() around the NCBI request. check.go builds one limiter per invocation (no concurrency); scan.go builds a single shared limiter before the goroutine loop (mutex-protected, safe across concurrent workers) so all in-flight PMID resolutions share one NCBI-facing pace regardless of concurrency.

Reuses the existing --rate-limit root flag (default 0 = disabled) - no new flag needed.

Verified: go build ./... and go vet ./... both pass.
Comment thread library/other/retraction-checker/internal/cli/check.go Outdated
laci141 added 2 commits July 4, 2026 10:08
…se-negative

checkDOI sets v.DOI = doi before the network call, so a transient error or 404 left both v.Error and v.DOI populated. The old guard (v.Error != "" && v.DOI == "") never fired in that case, and human-readable mode printed a misleading "NOT retracted" instead of surfacing the failure (Greptile, PR mvanhorn#1438 review).

Human-readable mode now returns the error whenever v.Error != "", regardless of whether a DOI was resolved. JSON/agent mode is unchanged - it still serializes the full verdict struct including the error field, so structured consumers see the failure either way.

Verified: go build ./... and go vet ./... both pass.
@laci141 laci141 changed the title feat(retraction-checker): add retraction-checker-pp-cli feat(retraction-checker): add retraction-checker Jul 4, 2026
laci141 added 2 commits July 4, 2026 11:27
…eded command

Added a new flag --from-year to allow users to specify a cutoff year for related research.
Replace plain []string baseline with SeenEntry (DOI + last_seen)
and prune entries older than 365 days on every load and save.
Automatically migrate old []string format to the new structure.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant