diff --git a/.gitignore b/.gitignore index b52af9407..c2d6e36dd 100644 --- a/.gitignore +++ b/.gitignore @@ -38,6 +38,7 @@ pytest-full.xml # Local agent-generated skills (keep canonical copies under docs/) .claude/ +.agents/ /skills/ CLAUDE.md AGENTS.md diff --git a/RELEASE_NOTES_0.9.10.md b/RELEASE_NOTES_0.9.10.md new file mode 100644 index 000000000..e47817b06 --- /dev/null +++ b/RELEASE_NOTES_0.9.10.md @@ -0,0 +1,127 @@ +

+ Desloppify mascot +

+ +This release adds **experimental Hermes Agent integration** for fully autonomous cleanup loops, **framework-aware detection** with a full Next.js spec, **SCSS language support**, significant **R language improvements**, and a **scan performance boost** from detector prefetch + caching — alongside a batch of bug fixes from the community. + +--- + +**152 files changed | 54 commits | 5,466 tests passing** + +## Hermes Agent Integration (Experimental) + +We've been exploring what it looks like when a codebase health tool can actually *drive* an AI agent — not just generate reports, but orchestrate the entire cleanup loop autonomously. This release ships our first experimental integration with [Hermes Agent](https://github.com/NousResearch/hermes-agent). + +The core idea: desloppify already knows what needs to be done (scan, triage, review, fix). Instead of printing instructions for a human, it can now tell the agent directly — switch to a cheap model for mechanical fixes, switch to an expensive one for architectural review, reset context between tasks, and keep the agent working via `/autoreply`, all without a human in the loop. + +What the integration enables: + +- **Autonomous review loops** — desloppify orchestrates blind reviews via `delegate_task` subagents (up to 3 concurrent), no human needed +- **Model switching at phase boundaries** — cheap models for execution, expensive for planning/review, switched automatically +- **Context management** — automatic resets between tasks to keep the agent focused on long sessions +- **Lifecycle transitions** — desloppify tells Hermes what to do next via the Control API + +### How to try it + +**This requires the Control API branch of Hermes** ([NousResearch/hermes-agent#1508](https://github.com/NousResearch/hermes-agent/pull/1508)), which hasn't been merged upstream yet. Without it, Hermes works as a normal harness but can't do autonomous model switching or self-prompting. + +**Step 1 — Install Hermes from the Control API branch:** + +```bash +git clone -b feat/control-api-autoreply https://github.com/peteromallet/hermes-agent.git +cd hermes-agent +pip install -e . +``` + +**Step 2 — Install desloppify and set up the Hermes skill doc:** + +```bash +pip install desloppify[full] +cd /path/to/your/project +desloppify update-skill hermes +``` + +This writes a `AGENTS.md` skill document into your project that teaches Hermes how to use desloppify. + +**Step 3 — Start Hermes with the Control API enabled, pointed at your project:** + +```bash +cd /path/to/your/project +HERMES_CONTROL_API=1 hermes +``` + +**Step 4 — Tell it to scan.** In the Hermes session, type: + +``` +Run desloppify scan, then follow its coaching output to clean up the codebase. +``` + +Desloppify will guide Hermes through the full lifecycle — scanning, triaging findings, running blind reviews with subagents, and fixing issues. It switches models and resets context automatically at phase boundaries. + +**This is experimental and we're iterating fast.** We'd love feedback on the approach, rough edges, and what you'd want to see next. If you try it, please open an issue — every report helps. + +## Framework-Aware Detection + +Massive contribution from **@MacHatter1** (PR #414). A new `FrameworkSpec` abstraction layer for framework-specific detection, shipping with a full Next.js spec that understands App Router conventions, server components, `use client`/`use server` directives, and Next.js-specific lint rules. This means dramatically fewer false positives when scanning Next.js projects — framework idioms are recognized, not flagged. The spec system is extensible, so adding support for other frameworks (Remix, SvelteKit, etc.) is now a matter of writing a spec, not changing the engine. + +## SCSS Language Plugin + +Thanks to **@klausagnoletti** for adding SCSS/Sass support via stylelint integration (PR #428). Detects code smells, unused variables, and style issues in `.scss` and `.sass` files. @klausagnoletti has also submitted a follow-up PR (#452) with bug fixes, tests, and honest documentation — expected to land shortly after release. + +## Plugin Tests, Docs, and Ruby Improvements + +**@klausagnoletti** also contributed across multiple language plugins: + +- **Ruby plugin improvements** (PR #462) — expanded exclusions, detect markers (`Gemfile`, `Rakefile`, `.ruby-version`, `*.gemspec`), `default_src="lib"`, `spec/` + `test/` support, and 13 wiring tests. Also adds `external_test_dirs` and `test_file_extensions` params to the generic plugin framework. +- **JavaScript plugin tests + README** (PR #458) — 12 sanity tests covering ESLint integration, command construction, fixer registration, and output parsing. +- **Python plugin README** (PR #459) — user-facing documentation covering phases, requirements, and usage. + +## R Language Improvements + +**@sims1253** has been steadily building out R support and contributed four PRs to this release: + +- **Jarl linter** with autofix support (PR #425) — adds a fast R linter as an alternative to lintr +- **Shell quote escaping fix** for lintr commands (PR #424) — prevents command injection on paths with special characters +- **Tree-sitter query improvements** (PR #449) — captures anonymous functions in `lapply`/`sapply` calls and `pkg::fn` namespace imports +- **Factory Droid harness support** (PR #451) — adds Droid as a new skill target, following the existing harness pattern exactly + +## Scan Performance: Detector Prefetch + Cache + +Another big one from **@MacHatter1** (PR #432). Cold and full scan times reduced significantly. Detectors now prefetch file contents and cache results across detection phases, avoiding redundant I/O. On large codebases this is a noticeable improvement. + +## Lifecycle & Triage + +- **Lifecycle transition messages** — the tool now tells agents what phase they're in and what to do next, with structured directives for each transition +- **Unified triage pipeline** with step detail display +- **Staged triage** now requires explicit decisions for auto-clusters before proceeding — no more accidentally skipping triage steps + +## Bug Fixes + +- **Binding-aware unused import detection for JS/TS** — @MacHatter1 (PR #433). No longer flags imports used via destructuring, `as` renames, or re-export patterns. This was a significant source of false positives in real JS/TS projects. +- **Rust dep graph hangs** — @fluffypony (PR #429). String literals that look like import paths (e.g., `"path/to/thing"`) no longer cause the dependency graph builder to hang. @fluffypony also contributed Rust inline-test filtering (PR #440), which prevents `#[cfg(test)]` diagnostic noise from inflating production debt scores. +- **Project root detection** (PR #439) — fixed cases where the project root was derived incorrectly, plus force-rescan now properly wipes stale plan data, and manual clusters are visible in triage. +- **workflow::create-plan re-injection** — @cdunda-perchwell (PR #435). Resolved workflow items no longer reappear in the execution queue after reconciliation. @cdunda-perchwell also identified the related communicate-score cycle-boundary sentinel issue (#447, fix in PR #448). +- **PHPStan parser fixes** — @nickperkins (PR #420). stderr output and malformed JSON from PHPStan no longer crash the parser. Clean, focused fix. +- **Preserve plan_start_scores during force-rescan** — manual clusters are no longer wiped when force-rescanning. +- **Import run project root** — `--scan-after-import` now derives the project root correctly from the state file path. +- **Windows codex runner** (PR #453) — proper `cmd /c` argument quoting + UTF-8 log encoding for Windows. Reported by **@DenysAshikhin**. +- **Scan after queue drain** (PR #454) — `score_display_mode` now returns LIVE when queue is empty, fixing the UX contradiction where `next` says "run scan" but scan refuses. Reported by **@kgelpes**. +- **SKILL.md cleanup** (PR #455) — removes unsupported `allowed-tools` frontmatter, fixes batch naming inconsistency (`.raw.txt` not `.json`), adds pip fallback alongside uvx. Three issues all reported by **@willfrey**. +- **Batch retry coverage gate** (PR #456) — partial retries now bypass the full-coverage requirement instead of being rejected. Reported by **@imetandy**. +- **R anonymous function extraction** (PR #461) — the tree-sitter anonymous function pattern from PR #449 now actually works (extractor handles missing `@name` capture with `` fallback). + +## Community + +This release wouldn't exist without the community. Seriously — thank you all. + +**@MacHatter1** delivered three major PRs (framework-aware detection, detector prefetch + cache, binding-aware unused imports) that each individually would have been a headline feature. The framework spec system in particular opens up a whole new category of detection accuracy. + +**@fluffypony** contributed both the Rust dep graph hang fix and the inline-test filtering — the latter being 1,000+ lines of carefully tested Rust syntax parsing with conservative cfg predicate handling and thorough edge-case coverage. + +**@sims1253** has been the driving force behind R language support, with four PRs spanning linting, tree-sitter queries, and harness support. The R plugin is becoming genuinely useful thanks to this sustained effort. + +**@klausagnoletti** added SCSS support, improved the Ruby plugin, and contributed tests and documentation for JavaScript and Python plugins — seven PRs total (#428, #452, #457, #458, #459, #462). The kind of contributor who makes the codebase more trustworthy across the board. + +**@cdunda-perchwell** fixed two separate workflow re-injection bugs that were causing phantom plan items. **@nickperkins** shipped a clean PHPStan parser fix. + +Bug reporters **@willfrey**, **@DenysAshikhin**, **@kgelpes**, and **@imetandy** filed detailed, actionable issues that made fixes straightforward. Every one of those reports saved debugging time. diff --git a/assets/scorecard.png b/assets/scorecard.png index a5eb76c86..9537c8cac 100644 Binary files a/assets/scorecard.png and b/assets/scorecard.png differ diff --git a/desloppify/app/cli_support/parser.py b/desloppify/app/cli_support/parser.py index d558f586c..840bf56b3 100644 --- a/desloppify/app/cli_support/parser.py +++ b/desloppify/app/cli_support/parser.py @@ -10,6 +10,7 @@ _add_backlog_parser, _add_config_parser, _add_detect_parser, + _add_directives_parser, _add_dev_parser, _add_exclude_parser, _add_autofix_parser, @@ -139,6 +140,7 @@ def create_parser(*, langs: list[str], detector_names: list[str]) -> argparse.Ar # configure _add_zone_parser(sub) _add_config_parser(sub) + _add_directives_parser(sub) _add_langs_parser(sub) _add_dev_parser(sub) _add_update_skill_parser(sub) diff --git a/desloppify/app/cli_support/parser_groups.py b/desloppify/app/cli_support/parser_groups.py index 90aae0ce2..3101ac4d6 100644 --- a/desloppify/app/cli_support/parser_groups.py +++ b/desloppify/app/cli_support/parser_groups.py @@ -7,6 +7,7 @@ from desloppify.app.cli_support.parser_groups_admin import ( # noqa: F401 (re-exports) _add_config_parser, _add_detect_parser, + _add_directives_parser, _add_dev_parser, _add_autofix_parser, _add_langs_parser, @@ -25,6 +26,7 @@ "_add_backlog_parser", "_add_config_parser", "_add_detect_parser", + "_add_directives_parser", "_add_dev_parser", "_add_exclude_parser", "_add_autofix_parser", diff --git a/desloppify/app/cli_support/parser_groups_admin.py b/desloppify/app/cli_support/parser_groups_admin.py index 4cf629ce8..b888a07b3 100644 --- a/desloppify/app/cli_support/parser_groups_admin.py +++ b/desloppify/app/cli_support/parser_groups_admin.py @@ -90,6 +90,17 @@ def _add_config_parser(sub) -> None: c_unset.add_argument("config_key", type=str, help="Config key name") +def _add_directives_parser(sub) -> None: + p = sub.add_parser("directives", help="View/set agent directives for phase transitions") + d_sub = p.add_subparsers(dest="directives_action") + d_sub.add_parser("show", help="Show all configured directives") + d_set = d_sub.add_parser("set", help="Set a directive for a lifecycle phase") + d_set.add_argument("phase", type=str, help="Lifecycle phase name") + d_set.add_argument("message", type=str, help="Message to show at this transition") + d_unset = d_sub.add_parser("unset", help="Remove a directive for a lifecycle phase") + d_unset.add_argument("phase", type=str, help="Lifecycle phase name") + + def _fixer_help_lines(langs: list[str]) -> list[str]: fixer_help_lines: list[str] = [] for lang_name in langs: @@ -168,6 +179,8 @@ def _add_dev_parser(sub) -> None: ) d_scaffold.set_defaults(wire_pyproject=True) + dev_sub.add_parser("test-hermes", help="Test Hermes model switching (switch and switch back)") + def _add_langs_parser(sub) -> None: sub.add_parser("langs", help="List all available language plugins with depth and tools") @@ -182,6 +195,6 @@ def _add_update_skill_parser(sub) -> None: "interface", nargs="?", default=None, - help="Agent interface (amp, claude, codex, cursor, copilot, windsurf, gemini, opencode). " + help="Agent interface (amp, claude, codex, cursor, copilot, windsurf, gemini, hermes, droid, opencode). " "Auto-detected on updates if omitted.", ) diff --git a/desloppify/app/commands/dev.py b/desloppify/app/commands/dev.py index 6396bf45e..96940a860 100644 --- a/desloppify/app/commands/dev.py +++ b/desloppify/app/commands/dev.py @@ -23,6 +23,9 @@ def cmd_dev(args: argparse.Namespace) -> None: except ValueError as ex: raise CommandError(str(ex)) from ex return + if action == "test-hermes": + _cmd_test_hermes() + return raise CommandError("Unknown dev action. Use `desloppify dev scaffold-lang`.") @@ -165,3 +168,54 @@ def _cmd_scaffold_lang(args: object) -> None: " Next: implement real phases/commands/detectors and run pytest.", "dim" ) ) + + +def _cmd_test_hermes() -> None: + """Test Hermes model switching — switch to a random model and back.""" + import random + import time + + from desloppify.app.commands.helpers.transition_messages import ( + _hermes_available, + _hermes_get, + _hermes_send_message, + ) + + if not _hermes_available(): + print(colorize('Hermes not enabled. Set "hermes_enabled": true in config.json', "yellow")) + return + + # Get current model + info = _hermes_get("/sessions/_any") + if "error" in info: + print(colorize(f"Cannot reach Hermes: {info['error']}", "red")) + return + + original_model = info.get("model", "unknown") + original_provider = info.get("provider", "unknown") + print(f" Current model: {original_provider}:{original_model}") + + # Pick a random test model + test_models = [ + ("openrouter", "google/gemini-2.5-flash"), + ("openrouter", "meta-llama/llama-4-scout"), + ("openrouter", "mistralai/mistral-medium-3"), + ] + test_provider, test_model = random.choice(test_models) + + # Switch to test model + print(f" Switching to: {test_provider}:{test_model}") + result = _hermes_send_message(f"/model {test_provider}:{test_model}", mode="queue") + if not result.get("success"): + print(colorize(f" Switch failed: {result.get('error', '?')}", "red")) + return + print(colorize(" ✓ Switch command sent", "green")) + + # Wait a moment, then switch back + time.sleep(2) + print(f" Switching back to: {original_provider}:{original_model}") + result = _hermes_send_message(f"/model {original_provider}:{original_model}", mode="queue") + if not result.get("success"): + print(colorize(f" Switch-back failed: {result.get('error', '?')}", "red")) + return + print(colorize(" ✓ Restored original model", "green")) diff --git a/desloppify/app/commands/directives.py b/desloppify/app/commands/directives.py new file mode 100644 index 000000000..36d651ba0 --- /dev/null +++ b/desloppify/app/commands/directives.py @@ -0,0 +1,166 @@ +"""directives command: view and manage agent directives.""" + +from __future__ import annotations + +import argparse + +from desloppify.app.commands.helpers.command_runtime import command_runtime +from desloppify.base.config import save_config +from desloppify.base.exception_sets import CommandError +from desloppify.base.output.terminal import colorize +from desloppify.engine._plan.refresh_lifecycle import VALID_LIFECYCLE_PHASES + +# The four phases that actually matter as agent directive hooks. +# Each has: short description, when it fires, example use case. +_PHASES: list[tuple[str, str, str, str]] = [ + ( + "execute", + "Working the fix queue — code changes, refactors, and fixes.", + "Fires when the queue fills with work items after triage,\n" + " or when you unskip/reopen issues that push back into work mode.", + "Commit after every 3 fixes. Don't refactor beyond what the issue asks.", + ), + ( + "postflight", + "Execution done — transitioning into planning/review phases.", + "Fires when the execution queue drains and the system moves into\n" + " review, workflow, or triage. Catch-all for leaving work mode.", + "Stop and summarise what you fixed before continuing.", + ), + ( + "triage", + "Reading issues, deciding what's real, clustering into a plan.", + "Fires when workflow items complete and triage stages are injected,\n" + " or when review surfaces issues that need strategic decisions.", + "Open every flagged file before deciding. Skip nothing without reading the code.", + ), + ( + "review", + "Scoring subjective dimensions — reading code and assessing quality.", + "Fires when the system needs subjective scores (first review, stale\n" + " dimensions, or post-triage re-review). Covers all review sub-phases.", + "Use review_packet_blind.json only. Do not read previous scores or targets.", + ), + ( + "scan", + "Running detectors and analyzing the codebase.", + "Fires when the lifecycle resets to the scan phase after a cycle\n" + " completes or when no other phase applies.", + "Include --skip-slow if this is a mid-cycle rescan.", + ), +] + +_PHASE_NAMES = {name for name, _, _, _ in _PHASES} + +_EXAMPLE_DIRECTIVES = { + "execute": "Commit after every 3 fixes. Don't refactor beyond what the issue asks.", + "triage": "Open every flagged file before deciding. Skip nothing without reading the code.", + "review": "Use review_packet_blind.json only. Do not read previous scores or targets.", +} + + +def cmd_directives(args: argparse.Namespace) -> None: + """Handle directives subcommands: show, set, unset.""" + action = getattr(args, "directives_action", None) + if action == "set": + _directives_set(args) + elif action == "unset": + _directives_unset(args) + else: + _directives_show(args) + + +def _directives_show(args: argparse.Namespace) -> None: + """Show all phases with their directives (if configured).""" + config = command_runtime(args).config + messages = config.get("transition_messages", {}) + if not isinstance(messages, dict): + messages = {} + + active = { + phase: text + for phase, text in messages.items() + if isinstance(text, str) and text.strip() + } + + print(colorize("\n Agent Directives\n", "bold")) + print( + " Messages shown to AI agents at lifecycle phase transitions.\n" + " Use them to switch models, set constraints, or give context-\n" + " specific instructions at key moments in the workflow.\n" + ) + + for name, description, when, example_use in _PHASES: + directive = active.get(name) + if directive: + marker = colorize("*", "green") + print(f" {marker} {colorize(name, 'cyan')} {description}") + print(colorize(f" When: {when}", "dim")) + print(f" Directive: {directive}") + else: + print(f" {colorize(name, 'cyan')} {description}") + print(colorize(f" When: {when}", "dim")) + print(colorize(f" e.g.: {example_use}", "dim")) + print() + + count = len(active) + if count: + print(colorize(f" {count} directive{'s' if count != 1 else ''} configured.\n", "green")) + else: + print(colorize(" No directives configured.\n", "dim")) + + print(colorize(" Examples:", "dim")) + for phase, text in _EXAMPLE_DIRECTIVES.items(): + print(colorize(f' desloppify directives set {phase} "{text}"', "dim")) + print() + print(colorize(" Commands:", "dim")) + print(colorize(' desloppify directives set ""', "dim")) + print(colorize(" desloppify directives unset ", "dim")) + print() + + +def _directives_set(args: argparse.Namespace) -> None: + """Set a directive for a lifecycle phase.""" + phase = args.phase + text = args.message + + # Accept the main phases (including postflight) plus any valid lifecycle phase. + if phase != "postflight" and phase not in VALID_LIFECYCLE_PHASES: + valid = ", ".join(sorted(_PHASE_NAMES)) + raise CommandError(f"unknown phase {phase!r}; valid phases: {valid}") + + config = command_runtime(args).config + messages = config.get("transition_messages", {}) + if not isinstance(messages, dict): + messages = {} + messages[phase] = text + config["transition_messages"] = messages + + try: + save_config(config) + except OSError as e: + raise CommandError(f"could not save config: {e}") from e + print(colorize(f" Set directive for {phase}:", "green")) + print(f" {text}") + + +def _directives_unset(args: argparse.Namespace) -> None: + """Remove a directive for a lifecycle phase.""" + phase = args.phase + + config = command_runtime(args).config + messages = config.get("transition_messages", {}) + if not isinstance(messages, dict): + messages = {} + + if phase not in messages: + raise CommandError(f"no directive set for phase {phase!r}") + + del messages[phase] + config["transition_messages"] = messages + + try: + save_config(config) + except OSError as e: + raise CommandError(f"could not save config: {e}") from e + print(colorize(f" Removed directive for {phase}", "green")) diff --git a/desloppify/app/commands/helpers/queue_progress.py b/desloppify/app/commands/helpers/queue_progress.py index cfed45116..fc60bd5c5 100644 --- a/desloppify/app/commands/helpers/queue_progress.py +++ b/desloppify/app/commands/helpers/queue_progress.py @@ -106,6 +106,8 @@ def score_display_mode( return ScoreDisplayMode.LIVE if breakdown is None: return ScoreDisplayMode.LIVE + if breakdown.queue_total == 0: + return ScoreDisplayMode.LIVE # Queue fully drained — always live (#441) if breakdown.lifecycle_phase == LIFECYCLE_PHASE_SCAN: return ScoreDisplayMode.LIVE if breakdown.lifecycle_phase == LIFECYCLE_PHASE_EXECUTE: diff --git a/desloppify/app/commands/helpers/transition_messages.py b/desloppify/app/commands/helpers/transition_messages.py new file mode 100644 index 000000000..24c840422 --- /dev/null +++ b/desloppify/app/commands/helpers/transition_messages.py @@ -0,0 +1,202 @@ +"""Emit user-configured messages at lifecycle phase transitions.""" + +from __future__ import annotations + +import json as _json +import logging +import os as _os +import urllib.error as _urlerr +import urllib.request as _urlreq + +from desloppify.base.config import load_config +from desloppify.base.output.user_message import print_user_message +from desloppify.engine._plan.refresh_lifecycle import ( + COARSE_PHASE_MAP, + LIFECYCLE_PHASE_EXECUTE, + LIFECYCLE_PHASE_SCAN, +) + +logger = logging.getLogger(__name__) + +# Phases that are NOT postflight — everything else counts as postflight. +_NON_POSTFLIGHT = frozenset({LIFECYCLE_PHASE_EXECUTE, LIFECYCLE_PHASE_SCAN}) + +_HERMES_PORT_FILE = _os.path.expanduser("~/.hermes/control_api.port") + + +def _hermes_available() -> bool: + """Check if Hermes integration is enabled in config.""" + try: + config = load_config() + except (OSError, ValueError): + return False + return bool(config.get("hermes_enabled", False)) + + +def _hermes_port() -> int: + try: + with open(_HERMES_PORT_FILE) as f: + return int(f.read().strip()) + except (OSError, ValueError): + return 47823 + + +def _hermes_get(path: str) -> dict: + """GET a Hermes control API endpoint. Stdlib-only, no deps.""" + url = f"http://127.0.0.1:{_hermes_port()}{path}" + req = _urlreq.Request(url, method="GET", + headers={"X-Hermes-Control": "1"}) + try: + with _urlreq.urlopen(req, timeout=5) as resp: + return _json.loads(resp.read()) + except _urlerr.HTTPError as e: + return _json.loads(e.read()) + except (_urlerr.URLError, OSError) as e: + return {"error": str(e)} + + +def _hermes_send_message(text: str, mode: str = "queue") -> dict: + """Send a message/command to the running Hermes agent. Stdlib-only, no deps.""" + url = f"http://127.0.0.1:{_hermes_port()}/sessions/_any/message" + data = _json.dumps({"text": text, "mode": mode}).encode() + req = _urlreq.Request(url, data=data, method="POST", + headers={"Content-Type": "application/json", + "X-Hermes-Control": "1"}) + try: + with _urlreq.urlopen(req, timeout=5) as resp: + return _json.loads(resp.read()) + except _urlerr.HTTPError as e: + return _json.loads(e.read()) + except (_urlerr.URLError, OSError) as e: + return {"error": str(e)} + + +def _resolve_hermes_model(phase: str, hermes_models: dict) -> str | None: + """Resolve a phase to a 'provider:model' string from hermes_models config. + + Lookup: exact phase → coarse phase → 'review' (fallback for non-execute). + Returns None if no model is configured for this phase. + """ + spec = hermes_models.get(phase) + if not spec: + coarse = COARSE_PHASE_MAP.get(phase) + if coarse: + spec = hermes_models.get(coarse) + if not spec and phase not in _NON_POSTFLIGHT: + spec = hermes_models.get("review") + return spec or None + + +def _switch_hermes_model(phase: str) -> bool: + """Switch the running Hermes agent's model based on the phase. + + Reads model mapping from hermes_models in config.json. + Returns True if switch was triggered. + """ + if not _hermes_available(): + return False + + try: + config = load_config() + except (OSError, ValueError): + return False + + hermes_models = config.get("hermes_models", {}) + if not hermes_models: + return False + + spec = _resolve_hermes_model(phase, hermes_models) + if not spec: + return False + + try: + result = _hermes_send_message(f"/model {spec}", mode="interrupt") + if result.get("success"): + _hermes_send_message("continue", mode="queue") + print(f"🔄 Hermes model → {spec} (phase: {phase})") + return True + else: + logger.debug("Hermes model switch failed: %s", result.get("error", "")) + return False + except Exception as exc: + logger.debug("Hermes model switch skipped: %s", exc) + return False + + +_AUTOREPLY_PROMPT = ( + "You are an autonomous code repair agent working through a desloppify queue. " + "After each task, run the next desloppify command as instructed. " + "Do not stop or ask for confirmation — keep going until the queue is empty." +) + + +def _ensure_hermes_autoreply() -> None: + """Enable autoreply on the Hermes session if not already active. + + Checks the session state via GET /sessions/_any. If autoreply is + already enabled, does nothing — so it's safe to call on every + phase transition without clobbering an existing config. + """ + if not _hermes_available(): + return + try: + info = _hermes_get("/sessions/_any") + if info.get("autoreply", {}).get("enabled"): + return + _hermes_send_message( + f"/autoreply {_AUTOREPLY_PROMPT}", + mode="queue", + ) + logger.debug("Hermes autoreply enabled for desloppify session") + except Exception as exc: + logger.debug("Hermes autoreply check skipped: %s", exc) + + +def emit_transition_message(new_phase: str) -> bool: + """Print a transition message if one is configured for *new_phase*. + + Lookup order: exact phase → coarse phase → ``postflight`` (if the + phase is not execute/scan). + + Also triggers a Hermes model switch if the control API is available. + + Returns True if a message was emitted. + """ + # Ensure autoreply is enabled so the agent keeps working autonomously + _ensure_hermes_autoreply() + + # Switch Hermes model for this phase (best-effort, non-blocking) + _switch_hermes_model(new_phase) + + try: + config = load_config() + except (OSError, ValueError) as exc: + logger.debug("transition message skipped (config load): %s", exc) + return False + + messages = config.get("transition_messages") + if not isinstance(messages, dict) or not messages: + return False + + # Try exact phase first, then coarse fallback, then postflight. + text = messages.get(new_phase) + if text is None: + coarse = COARSE_PHASE_MAP.get(new_phase) + if coarse and coarse != new_phase: + text = messages.get(coarse) + if text is None and new_phase not in _NON_POSTFLIGHT: + text = messages.get("postflight") + + if not isinstance(text, str) or not text.strip(): + return False + + clean = text.strip() + print(f"\n{'─' * 60}") + print(f"TRANSITION INSTRUCTION — entering {new_phase} phase") + print(clean) + print(f"{'─' * 60}") + print_user_message(f"Hey, did you see the above? Please act on this: {clean}") + return True + + +__all__ = ["emit_transition_message"] diff --git a/desloppify/app/commands/next/render.py b/desloppify/app/commands/next/render.py index 1124c521b..4765bf6d1 100644 --- a/desloppify/app/commands/next/render.py +++ b/desloppify/app/commands/next/render.py @@ -27,6 +27,7 @@ from .render_scoring import render_score_impact as _render_score_impact_impl from .render_workflow import render_workflow_action as _render_workflow_action_impl from .render_workflow import render_workflow_stage as _render_workflow_stage_impl +from .render_workflow import step_full as _step_full_impl from .render_workflow import step_text as _step_text_impl @@ -34,6 +35,10 @@ def _step_text(step: str | dict) -> str: return _step_text_impl(step) +def _step_full(step: str | dict, *, indent: str = " ") -> list[str]: + return _step_full_impl(step, indent=indent) + + def _render_workflow_stage(item: dict) -> None: _render_workflow_stage_impl( item, @@ -108,11 +113,31 @@ def _render_plan_cluster_detail( desc_str = f' — "{cluster_desc}"' if cluster_desc else "" print(colorize(f" Cluster: {cluster_name}{desc_str} ({total} items)", "dim")) steps = plan_cluster.get("action_steps") or [] - if not (steps and single_item and not header_showed_plan): + if not steps: return - print(colorize("\n Steps:", "dim")) - for idx, step in enumerate(steps, 1): - print(colorize(f" {idx}. {_step_text(step)}", "dim")) + + # Find steps relevant to this item (via issue_refs) + item_id = item.get("id", "") + item_hash = item_id.rsplit("::", 1)[-1] if item_id else "" + relevant = [ + (idx, step) for idx, step in enumerate(steps, 1) + if isinstance(step, dict) and ( + item_id in step.get("issue_refs", []) + or any(item_hash and ref.endswith(item_hash) for ref in step.get("issue_refs", [])) + ) + ] + + if relevant: + # Show full detail for relevant steps — this is the execution view + print(colorize("\n Your step(s):", "bold")) + for idx, step in relevant: + for line in _step_full(step, indent=" "): + print(colorize(line, "dim")) + elif single_item and not header_showed_plan: + # No matching steps — show the full plan as context + print(colorize("\n Steps:", "dim")) + for idx, step in enumerate(steps, 1): + print(colorize(f" {idx}. {_step_text(step)}", "dim")) def _render_issue_metadata(item: dict, detail: dict) -> None: @@ -297,12 +322,15 @@ def _render_cluster_drill_header( steps = cluster_data.get("action_steps") or [] if steps: print(colorize(" │", "cyan")) - print(colorize(" │ Action plan:", "cyan")) + print(colorize(" │ Steps:", "cyan")) for idx, step in enumerate(steps, 1): - print(colorize(f" │ {idx}. {_step_text(step)}", "cyan")) + done = isinstance(step, dict) and step.get("done", False) + marker = "[x]" if done else "[ ]" + print(colorize(f" │ {idx}. {marker} {_step_text(step)}", "cyan")) print(colorize(" └" + "─" * 60 + "┘", "cyan")) print(colorize(" Back to full queue: desloppify next", "dim")) if steps: + print(colorize(f" Step detail: desloppify plan cluster show {cluster_name}", "dim")) print(colorize(f" Mark step done: desloppify plan cluster update {cluster_name} --done-step N", "dim")) return bool(steps) diff --git a/desloppify/app/commands/next/render_workflow.py b/desloppify/app/commands/next/render_workflow.py index 37b5aef9a..9fd146fc3 100644 --- a/desloppify/app/commands/next/render_workflow.py +++ b/desloppify/app/commands/next/render_workflow.py @@ -10,10 +10,30 @@ def step_text(step: str | dict) -> str: if isinstance(step, dict): - return step.get("title", str(step)) + title = step.get("title", str(step)) + effort = step.get("effort", "") + return f"{title} [{effort}]" if effort else title return str(step) +def step_full(step: str | dict, *, indent: str = " ") -> list[str]: + """Return the full step rendering: title + effort + detail + refs.""" + import textwrap + + if isinstance(step, str): + return [f"{indent}{step}"] + lines: list[str] = [f"{indent}{step_text(step)}"] + detail = step.get("detail", "") + if detail: + for line in textwrap.wrap(detail, width=90): + lines.append(f"{indent} {line}") + refs = step.get("issue_refs", []) + if refs: + short_refs = [r.rsplit("::", 1)[-1] for r in refs] + lines.append(f"{indent} Refs: {', '.join(short_refs)}") + return lines + + def _detail_mapping(item: dict) -> dict: detail = item.get("detail", {}) return detail if isinstance(detail, dict) else {} @@ -120,4 +140,4 @@ def render_workflow_action(item: dict, *, colorize_fn) -> None: print(colorize_fn(f"\n Action: {item.get('primary_command', '')}", "cyan")) -__all__ = ["render_workflow_action", "render_workflow_stage", "step_text"] +__all__ = ["render_workflow_action", "render_workflow_stage", "step_full", "step_text"] diff --git a/desloppify/app/commands/plan/cluster/ops_display.py b/desloppify/app/commands/plan/cluster/ops_display.py index 7e08573aa..1a3f932e3 100644 --- a/desloppify/app/commands/plan/cluster/ops_display.py +++ b/desloppify/app/commands/plan/cluster/ops_display.py @@ -76,12 +76,38 @@ def _print_cluster_steps(steps: list[dict] | list[str]) -> None: print_step(i, step, colorize_fn=colorize) -def _print_cluster_members(args: argparse.Namespace, issue_ids: list[str]) -> None: +def _short_member_id(fid: str) -> str: + """Shorten an issue ID to its last segment for compact display.""" + return fid.rsplit("::", 1)[-1] + + +def _print_cluster_members(args: argparse.Namespace, issue_ids: list[str], *, has_steps: bool) -> None: print() if not issue_ids: print(colorize(" Members: (none)", "dim")) return + # When steps exist, members are audit trail — show compact list only. + # When no steps, members ARE the work — show full detail. + if has_steps: + short_ids = [_short_member_id(fid) for fid in issue_ids] + label = f" Members ({len(issue_ids)}): " + # Wrap IDs to fit ~100 char lines + lines: list[str] = [] + current = label + for i, sid in enumerate(short_ids): + sep = ", " if i > 0 else "" + if len(current) + len(sep) + len(sid) > 100 and current != label: + lines.append(current) + current = " " + sid + else: + current += sep + sid + lines.append(current) + for line in lines: + print(colorize(line, "dim")) + print(colorize(f" Full detail: desloppify show --no-budget", "dim")) + return + issues = _load_issues_best_effort(args) print(colorize(f" Members ({len(issue_ids)}):", "dim")) for idx, fid in enumerate(issue_ids, 1): @@ -106,7 +132,7 @@ def _cmd_cluster_show(args: argparse.Namespace) -> None: steps = cluster.get("action_steps") or [] issue_ids = cluster_issue_ids(cluster) _print_cluster_steps(steps) - _print_cluster_members(args, issue_ids) + _print_cluster_members(args, issue_ids, has_steps=bool(steps)) _print_cluster_commands(cluster_name) @@ -135,22 +161,37 @@ def _print_cluster_list_verbose( active: str | None, ) -> None: """Print the verbose table view of the cluster list.""" - name_width = _cluster_list_name_width(sorted_clusters) - total = len(sorted_clusters) - has_dep = any(c.get("dependency_order") is not None for _, c in sorted_clusters) - print(colorize(f" Clusters ({total} total, sorted by priority/queue position):", "bold")) + # Filter out empty auto-clusters — they're noise + visible = [ + (name, cluster) for name, cluster in sorted_clusters + if len(cluster_issue_ids(cluster)) > 0 or not cluster.get("auto") + ] + if not visible: + print(" No clusters with items.") + return + + empty_auto = len(sorted_clusters) - len(visible) + name_width = max(20, min(35, max(len(name) for name, _ in visible))) + total_items = sum(len(cluster_issue_ids(c)) for _, c in visible) + total_steps = sum(len(c.get("action_steps") or []) for _, c in visible) + print(colorize( + f" {len(visible)} clusters ({total_items} issues, {total_steps} steps):", + "bold", + )) + if empty_auto: + print(colorize(f" ({empty_auto} empty auto-clusters hidden)", "dim")) print() - header, sep = _cluster_list_verbose_header(name_width, has_dep) + header = f" {'Name':<{name_width}} {'Issues':>6} {'Steps':>5} {'Effort':<10}" + sep = f" {'─'*name_width} {'─'*6} {'─'*5} {'─'*10}" print(colorize(header, "dim")) print(colorize(sep, "dim")) - for name, cluster in sorted_clusters: + for name, cluster in visible: print( _cluster_list_verbose_row( name, cluster, min_pos_cache[name], name_width=name_width, - has_dep=has_dep, active=active, ) ) @@ -165,12 +206,12 @@ def _cluster_list_verbose_header(name_width: int, has_dep: bool) -> tuple[str, s dep_header = f" {'Dep':>3}" if has_dep else "" header = ( f" {'#pos':<5} {'Pri':>3}{dep_header} {'Name':<{name_width}}" - f" {'Items':>5} {'Steps':>5} {'Type':<6} Description" + f" {'Items':>5} {'Steps':>5} {'Effort':<14} {'Type':<6} Description" ) dep_sep = f" {'─'*3}" if has_dep else "" sep = ( f" {'─'*4} {'─'*3}{dep_sep} {'─'*name_width}" - f" {'─'*5} {'─'*5} {'─'*6} {'─'*40}" + f" {'─'*5} {'─'*5} {'─'*14} {'─'*6} {'─'*40}" ) return header, sep @@ -189,34 +230,45 @@ def _cluster_dependency_token(cluster: dict, *, has_dep: bool) -> str: return f" {dep_token:>3}" +def _effort_summary(steps: list[dict]) -> str: + """Summarize step effort tags into a compact string like '3T 1S'.""" + if not steps: + return "—" + from collections import Counter + counts: Counter[str] = Counter() + for s in steps: + if isinstance(s, dict): + effort = s.get("effort", "") + if effort: + counts[effort] += 1 + if not counts: + return "—" + # Order: trivial < small < medium < large + order = {"trivial": 0, "small": 1, "medium": 2, "large": 3} + abbrev = {"trivial": "T", "small": "S", "medium": "M", "large": "L"} + parts = [] + for effort, _ in sorted(counts.items(), key=lambda kv: order.get(kv[0], 9)): + parts.append(f"{counts[effort]}{abbrev.get(effort, effort[0].upper())}") + return " ".join(parts) + + def _cluster_list_verbose_row( name: str, cluster: dict, min_pos: int, *, name_width: int, - has_dep: bool, active: str | None, ) -> str: member_count = len(cluster_issue_ids(cluster)) - desc = _cluster_list_description( - cluster.get("description") or "", - min_pos=min_pos, - member_count=member_count, - ) - pos_str = f"#{min_pos}" if min_pos < 999_999 else "—" - priority = cluster.get("priority") - pri_str = str(priority) if priority is not None else "—" - dep_str = _cluster_dependency_token(cluster, has_dep=has_dep) steps = cluster.get("action_steps") or [] steps_str = str(len(steps)) if steps else "—" - type_str = "auto" if cluster.get("auto") else "manual" - desc_truncated = (desc[:39] + "…") if len(desc) > 40 else desc + effort_str = _effort_summary(steps) name_display = (name[: name_width - 1] + "…") if len(name) > name_width else name focused = " *" if name == active else "" return ( - f" {pos_str:>5} {pri_str:>3}{dep_str} {name_display:<{name_width}}" - f" {member_count:>5} {steps_str:>5} {type_str:<6} {desc_truncated}{focused}" + f" {name_display:<{name_width}}" + f" {member_count:>6} {steps_str:>5} {effort_str:<10}{focused}" ) diff --git a/desloppify/app/commands/plan/cluster/steps.py b/desloppify/app/commands/plan/cluster/steps.py index 7152658aa..50c3cce93 100644 --- a/desloppify/app/commands/plan/cluster/steps.py +++ b/desloppify/app/commands/plan/cluster/steps.py @@ -2,23 +2,43 @@ from __future__ import annotations +import textwrap + +_DETAIL_WIDTH = 90 +_DETAIL_MAX_LINES = 4 + + +def _truncate_detail(detail: str) -> list[str]: + """Wrap and truncate detail to a readable block.""" + # Wrap long single-line details, then cap total lines + wrapped = textwrap.wrap(detail, width=_DETAIL_WIDTH) + if not wrapped: + return [] + if len(wrapped) <= _DETAIL_MAX_LINES: + return wrapped + return wrapped[:_DETAIL_MAX_LINES] + ["..."] + + +def _short_refs(refs: list[str]) -> list[str]: + """Shorten issue refs to their last segment for display.""" + return [r.rsplit("::", 1)[-1] for r in refs] + def print_step(i: int, step: dict, *, colorize_fn) -> None: - """Print a single step with title, detail, refs, and done status.""" + """Print a single step with title, effort, detail, and refs.""" done = step.get("done", False) marker = "[x]" if done else "[ ]" title = step.get("title", "") - print(f" {i}. {marker} {title}") - if done: - print(colorize_fn(" (completed)", "dim")) - return + effort = step.get("effort", "") + effort_tag = f" [{effort}]" if effort else "" + print(f" {i}. {marker} {title}{effort_tag}") detail = step.get("detail", "") if detail: - for line in detail.splitlines(): + for line in _truncate_detail(detail): print(colorize_fn(f" {line}", "dim")) refs = step.get("issue_refs", []) if refs: - print(colorize_fn(f" Refs: {', '.join(refs)}", "dim")) + print(colorize_fn(f" Refs: {', '.join(_short_refs(refs))}", "dim")) __all__ = ["print_step"] diff --git a/desloppify/app/commands/plan/override/misc.py b/desloppify/app/commands/plan/override/misc.py index b5025566b..ce274d01d 100644 --- a/desloppify/app/commands/plan/override/misc.py +++ b/desloppify/app/commands/plan/override/misc.py @@ -25,7 +25,11 @@ describe_issue, set_focus, ) -from desloppify.engine._plan.refresh_lifecycle import clear_postflight_scan_completion +from desloppify.app.commands.helpers.transition_messages import emit_transition_message +from desloppify.engine._plan.refresh_lifecycle import ( + LIFECYCLE_PHASE_EXECUTE, + clear_postflight_scan_completion, +) from desloppify.engine._state.resolution import resolve_issues from desloppify.state_io import load_state @@ -113,7 +117,9 @@ def cmd_plan_reopen(args: argparse.Namespace) -> None: count += 1 append_log_entry(plan, "reopen", issue_ids=reopened, actor="user") - clear_postflight_scan_completion(plan, issue_ids=reopened, state=state_data) + transition_phase: str | None = None + if clear_postflight_scan_completion(plan, issue_ids=reopened, state=state_data): + transition_phase = LIFECYCLE_PHASE_EXECUTE save_plan_state_transactional( plan=plan, plan_path=plan_file, @@ -124,6 +130,8 @@ def cmd_plan_reopen(args: argparse.Namespace) -> None: print(colorize(f" Reopened {len(reopened)} issue(s).", "green")) if count: print(colorize(" Plan updated: items moved back to queue.", "dim")) + if transition_phase: + emit_transition_message(transition_phase) def cmd_plan_focus(args: argparse.Namespace) -> None: diff --git a/desloppify/app/commands/plan/override/resolve_workflow.py b/desloppify/app/commands/plan/override/resolve_workflow.py index 5363ee598..8854f2fc1 100644 --- a/desloppify/app/commands/plan/override/resolve_workflow.py +++ b/desloppify/app/commands/plan/override/resolve_workflow.py @@ -12,6 +12,7 @@ ensure_active_triage_issue_ids, has_open_review_issues, ) +from desloppify.app.commands.helpers.transition_messages import emit_transition_message from desloppify.base.config import target_strict_score_from_config from .resolve_helpers import blocked_triage_stages from desloppify.app.commands.plan.triage.stage_queue import ( @@ -350,15 +351,19 @@ def _reconcile_if_queue_drained( if WORKFLOW_CREATE_PLAN_ID in synthetic_ids and has_open_review_issues(state_data): ensure_active_triage_issue_ids(plan, state_data) inject_triage_stages(plan) - set_lifecycle_phase(plan, LIFECYCLE_PHASE_TRIAGE_POSTFLIGHT) + changed = set_lifecycle_phase(plan, LIFECYCLE_PHASE_TRIAGE_POSTFLIGHT) save_plan(plan) + if changed: + emit_transition_message(LIFECYCLE_PHASE_TRIAGE_POSTFLIGHT) return - reconcile_plan( + result = reconcile_plan( plan, state_data, target_strict=target_strict_score_from_config(state_data.get("config")), ) save_plan(plan) + if result.lifecycle_phase_changed: + emit_transition_message(result.lifecycle_phase) def resolve_workflow_patterns( diff --git a/desloppify/app/commands/plan/override/skip.py b/desloppify/app/commands/plan/override/skip.py index 51f294f83..64a40c8d7 100644 --- a/desloppify/app/commands/plan/override/skip.py +++ b/desloppify/app/commands/plan/override/skip.py @@ -22,7 +22,11 @@ from desloppify.base.exception_sets import CommandError from desloppify.base.output.terminal import colorize from desloppify.base.output.user_message import print_user_message -from desloppify.engine._plan.refresh_lifecycle import clear_postflight_scan_completion +from desloppify.app.commands.helpers.transition_messages import emit_transition_message +from desloppify.engine._plan.refresh_lifecycle import ( + LIFECYCLE_PHASE_EXECUTE, + clear_postflight_scan_completion, +) from desloppify.engine.plan_ops import ( SKIP_KIND_LABELS, append_log_entry, @@ -245,7 +249,9 @@ def cmd_plan_skip(args: argparse.Namespace) -> None: note=note, detail={"kind": kind, "reason": reason}, ) - clear_postflight_scan_completion(plan, issue_ids=issue_ids, state=state) + transition_phase: str | None = None + if clear_postflight_scan_completion(plan, issue_ids=issue_ids, state=state): + transition_phase = LIFECYCLE_PHASE_EXECUTE _save_skip_plan_state( plan=plan, plan_file=plan_file, @@ -271,6 +277,8 @@ def cmd_plan_skip(args: argparse.Namespace) -> None: " plan --help` to see all available plan tools. Otherwise" " no need to reply, just keep going." ) + if transition_phase: + emit_transition_message(transition_phase) def cmd_plan_unskip(args: argparse.Namespace) -> None: @@ -304,7 +312,9 @@ def cmd_plan_unskip(args: argparse.Namespace) -> None: actor="user", detail={"need_reopen": need_reopen}, ) - clear_postflight_scan_completion(plan, issue_ids=unskipped_ids, state=state) + transition_phase: str | None = None + if clear_postflight_scan_completion(plan, issue_ids=unskipped_ids, state=state): + transition_phase = LIFECYCLE_PHASE_EXECUTE reopened: list[str] = [] if need_reopen: @@ -330,6 +340,8 @@ def cmd_plan_unskip(args: argparse.Namespace) -> None: "yellow", ) ) + if transition_phase: + emit_transition_message(transition_phase) def cmd_plan_backlog(args: argparse.Namespace) -> None: @@ -374,7 +386,9 @@ def cmd_plan_backlog(args: argparse.Namespace) -> None: issue_ids=removed, actor="user", ) - clear_postflight_scan_completion(plan, issue_ids=removed, state=state_data) + transition_phase: str | None = None + if clear_postflight_scan_completion(plan, issue_ids=removed, state=state_data): + transition_phase = LIFECYCLE_PHASE_EXECUTE if reopen_ids: save_plan_state_transactional( @@ -389,6 +403,8 @@ def cmd_plan_backlog(args: argparse.Namespace) -> None: print(colorize(f" Moved {len(removed)} item(s) to backlog.", "green")) if reopen_ids: print(colorize(f" Reopened {len(reopen_ids)} deferred/triaged-out issue(s) in state.", "dim")) + if transition_phase: + emit_transition_message(transition_phase) __all__ = [ diff --git a/desloppify/app/commands/plan/triage/observe_batches.py b/desloppify/app/commands/plan/triage/observe_batches.py index 8bec4e5b0..edb2e9bc6 100644 --- a/desloppify/app/commands/plan/triage/observe_batches.py +++ b/desloppify/app/commands/plan/triage/observe_batches.py @@ -3,6 +3,7 @@ from __future__ import annotations from collections import defaultdict +from dataclasses import dataclass from desloppify.engine._state.schema import Issue from desloppify.engine.plan_triage import TriageInput @@ -56,7 +57,60 @@ def group_issues_into_observe_batches( return result +@dataclass +class AutoClusterSample: + """A sampled auto-cluster for observe-stage verification.""" + + cluster_name: str + total_count: int + sample_ids: list[str] + sample_issues: dict[str, Issue] + + +def sample_auto_clusters( + si: TriageInput, + sample_size: int = 5, +) -> list[AutoClusterSample]: + """Sample representative issues from each auto-cluster for verification. + + For each auto-cluster, pick up to *sample_size* issues (biased toward + higher severity) so the observe stage can spot-check false-positive rates. + """ + auto_clusters = getattr(si, "auto_clusters", {}) + backlog = getattr( + si, "objective_backlog_issues", + getattr(si, "mechanical_issues", {}), + ) + samples: list[AutoClusterSample] = [] + for name, cluster in sorted(auto_clusters.items()): + issue_ids = cluster.get("issue_ids", []) + if not isinstance(issue_ids, list): + continue + member_ids = [iid for iid in issue_ids if isinstance(iid, str) and iid in backlog] + if not member_ids: + continue + + # Sort by severity (high first) for representative sampling + def _severity_key(iid: str) -> int: + issue = backlog.get(iid, {}) + detail = issue.get("detail") or {} + sev = str(detail.get("severity", "medium")).lower() if isinstance(detail, dict) else "medium" + return {"high": 0, "medium": 1, "low": 2}.get(sev, 1) + + member_ids.sort(key=_severity_key) + selected = member_ids[:sample_size] + samples.append(AutoClusterSample( + cluster_name=name, + total_count=len(member_ids), + sample_ids=selected, + sample_issues={iid: backlog[iid] for iid in selected}, + )) + return samples + + __all__ = [ + "AutoClusterSample", "group_issues_into_observe_batches", "observe_dimension_breakdown", + "sample_auto_clusters", ] diff --git a/desloppify/app/commands/plan/triage/runner/stage_prompts_instruction_blocks.py b/desloppify/app/commands/plan/triage/runner/stage_prompts_instruction_blocks.py index 0abc9479a..2548589ea 100644 --- a/desloppify/app/commands/plan/triage/runner/stage_prompts_instruction_blocks.py +++ b/desloppify/app/commands/plan/triage/runner/stage_prompts_instruction_blocks.py @@ -75,11 +75,28 @@ def _observe_instructions(mode: PromptMode = "self_record") -> str: {observe_example_report_quality()} +### Auto-Cluster Sampling + +For each auto-cluster provided, sample-check 3-5 items and render a **cluster-level verdict**. +Auto-cluster members do NOT need individual per-issue assessments — the cluster verdict covers them. + +Use this template for each auto-cluster: +``` +- cluster: auto/security-B602 + verdict: mostly-false-positives + sample_count: 5 + false_positive_rate: 0.8 + recommendation: skip +``` + +Verdict options: `actionable`, `mostly-false-positives`, `mixed`, `low-value` +Recommendation options: `promote`, `skip`, `break_up` + **Validation checks (all blocking):** -- Every entry must have a recognized `verdict` keyword -- Every entry must have non-empty `verdict_reasoning` -- Every entry must have non-empty `files_read` list -- Every entry must have non-empty `recommendation` +- Every per-issue entry must have a recognized `verdict` keyword +- Every per-issue entry must have non-empty `verdict_reasoning` +- Every per-issue entry must have non-empty `files_read` list +- Every per-issue entry must have non-empty `recommendation` - Template fields left empty or with placeholder text @@ -134,15 +151,18 @@ def _reflect_instructions(mode: PromptMode = "self_record") -> str: 6. **Check recurring patterns** — compare current issues against resolved history. If the same dimension keeps producing issues, that's a root cause that needs addressing, not just another round of fixes. -7. **Consider mechanical backlog** — the backlog section shows auto-clusters - (pre-grouped detector findings) and unclustered items. For each auto-cluster: - - **promote**: name it in a `## Backlog Promotions` section. Prefer clusters with - `[autofix: ...]` hints because they are lower-risk. - - **leave**: say nothing. Silence means it stays in backlog. - - **supersede**: absorb the underlying work into a review cluster when the same files - or root cause already belong together. +7. **Decide on auto-clusters** — auto-clusters are first-class triage candidates, not + an afterthought. The observe stage includes cluster-level verdicts with false-positive + rates from sampling. Use these verdicts to make informed decisions: + - **promote**: add to the active queue. Prefer clusters with `[autofix: ...]` hints + (lower risk) and low false-positive rates from observe sampling. + - **skip**: explicitly skip with a reason citing the observe sampling results + (e.g., "80% false positive rate per observe sampling", "low value"). + - **supersede**: absorb into a review cluster when the same files or root cause overlap. + You MUST make an explicit decision for every auto-cluster. Include a `## Backlog Decisions` + section listing each auto-cluster with: promote, skip (with reason), or supersede. For unclustered items: promote individually or group related ones into a manual cluster. - Mechanical items are NOT part of the Coverage Ledger — that ledger remains review-issues only. + The Coverage Ledger remains review-issues only — auto-clusters are covered by Backlog Decisions. 8. **Account for every issue exactly once** — every open issue hash must appear in exactly one cluster line or one skip line. Do not drop hashes, and do not repeat a hash in multiple clusters or in both a cluster and a skip. @@ -159,8 +179,10 @@ def _reflect_instructions(mode: PromptMode = "self_record") -> str: Cluster "media-lightbox-hooks" (all in src/domains/media-lightbox/) Cluster "task-typing" (both touch src/types/database.ts) -## Backlog Promotions -- Promote auto/unused-imports (overlaps with the files in cluster "task-typing") +## Backlog Decisions +- auto/unused-imports -> promote (overlaps with the files in cluster "task-typing") +- auto/dead-code -> skip "mostly test noise, low value" +- auto/type-assertions -> supersede "absorbed into cluster task-typing" ## Skip Decisions Skip "false-positive-current-code" (false positive per observe) @@ -215,9 +237,10 @@ def _organize_instructions(mode: PromptMode = "self_record") -> str: 3. Create clusters as specified in the blueprint: `desloppify plan cluster create --description "..."` 4. Add issues: `desloppify plan cluster add ` -5. Promote any mechanical backlog items that reflect explicitly selected: - - Auto-clusters: `desloppify plan promote auto/` - - Individual items: `desloppify plan promote ` +5. Execute ALL backlog decisions from the reflect stage's `## Backlog Decisions` section: + - **promote**: `desloppify plan promote auto/` + - **skip**: no CLI action needed — the cluster stays in backlog, skip is documented + - **supersede**: absorb into the named review cluster (already handled by clustering above) - With placement: `desloppify plan promote before -t ` 6. Add steps that consolidate: one step per file or logical change, NOT one step per issue 7. Set `--effort` on each step individually (trivial/small/medium/large) @@ -243,9 +266,10 @@ def _organize_instructions(mode: PromptMode = "self_record") -> str: If reflect skipped additional issues (over-engineering/not-worth-it), include those skip decisions. 3. Define the clusters exactly as they should be created. 4. Assign every kept issue to a cluster. -5. Consolidate steps: one step per file or logical change, NOT one step per issue. -6. Assign an effort level to each planned step (trivial/small/medium/large). -7. Call out cross-cluster dependencies when clusters touch overlapping files. +5. Execute ALL backlog decisions from reflect's `## Backlog Decisions` section (promote/skip/supersede). +6. Consolidate steps: one step per file or logical change, NOT one step per issue. +7. Assign an effort level to each planned step (trivial/small/medium/large). +8. Call out cross-cluster dependencies when clusters touch overlapping files. """ tail = """\ When done, write a plain-text organize report that names the clusters, their issue membership, diff --git a/desloppify/app/commands/plan/triage/runner/stage_validation.py b/desloppify/app/commands/plan/triage/runner/stage_validation.py index d73d9309b..22533c213 100644 --- a/desloppify/app/commands/plan/triage/runner/stage_validation.py +++ b/desloppify/app/commands/plan/triage/runner/stage_validation.py @@ -31,7 +31,6 @@ from ..stages.helpers import ( active_triage_issue_scope, scoped_manual_clusters_with_issues, - triage_scoped_plan, unclustered_review_issues, unenriched_clusters, value_check_targets, diff --git a/desloppify/app/commands/plan/triage/stages/evidence_parsing.py b/desloppify/app/commands/plan/triage/stages/evidence_parsing.py index 2ed5c2335..1197da778 100644 --- a/desloppify/app/commands/plan/triage/stages/evidence_parsing.py +++ b/desloppify/app/commands/plan/triage/stages/evidence_parsing.py @@ -58,6 +58,17 @@ class ObserveEvidence: has_parseable_ids: bool = True # False if valid_ids had no hex-hash IDs +@dataclass +class ClusterVerdict: + """A cluster-level verdict from observe-stage sampling.""" + + cluster_name: str + verdict: str # "actionable", "mostly-false-positives", "mixed", "low-value" + sample_count: int = 0 + false_positive_rate: float = 0.0 + recommendation: str = "" # "promote", "skip", "break_up" + + @dataclass class DecisionLedger: """Parsed keep/tighten/skip coverage from a value-check report.""" @@ -430,6 +441,96 @@ def validate_report_has_file_paths(report: str) -> list[EvidenceFailure]: )] +# --------------------------------------------------------------------------- +# Cluster-level verdict parsing (observe stage) +# --------------------------------------------------------------------------- + +_CLUSTER_VERDICT_KEYWORDS = frozenset({ + "actionable", "mostly-false-positives", "mostly false positives", + "mixed", "low-value", "low value", +}) + +# Matches: - cluster: auto/security-B602 +_YAML_CLUSTER_RE = re.compile(r"^\s*-?\s*cluster\s*:\s*(\S+)", re.IGNORECASE) +_YAML_SAMPLE_COUNT_RE = re.compile(r"^\s*sample_count\s*:\s*(\d+)", re.IGNORECASE) +_YAML_FP_RATE_RE = re.compile(r"^\s*false_positive_rate\s*:\s*([\d.]+)", re.IGNORECASE) + + +def parse_cluster_verdicts(report: str) -> list[ClusterVerdict]: + """Parse cluster-level verdicts from an observe-stage report. + + Supports YAML-like format: + - cluster: auto/security-B602 + verdict: mostly-false-positives + sample_count: 5 + false_positive_rate: 0.8 + recommendation: skip + """ + verdicts: list[ClusterVerdict] = [] + current: dict | None = None + + for line in report.splitlines(): + m_cluster = _YAML_CLUSTER_RE.match(line) + if m_cluster: + if current is not None: + v = _flush_cluster_verdict(current) + if v: + verdicts.append(v) + current = {"cluster": m_cluster.group(1).strip()} + continue + + if current is None: + continue + + m_verdict = _YAML_VERDICT_RE.match(line) + if m_verdict: + current["verdict"] = m_verdict.group(1).strip() + continue + + m_sample = _YAML_SAMPLE_COUNT_RE.match(line) + if m_sample: + current["sample_count"] = int(m_sample.group(1)) + continue + + m_fp = _YAML_FP_RATE_RE.match(line) + if m_fp: + try: + current["false_positive_rate"] = float(m_fp.group(1)) + except ValueError: + pass + continue + + m_rec = _YAML_RECOMMENDATION_RE.match(line) + if m_rec: + current["recommendation"] = m_rec.group(1).strip() + continue + + if current is not None: + v = _flush_cluster_verdict(current) + if v: + verdicts.append(v) + + return verdicts + + +def _flush_cluster_verdict(current: dict) -> ClusterVerdict | None: + """Convert a collected cluster verdict dict into a ClusterVerdict.""" + cluster_name = current.get("cluster", "") + if not cluster_name: + return None + raw_verdict = current.get("verdict", "") + # Accept any verdict text (don't enforce keywords — let the LLM express itself) + if not raw_verdict: + return None + return ClusterVerdict( + cluster_name=cluster_name, + verdict=raw_verdict.lower().strip(), + sample_count=current.get("sample_count", 0), + false_positive_rate=current.get("false_positive_rate", 0.0), + recommendation=current.get("recommendation", ""), + ) + + _VALUE_LEDGER_RE = re.compile( r"^\s*-\s*(?P.+?)\s*->\s*(?Pkeep|tighten|skip)\s*$", re.IGNORECASE, @@ -497,12 +598,14 @@ def resolve_short_hash_to_full_id(short_hash: str, valid_ids: set[str]) -> str | __all__ = [ + "ClusterVerdict", "DecisionLedger", "EvidenceFailure", "ObserveAssessment", "ObserveEvidence", "VERDICT_KEYWORDS", "format_evidence_failures", + "parse_cluster_verdicts", "parse_value_check_decision_ledger", "parse_observe_evidence", "resolve_short_hash_to_full_id", diff --git a/desloppify/app/commands/plan/triage/stages/helpers.py b/desloppify/app/commands/plan/triage/stages/helpers.py index 880292aa7..bf1860d8f 100644 --- a/desloppify/app/commands/plan/triage/stages/helpers.py +++ b/desloppify/app/commands/plan/triage/stages/helpers.py @@ -4,7 +4,7 @@ from desloppify.base.output.terminal import colorize from desloppify.engine._plan.constants import is_synthetic_id -from desloppify.engine._state.issue_semantics import is_triage_finding +from desloppify.engine._state.issue_semantics import is_review_work_item, is_triage_finding from desloppify.engine.plan_triage import TRIAGE_IDS from ..review_coverage import ( @@ -155,10 +155,13 @@ def unclustered_review_issues(plan: dict, state: dict | None = None) -> list[str } if state is not None: + # Only count review-type issues for ledger purposes — mechanical + # defects are covered by cluster-level backlog decisions, not + # per-item ledger entries. review_ids = [ fid for fid, finding in (state.get("work_items") or state.get("issues", {})).items() if finding.get("status") == "open" - and is_triage_finding(finding) + and is_review_work_item(finding) ] frozen_ids = (plan.get("epic_triage_meta", {}) or {}).get("active_triage_issue_ids") if isinstance(frozen_ids, list) and frozen_ids: diff --git a/desloppify/app/commands/plan/triage/stages/observe.py b/desloppify/app/commands/plan/triage/stages/observe.py index 081089c2a..0ee922e94 100644 --- a/desloppify/app/commands/plan/triage/stages/observe.py +++ b/desloppify/app/commands/plan/triage/stages/observe.py @@ -96,6 +96,7 @@ def cmd_stage_observe( from .evidence_parsing import ( format_evidence_failures, + parse_cluster_verdicts, parse_observe_evidence, resolve_short_hash_to_full_id, validate_observe_evidence, @@ -104,6 +105,7 @@ def cmd_stage_observe( valid_ids = set(review_issues.keys()) cited = resolved_services.extract_issue_citations(report, valid_ids) evidence = parse_observe_evidence(report, valid_ids) + cluster_verdicts = parse_cluster_verdicts(report) evidence_failures = validate_observe_evidence(evidence, issue_count) blocking = [failure for failure in evidence_failures if failure.blocking] advisory = [failure for failure in evidence_failures if not failure.blocking] @@ -144,6 +146,19 @@ def cmd_stage_observe( } meta["issue_dispositions"] = dispositions + # Store cluster-level verdicts from auto-cluster sampling + if cluster_verdicts: + meta["cluster_verdicts"] = [ + { + "cluster": v.cluster_name, + "verdict": v.verdict, + "sample_count": v.sample_count, + "false_positive_rate": v.false_positive_rate, + "recommendation": v.recommendation, + } + for v in cluster_verdicts + ] + cleared = record_observe_stage( stages, report=report, diff --git a/desloppify/app/commands/plan/triage/stages/organize.py b/desloppify/app/commands/plan/triage/stages/organize.py index 5c007af8e..6f6b137e8 100644 --- a/desloppify/app/commands/plan/triage/stages/organize.py +++ b/desloppify/app/commands/plan/triage/stages/organize.py @@ -21,6 +21,7 @@ _organize_report_or_error, _unclustered_review_issues_or_error, _validate_organize_against_ledger_or_error, + validate_backlog_promotions_executed, ) from ..validation.stage_policy import require_prerequisite from .records import record_organize_stage @@ -135,6 +136,12 @@ def _validate_organize_submission( plan=plan, stages=stages, ): return None + + # Warn (non-blocking) when reflect requested backlog promotions that weren't executed + backlog_warnings = validate_backlog_promotions_executed(plan=plan, stages=stages) + for warning in backlog_warnings: + print(colorize(f" Warning: {warning}", "yellow")) + if not _enforce_cluster_activity_for_organize( plan=plan, stages=stages, diff --git a/desloppify/app/commands/plan/triage/stages/reflect.py b/desloppify/app/commands/plan/triage/stages/reflect.py index 52f512bac..99db03752 100644 --- a/desloppify/app/commands/plan/triage/stages/reflect.py +++ b/desloppify/app/commands/plan/triage/stages/reflect.py @@ -11,8 +11,11 @@ from ..stage_queue import cascade_clear_dispositions, cascade_clear_later_confirmations, has_triage_in_queue from ..services import TriageServices, default_triage_services from ..validation.reflect_accounting import ( + BacklogDecision, ReflectDisposition, + parse_backlog_decisions, parse_reflect_dispositions, + validate_backlog_decisions, validate_reflect_accounting, ) from ..validation.stage_policy import auto_confirm_observe_if_attested @@ -55,7 +58,7 @@ def _validate_reflect_submission( stages: dict, attestation: str | None, services: TriageServices, -) -> tuple[object, int, dict, list[str], set[str], list[str], list[str], list[ReflectDisposition]] | None: +) -> tuple[object, int, dict, list[str], set[str], list[str], list[str], list[ReflectDisposition], list[BacklogDecision]] | None: if "observe" not in stages: print(colorize(" Cannot reflect: observe stage not complete.", "red")) print(colorize(' Run: desloppify plan triage --stage observe --report "..."', "dim")) @@ -127,6 +130,18 @@ def _validate_reflect_submission( # Parse structured disposition ledger from Coverage Ledger section disposition_ledger = parse_reflect_dispositions(report, valid_ids) + # Validate backlog decisions for auto-clusters (warn, don't block) + auto_clusters = getattr(triage_input, "auto_clusters", None) or {} + auto_cluster_names = sorted(auto_clusters.keys()) + _, backlog_warnings = validate_backlog_decisions( + report=report, + auto_cluster_names=auto_cluster_names, + ) + for warning in backlog_warnings: + print(colorize(f" Warning: {warning}", "yellow")) + + backlog_decisions = parse_backlog_decisions(report) + return ( triage_input, issue_count, @@ -136,6 +151,7 @@ def _validate_reflect_submission( missing_ids, duplicate_ids, disposition_ledger, + backlog_decisions, ) @@ -151,6 +167,7 @@ def _persist_reflect_stage( duplicate_ids: list[str], recurring_dims: list[str], disposition_ledger: list[ReflectDisposition], + backlog_decisions: list[BacklogDecision], existing_stage: dict | None, is_reuse: bool, services: TriageServices, @@ -182,6 +199,9 @@ def _persist_reflect_stage( entry["target"] = d.target entry["decision_source"] = "reflect" + if backlog_decisions: + reflect_stage["backlog_decisions"] = [d.to_dict() for d in backlog_decisions] + stages["reflect"] = reflect_stage if is_reuse and existing_stage and existing_stage.get("confirmed_at"): reflect_stage["confirmed_at"] = existing_stage["confirmed_at"] @@ -239,6 +259,7 @@ def _cmd_stage_reflect( ( triage_input, issue_count, recurring, recurring_dims, cited_ids, missing_ids, duplicate_ids, disposition_ledger, + backlog_decisions, ) = submission reflect_stage, cleared = _persist_reflect_stage( plan=plan, @@ -251,6 +272,7 @@ def _cmd_stage_reflect( duplicate_ids=duplicate_ids, recurring_dims=recurring_dims, disposition_ledger=disposition_ledger, + backlog_decisions=backlog_decisions, existing_stage=existing_stage, is_reuse=is_reuse, services=resolved_services, diff --git a/desloppify/app/commands/plan/triage/validation/organize_policy.py b/desloppify/app/commands/plan/triage/validation/organize_policy.py index cf5b4f89b..d46f33d73 100644 --- a/desloppify/app/commands/plan/triage/validation/organize_policy.py +++ b/desloppify/app/commands/plan/triage/validation/organize_policy.py @@ -9,7 +9,7 @@ from ..review_coverage import cluster_issue_ids, manual_clusters_with_issues from ..stages.helpers import unclustered_review_issues, unenriched_clusters -from .reflect_accounting import ReflectDisposition +from .reflect_accounting import BacklogDecision, ReflectDisposition @dataclass(frozen=True) @@ -275,6 +275,43 @@ def _validate_organize_against_ledger_or_error( return False +def validate_backlog_promotions_executed( + *, + plan: dict, + stages: dict, +) -> list[str]: + """Warn when reflect requested backlog promotions that organize didn't execute. + + Returns a list of warning strings (non-blocking). Empty means all good. + """ + reflect_data = stages.get("reflect", {}) + raw_decisions = reflect_data.get("backlog_decisions", []) + if not raw_decisions: + return [] + + decisions = [BacklogDecision.from_dict(d) for d in raw_decisions] + promote_decisions = [d for d in decisions if d.decision == "promote"] + if not promote_decisions: + return [] + + # Check which promoted clusters actually got promoted (are in queue_order + # or have execution_status set to active) + clusters = plan.get("clusters", {}) + warnings: list[str] = [] + for decision in promote_decisions: + cluster = clusters.get(decision.cluster_name) + if cluster is None: + continue + # A promoted cluster should have been activated + execution_status = cluster.get("execution_status", "") + if execution_status not in ("active", "in_progress"): + warnings.append( + f"Reflect requested promoting {decision.cluster_name} " + f"but it was not promoted during organize." + ) + return warnings + + __all__ = [ "ActualDisposition", "LedgerMismatch", @@ -283,6 +320,7 @@ def _validate_organize_against_ledger_or_error( "_organize_report_or_error", "_unclustered_review_issues_or_error", "_validate_organize_against_ledger_or_error", + "validate_backlog_promotions_executed", "validate_organize_against_dispositions", "validate_organize_against_reflect_ledger", ] diff --git a/desloppify/app/commands/plan/triage/validation/reflect_accounting.py b/desloppify/app/commands/plan/triage/validation/reflect_accounting.py index 6aa031509..b9b562047 100644 --- a/desloppify/app/commands/plan/triage/validation/reflect_accounting.py +++ b/desloppify/app/commands/plan/triage/validation/reflect_accounting.py @@ -366,9 +366,121 @@ def validate_reflect_accounting( return False, cited, missing, duplicates +BacklogDecisionKind = Literal["promote", "skip", "supersede"] + + +@dataclass(frozen=True) +class BacklogDecision: + """One auto-cluster's intended disposition as declared by the reflect stage.""" + + cluster_name: str + decision: BacklogDecisionKind + reason: str = "" + + def to_dict(self) -> dict: + """Serialize for JSON persistence.""" + d: dict = {"cluster_name": self.cluster_name, "decision": self.decision} + if self.reason: + d["reason"] = self.reason + return d + + @classmethod + def from_dict(cls, data: dict | BacklogDecision) -> BacklogDecision: + """Deserialize from persisted plan data, or pass through unchanged.""" + if isinstance(data, cls): + return data + return cls( + cluster_name=data.get("cluster_name", ""), + decision=data.get("decision", "skip"), # type: ignore[arg-type] + reason=data.get("reason", ""), + ) + + +_BACKLOG_DECISION_RE = re.compile( + r"-\s*(\S+)\s*->\s*(promote|skip|supersede)\b\s*(.*)", + re.IGNORECASE, +) + + +def _iter_backlog_decisions_lines(report: str) -> tuple[bool, list[str]]: + """Extract lines from the ## Backlog Decisions section of a reflect report.""" + found_section = False + in_section = False + lines: list[str] = [] + for raw_line in report.splitlines(): + line = raw_line.strip() + if re.fullmatch(r"##\s+Backlog Decisions", line, re.IGNORECASE): + found_section = True + in_section = True + continue + if in_section and re.match(r"##\s+", line): + break + if in_section: + lines.append(line) + return found_section, lines + + +def parse_backlog_decisions(report: str) -> list[BacklogDecision]: + """Parse structured backlog decisions from the ## Backlog Decisions section.""" + _, lines = _iter_backlog_decisions_lines(report) + decisions: list[BacklogDecision] = [] + for line in lines: + match = _BACKLOG_DECISION_RE.match(line) + if not match: + continue + cluster_name = match.group(1).strip().strip("`") + decision_raw = match.group(2).strip().lower() + reason = match.group(3).strip().strip('"\'') + if decision_raw in ("promote", "skip", "supersede"): + decisions.append(BacklogDecision( + cluster_name=cluster_name, + decision=decision_raw, # type: ignore[arg-type] + reason=reason, + )) + return decisions + + +def validate_backlog_decisions( + *, + report: str, + auto_cluster_names: list[str], +) -> tuple[bool, list[str]]: + """Require every auto-cluster to have an explicit backlog decision. + + Returns ``(ok, messages)`` — ``ok=False`` (blocking) when auto-clusters + are missing decisions. Every auto-cluster must be accounted for. + """ + if not auto_cluster_names: + return True, [] + + found_section, _ = _iter_backlog_decisions_lines(report) + if not found_section: + return False, [ + f"Reflect report has {len(auto_cluster_names)} auto-cluster(s) " + "but no `## Backlog Decisions` section. Every auto-cluster must have an " + "explicit decision: promote, skip (with reason), or supersede." + ] + + decisions = parse_backlog_decisions(report) + decided_names = {d.cluster_name for d in decisions} + missing = [name for name in auto_cluster_names if name not in decided_names] + if missing: + missing_str = ", ".join(missing[:10]) + suffix = f" (and {len(missing) - 10} more)" if len(missing) > 10 else "" + return False, [ + f"Backlog Decisions section is missing decisions for {len(missing)} " + f"auto-cluster(s): {missing_str}{suffix}" + ] + + return True, [] + + __all__ = [ + "BacklogDecision", "ReflectDisposition", "analyze_reflect_issue_accounting", + "parse_backlog_decisions", "parse_reflect_dispositions", + "validate_backlog_decisions", "validate_reflect_accounting", ] diff --git a/desloppify/app/commands/registry.py b/desloppify/app/commands/registry.py index 861d7133d..e8fb46413 100644 --- a/desloppify/app/commands/registry.py +++ b/desloppify/app/commands/registry.py @@ -15,6 +15,7 @@ def _build_handlers() -> dict[str, CommandHandler]: from desloppify.app.commands.backlog import cmd_backlog from desloppify.app.commands.config import cmd_config from desloppify.app.commands.detect import cmd_detect + from desloppify.app.commands.directives import cmd_directives from desloppify.app.commands.dev import cmd_dev from desloppify.app.commands.exclude import cmd_exclude from desloppify.app.commands.langs import cmd_langs @@ -47,6 +48,7 @@ def _build_handlers() -> dict[str, CommandHandler]: "zone": cmd_zone, "review": cmd_review, "config": cmd_config, + "directives": cmd_directives, "dev": cmd_dev, "langs": cmd_langs, "update-skill": cmd_update_skill, diff --git a/desloppify/app/commands/resolve/living_plan.py b/desloppify/app/commands/resolve/living_plan.py index 17044eee9..fcf301e27 100644 --- a/desloppify/app/commands/resolve/living_plan.py +++ b/desloppify/app/commands/resolve/living_plan.py @@ -7,6 +7,7 @@ from pathlib import Path from typing import NamedTuple +from desloppify.app.commands.helpers.transition_messages import emit_transition_message from desloppify.base.config import target_strict_score_from_config from desloppify.base.exception_sets import PLAN_LOAD_EXCEPTIONS from desloppify.base.output.terminal import colorize @@ -17,7 +18,10 @@ auto_complete_steps, purge_ids, ) -from desloppify.engine._plan.refresh_lifecycle import clear_postflight_scan_completion +from desloppify.engine._plan.refresh_lifecycle import ( + LIFECYCLE_PHASE_EXECUTE, + clear_postflight_scan_completion, +) from desloppify.engine.plan_state import ( add_uncommitted_issues, has_living_plan, @@ -36,12 +40,19 @@ class ClusterContext(NamedTuple): cluster_remaining: int -def _reconcile_if_queue_drained(plan: dict, state: dict | None) -> None: - """Advance the living plan when a resolve drains the explicit live queue.""" +def _reconcile_if_queue_drained(plan: dict, state: dict | None) -> str | None: + """Advance the living plan when a resolve drains the explicit live queue. + + Returns the new lifecycle phase if a transition occurred, so the caller + can emit the directive after all other output. + """ if state is None or not live_planned_queue_empty(plan): - return + return None target_strict = target_strict_score_from_config(state.get("config")) - reconcile_plan(plan, state, target_strict=target_strict) + result = reconcile_plan(plan, state, target_strict=target_strict) + if result is not None and result.lifecycle_phase_changed: + return result.lifecycle_phase + return None def capture_cluster_context(plan: dict, resolved_ids: list[str]) -> ClusterContext: @@ -102,15 +113,27 @@ def update_living_plan_after_resolve( cluster_name=ctx.cluster_name, actor="user", ) + # Clear focus when cluster is done + if plan.get("active_cluster") == ctx.cluster_name: + plan["active_cluster"] = None + elif ctx.cluster_name and ctx.cluster_remaining > 0: + # Auto-focus on the cluster while there's still work in it + plan["active_cluster"] = ctx.cluster_name if args.status == "fixed": add_uncommitted_issues(plan, all_resolved) elif args.status == "open": purge_uncommitted_ids(plan, all_resolved) - clear_postflight_scan_completion(plan, issue_ids=all_resolved, state=state) - _reconcile_if_queue_drained(plan, state) + transition_phase: str | None = None + if clear_postflight_scan_completion(plan, issue_ids=all_resolved, state=state): + transition_phase = LIFECYCLE_PHASE_EXECUTE + reconcile_phase = _reconcile_if_queue_drained(plan, state) + if reconcile_phase: + transition_phase = reconcile_phase save_plan(plan, plan_path) if purged: print(colorize(f" Plan updated: {purged} item(s) removed from queue.", "dim")) + if transition_phase: + emit_transition_message(transition_phase) except PLAN_LOAD_EXCEPTIONS as exc: _logger.debug("plan update failed after resolve", exc_info=True) warn_plan_load_degraded_once( diff --git a/desloppify/app/commands/resolve/messages.py b/desloppify/app/commands/resolve/messages.py index f9326bc24..a984dcddc 100644 --- a/desloppify/app/commands/resolve/messages.py +++ b/desloppify/app/commands/resolve/messages.py @@ -3,12 +3,68 @@ from __future__ import annotations import argparse +import logging from desloppify.base.output.terminal import colorize from desloppify.base.output.user_message import print_user_message from .living_plan import ClusterContext +logger = logging.getLogger(__name__) + +_NEXT_TASK_INSTRUCTIONS = ( + "A desloppify task was just marked complete. Here's what to do next:\n" + "\n" + "1. Run `desloppify next` to see the next task in the queue\n" + "2. Read and understand the issue — explore the relevant files and scope\n" + "3. Execute the fix thoroughly and verify it works\n" + "4. Once you're happy with it, commit and push:\n" + " `git add -A && git commit -m '' && git push`\n" + "5. Record the commit: `desloppify plan commit-log record`\n" + "6. Mark it resolved: `desloppify resolve --fixed --attest ''`" +) + + +def _hermes_reset_and_instruct( + *, + cluster_name: str | None = None, + cluster_remaining: int = 0, +) -> None: + """Reset Hermes context and inject next-task instructions via control API.""" + from desloppify.app.commands.helpers.transition_messages import ( + _hermes_available, + _hermes_send_message, + ) + + if not _hermes_available(): + return + try: + # Reset conversation to clear stale context from the previous task + result = _hermes_send_message("/reset", mode="interrupt") + if not result.get("success"): + return + + # Build context-aware instructions + if cluster_name and cluster_remaining > 0: + instructions = ( + f"A desloppify task was just marked complete. You're working through " + f"cluster '{cluster_name}' — {cluster_remaining} item(s) remaining.\n" + f"\n" + f"1. Run `desloppify next` to see the next task (focus is on '{cluster_name}')\n" + f"2. Read the step detail shown under 'Your step(s):' — it has exact file paths and line numbers\n" + f"3. Execute the fix and verify it works\n" + f"4. Commit: `git add -A && git commit -m '' && git push`\n" + f"5. Record: `desloppify plan commit-log record`\n" + f"6. Resolve: `desloppify resolve --fixed --attest ''`\n" + f"\nKeep going until the cluster is finished." + ) + else: + instructions = _NEXT_TASK_INSTRUCTIONS + + _hermes_send_message(instructions, mode="queue") + except Exception as exc: + logger.debug("Hermes next-task injection skipped: %s", exc) + def print_no_match_warning(args: argparse.Namespace) -> None: status_label = "resolved" if args.status == "open" else "open" @@ -49,5 +105,11 @@ def print_fixed_next_user_message( " to commit and push. Otherwise just keep going." ) + # Also inject via Hermes control API for a clean context switch + _hermes_reset_and_instruct( + cluster_name=cluster_ctx.cluster_name if mid_cluster else None, + cluster_remaining=cluster_ctx.cluster_remaining if mid_cluster else 0, + ) + __all__ = ["print_fixed_next_user_message", "print_no_match_warning"] diff --git a/desloppify/app/commands/review/batch/execution_phases.py b/desloppify/app/commands/review/batch/execution_phases.py index b38fe6ff0..99f25a819 100644 --- a/desloppify/app/commands/review/batch/execution_phases.py +++ b/desloppify/app/commands/review/batch/execution_phases.py @@ -553,6 +553,12 @@ def execute_batch_run(*, prepared: PreparedBatchRunContext, deps: BatchRunDeps) ) +def _is_partial_batch_retry(prepared: PreparedBatchRunContext) -> bool: + """Return True when the current run targets a subset of the packet's batches.""" + all_indexes = set(range(len(prepared.batches))) + return set(prepared.selected_indexes) != all_indexes + + def merge_and_import_batch_run( *, prepared: PreparedBatchRunContext, @@ -578,10 +584,19 @@ def merge_and_import_batch_run( safe_write_text_fn=deps.safe_write_text_fn, colorize_fn=deps.colorize_fn, ) + + # When retrying a subset of batches (--only-batches), the merged output + # only contains the retried dimensions. Skip the coverage gate so the + # partial result can be imported — the original run already covered the + # remaining dimensions. + allow_partial = prepared.allow_partial + if _is_partial_batch_retry(prepared): + allow_partial = True + enforce_import_coverage( missing_after_import=missing_after_import, packet_dimensions=prepared.packet_dimensions, - allow_partial=prepared.allow_partial, + allow_partial=allow_partial, scan_path=prepared.scan_path, colorize_fn=deps.colorize_fn, ) @@ -606,6 +621,7 @@ def merge_and_import_batch_run( "PreparedPacketScope", "PreparedBatchRunContext", "PreparedRunArtifacts", + "_is_partial_batch_retry", "_prepare_packet_scope", "_prepare_run_runtime", "_print_runtime_expectation", diff --git a/desloppify/app/commands/review/batch/orchestrator.py b/desloppify/app/commands/review/batch/orchestrator.py index 0d6ea09ca..1902a7005 100644 --- a/desloppify/app/commands/review/batch/orchestrator.py +++ b/desloppify/app/commands/review/batch/orchestrator.py @@ -582,7 +582,7 @@ def do_import_run( lang_name=lang_name, scan_path=scan_path, deps=FollowupScanDeps( - project_root=_runtime_project_root(), + project_root=Path(state_file).parent.parent, timeout_seconds=FOLLOWUP_SCAN_TIMEOUT_SECONDS, python_executable=sys.executable, subprocess_run=subprocess.run, diff --git a/desloppify/app/commands/review/coordinator.py b/desloppify/app/commands/review/coordinator.py index 5d75580f0..f7c4bed28 100644 --- a/desloppify/app/commands/review/coordinator.py +++ b/desloppify/app/commands/review/coordinator.py @@ -66,19 +66,22 @@ def git_baseline( if head_proc.returncode != 0: return None, None head = head_proc.stdout.strip() or None - status_proc = subprocess_run( - [ - "git", - "-C", - str(project_root), - "status", - "--porcelain", - "--untracked-files=normal", - ], - capture_output=True, - text=True, - check=False, - ) + try: + status_proc = subprocess_run( + [ + "git", + "-C", + str(project_root), + "status", + "--porcelain", + "--untracked-files=normal", + ], + capture_output=True, + text=True, + check=False, + ) + except OSError: + return None, None status_raw = status_proc.stdout if status_proc.returncode == 0 else "" status_hash = _stable_json_sha256(status_raw) return head, status_hash diff --git a/desloppify/app/commands/review/importing/plan_sync.py b/desloppify/app/commands/review/importing/plan_sync.py index d54ef012b..bf71e3fb6 100644 --- a/desloppify/app/commands/review/importing/plan_sync.py +++ b/desloppify/app/commands/review/importing/plan_sync.py @@ -6,6 +6,7 @@ from pathlib import Path from desloppify.app.commands.helpers.issue_id_display import short_issue_id +from desloppify.app.commands.helpers.transition_messages import emit_transition_message from desloppify.app.commands.review.importing.flags import imported_assessment_keys from desloppify.base.config import target_strict_score_from_config from desloppify.base.exception_sets import PLAN_LOAD_EXCEPTIONS @@ -27,7 +28,10 @@ reconcile_plan, ) from desloppify.engine._plan.sync.workflow_gates import sync_import_scores_needed -from desloppify.engine._plan.sync.workflow import clear_score_communicated_sentinel +from desloppify.engine._plan.sync.workflow import ( + clear_create_plan_sentinel, + clear_score_communicated_sentinel, +) from desloppify.engine._plan.refresh_lifecycle import mark_subjective_review_completed from desloppify.engine.plan_triage import ( TRIAGE_CMD_RUN_STAGES_CLAUDE, @@ -69,6 +73,7 @@ class _ImportPlanTransition: covered_pruned: list[str] import_scores_result: object reconcile_result: ReconcileResult + transition_phase: str | None = None def _print_review_import_sync( @@ -245,6 +250,7 @@ def _apply_import_plan_transitions( ) if trusted: clear_score_communicated_sentinel(plan) + clear_create_plan_sentinel(plan) if sync_inputs.covered_ids: mark_subjective_review_completed( plan, @@ -270,11 +276,17 @@ def _apply_import_plan_transitions( triage_deferred=import_result.triage_deferred, ) + transition_phase = ( + reconcile_result.lifecycle_phase + if reconcile_result.lifecycle_phase_changed + else None + ) return _ImportPlanTransition( import_result=import_result, covered_pruned=covered_pruned, import_scores_result=import_scores_result, reconcile_result=reconcile_result, + transition_phase=transition_phase, ) @@ -445,6 +457,8 @@ def sync_plan_after_import( outcome=outcome, ) _print_workflow_injected_message(result.workflow_injected_ids) + if transition.transition_phase: + emit_transition_message(transition.transition_phase) return outcome except PLAN_LOAD_EXCEPTIONS as exc: message = f"skipped plan sync after review import ({exc})" diff --git a/desloppify/app/commands/review/runner_process_impl/io.py b/desloppify/app/commands/review/runner_process_impl/io.py index 28b918b78..9a4b03f97 100644 --- a/desloppify/app/commands/review/runner_process_impl/io.py +++ b/desloppify/app/commands/review/runner_process_impl/io.py @@ -36,7 +36,7 @@ def _output_file_has_json_payload(output_file: Path) -> bool: if not output_file.exists(): return False try: - payload = json.loads(output_file.read_text()) + payload = json.loads(output_file.read_text(encoding="utf-8", errors="replace")) except (OSError, json.JSONDecodeError): return False return isinstance(payload, dict) @@ -52,7 +52,7 @@ def extract_payload_from_log( if not log_path.exists(): return None try: - log_text = log_path.read_text() + log_text = log_path.read_text(encoding="utf-8", errors="replace") except OSError: return None diff --git a/desloppify/app/commands/runner/codex_batch.py b/desloppify/app/commands/runner/codex_batch.py index b7bad99c8..0cbc70cea 100644 --- a/desloppify/app/commands/runner/codex_batch.py +++ b/desloppify/app/commands/runner/codex_batch.py @@ -4,6 +4,7 @@ import os import shutil +import subprocess import sys from pathlib import Path @@ -32,23 +33,55 @@ def _resolve_executable(name: str) -> list[str]: When ``shutil.which()`` cannot locate the executable on Windows, we still route through ``cmd /c`` so the shell's own PATH resolution can find ``.cmd``/``.bat`` wrappers that Python's ``which`` missed. + + Returns the command prefix tokens. On Windows, this will be + ``["cmd", "/c", executable]``; the caller should pass the final + assembled command through :func:`_wrap_cmd_c` to collapse everything + after ``/c`` into a single properly-quoted string. """ resolved = shutil.which(name) if sys.platform == "win32": + target = resolved or name if resolved is not None and resolved.lower().endswith((".cmd", ".bat")): - return ["cmd", "/c", resolved] + return ["cmd", "/c", target] # shutil.which may miss .cmd/.bat wrappers — let cmd.exe resolve it - return ["cmd", "/c", resolved or name] + return ["cmd", "/c", target] return [resolved or name] +def _wrap_cmd_c(cmd: list[str]) -> list[str]: + """Collapse a ``cmd /c `` list into proper form. + + ``cmd /c`` concatenates everything after ``/c`` into a single string and + re-parses it with its own tokeniser. When arguments contain spaces + (e.g. repo paths like ``core_project - Copy``), passing them as separate + list elements causes ``subprocess.list2cmdline()`` to quote them + individually, but ``cmd``'s re-parsing can still split on spaces in + certain edge cases. + + The reliable approach is to build the real command string ourselves with + ``subprocess.list2cmdline()`` and pass that as a **single** token after + ``/c``:: + + ["cmd", "/c", "codex exec -C \\"path with spaces\\" ..."] + + ``list2cmdline`` on the outer list then leaves the inner string untouched + (it contains no special characters that need additional quoting), and + ``cmd /c`` receives exactly the string we intended. + """ + if len(cmd) >= 3 and cmd[0].lower() == "cmd" and cmd[1].lower() == "/c": + inner = subprocess.list2cmdline(cmd[2:]) + return ["cmd", "/c", inner] + return cmd + + def codex_batch_command(*, prompt: str, repo_root: Path, output_file: Path) -> list[str]: """Build one codex exec command line for a batch prompt.""" effort = os.environ.get("DESLOPPIFY_CODEX_REASONING_EFFORT", "low").strip().lower() if effort not in {"low", "medium", "high", "xhigh"}: effort = "low" prefix = _resolve_executable("codex") - return [ + cmd = [ *prefix, "exec", "--ephemeral", @@ -64,6 +97,7 @@ def codex_batch_command(*, prompt: str, repo_root: Path, output_file: Path) -> l str(output_file), prompt, ] + return _wrap_cmd_c(cmd) def run_codex_batch( diff --git a/desloppify/app/commands/scan/helpers.py b/desloppify/app/commands/scan/helpers.py index 5469bce51..efc588aa2 100644 --- a/desloppify/app/commands/scan/helpers.py +++ b/desloppify/app/commands/scan/helpers.py @@ -77,16 +77,29 @@ def audit_excluded_dirs( return stale_issues -def collect_codebase_metrics(lang, path: Path) -> dict | None: +def collect_codebase_metrics( + lang, + path: Path, + *, + files: list[str] | None = None, +) -> dict | None: """Collect LOC/file/directory counts for the configured language.""" - if not lang or not lang.file_finder: + if not lang: return None - files = lang.file_finder(path) + if files is None and not lang.file_finder: + return None + scan_root = Path(path) + files = _resolve_scan_files(lang, scan_root, files=files) total_loc = 0 dirs = set() for filepath in files: try: - total_loc += count_lines(Path(filepath)) + abs_path = _resolve_scan_file_path(filepath, project_root=scan_root) + content = read_file_text(abs_path) + if content is not None: + total_loc += len(content.splitlines()) + else: + total_loc += count_lines(Path(abs_path)) dirs.add(str(Path(filepath).parent)) except (OSError, UnicodeDecodeError) as exc: logger.debug( @@ -101,6 +114,21 @@ def collect_codebase_metrics(lang, path: Path) -> dict | None: } +def _resolve_scan_files(lang, path: Path, *, files: list[str] | None = None) -> list[str]: + """Return discovered source files, preferring an explicit precomputed list.""" + if files is not None: + return files + return lang.file_finder(path) + + +def _resolve_scan_file_path(filepath: str, *, project_root: Path) -> str: + """Resolve relative scan filepaths against the active scan path.""" + file_path = Path(filepath) + if file_path.is_absolute(): + return str(file_path) + return str((project_root / file_path).resolve()) + + def resolve_scan_profile(profile: str | None, lang) -> str: """Resolve effective scan profile from CLI and language defaults.""" if profile in {"objective", "full", "ci"}: diff --git a/desloppify/app/commands/scan/plan_reconcile.py b/desloppify/app/commands/scan/plan_reconcile.py index 95fbd1257..630194ce2 100644 --- a/desloppify/app/commands/scan/plan_reconcile.py +++ b/desloppify/app/commands/scan/plan_reconcile.py @@ -9,6 +9,7 @@ from desloppify.base.exception_sets import PLAN_LOAD_EXCEPTIONS from desloppify.base.output.fallbacks import log_best_effort_failure from desloppify.base.output.terminal import colorize +from desloppify.app.commands.helpers.transition_messages import emit_transition_message from desloppify.base.config import target_strict_score_from_config from desloppify.engine._plan.constants import ( WORKFLOW_COMMUNICATE_SCORE_ID, @@ -27,23 +28,30 @@ ) from desloppify.engine._plan.sync.dimensions import current_unscored_ids from desloppify.engine._plan.sync.context import is_mid_cycle -from desloppify.engine._plan.sync.workflow import clear_score_communicated_sentinel +from desloppify.engine._plan.sync.workflow import ( + clear_create_plan_sentinel, + clear_score_communicated_sentinel, +) from desloppify.engine.work_queue import build_deferred_disposition_item logger = logging.getLogger(__name__) def _reset_cycle_for_force_rescan(plan: dict[str, object]) -> bool: - """Clear all cycle state when --force-rescan is used.""" + """Clear synthetic queue items when --force-rescan is used. + + Preserves ``plan_start_scores`` so that ``is_mid_cycle()`` still + returns True — this prevents ``auto_cluster_issues()`` from running + full cluster regeneration, which would wipe manual cluster items. + """ order: list[str] = plan.get("queue_order", []) synthetic = [item for item in order if is_synthetic_id(item)] - if not synthetic and not plan.get("plan_start_scores"): + if not synthetic: return False for item in synthetic: order.remove(item) - plan["plan_start_scores"] = {} clear_score_communicated_sentinel(plan) - plan.pop("scan_count_at_plan_start", None) + clear_create_plan_sentinel(plan) meta = plan.get("epic_triage_meta", {}) if isinstance(meta, dict): meta.pop("triage_recommended", None) @@ -99,6 +107,7 @@ def _seed_plan_start_scores(plan: dict[str, object], state: state_mod.StateModel "verified": scores.verified, } clear_score_communicated_sentinel(plan) + clear_create_plan_sentinel(plan) plan["scan_count_at_plan_start"] = int(state.get("scan_count", 0) or 0) return True @@ -145,6 +154,7 @@ def _clear_plan_start_scores_if_queue_empty( state["_plan_start_scores_for_reveal"] = dict(plan["plan_start_scores"]) plan["plan_start_scores"] = {} clear_score_communicated_sentinel(plan) + clear_create_plan_sentinel(plan) return True @@ -295,13 +305,14 @@ def reconcile_plan_post_scan(runtime: Any) -> None: plan, mid_cycle=_is_mid_cycle_scan(plan, runtime.state) or force_rescan, ) + if result.lifecycle_phase_changed: + emit_transition_message(result.lifecycle_phase) dirty = result.dirty or dirty - if not force_rescan: - if _sync_plan_start_scores_and_log(plan, runtime.state): - dirty = True - if _sync_postflight_scan_completion_and_log(plan, runtime.state): - dirty = True + if not force_rescan and _sync_plan_start_scores_and_log(plan, runtime.state): + dirty = True + if _sync_postflight_scan_completion_and_log(plan, runtime.state): + dirty = True if dirty: try: diff --git a/desloppify/app/commands/scan/workflow.py b/desloppify/app/commands/scan/workflow.py index ceb37a6eb..e04ca411a 100644 --- a/desloppify/app/commands/scan/workflow.py +++ b/desloppify/app/commands/scan/workflow.py @@ -119,6 +119,20 @@ def _ensure_state_lang_capabilities( ) +def _state_review_cache(state: StateModel) -> dict[str, object]: + """Return language review cache payload, creating storage when missing.""" + review_cache = state.get("review_cache") + if review_cache is None: + normalized: dict[str, object] = {} + state["review_cache"] = normalized + return normalized + if isinstance(review_cache, dict): + return review_cache + raise ScanStateContractError( + "state.review_cache must be an object when present" + ) + + def _state_issues(state: StateModel) -> dict[str, dict[str, Any]]: """Return normalized issue map from state.""" issues = state.get("work_items") @@ -191,7 +205,7 @@ def _configure_lang_runtime( runtime_lang = make_lang_run( lang, overrides=LangRunOverrides( - review_cache=state.get("review_cache", {}), + review_cache=_state_review_cache(state), review_max_age_days=config.get("review_max_age_days", 30), subjective_assessments=_state_subjective_assessments(state), runtime_settings=lang_settings, @@ -308,13 +322,16 @@ def prepare_scan_runtime(args: argparse.Namespace) -> ScanRuntime: def _augment_with_stale_exclusion_issues( issues: list[dict[str, Any]], runtime: ScanRuntime, + *, + scanned_files: list[str] | None = None, ) -> list[dict[str, Any]]: """Append stale exclude issues when excluded dirs are unreferenced.""" extra_exclusions = get_exclusions() if not (extra_exclusions and runtime.lang and runtime.lang.file_finder): return issues - scanned_files = runtime.lang.file_finder(runtime.path) + if scanned_files is None: + scanned_files = runtime.lang.file_finder(runtime.path) stale = audit_excluded_dirs( extra_exclusions, scanned_files, get_project_root() ) @@ -328,6 +345,21 @@ def _augment_with_stale_exclusion_issues( return augmented +def _resolve_scanned_files(runtime: ScanRuntime) -> list[str]: + """Resolve scan file list once for post-generation lifecycle steps.""" + if not runtime.lang: + return [] + zone_map = getattr(runtime.lang, "zone_map", None) + if zone_map is not None and hasattr(zone_map, "all_files"): + files = zone_map.all_files() + if isinstance(files, list): + return files + file_finder = getattr(runtime.lang, "file_finder", None) + if not file_finder: + return [] + return file_finder(runtime.path) + + def _augment_with_stale_wontfix_issues( issues: list[dict[str, Any]], runtime: ScanRuntime, @@ -360,27 +392,35 @@ def run_scan_generation( profile=runtime.profile, ), ) + scanned_files = _resolve_scanned_files(runtime) + codebase_metrics = collect_codebase_metrics( + runtime.lang, + runtime.path, + files=scanned_files, + ) + warn_explicit_lang_with_no_files( + runtime.args, runtime.lang, runtime.path, codebase_metrics + ) + issues = _augment_with_stale_exclusion_issues( + issues, + runtime, + scanned_files=scanned_files, + ) + decay_scans = _coerce_int( + runtime.config.get("wontfix_decay_scans"), + default=_WONTFIX_DECAY_SCANS_DEFAULT, + ) + issues, monitored_wontfix = _augment_with_stale_wontfix_issues( + issues, + runtime, + decay_scans=max(decay_scans, 0), + ) + potentials["stale_wontfix"] = monitored_wontfix + return issues, potentials, codebase_metrics finally: disable_parse_cache() disable_file_cache() - codebase_metrics = collect_codebase_metrics(runtime.lang, runtime.path) - warn_explicit_lang_with_no_files( - runtime.args, runtime.lang, runtime.path, codebase_metrics - ) - issues = _augment_with_stale_exclusion_issues(issues, runtime) - decay_scans = _coerce_int( - runtime.config.get("wontfix_decay_scans"), - default=_WONTFIX_DECAY_SCANS_DEFAULT, - ) - issues, monitored_wontfix = _augment_with_stale_wontfix_issues( - issues, - runtime, - decay_scans=max(decay_scans, 0), - ) - potentials["stale_wontfix"] = monitored_wontfix - return issues, potentials, codebase_metrics - def merge_scan_results( runtime: ScanRuntime, diff --git a/desloppify/app/commands/status/flow.py b/desloppify/app/commands/status/flow.py index 431c03837..e61683c93 100644 --- a/desloppify/app/commands/status/flow.py +++ b/desloppify/app/commands/status/flow.py @@ -52,6 +52,12 @@ def _print_status_warnings(config: dict) -> None: + if config.get("hermes_enabled"): + print(colorize( + ' ⚕ Hermes agent mode — model switching, autoreply, task handoff active' + '\n To disable: set "hermes_enabled": false in config.json', + "cyan", + )) skill_warning = check_skill_version() if skill_warning: print(colorize(f" {skill_warning}", "yellow")) diff --git a/desloppify/base/config/schema.py b/desloppify/base/config/schema.py index 1497c412c..03e8fa004 100644 --- a/desloppify/base/config/schema.py +++ b/desloppify/base/config/schema.py @@ -100,6 +100,24 @@ class ConfigKey: False, "Allow loading user plugins from .desloppify/plugins/ (security opt-in)", ), + "transition_messages": ConfigKey( + dict, + {}, + "Messages shown to agents at lifecycle phase transitions {phase: message}", + ), + "hermes_enabled": ConfigKey( + bool, + False, + "Enable Hermes agent integration (model switching, autoreply, task handoff)", + ), + "hermes_models": ConfigKey( + dict, + { + "execute": "openrouter:x-ai/grok-4.20-beta", + "review": "openrouter:google/gemini-3.1-pro-preview", + }, + "Phase → provider:model mapping for Hermes model switching", + ), } diff --git a/desloppify/base/discovery/source.py b/desloppify/base/discovery/source.py index b0b564ee8..25e2ce934 100644 --- a/desloppify/base/discovery/source.py +++ b/desloppify/base/discovery/source.py @@ -286,6 +286,30 @@ def find_tsx_files(path: str | Path, *, runtime: RuntimeContext | None = None) - return find_source_files(path, [".tsx"], runtime=runtime) +def find_js_and_jsx_files( + path: str | Path, + *, + runtime: RuntimeContext | None = None, +) -> list[str]: + """Find JavaScript source files across common extensions.""" + exts = [".js", ".jsx", ".mjs", ".cjs"] + if runtime is None: + return find_source_files(path, exts) + return find_source_files(path, exts, runtime=runtime) + + +def find_js_ts_and_tsx_files( + path: str | Path, + *, + runtime: RuntimeContext | None = None, +) -> list[str]: + """Find JavaScript + TypeScript source files across common extensions.""" + exts = [".js", ".jsx", ".mjs", ".cjs", ".ts", ".tsx"] + if runtime is None: + return find_source_files(path, exts) + return find_source_files(path, exts, runtime=runtime) + + def find_py_files(path: str | Path, *, runtime: RuntimeContext | None = None) -> list[str]: if runtime is None: return find_source_files(path, [".py"]) @@ -309,5 +333,7 @@ def find_py_files(path: str | Path, *, runtime: RuntimeContext | None = None) -> "find_ts_files", "find_ts_and_tsx_files", "find_tsx_files", + "find_js_and_jsx_files", + "find_js_ts_and_tsx_files", "find_py_files", ] diff --git a/desloppify/base/registry/catalog_entries.py b/desloppify/base/registry/catalog_entries.py index d2015525d..f5b6224aa 100644 --- a/desloppify/base/registry/catalog_entries.py +++ b/desloppify/base/registry/catalog_entries.py @@ -189,6 +189,28 @@ tier=3, subjective_dimensions=("design_coherence",), ), + "nextjs": DetectorMeta( + "nextjs", + "nextjs", + "Code quality", + "refactor", + "fix Next.js framework smells (RSC/client boundaries, routing, middleware, env leakage)", + needs_judgment=True, + standalone_threshold="medium", + tier=3, + marks_dims_stale=True, + subjective_dimensions=("design_coherence", "logic_clarity"), + ), + "next_lint": DetectorMeta( + "next_lint", + "next lint", + "Code quality", + "manual_fix", + "run `next lint` and fix reported ESLint issues", + tier=2, + marks_dims_stale=True, + subjective_dimensions=("convention_outlier",), + ), "dupes": DetectorMeta( "dupes", "dupes", diff --git a/desloppify/base/registry/catalog_models.py b/desloppify/base/registry/catalog_models.py index 1c3b0b767..31c273c9e 100644 --- a/desloppify/base/registry/catalog_models.py +++ b/desloppify/base/registry/catalog_models.py @@ -22,6 +22,8 @@ "naming", "smells", "react", + "nextjs", + "next_lint", "dupes", "stale_exclude", "dict_keys", diff --git a/desloppify/engine/_plan/cluster_strategy.py b/desloppify/engine/_plan/cluster_strategy.py index baae1ac3a..0c6c25051 100644 --- a/desloppify/engine/_plan/cluster_strategy.py +++ b/desloppify/engine/_plan/cluster_strategy.py @@ -32,8 +32,16 @@ def grouping_key(issue: dict, meta: DetectorMeta | None) -> str | None: return None if meta.action_type == "auto_fix": + detail = issue.get("detail") or {} + kind = detail.get("kind", "") + if kind: + return f"auto::{detector}::{kind}" return f"auto::{detector}" + detail = issue.get("detail") or {} + kind = detail.get("kind", "") + if kind: + return f"detector::{detector}::{kind}" return f"detector::{detector}" diff --git a/desloppify/engine/_plan/operations/cluster.py b/desloppify/engine/_plan/operations/cluster.py index 1c9fd6632..2dd22e3e0 100644 --- a/desloppify/engine/_plan/operations/cluster.py +++ b/desloppify/engine/_plan/operations/cluster.py @@ -98,16 +98,25 @@ def add_to_cluster( cluster_name: str, issue_ids: list[str], ) -> int: - """Add issue IDs to a cluster. Returns count added.""" + """Add issue IDs to a cluster. Returns count added. + + For non-auto clusters, also ensures the issue IDs appear in + ``queue_order`` so they're visible in ``desloppify next``. + """ ensure_plan_defaults(plan) cluster = _cluster_or_raise(plan, cluster_name) member_ids: list[str] = cluster["issue_ids"] + queue_order: list[str] = plan.get("queue_order", []) + is_manual = not cluster.get("auto") count = 0 now = utc_now() for fid in issue_ids: if fid not in member_ids: member_ids.append(fid) count += 1 + # Ensure manual cluster members are in the queue + if is_manual and fid not in queue_order: + queue_order.append(fid) _upsert_cluster_override( plan, fid, diff --git a/desloppify/engine/_plan/policy/stale.py b/desloppify/engine/_plan/policy/stale.py index 17f12f8d0..a6dd32b74 100644 --- a/desloppify/engine/_plan/policy/stale.py +++ b/desloppify/engine/_plan/policy/stale.py @@ -5,20 +5,36 @@ import hashlib from desloppify.base.config import DEFAULT_TARGET_STRICT_SCORE -from desloppify.engine._state.issue_semantics import is_triage_finding from desloppify.engine._state.schema import StateModel from desloppify.engine._work_queue.helpers import slugify from desloppify.engine.planning.scorecard_projection import all_subjective_entries def open_review_ids(state: StateModel) -> set[str]: - """Return IDs of open review/concerns issues from state.""" + """Return IDs of open review/concerns issues from state. + + With the unified pipeline, ``is_triage_finding`` now includes mechanical + defects. For staleness/snapshot purposes we still track only review-type + issues — mechanical changes use threshold-based staleness via + ``is_triage_stale``. + """ + from desloppify.engine._state.issue_semantics import is_review_work_item return { fid for fid, f in (state.get("work_items") or state.get("issues", {})).items() - if f.get("status") == "open" and is_triage_finding(f) + if f.get("status") == "open" and is_review_work_item(f) } +def open_mechanical_count(state: StateModel) -> int: + """Return the count of open mechanical defects from state.""" + from desloppify.engine._state.issue_semantics import is_objective_finding + return sum( + 1 + for f in (state.get("work_items") or state.get("issues", {})).values() + if f.get("status") == "open" and is_objective_finding(f) + ) + + def _subjective_entry_id( dimension_key: object, *, @@ -154,8 +170,18 @@ def compute_new_issue_ids(plan: dict, state: StateModel) -> set[str]: def is_triage_stale( plan: dict, state: StateModel, + *, + mechanical_growth_threshold: float = 0.10, ) -> bool: - """Return True when genuinely new review issues appeared since last triage. + """Return True when triage should be re-run. + + Stale when: + - ANY new review issues appeared since last triage, OR + - Mechanical defect count grew by more than *mechanical_growth_threshold* + (default 10%) since last triage. + + The threshold prevents trivial scan changes from forcing full re-triage + while still catching significant new mechanical findings. In-progress triage (confirmed stages + stage IDs in queue) is NOT considered stale — the lifecycle filter in the work queue already @@ -165,7 +191,23 @@ def is_triage_stale( triaged_ids = set(meta.get("triaged_ids", [])) active_ids = set(meta.get("active_triage_issue_ids", [])) known = triaged_ids | active_ids - return bool(open_review_ids(state) - known) + + # Any new review issue → stale + if open_review_ids(state) - known: + return True + + # Check mechanical growth threshold + last_mechanical = meta.get("last_mechanical_count", 0) + current_mechanical = open_mechanical_count(state) + if last_mechanical > 0: + growth = (current_mechanical - last_mechanical) / last_mechanical + if growth > mechanical_growth_threshold: + return True + elif current_mechanical > 0 and not known: + # First triage with mechanical issues + return True + + return False __all__ = [ @@ -174,6 +216,7 @@ def is_triage_stale( "current_under_target_ids", "current_unscored_ids", "is_triage_stale", + "open_mechanical_count", "open_review_ids", "review_issue_snapshot_hash", ] diff --git a/desloppify/engine/_plan/sync/workflow.py b/desloppify/engine/_plan/sync/workflow.py index d3894f299..480a5553a 100644 --- a/desloppify/engine/_plan/sync/workflow.py +++ b/desloppify/engine/_plan/sync/workflow.py @@ -303,6 +303,16 @@ def clear_score_communicated_sentinel(plan: PlanModel) -> None: plan.pop("previous_plan_start_scores", None) +def clear_create_plan_sentinel(plan: PlanModel) -> None: + """Clear the ``create_plan_resolved_this_cycle`` sentinel. + + Call this at the same cycle-boundary points as + ``clear_score_communicated_sentinel`` so that ``sync_create_plan_needed`` + can re-inject ``workflow::create-plan`` in the next cycle. + """ + plan.pop("create_plan_resolved_this_cycle", None) + + _EMPTY = QueueSyncResult @@ -343,6 +353,7 @@ def sync_create_plan_needed( - At least one objective issue exists - ``workflow::create-plan`` is not already in the queue - No triage stages are pending + - ``workflow::create-plan`` has not already been resolved this cycle Front-loads it into the workflow prefix so it stays ahead of triage. """ @@ -351,6 +362,11 @@ def sync_create_plan_needed( if WORKFLOW_CREATE_PLAN_ID in order: return _EMPTY() + # Already resolved this cycle — sentinel is set when injected and + # cleared at cycle boundaries (force-rescan, score seeding, queue + # drain, trusted import). + if plan.get("create_plan_resolved_this_cycle"): + return _EMPTY() if any(sid in order for sid in TRIAGE_IDS): return _EMPTY() if not _subjective_review_current_for_cycle(plan, state, policy=policy): @@ -359,6 +375,7 @@ def sync_create_plan_needed( if not has_objective_backlog(state, policy): return _EMPTY() + plan["create_plan_resolved_this_cycle"] = True return _inject(plan, WORKFLOW_CREATE_PLAN_ID) @@ -503,6 +520,7 @@ def _rebaseline_plan_start_scores( __all__ = [ "PendingImportScoresMeta", "ScoreSnapshot", + "clear_create_plan_sentinel", "clear_score_communicated_sentinel", "import_scores_meta_matches", "pending_import_scores_meta", diff --git a/desloppify/engine/_plan/triage/apply.py b/desloppify/engine/_plan/triage/apply.py index 0f8177878..2b2004dbc 100644 --- a/desloppify/engine/_plan/triage/apply.py +++ b/desloppify/engine/_plan/triage/apply.py @@ -17,7 +17,7 @@ from desloppify.engine._state.schema import StateModel, ensure_state_defaults, utc_now from .dismiss import dismiss_triage_issues -from .prompt import TriageResult +from .prompt import AutoClusterDecision, TriageResult @dataclass @@ -32,6 +32,9 @@ class TriageMutationResult: strategy_summary: str = "" triage_version: int = 0 dry_run: bool = False + auto_clusters_promoted: int = 0 + auto_clusters_skipped: int = 0 + auto_clusters_broken_up: int = 0 @property def clusters_created(self) -> int: @@ -189,6 +192,90 @@ def _set_triage_meta( } +def _apply_auto_cluster_decisions( + *, + plan: PlanModel, + decisions: list[AutoClusterDecision], + order: list[str], + now: str, + version: int, + result: TriageMutationResult, +) -> None: + """Process auto_cluster_decisions from the triage result. + + - promote: add cluster issue IDs to queue_order + - skip: mark the cluster as skipped in the plan + - break_up: record the decision for downstream processing + """ + clusters = plan["clusters"] + + for decision in decisions: + cluster_name = decision.cluster + cluster = clusters.get(cluster_name) + if cluster is None: + continue + + action = decision.action + + if action == "promote": + issue_ids = cluster.get("issue_ids", []) + existing_in_order = set(order) + new_ids = [ + fid for fid in issue_ids + if isinstance(fid, str) and fid not in existing_in_order + ] + # Determine insertion position based on priority hint + priority = (decision.priority or "").lower().strip() + if priority == "first": + for i, fid in enumerate(new_ids): + order.insert(i, fid) + elif priority.startswith("after "): + target = priority[len("after "):] + insert_idx = len(order) + for idx, item in enumerate(order): + if target in item: + insert_idx = idx + 1 + break + for i, fid in enumerate(new_ids): + order.insert(insert_idx + i, fid) + elif priority.startswith("before "): + target = priority[len("before "):] + insert_idx = len(order) + for idx, item in enumerate(order): + if target in item: + insert_idx = idx + break + for i, fid in enumerate(new_ids): + order.insert(insert_idx + i, fid) + else: + # "last" or unrecognized: append to end + order.extend(new_ids) + + cluster["execution_status"] = EXECUTION_STATUS_ACTIVE + cluster["updated_at"] = now + cluster["triage_version"] = version + result.auto_clusters_promoted += 1 + + elif action == "skip": + cluster["triage_skip"] = { + "reason": decision.reason, + "skipped_at": now, + "triage_version": version, + } + cluster["updated_at"] = now + result.auto_clusters_skipped += 1 + + elif action == "break_up": + cluster["triage_break_up"] = { + "reason": decision.reason, + "sub_clusters": decision.sub_clusters, + "decided_at": now, + "triage_version": version, + } + cluster["updated_at"] = now + result.auto_clusters_broken_up += 1 + + def apply_triage_to_plan( plan: PlanModel, state: StateModel, @@ -250,6 +337,17 @@ def apply_triage_to_plan( dismissed_ids=dismissed_ids, ) + # Process auto-cluster decisions (backward-compatible: no-op if empty) + if triage.auto_cluster_decisions: + _apply_auto_cluster_decisions( + plan=plan, + decisions=triage.auto_cluster_decisions, + order=order, + now=now, + version=version, + result=result, + ) + _set_triage_meta( plan=plan, state=state, diff --git a/desloppify/engine/_plan/triage/prompt.py b/desloppify/engine/_plan/triage/prompt.py index 0284df187..df186a4b0 100644 --- a/desloppify/engine/_plan/triage/prompt.py +++ b/desloppify/engine/_plan/triage/prompt.py @@ -106,6 +106,16 @@ class ContradictionNote: dismissed: str reason: str +@dataclass +class AutoClusterDecision: + """A triage decision for an auto-cluster.""" + + cluster: str + action: str # "promote", "skip", or "break_up" + reason: str = "" + priority: str = "" # e.g. "after dead-code-fixes", "last", "first" + sub_clusters: list[str] = field(default_factory=list) # for break_up action + @dataclass class TriageResult: """Parsed and validated LLM triage output.""" @@ -115,6 +125,7 @@ class TriageResult: dismissed_issues: list[DismissedIssue] = field(default_factory=list) contradiction_notes: list[ContradictionNote] = field(default_factory=list) priority_rationale: str = "" + auto_cluster_decisions: list[AutoClusterDecision] = field(default_factory=list) @property def clusters(self) -> list[dict]: @@ -159,15 +170,29 @@ def _recurring_dimensions( def _split_open_issue_buckets( issues: dict[str, dict], ) -> tuple[dict[str, dict], dict[str, dict]]: + """Split open issues into review (individual triage) and mechanical (cluster triage). + + With the unified pipeline, all defects are triage findings. Review issues + get individual treatment; mechanical defects flow through auto-cluster + summaries. + """ open_review: dict[str, dict] = {} open_mechanical: dict[str, dict] = {} for issue_id, issue in issues.items(): if issue.get("status") != "open": continue - if is_triage_finding(issue): + kind = issue.get("work_item_kind", issue.get("issue_kind", "")) + if kind in ("review_defect", "review_concern"): open_review[issue_id] = issue - continue - open_mechanical[issue_id] = issue + elif kind == "mechanical_defect": + open_mechanical[issue_id] = issue + elif is_triage_finding(issue): + # Fallback for items without explicit kind — infer from semantics + from desloppify.engine._state.issue_semantics import is_review_work_item + if is_review_work_item(issue): + open_review[issue_id] = issue + else: + open_mechanical[issue_id] = issue return open_review, open_mechanical @@ -251,10 +276,10 @@ def collect_triage_input(plan: PlanModel, state: StateModel) -> TriageInput: - `desloppify scan` — re-scan after making changes to verify progress - `desloppify show review --status open` — see all open review issues -Your output defines the active work plan for review findings and any explicitly -promoted backlog work. Mechanical backlog items you do not mention remain in -backlog by default. Dismissed issues will be removed from the queue with your -stated reason. +Your output defines the active work plan for all open defects. Review issues are +triaged individually; auto-clusters are triaged at the cluster level (promote/skip/break_up). +Every auto-cluster must have an explicit decision. Dismissed issues will be removed from +the queue with your stated reason. Respond with a single JSON object matching this schema: { @@ -285,7 +310,12 @@ def collect_triage_input(plan: PlanModel, state: StateModel) -> TriageInput: "contradiction_notes": [ {"kept": "issue_id", "dismissed": "issue_id", "reason": "why"} ], - "priority_rationale": "why the dependency_order is what it is" + "priority_rationale": "why the dependency_order is what it is", + "auto_cluster_decisions": [ + {"cluster": "auto/security", "action": "promote", "priority": "first", "reason": "high-value security fixes"}, + {"cluster": "auto/unused", "action": "skip", "reason": "mostly test assert noise"}, + {"cluster": "auto/test_coverage", "action": "break_up", "reason": "split by module", "sub_clusters": ["auto/test_coverage_api", "auto/test_coverage_core"]} + ] } """ @@ -456,14 +486,21 @@ def _append_mechanical_backlog_section( ) parts.append( - "## Mechanical backlog " + "## Auto-cluster candidates " f"({len(objective_backlog_issues)} items: {clustered_issue_count} in " f"{auto_cluster_count} auto-clusters, {len(unclustered)} unclustered)" ) parts.append( - "These detector-created items stay in backlog unless you explicitly promote them into the active queue." + "These are detector-created findings grouped by rule type. Each auto-cluster " + "is a first-class triage candidate — decide its fate just like review issues." + ) + parts.append( + "You MUST make an explicit decision for each auto-cluster listed below. " + "Include every auto-cluster in your `auto_cluster_decisions` output with one of: " + "promote (add to active queue with a priority position), " + "skip (with a specific reason — e.g. 'mostly false positives per sampling'), or " + "break_up (split into smaller sub-clusters with a reason)." ) - parts.append("Silence means leave the item or cluster in backlog.") rendered_clusters: list[tuple[str, dict, int]] = [] for name, cluster in auto_clusters.items(): @@ -479,10 +516,10 @@ def _append_mechanical_backlog_section( rendered_clusters.append((name, cluster, member_count)) if rendered_clusters: - parts.append("### Auto-clusters") + parts.append("### Auto-clusters (decision required for each)") parts.append( - "These are pre-grouped detector findings. Promote whole clusters with " - "`desloppify plan promote auto/`." + "Each cluster below includes a statistical summary with severity breakdown " + "and sample issues. Decide for each: promote, skip (with reason), or break_up." ) rendered_clusters.sort(key=lambda item: (-item[2], item[0])) visible_clusters = rendered_clusters[:15] @@ -492,6 +529,10 @@ def _append_mechanical_backlog_section( summary = _cluster_backlog_summary(name, cluster, member_count) parts.append(f"- {name} ({member_count} items){hint_suffix}") parts.append(f" {summary}") + # Statistical summary: severity/confidence breakdown + samples + stats = _cluster_stats(cluster, objective_backlog_issues) + if stats: + parts.append(f" {stats}") if len(rendered_clusters) > len(visible_clusters): remaining = rendered_clusters[len(visible_clusters):] remaining_issues = sum(item[2] for item in remaining) @@ -529,6 +570,57 @@ def _append_mechanical_backlog_section( parts.append("") +def _cluster_stats(cluster: dict[str, Any], all_issues: dict[str, dict]) -> str: + """Build a compact statistical summary for an auto-cluster.""" + issue_ids = cluster.get("issue_ids", []) + if not isinstance(issue_ids, list): + return "" + members = [ + all_issues[iid] for iid in issue_ids + if isinstance(iid, str) and iid in all_issues + ] + if not members: + return "" + + # Severity/confidence breakdown + from collections import Counter + severities: Counter[str] = Counter() + confidences: Counter[str] = Counter() + rules: Counter[str] = Counter() + for m in members: + detail = m.get("detail") or {} + if isinstance(detail, dict): + severities[detail.get("severity", "unknown")] += 1 + rules[detail.get("kind", detail.get("test_name", "unknown"))] += 1 + confidences[str(m.get("confidence", "medium"))] += 1 + + parts: list[str] = [] + if severities: + sev_str = ", ".join(f"{k}: {v}" for k, v in severities.most_common(3)) + parts.append(f"severity=[{sev_str}]") + if confidences: + conf_str = ", ".join(f"{k}: {v}" for k, v in confidences.most_common(3)) + parts.append(f"confidence=[{conf_str}]") + if rules: + top_rules = rules.most_common(5) + rule_str = ", ".join(f"{k}({v})" for k, v in top_rules) + if len(rules) > 5: + rule_str += f", +{len(rules) - 5} more" + parts.append(f"top_rules=[{rule_str}]") + + # Sample issues (3-5) + samples = members[:5] + sample_strs = [] + for s in samples: + f = s.get("file", "") + summary = str(s.get("summary", ""))[:80] + sample_strs.append(f"{f}: {summary}") + if sample_strs: + parts.append("samples: " + " | ".join(sample_strs)) + + return " ".join(parts) + + def _cluster_autofix_hint(cluster: dict[str, Any]) -> str: return cluster_autofix_hint(cluster) or "" @@ -589,6 +681,7 @@ def build_triage_prompt(si: TriageInput) -> str: __all__ = [ "_TRIAGE_SYSTEM_PROMPT", + "AutoClusterDecision", "ContradictionNote", "DismissedIssue", "TriageInput", diff --git a/desloppify/engine/_scoring/policy/core.py b/desloppify/engine/_scoring/policy/core.py index 8835a0b9f..b44ffceb4 100644 --- a/desloppify/engine/_scoring/policy/core.py +++ b/desloppify/engine/_scoring/policy/core.py @@ -41,7 +41,16 @@ class DetectorScoringPolicy: # Keep policy details that are independent of tier/dimension wiring. _FILE_BASED_POLICY_DETECTORS = frozenset( - {"smells", "dict_keys", "test_coverage", "security", "concerns", "review"} + { + "smells", + "dict_keys", + "test_coverage", + "security", + "concerns", + "review", + "nextjs", + "next_lint", + } ) _LOC_WEIGHT_POLICY_DETECTORS = frozenset({"test_coverage"}) _EXCLUDED_ZONE_OVERRIDES: dict[str, frozenset[str]] = { diff --git a/desloppify/engine/_scoring/subjective/core.py b/desloppify/engine/_scoring/subjective/core.py index ca7ed9c62..9df373d97 100644 --- a/desloppify/engine/_scoring/subjective/core.py +++ b/desloppify/engine/_scoring/subjective/core.py @@ -10,7 +10,7 @@ ) from desloppify.base.text_utils import is_numeric from desloppify.engine._scoring.policy.core import SUBJECTIVE_CHECKS -from desloppify.engine._state.issue_semantics import is_triage_finding +from desloppify.engine._state.issue_semantics import is_review_work_item def _display_fallback(dim_name: str) -> str: @@ -173,7 +173,7 @@ def _subjective_issue_count( return sum( 1 for issue in issues.values() - if is_triage_finding(issue) + if is_review_work_item(issue) and issue.get("status") in failure_set and _normalize_dimension_key(issue.get("detail", {}).get("dimension")) == dim_name ) diff --git a/desloppify/engine/_state/issue_semantics.py b/desloppify/engine/_state/issue_semantics.py index 1da50dba0..de832f280 100644 --- a/desloppify/engine/_state/issue_semantics.py +++ b/desloppify/engine/_state/issue_semantics.py @@ -166,7 +166,7 @@ def is_review_work_item(issue: Mapping[str, Any]) -> bool: def is_triage_finding(issue: Mapping[str, Any]) -> bool: - return is_review_work_item(issue) + return is_defect_work_item(issue) def is_assessment_request(issue: Mapping[str, Any]) -> bool: diff --git a/desloppify/engine/_work_queue/synthetic.py b/desloppify/engine/_work_queue/synthetic.py index e7997d7b8..7bfdd6d63 100644 --- a/desloppify/engine/_work_queue/synthetic.py +++ b/desloppify/engine/_work_queue/synthetic.py @@ -10,7 +10,7 @@ from desloppify.engine.plan_triage import TRIAGE_STAGE_SPECS from desloppify.engine._scoring.subjective.core import DISPLAY_NAMES -from desloppify.engine._state.issue_semantics import is_triage_finding +from desloppify.engine._state.issue_semantics import is_review_work_item, is_triage_finding from desloppify.engine._state.schema import StateModel from desloppify.engine._work_queue.helpers import ( detail_dict, @@ -221,11 +221,13 @@ def build_subjective_items( } # Review issues are keyed by raw dimension name (snake_case). + # Only review-type issues contribute to subjective dimension counts, + # not mechanical defects (even those with a dimension field). review_open_by_dim: dict[str, int] = {} for issue in issues.values(): if issue.get("status") != "open": continue - if is_triage_finding(issue): + if is_review_work_item(issue): dim_key = str(detail_dict(issue).get("dimension", "")).strip().lower() if dim_key: review_open_by_dim[dim_key] = review_open_by_dim.get(dim_key, 0) + 1 diff --git a/desloppify/engine/detectors/dupes.py b/desloppify/engine/detectors/dupes.py index 58edfc7b7..688fe3d76 100644 --- a/desloppify/engine/detectors/dupes.py +++ b/desloppify/engine/detectors/dupes.py @@ -17,6 +17,10 @@ PairKey: TypeAlias = tuple[str, str] MatchedPair: TypeAlias = tuple[int, int, float, str] +_DUPES_CACHE_VERSION = 1 +_DUPES_CACHE_MAX_NEAR_PAIRS = 20_000 +_DUPES_AUTOJUNK_MIN_LINES = 80 + class DuplicateMember(TypedDict): file: str @@ -34,6 +38,17 @@ class DuplicateEntry(TypedDict): cluster: list[DuplicateMember] +class _CachedFunctionMeta(TypedDict): + body_hash: str + loc: int + + +class _CachedNearPair(TypedDict): + a: str + b: str + similarity: float + + def _build_clusters( pairs: list[MatchedPair], n: int ) -> list[list[int]]: @@ -80,13 +95,146 @@ def _dupes_debug_settings() -> tuple[bool, int]: def _pair_key(fn_a: FunctionInfo, fn_b: FunctionInfo) -> PairKey: """Build a stable pair key for duplicate tracking.""" - def _identity(fn: FunctionInfo) -> str: - end_line = getattr(fn, "end_line", None) - if not isinstance(end_line, int): - end_line = int(getattr(fn, "line", 0)) + int(getattr(fn, "loc", 0)) - return f"{fn.file}:{fn.name}:{fn.line}:{end_line}" + return (_function_identity(fn_a), _function_identity(fn_b)) + + +def _function_identity(fn: FunctionInfo) -> str: + """Build a stable identity token for one function.""" + end_line = getattr(fn, "end_line", None) + if not isinstance(end_line, int): + end_line = int(getattr(fn, "line", 0)) + int(getattr(fn, "loc", 0)) + return f"{fn.file}:{fn.name}:{fn.line}:{end_line}" + + +def _build_function_cache_map( + functions: list[FunctionInfo], +) -> tuple[dict[str, _CachedFunctionMeta], dict[str, int]]: + """Build cache metadata and index map for function identities.""" + meta_by_id: dict[str, _CachedFunctionMeta] = {} + index_by_id: dict[str, int] = {} + for idx, fn in enumerate(functions): + func_id = _function_identity(fn) + meta_by_id[func_id] = { + "body_hash": fn.body_hash, + "loc": int(fn.loc), + } + index_by_id[func_id] = idx + return meta_by_id, index_by_id - return (_identity(fn_a), _identity(fn_b)) + +def _load_cached_near_pairs( + *, + cache: dict[str, object], + threshold: float, + functions: list[FunctionInfo], + function_meta: dict[str, _CachedFunctionMeta], + index_by_id: dict[str, int], + seen_pairs: set[PairKey], +) -> tuple[list[MatchedPair], set[str] | None]: + """Return reusable near-duplicate pairs and changed function identities. + + Returns ``([], None)`` when cache is missing/incompatible, signaling that + near-duplicate pass should run in full mode. + """ + if cache.get("version") != _DUPES_CACHE_VERSION: + return [], None + + cached_threshold = cache.get("threshold") + if not isinstance(cached_threshold, int | float): + return [], None + if float(cached_threshold) != float(threshold): + return [], None + + cached_functions = cache.get("functions") + cached_near_pairs = cache.get("near_pairs") + if not isinstance(cached_functions, dict) or not isinstance(cached_near_pairs, list): + return [], None + + changed_ids: set[str] = set() + for func_id, meta in function_meta.items(): + previous = cached_functions.get(func_id) + if not isinstance(previous, dict): + changed_ids.add(func_id) + continue + if previous.get("body_hash") != meta["body_hash"]: + changed_ids.add(func_id) + continue + prev_loc = previous.get("loc") + if not isinstance(prev_loc, int): + changed_ids.add(func_id) + continue + if prev_loc != meta["loc"]: + changed_ids.add(func_id) + + reusable_pairs: list[MatchedPair] = [] + for raw_pair in cached_near_pairs: + if not isinstance(raw_pair, dict): + continue + left_id = raw_pair.get("a") + right_id = raw_pair.get("b") + similarity = raw_pair.get("similarity") + if ( + not isinstance(left_id, str) + or not isinstance(right_id, str) + or not isinstance(similarity, int | float) + ): + continue + if left_id in changed_ids or right_id in changed_ids: + continue + left_idx = index_by_id.get(left_id) + right_idx = index_by_id.get(right_id) + if left_idx is None or right_idx is None or left_idx == right_idx: + continue + + left_fn = functions[left_idx] + right_fn = functions[right_idx] + if left_fn.body_hash == right_fn.body_hash: + continue + pair_key = _pair_key(left_fn, right_fn) + if pair_key in seen_pairs: + continue + seen_pairs.add(pair_key) + reusable_pairs.append((left_idx, right_idx, float(similarity), "near-duplicate")) + + return reusable_pairs, changed_ids + + +def _store_dupes_cache( + *, + cache: dict[str, object], + threshold: float, + functions: list[FunctionInfo], + function_meta: dict[str, _CachedFunctionMeta], + pairs: list[MatchedPair], +) -> None: + """Persist near-duplicate cache payload for reuse on next scan.""" + near_pairs: list[_CachedNearPair] = [] + for left_idx, right_idx, similarity, kind in pairs: + if kind != "near-duplicate": + continue + left_id = _function_identity(functions[left_idx]) + right_id = _function_identity(functions[right_idx]) + near_pairs.append( + { + "a": left_id, + "b": right_id, + "similarity": round(float(similarity), 6), + } + ) + + near_pairs.sort(key=lambda pair: (-pair["similarity"], pair["a"], pair["b"])) + if len(near_pairs) > _DUPES_CACHE_MAX_NEAR_PAIRS: + near_pairs = near_pairs[:_DUPES_CACHE_MAX_NEAR_PAIRS] + + cache.clear() + cache.update( + { + "version": _DUPES_CACHE_VERSION, + "threshold": float(threshold), + "functions": function_meta, + "near_pairs": near_pairs, + } + ) def _collect_exact_duplicate_pairs( @@ -119,10 +267,13 @@ def _collect_near_duplicate_pairs( threshold: float, *, seen_pairs: set[PairKey], + active_indices: set[int] | None, debug: bool, debug_every: int, ) -> list[MatchedPair]: """Collect near-duplicate pairs using SequenceMatcher with pruning.""" + if active_indices is not None and not active_indices: + return [] large_idx = [(idx, fn) for idx, fn in enumerate(functions) if fn.loc >= 15] large_idx.sort(key=lambda item: item[1].loc) normalized_lines = [fn.normalized.splitlines() for fn in functions] @@ -152,6 +303,9 @@ def _collect_near_duplicate_pairs( pair_key = _pair_key(fn_a, fn_b) if pair_key in seen_pairs or fn_a.body_hash == fn_b.body_hash: continue + if active_indices is not None: + if idx_a not in active_indices and idx_b not in active_indices: + continue # ratio = 2*M/(len_a+len_b), with M <= min(len_a, len_b) len_a = normalized_line_counts[idx_a] @@ -168,7 +322,7 @@ def _collect_near_duplicate_pairs( None, normalized_lines[idx_a], normalized_lines[idx_b], - autojunk=False, + autojunk=len_a >= _DUPES_AUTOJUNK_MIN_LINES and len_b >= _DUPES_AUTOJUNK_MIN_LINES, ) if matcher.real_quick_ratio() < threshold: continue @@ -261,6 +415,8 @@ def _build_duplicate_entries( def detect_duplicates( functions: list[FunctionInfo], threshold: float = 0.9, + *, + cache: dict[str, object] | None = None, ) -> tuple[list[DuplicateEntry], int]: """Find duplicate or near-duplicate functions clustered by similarity.""" if not functions: @@ -269,11 +425,31 @@ def detect_duplicates( seen_pairs: set[PairKey] = set() pairs = _collect_exact_duplicate_pairs(functions, seen_pairs) + function_meta, index_by_id = _build_function_cache_map(functions) + active_indices: set[int] | None = None + if isinstance(cache, dict): + cached_pairs, changed_ids = _load_cached_near_pairs( + cache=cache, + threshold=threshold, + functions=functions, + function_meta=function_meta, + index_by_id=index_by_id, + seen_pairs=seen_pairs, + ) + pairs.extend(cached_pairs) + if changed_ids is not None: + active_indices = { + index_by_id[func_id] + for func_id in changed_ids + if func_id in index_by_id + } + pairs.extend( _collect_near_duplicate_pairs( functions, threshold, seen_pairs=seen_pairs, + active_indices=active_indices, debug=debug, debug_every=debug_every, ) @@ -281,6 +457,14 @@ def detect_duplicates( clusters = _build_clusters(pairs, len(functions)) entries = _build_duplicate_entries(functions, pairs, clusters) + if isinstance(cache, dict): + _store_dupes_cache( + cache=cache, + threshold=threshold, + functions=functions, + function_meta=function_meta, + pairs=pairs, + ) return sorted(entries, key=lambda e: (-e["similarity"], -e["cluster_size"])), len( functions ) diff --git a/desloppify/engine/planning/scan.py b/desloppify/engine/planning/scan.py index e00fbaae7..656a0cf22 100644 --- a/desloppify/engine/planning/scan.py +++ b/desloppify/engine/planning/scan.py @@ -12,6 +12,7 @@ from desloppify.engine.planning.helpers import is_subjective_phase from desloppify.engine.policy.zones import ZONE_POLICIES, FileZoneMap from desloppify.languages.framework import ( + clear_review_phase_prefetch, DetectorPhase, LangConfig, LangRun, @@ -20,6 +21,7 @@ capability_report, get_lang, make_lang_run, + prewarm_review_phase_detectors, ) from desloppify.state_io import Issue @@ -130,7 +132,11 @@ def _generate_issues_from_lang( """Run detector phases from a LangRun.""" _build_zone_map(path, lang, zone_overrides) phases = _select_phases(lang, include_slow=include_slow, profile=profile) - issues, all_potentials = _run_phases(path, lang, phases) + prewarm_review_phase_detectors(path, lang, phases) + try: + issues, all_potentials = _run_phases(path, lang, phases) + finally: + clear_review_phase_prefetch(lang) _stamp_issue_context(issues, lang) _stderr(f"\n Total: {len(issues)} issues") return issues, all_potentials diff --git a/desloppify/intelligence/review/context_holistic/budget/scan.py b/desloppify/intelligence/review/context_holistic/budget/scan.py index f0f55956c..33a1dafbd 100644 --- a/desloppify/intelligence/review/context_holistic/budget/scan.py +++ b/desloppify/intelligence/review/context_holistic/budget/scan.py @@ -9,7 +9,6 @@ from pathlib import Path from desloppify.base.discovery.file_paths import rel -from desloppify.intelligence.review.context import file_excerpt from .analysis import _count_signature_params, _extract_type_names from .axes import _assemble_context, _compute_sub_axes @@ -55,6 +54,14 @@ ) +def _excerpt_from_content(content: str, *, max_lines: int = 30) -> str: + """Return a short leading excerpt directly from in-memory file content.""" + lines = content.splitlines(keepends=True) + if len(lines) <= max_lines: + return content + return "".join(lines[:max_lines]) + f"\n... ({len(lines) - max_lines} more lines)" + + @dataclasses.dataclass class _AbstractionsCollector: """Accumulated state for the abstractions scan pass.""" @@ -90,7 +97,7 @@ def _scan_file( basename = Path(rpath).stem.lower() if basename in {"utils", "helpers", "util", "helper", "common", "misc"}: col.util_files.append( - {"file": rpath, "loc": loc, "excerpt": file_excerpt(filepath) or ""} + {"file": rpath, "loc": loc, "excerpt": _excerpt_from_content(content)} ) signatures = _DEF_SIGNATURE_RE.findall(content) diff --git a/desloppify/intelligence/review/prepare_holistic_orchestration.py b/desloppify/intelligence/review/prepare_holistic_orchestration.py index 35c8ffb44..18e7120ba 100644 --- a/desloppify/intelligence/review/prepare_holistic_orchestration.py +++ b/desloppify/intelligence/review/prepare_holistic_orchestration.py @@ -217,12 +217,52 @@ def prepare_holistic_review_payload( continue batch_dims = batch_item.get("dimensions", []) if isinstance(batch_dims, list): - batch_item["dimension_contexts"] = { - d: dim_contexts[d] for d in batch_dims if d in dim_contexts - } + compact_contexts = _compact_batch_dimension_contexts( + dimensions=batch_dims, + all_contexts=dim_contexts, + ) + if compact_contexts: + batch_item["dimension_contexts"] = compact_contexts payload["investigation_batches"] = batches return payload +def _compact_batch_dimension_contexts( + *, + dimensions: list[str], + all_contexts: dict[str, Any], +) -> dict[str, dict[str, list[dict[str, object]]]]: + """Attach a prompt-facing slice of dimension contexts for each batch. + + Batch prompts only need insight headers and settled/positive flags. Keeping + this payload compact avoids duplicating full insight descriptions across all + batches while preserving packet-level full context in payload["dimension_contexts"]. + """ + compact: dict[str, dict[str, list[dict[str, object]]]] = {} + for dimension in dimensions: + raw_context = all_contexts.get(dimension) + if not isinstance(raw_context, dict): + continue + raw_insights = raw_context.get("insights") + if not isinstance(raw_insights, list): + continue + insights: list[dict[str, object]] = [] + for item in raw_insights: + if not isinstance(item, dict): + continue + header = str(item.get("header", "")).strip() + if not header: + continue + insight: dict[str, object] = {"header": header} + if bool(item.get("settled", False)): + insight["settled"] = True + if bool(item.get("positive", False)): + insight["positive"] = True + insights.append(insight) + if insights: + compact[dimension] = {"insights": insights} + return compact + + __all__ = ["HolisticPrepareDependencies", "prepare_holistic_review_payload"] diff --git a/desloppify/languages/_framework/base/shared_phases_review.py b/desloppify/languages/_framework/base/shared_phases_review.py index a77419d35..1af09e747 100644 --- a/desloppify/languages/_framework/base/shared_phases_review.py +++ b/desloppify/languages/_framework/base/shared_phases_review.py @@ -2,8 +2,15 @@ from __future__ import annotations +import concurrent.futures +import hashlib +import logging +import os from collections.abc import Callable from pathlib import Path +from typing import Any + +logger = logging.getLogger(__name__) from desloppify.base.discovery.file_paths import rel from desloppify.base.output.terminal import log @@ -15,11 +22,17 @@ from desloppify.engine.detectors.test_coverage.detector import detect_test_coverage from desloppify.engine._state.filtering import make_issue from desloppify.engine.policy.zones import EXCLUDED_ZONES, filter_entries -from desloppify.languages._framework.base.types import DetectorEntry, LangRuntimeContract +from desloppify.languages._framework.base.types import ( + DetectorCoverageStatus, + DetectorEntry, + LangRuntimeContract, + LangSecurityResult, +) from desloppify.languages._framework.issue_factories import make_dupe_issues from desloppify.state_io import Issue from .shared_phases_helpers import ( + _coverage_to_dict, _entries_to_issues, _filter_boilerplate_entries_by_zone, _find_external_test_files, @@ -31,10 +44,367 @@ # security detector symbol from this module. detect_security_issues = _detect_security_issues_default +_DETECTOR_CACHE_VERSION = 1 +_PREFETCH_ATTR = "_shared_review_prefetch_futures" +_FUNCTION_CACHE_ATTR = "_shared_review_function_cache" +_PREFETCH_BOILERPLATE_KEY = "boilerplate" +_PREFETCH_SECURITY_KEY = "security_lang" +_PREFETCH_EXECUTOR = concurrent.futures.ThreadPoolExecutor(max_workers=2) + + +def _detector_cache(review_cache: object, detector: str) -> dict[str, object] | None: + """Return mutable detector cache payload from review cache.""" + if not isinstance(review_cache, dict): + return None + detectors = review_cache.get("detectors") + if not isinstance(detectors, dict): + detectors = {} + review_cache["detectors"] = detectors + payload = detectors.get(detector) + if not isinstance(payload, dict): + payload = {} + detectors[detector] = payload + return payload + + +def _dupes_cache(review_cache: object) -> dict[str, object] | None: + return _detector_cache(review_cache, "dupes") + + +def _boilerplate_cache(review_cache: object) -> dict[str, object] | None: + return _detector_cache(review_cache, "boilerplate") + + +def _security_cache(review_cache: object) -> dict[str, object] | None: + return _detector_cache(review_cache, "security") + + +def _get_prefetch_futures( + lang: object, + *, + create: bool, +) -> dict[str, concurrent.futures.Future[Any]]: + """Read/write in-memory review prefetch futures attached to LangRun.""" + payload = getattr(lang, _PREFETCH_ATTR, None) + if isinstance(payload, dict): + # Filter to only valid str->Future entries; rebuild only when needed. + bad_keys = [ + k for k, v in payload.items() + if not isinstance(k, str) or not isinstance(v, concurrent.futures.Future) + ] + if bad_keys: + for k in bad_keys: + payload.pop(k, None) + return payload + if not create: + return {} + initialized: dict[str, concurrent.futures.Future[Any]] = {} + setattr(lang, _PREFETCH_ATTR, initialized) + return initialized + + +def _pop_prefetch_future( + lang: object, + key: str, +) -> concurrent.futures.Future[Any] | None: + """Detach and return one prefetch future.""" + futures = _get_prefetch_futures(lang, create=False) + future = futures.pop(key, None) + if not futures: + try: + delattr(lang, _PREFETCH_ATTR) + except AttributeError: + pass + if isinstance(future, concurrent.futures.Future): + return future + return None + + +def _consume_prefetch_result( + lang: object, + key: str, +) -> object | None: + """Return completed prefetch result, swallowing async failures.""" + future = _pop_prefetch_future(lang, key) + if future is None: + return None + try: + return future.result() + except Exception: + logger.debug("prefetch %s failed, falling back to synchronous run", key, exc_info=True) + return None + + +def _has_phase( + phases: list[object], + *, + labels: set[str], + run_names: set[str], +) -> bool: + for phase in phases: + label = str(getattr(phase, "label", "")).strip().lower() + run = getattr(phase, "run", None) + run_name = str(getattr(run, "__name__", "")).strip().lower() + run_func = getattr(run, "func", None) + run_func_name = str(getattr(run_func, "__name__", "")).strip().lower() + if label in labels or run_name in run_names or run_func_name in run_names: + return True + return False + + +def _resolve_review_functions(path: Path, lang: LangRuntimeContract): + """Resolve language function extraction once per scan path.""" + cache = getattr(lang, _FUNCTION_CACHE_ATTR, None) + if not isinstance(cache, dict): + cache = {} + setattr(lang, _FUNCTION_CACHE_ATTR, cache) + cache_key = str(path.resolve()) + cached = cache.get(cache_key) + if isinstance(cached, list): + return cached + extracted = lang.extract_functions(path) + cache[cache_key] = extracted + return extracted + + +def _resolve_detector_files(path: Path, lang: LangRuntimeContract) -> list[str]: + """Resolve a detector file list for cache fingerprinting.""" + zone_map = getattr(lang, "zone_map", None) + if zone_map is not None and hasattr(zone_map, "all_files"): + zone_files = zone_map.all_files() + if isinstance(zone_files, list): + return zone_files + file_finder = getattr(lang, "file_finder", None) + if file_finder: + return file_finder(path) + return [] + + +def _resolve_detector_file_path(scan_root: Path, filepath: str) -> Path: + """Resolve a detector file path against the active scan root.""" + file_path = Path(filepath) + if file_path.is_absolute(): + return file_path + return (scan_root / file_path).resolve() + + +def _file_fingerprint( + *, + scan_root: Path, + files: list[str], + zone_map=None, + include_zone: bool = False, + salt: str = "", +) -> str: + """Build a stable file-signature hash from path + mtime + size + zone.""" + hasher = hashlib.blake2b(digest_size=20) + hasher.update(str(scan_root.resolve()).encode("utf-8", errors="replace")) + hasher.update(b"\0") + hasher.update(salt.encode("utf-8", errors="replace")) + hasher.update(b"\0") + for filepath in sorted({str(item) for item in files}): + resolved = _resolve_detector_file_path(scan_root, filepath) + normalized = filepath.replace("\\", "/") + hasher.update(normalized.encode("utf-8", errors="replace")) + hasher.update(b"\0") + try: + stats = os.stat(resolved) + hasher.update(str(stats.st_size).encode("ascii", errors="ignore")) + hasher.update(b"\0") + hasher.update(str(stats.st_mtime_ns).encode("ascii", errors="ignore")) + hasher.update(b"\0") + except OSError: + hasher.update(b"-1\0-1\0") + if include_zone and zone_map is not None: + zone = zone_map.get(filepath) + zone_value = getattr(zone, "value", zone) + hasher.update(str(zone_value or "").encode("utf-8", errors="replace")) + hasher.update(b"\0") + return hasher.hexdigest() + + +def _load_cached_boilerplate_entries( + cache: dict[str, object], + *, + fingerprint: str, +) -> list[dict] | None: + """Load cached boilerplate entries when fingerprint is unchanged.""" + if cache.get("version") != _DETECTOR_CACHE_VERSION: + return None + if cache.get("fingerprint") != fingerprint: + return None + entries = cache.get("entries") + if not isinstance(entries, list): + return None + return [entry for entry in entries if isinstance(entry, dict)] + + +def _store_cached_boilerplate_entries( + cache: dict[str, object], + *, + fingerprint: str, + entries: list[dict], +) -> None: + """Persist boilerplate detector entries for unchanged scans.""" + cache.clear() + cache.update( + { + "version": _DETECTOR_CACHE_VERSION, + "fingerprint": fingerprint, + "entries": [entry for entry in entries if isinstance(entry, dict)], + } + ) + + +def _coverage_from_record(payload: object) -> DetectorCoverageStatus | None: + """Rebuild coverage dataclass from serialized cache payload.""" + if not isinstance(payload, dict): + return None + detector = str(payload.get("detector", "")).strip() + status = str(payload.get("status", "")).strip() + if not detector or status not in {"full", "reduced"}: + return None + confidence_raw = payload.get("confidence", 1.0) + try: + confidence = float(confidence_raw) + except (TypeError, ValueError): + confidence = 1.0 + return DetectorCoverageStatus( + detector=detector, + status=status, + confidence=confidence, + summary=str(payload.get("summary", "") or ""), + impact=str(payload.get("impact", "") or ""), + remediation=str(payload.get("remediation", "") or ""), + tool=str(payload.get("tool", "") or ""), + reason=str(payload.get("reason", "") or ""), + ) + + +def _load_cached_security_result( + cache: dict[str, object], + *, + fingerprint: str, +) -> LangSecurityResult | None: + """Load cached language-specific security result when unchanged.""" + if cache.get("version") != _DETECTOR_CACHE_VERSION: + return None + if cache.get("fingerprint") != fingerprint: + return None + entries = cache.get("entries") + files_scanned = cache.get("files_scanned") + if not isinstance(entries, list) or not isinstance(files_scanned, int): + return None + normalized_entries = [entry for entry in entries if isinstance(entry, dict)] + return LangSecurityResult( + entries=normalized_entries, + files_scanned=max(0, files_scanned), + coverage=_coverage_from_record(cache.get("coverage")), + ) + + +def _store_cached_security_result( + cache: dict[str, object], + *, + fingerprint: str, + result: LangSecurityResult, +) -> None: + """Persist language-specific security results for unchanged scans.""" + cache.clear() + cache.update( + { + "version": _DETECTOR_CACHE_VERSION, + "fingerprint": fingerprint, + "entries": [entry for entry in result.entries if isinstance(entry, dict)], + "files_scanned": max(0, int(result.files_scanned)), + "coverage": ( + _coverage_to_dict(result.coverage) if result.coverage is not None else None + ), + } + ) + + +def prewarm_review_phase_detectors( + path: Path, + lang: LangRuntimeContract, + phases: list[object], +) -> None: + """Start expensive shared review detectors in background for overlap.""" + futures = _get_prefetch_futures(lang, create=True) + + if _has_phase( + phases, + labels={"boilerplate duplication"}, + run_names={"phase_boilerplate_duplication"}, + ): + boilerplate_cache = _boilerplate_cache(getattr(lang, "review_cache", None)) + detector_files = _resolve_detector_files(path, lang) + fingerprint = _file_fingerprint( + scan_root=path, + files=detector_files, + salt=f"boilerplate:{getattr(lang, 'name', '')}", + ) + cached_entries = ( + _load_cached_boilerplate_entries(boilerplate_cache, fingerprint=fingerprint) + if isinstance(boilerplate_cache, dict) + else None + ) + if cached_entries is None and _PREFETCH_BOILERPLATE_KEY not in futures: + futures[_PREFETCH_BOILERPLATE_KEY] = _PREFETCH_EXECUTOR.submit( + detect_with_jscpd, + path, + ) + + if _has_phase( + phases, + labels={"security"}, + run_names={"phase_security"}, + ): + file_finder = getattr(lang, "file_finder", None) + files = file_finder(path) if file_finder else [] + zone_map = getattr(lang, "zone_map", None) + security_cache = _security_cache(getattr(lang, "review_cache", None)) + fingerprint = _file_fingerprint( + scan_root=path, + files=files, + zone_map=zone_map, + include_zone=True, + salt=f"security:{getattr(lang, 'name', '')}", + ) + cached_result = ( + _load_cached_security_result(security_cache, fingerprint=fingerprint) + if isinstance(security_cache, dict) + else None + ) + if cached_result is None and _PREFETCH_SECURITY_KEY not in futures: + futures[_PREFETCH_SECURITY_KEY] = _PREFETCH_EXECUTOR.submit( + lang.detect_lang_security_detailed, + files, + zone_map, + ) + + +def clear_review_phase_prefetch(lang: object) -> None: + """Drop in-memory prefetch futures and function caches after scan run.""" + futures = _get_prefetch_futures(lang, create=False) + for future in futures.values(): + if isinstance(future, concurrent.futures.Future) and not future.done(): + future.cancel() + if hasattr(lang, _PREFETCH_ATTR): + try: + delattr(lang, _PREFETCH_ATTR) + except AttributeError: + pass + if hasattr(lang, _FUNCTION_CACHE_ATTR): + try: + delattr(lang, _FUNCTION_CACHE_ATTR) + except AttributeError: + pass + def phase_dupes(path: Path, lang: LangRuntimeContract) -> tuple[list[Issue], dict[str, int]]: """Shared phase runner: detect duplicate functions via lang.extract_functions.""" - functions = lang.extract_functions(path) + functions = _resolve_review_functions(path, lang) if lang.zone_map is not None: before = len(functions) @@ -47,7 +417,10 @@ def phase_dupes(path: Path, lang: LangRuntimeContract) -> tuple[list[Issue], dic if excluded: log(f" zones: {excluded} functions excluded (non-production)") - entries, total_functions = detect_duplicates(functions) + entries, total_functions = detect_duplicates( + functions, + cache=_dupes_cache(getattr(lang, "review_cache", None)), + ) issues = make_dupe_issues(entries, log) return issues, {"dupes": total_functions} @@ -57,7 +430,29 @@ def phase_boilerplate_duplication( lang: LangRuntimeContract, ) -> tuple[list[Issue], dict[str, int]]: """Shared phase runner: detect repeated boilerplate code via jscpd.""" - entries = detect_with_jscpd(path) + cache = _boilerplate_cache(getattr(lang, "review_cache", None)) + detector_files = _resolve_detector_files(path, lang) + fingerprint = _file_fingerprint( + scan_root=path, + files=detector_files, + salt=f"boilerplate:{getattr(lang, 'name', '')}", + ) + entries = ( + _load_cached_boilerplate_entries(cache, fingerprint=fingerprint) + if isinstance(cache, dict) + else None + ) + if entries is None: + prefetched = _consume_prefetch_result(lang, _PREFETCH_BOILERPLATE_KEY) + entries = prefetched if isinstance(prefetched, list) else None + if entries is None: + entries = detect_with_jscpd(path) + if isinstance(cache, dict) and entries is not None: + _store_cached_boilerplate_entries( + cache, + fingerprint=fingerprint, + entries=entries, + ) if entries is None: return [], {} entries = _filter_boilerplate_entries_by_zone(entries, lang.zone_map) @@ -116,7 +511,33 @@ def phase_security( ) lang_scanned = 0 - lang_result = lang.detect_lang_security_detailed(files, zone_map) + security_cache = _security_cache(getattr(lang, "review_cache", None)) + security_fingerprint = _file_fingerprint( + scan_root=path, + files=files, + zone_map=zone_map, + include_zone=True, + salt=f"security:{getattr(lang, 'name', '')}", + ) + lang_result = ( + _load_cached_security_result( + security_cache, + fingerprint=security_fingerprint, + ) + if isinstance(security_cache, dict) + else None + ) + if lang_result is None: + prefetched = _consume_prefetch_result(lang, _PREFETCH_SECURITY_KEY) + lang_result = prefetched if isinstance(prefetched, LangSecurityResult) else None + if lang_result is None: + lang_result = lang.detect_lang_security_detailed(files, zone_map) + if isinstance(security_cache, dict): + _store_cached_security_result( + security_cache, + fingerprint=security_fingerprint, + result=lang_result, + ) lang_entries = lang_result.entries lang_scanned = max(0, int(lang_result.files_scanned)) _record_detector_coverage(lang, lang_result.coverage) @@ -255,7 +676,7 @@ def phase_signature(path: Path, lang: LangRuntimeContract) -> tuple[list[Issue], """Shared phase runner: detect signature variance via lang.extract_functions.""" from desloppify.engine.detectors.signature import detect_signature_variance - functions = lang.extract_functions(path) + functions = _resolve_review_functions(path, lang) issues: list[Issue] = [] potentials: dict[str, int] = {} @@ -286,9 +707,11 @@ def phase_signature(path: Path, lang: LangRuntimeContract) -> tuple[list[Issue], __all__ = [ + "clear_review_phase_prefetch", "phase_boilerplate_duplication", "phase_dupes", "phase_private_imports", + "prewarm_review_phase_detectors", "phase_security", "phase_signature", "phase_subjective_review", diff --git a/desloppify/languages/_framework/frameworks/__init__.py b/desloppify/languages/_framework/frameworks/__init__.py new file mode 100644 index 000000000..fa703ff71 --- /dev/null +++ b/desloppify/languages/_framework/frameworks/__init__.py @@ -0,0 +1,41 @@ +"""Framework horizontal layer (spec-driven, like tree-sitter/tool specs). + +Framework support is intentionally spec-driven so it can be enabled from both +deep language plugins (LangConfig classes) and shallow generic_lang plugins. + +Public entrypoints: +- framework_phases(lang_name): build DetectorPhase objects +- detect_ecosystem_frameworks(scan_path, lang, ecosystem): framework presence + evidence +""" + +from __future__ import annotations + +from .detection import detect_ecosystem_frameworks +from .phases import framework_phases +from .registry import ( + FRAMEWORK_SPECS, + get_framework_spec, + list_framework_specs, + register_framework_spec, +) +from .types import ( + DetectionConfig, + EcosystemFrameworkDetection, + FrameworkSpec, + ScannerRule, + ToolIntegration, +) + +__all__ = [ + "DetectionConfig", + "EcosystemFrameworkDetection", + "FRAMEWORK_SPECS", + "FrameworkSpec", + "ScannerRule", + "ToolIntegration", + "detect_ecosystem_frameworks", + "framework_phases", + "get_framework_spec", + "list_framework_specs", + "register_framework_spec", +] diff --git a/desloppify/languages/_framework/frameworks/detection.py b/desloppify/languages/_framework/frameworks/detection.py new file mode 100644 index 000000000..1f9877199 --- /dev/null +++ b/desloppify/languages/_framework/frameworks/detection.py @@ -0,0 +1,212 @@ +"""Ecosystem-specific framework presence detection (deterministic, evidence-based).""" + +from __future__ import annotations + +import json +import re +from pathlib import Path +from typing import Any + +from desloppify.base.discovery.paths import get_project_root +from desloppify.languages._framework.base.types import LangRuntimeContract + +from .registry import ensure_builtin_specs_loaded, list_framework_specs +from .types import DetectionConfig, EcosystemFrameworkDetection, FrameworkEvidence + +_CACHE_PREFIX = "frameworks.ecosystem.present" + + +def _find_nearest_package_json(scan_path: Path, project_root: Path) -> Path | None: + resolved = scan_path if scan_path.is_absolute() else (project_root / scan_path) + resolved = resolved.resolve() + if resolved.is_file(): + resolved = resolved.parent + + # If scan_path is inside runtime project root, cap traversal there. + # Otherwise (e.g. --path /tmp/other-repo), traverse from scan_path upward. + limit_to_project_root = False + try: + resolved.relative_to(project_root) + limit_to_project_root = True + except ValueError: + limit_to_project_root = False + + cur = resolved + while True: + candidate = cur / "package.json" + if candidate.is_file(): + return candidate + if (limit_to_project_root and cur == project_root) or cur.parent == cur: + break + cur = cur.parent + + # Fallback only when no package.json exists in the scanned tree. + candidate = project_root / "package.json" + return candidate if candidate.is_file() else None + + +def _read_package_json(package_json: Path) -> dict[str, Any]: + try: + payload = json.loads(package_json.read_text()) + except (OSError, UnicodeDecodeError, json.JSONDecodeError): + return {} + return payload if isinstance(payload, dict) else {} + + +def _dep_set(payload: dict[str, Any], key: str) -> set[str]: + deps = payload.get(key) + if not isinstance(deps, dict): + return set() + return {str(k) for k in deps.keys()} + + +def _script_values(payload: dict[str, Any]) -> list[str]: + scripts = payload.get("scripts") + if not isinstance(scripts, dict): + return [] + return [v for v in scripts.values() if isinstance(v, str)] + + +def _existing_relpaths( + package_root: Path, + project_root: Path, + candidates: tuple[str, ...], + *, + kind: str, +) -> tuple[str, ...]: + hits: list[str] = [] + for relpath in candidates: + path = (package_root / relpath).resolve() + ok = path.is_dir() if kind == "dir" else path.is_file() + if not ok: + continue + try: + hits.append(path.relative_to(project_root).as_posix()) + except ValueError: + hits.append(path.as_posix()) + return tuple(hits) + + +def _node_framework_evidence( + *, + cfg: DetectionConfig, + package_root: Path, + project_root: Path, + deps: set[str], + dev_deps: set[str], + scripts: list[str], +) -> tuple[bool, FrameworkEvidence]: + dep_hits = tuple(sorted(set(cfg.dependencies).intersection(deps))) + dev_dep_hits = tuple(sorted(set(cfg.dev_dependencies).intersection(dev_deps))) + config_hits = _existing_relpaths(package_root, project_root, cfg.config_files, kind="file") + marker_file_hits = _existing_relpaths(package_root, project_root, cfg.marker_files, kind="file") + marker_dir_hits = _existing_relpaths(package_root, project_root, cfg.marker_dirs, kind="dir") + + script_hits: list[str] = [] + if scripts and cfg.script_pattern: + pat = re.compile(cfg.script_pattern) + script_hits = [s for s in scripts if pat.search(s)] + + # Presence is deterministic: deps/config/scripts imply presence. Marker dirs are context by default. + present = bool(dep_hits or dev_dep_hits or config_hits or marker_file_hits or script_hits) + if cfg.marker_dirs_imply_presence and marker_dir_hits: + present = True + + evidence: FrameworkEvidence = { + "dep_hits": list(dep_hits), + "dev_dep_hits": list(dev_dep_hits), + "config_hits": list(config_hits), + "marker_file_hits": list(marker_file_hits), + "marker_dir_hits": list(marker_dir_hits), + "script_hits": script_hits[:5], + } + return present, evidence + + +def detect_ecosystem_frameworks( + scan_path: Path, + lang: LangRuntimeContract | None, + ecosystem: str, +) -> EcosystemFrameworkDetection: + """Detect framework presence for an ecosystem and scan path (cached per run).""" + ensure_builtin_specs_loaded() + eco = str(ecosystem or "").strip().lower() + resolved_scan_path = Path(scan_path).resolve() + cache_key = f"{_CACHE_PREFIX}:{eco}:{resolved_scan_path.as_posix()}" + + if lang is not None: + cache = getattr(lang, "review_cache", None) + if isinstance(cache, dict): + cached = cache.get(cache_key) + if isinstance(cached, EcosystemFrameworkDetection): + return cached + + project_root = get_project_root() + + if eco != "node": + result = EcosystemFrameworkDetection( + ecosystem=eco, + package_root=project_root, + package_json_relpath=None, + present={}, + ) + if lang is not None and isinstance(getattr(lang, "review_cache", None), dict): + lang.review_cache[cache_key] = result + return result + + package_json = _find_nearest_package_json(resolved_scan_path, project_root) + package_root = (package_json.parent if package_json else project_root).resolve() + payload = _read_package_json(package_json) if package_json else {} + + deps = _dep_set(payload, "dependencies") | _dep_set(payload, "peerDependencies") | _dep_set( + payload, "optionalDependencies" + ) + dev_deps = _dep_set(payload, "devDependencies") + scripts = _script_values(payload) + + specs = list_framework_specs(ecosystem=eco) + present: dict[str, FrameworkEvidence] = {} + for framework_id, spec in specs.items(): + ok, evidence = _node_framework_evidence( + cfg=spec.detection, + package_root=package_root, + project_root=project_root, + deps=deps, + dev_deps=dev_deps, + scripts=scripts, + ) + if ok: + present[framework_id] = evidence + + # Apply mutual exclusions deterministically: present frameworks can suppress others. + present_ids = set(present.keys()) + for framework_id, spec in specs.items(): + if framework_id not in present_ids: + continue + for excluded in spec.excludes: + present.pop(str(excluded), None) + + result = EcosystemFrameworkDetection( + ecosystem=eco, + package_root=package_root, + package_json_relpath=( + ( + package_json.relative_to(project_root).as_posix() + if package_json and package_json.is_relative_to(project_root) + else package_json.as_posix() + ) + if package_json + else None + ), + present=present, + ) + + if lang is not None: + cache = getattr(lang, "review_cache", None) + if isinstance(cache, dict): + cache[cache_key] = result + + return result + + +__all__ = ["detect_ecosystem_frameworks"] diff --git a/desloppify/languages/_framework/frameworks/phases.py b/desloppify/languages/_framework/frameworks/phases.py new file mode 100644 index 000000000..65f6c75cf --- /dev/null +++ b/desloppify/languages/_framework/frameworks/phases.py @@ -0,0 +1,169 @@ +"""DetectorPhase factories for framework specs.""" + +from __future__ import annotations + +from pathlib import Path +from typing import Any + +from desloppify.base.output.terminal import log as _log +from desloppify.languages._framework.base.types import DetectorPhase, LangRuntimeContract +from desloppify.languages._framework.generic_support.core import make_tool_phase +from desloppify.state_io import Issue + +from .detection import detect_ecosystem_frameworks +from .registry import ensure_builtin_specs_loaded, list_framework_specs +from .types import FrameworkSpec, ScannerRule, ToolIntegration + + +def _has_capability(lang: LangRuntimeContract, cap: str) -> bool: + key = str(cap or "").strip() + if not key: + return True + if key == "dep_graph": + return getattr(lang, "dep_graph", None) is not None + if key == "zone_map": + return getattr(lang, "zone_map", None) is not None + if key == "file_finder": + return callable(getattr(lang, "file_finder", None)) + return bool(getattr(lang, key, None)) + + +def _record_capability_degradation( + lang: Any, + *, + detector: str, + rule_id: str, + missing: list[str], +) -> None: + """Record reduced coverage metadata when a framework rule cannot run.""" + if not missing: + return + summary = ( + f"Skipped {detector} framework rule '{rule_id}' (missing: {', '.join(missing)})." + ) + record = { + "detector": detector, + "status": "reduced", + "confidence": 0.5, + "summary": summary, + "impact": "Some framework-specific issues may be under-reported for this scan.", + "remediation": "Enable the required language capabilities and rerun scan.", + "tool": "", + "reason": "missing_capability", + } + detector_coverage = getattr(lang, "detector_coverage", None) + if isinstance(detector_coverage, dict): + existing = detector_coverage.get(detector) + if isinstance(existing, dict): + merged = dict(existing) + merged["status"] = "reduced" + merged["confidence"] = min(float(existing.get("confidence", 1.0)), 0.5) + merged_summary = str(merged.get("summary", "") or "").strip() + if merged_summary and summary not in merged_summary: + merged["summary"] = f"{merged_summary} | {summary}" + elif not merged_summary: + merged["summary"] = summary + detector_coverage[detector] = merged + else: + detector_coverage[detector] = dict(record) + + coverage_warnings = getattr(lang, "coverage_warnings", None) + if isinstance(coverage_warnings, list): + if not any( + isinstance(entry, dict) and entry.get("detector") == detector for entry in coverage_warnings + ): + coverage_warnings.append(dict(record)) + + +def _run_scanner_rules( + scan_root: Path, + lang: LangRuntimeContract, + *, + detector: str, + rules: tuple[ScannerRule, ...], +) -> tuple[list[Issue], int]: + issues: list[Issue] = [] + potential = 0 + + for rule in rules: + scan_fn = rule.scan + issue_factory = rule.issue_factory + if scan_fn is None or issue_factory is None: + continue + + missing = [cap for cap in rule.requires if not _has_capability(lang, cap)] + if missing: + _record_capability_degradation( + lang, + detector=detector, + rule_id=rule.id, + missing=missing, + ) + continue + + entries, scanned = scan_fn(scan_root, lang) + potential = max(potential, int(scanned or 0)) + for entry in entries: + issues.append(issue_factory(entry)) + if entries and rule.log_message: + _log(rule.log_message(len(entries))) + + return issues, potential + + +def _framework_smells_phase(spec: FrameworkSpec) -> DetectorPhase: + label = f"{spec.label} framework smells" + + def run(path: Path, lang: LangRuntimeContract) -> tuple[list[Issue], dict[str, int]]: + detection = detect_ecosystem_frameworks(path, lang, spec.ecosystem) + if spec.id not in detection.present: + return [], {} + + scan_root = detection.package_root + issues, potential = _run_scanner_rules( + scan_root, + lang, + detector=spec.id, + rules=spec.scanners, + ) + return issues, ({spec.id: potential} if potential > 0 else {}) + + return DetectorPhase(label, run) + + +def _framework_tool_phase(spec: FrameworkSpec, tool: ToolIntegration) -> DetectorPhase: + tool_phase = make_tool_phase( + tool.label, + tool.cmd, + tool.fmt, + tool.id, + tool.tier, + confidence=tool.confidence, + ) + tool_phase.slow = bool(tool.slow) + + def run(path: Path, lang: LangRuntimeContract) -> tuple[list[Issue], dict[str, int]]: + detection = detect_ecosystem_frameworks(path, lang, spec.ecosystem) + if spec.id not in detection.present: + return [], {} + + scan_root = detection.package_root + return tool_phase.run(scan_root, lang) + + return DetectorPhase(tool_phase.label, run, slow=tool_phase.slow) + + +def framework_phases(lang_name: str) -> list[DetectorPhase]: + """Return all framework phases for a language plugin.""" + del lang_name + ensure_builtin_specs_loaded() + + phases: list[DetectorPhase] = [] + for spec in list_framework_specs().values(): + phases.append(_framework_smells_phase(spec)) + for tool in spec.tools: + phases.append(_framework_tool_phase(spec, tool)) + return phases + + +__all__ = ["framework_phases"] diff --git a/desloppify/languages/_framework/frameworks/registry.py b/desloppify/languages/_framework/frameworks/registry.py new file mode 100644 index 000000000..fe2bb5fb0 --- /dev/null +++ b/desloppify/languages/_framework/frameworks/registry.py @@ -0,0 +1,58 @@ +"""Framework spec registry (analogous to tree-sitter spec registry).""" + +from __future__ import annotations + +from collections.abc import Iterable + +from .types import FrameworkSpec + +FRAMEWORK_SPECS: dict[str, FrameworkSpec] = {} + + +def register_framework_spec(spec: FrameworkSpec) -> None: + """Register a framework spec by id.""" + key = str(spec.id or "").strip() + if not key: + raise ValueError("FrameworkSpec.id must be non-empty") + FRAMEWORK_SPECS[key] = spec + + +def get_framework_spec(framework_id: str) -> FrameworkSpec | None: + """Return a registered framework spec by id.""" + key = str(framework_id or "").strip() + if not key: + return None + return FRAMEWORK_SPECS.get(key) + + +def list_framework_specs(*, ecosystem: str | None = None) -> dict[str, FrameworkSpec]: + """Return a copy of the framework registry, optionally filtered by ecosystem.""" + if ecosystem is None: + return dict(FRAMEWORK_SPECS) + eco = str(ecosystem or "").strip().lower() + if not eco: + return dict(FRAMEWORK_SPECS) + return {k: v for k, v in FRAMEWORK_SPECS.items() if str(v.ecosystem).lower() == eco} + + +def _register_builtin_specs() -> None: + """Register built-in framework specs shipped with the repo.""" + if FRAMEWORK_SPECS: + return + from .specs.nextjs import NEXTJS_SPEC + + register_framework_spec(NEXTJS_SPEC) + + +def ensure_builtin_specs_loaded() -> None: + """Idempotently load built-in framework specs.""" + _register_builtin_specs() + + +__all__ = [ + "FRAMEWORK_SPECS", + "ensure_builtin_specs_loaded", + "get_framework_spec", + "list_framework_specs", + "register_framework_spec", +] diff --git a/desloppify/languages/_framework/frameworks/specs/__init__.py b/desloppify/languages/_framework/frameworks/specs/__init__.py new file mode 100644 index 000000000..8c4c32a85 --- /dev/null +++ b/desloppify/languages/_framework/frameworks/specs/__init__.py @@ -0,0 +1,5 @@ +"""Built-in framework specs.""" + +from __future__ import annotations + +__all__ = [] diff --git a/desloppify/languages/_framework/frameworks/specs/nextjs.py b/desloppify/languages/_framework/frameworks/specs/nextjs.py new file mode 100644 index 000000000..0180d2206 --- /dev/null +++ b/desloppify/languages/_framework/frameworks/specs/nextjs.py @@ -0,0 +1,557 @@ +"""Next.js framework spec (Node ecosystem).""" + +from __future__ import annotations + +from collections.abc import Callable +from pathlib import Path +from typing import Any + +from desloppify.engine._state.filtering import make_issue +from desloppify.languages._framework.base.types import LangRuntimeContract +from desloppify.languages._framework.node.frameworks.nextjs.info import ( + NextjsFrameworkInfo, + nextjs_info_from_evidence, +) +from desloppify.languages._framework.node.frameworks.nextjs.scanners import ( + scan_mixed_router_layout, + scan_next_router_imports_in_app_router, + scan_nextjs_app_router_exports_in_pages_router, + scan_nextjs_async_client_components, + scan_nextjs_browser_globals_missing_use_client, + scan_nextjs_client_layouts, + scan_nextjs_env_leaks_in_client, + scan_nextjs_error_files_missing_use_client, + scan_nextjs_navigation_hooks_missing_use_client, + scan_nextjs_next_document_misuse, + scan_nextjs_next_head_in_app_router, + scan_nextjs_pages_api_route_handlers, + scan_nextjs_pages_router_apis_in_app_router, + scan_nextjs_pages_router_artifacts_in_app_router, + scan_nextjs_route_handlers_and_middleware_misuse, + scan_nextjs_server_exports_in_client, + scan_nextjs_server_imports_in_client, + scan_nextjs_server_modules_in_pages_router, + scan_nextjs_server_navigation_apis_in_client, + scan_nextjs_use_client_not_first, + scan_nextjs_use_server_in_client, + scan_nextjs_use_server_not_first, + scan_rsc_missing_use_client, +) +from desloppify.state_io import Issue + +from ..types import DetectionConfig, FrameworkSpec, ScannerRule, ToolIntegration + +_NEXTJS_INFO_CACHE_PREFIX = "framework.nextjs.info" + + +def _nextjs_info(scan_root: Path, lang: LangRuntimeContract) -> NextjsFrameworkInfo: + key = f"{_NEXTJS_INFO_CACHE_PREFIX}:{scan_root.resolve().as_posix()}" + cache = getattr(lang, "review_cache", None) + if isinstance(cache, dict): + cached = cache.get(key) + if isinstance(cached, NextjsFrameworkInfo): + return cached + + from desloppify.languages._framework.frameworks.detection import ( + detect_ecosystem_frameworks, + ) + + detection = detect_ecosystem_frameworks(scan_root, lang, "node") + evidence = detection.present.get("nextjs", {}) + info = nextjs_info_from_evidence( + evidence, + package_root=detection.package_root, + package_json_relpath=detection.package_json_relpath, + ) + + if isinstance(cache, dict): + cache[key] = info + return info + + +def _wrap_scan( + scan_fn: Callable[[Path, NextjsFrameworkInfo], tuple[list[dict[str, Any]], int]], +) -> Callable[[Path, LangRuntimeContract], tuple[list[dict[str, Any]], int]]: + def scan(scan_root: Path, lang: LangRuntimeContract) -> tuple[list[dict[str, Any]], int]: + info = _nextjs_info(scan_root, lang) + return scan_fn(scan_root, info) + + return scan + + +def _wrap_info_scan( + scan_fn: Callable[[NextjsFrameworkInfo], list[dict[str, Any]]], +) -> Callable[[Path, LangRuntimeContract], tuple[list[dict[str, Any]], int]]: + def scan(scan_root: Path, lang: LangRuntimeContract) -> tuple[list[dict[str, Any]], int]: + info = _nextjs_info(scan_root, lang) + return list(scan_fn(info)), 0 + + return scan + + +def _make_line_issue( + detector: str, + issue_id: str, + *, + tier: int, + confidence: str, + summary: str, +) -> Callable[[dict[str, Any]], Issue]: + return lambda entry: make_issue( + detector, + entry["file"], + issue_id, + tier=tier, + confidence=confidence, + summary=summary, + detail={"line": entry["line"]}, + ) + + +NEXTJS_SCANNERS: tuple[ScannerRule, ...] = ( + ScannerRule( + id="use_client_not_first", + scan=_wrap_scan(scan_nextjs_use_client_not_first), + issue_factory=_make_line_issue( + "nextjs", + "use_client_not_first", + tier=2, + confidence="high", + summary="'use client' directive is present but not the first meaningful line (invalid in Next.js).", + ), + log_message=lambda count: ( + " nextjs: " + f"{count} App Router files contain a non-top-level 'use client' directive" + ), + ), + ScannerRule( + id="error_file_missing_use_client", + scan=_wrap_scan(scan_nextjs_error_files_missing_use_client), + issue_factory=lambda entry: make_issue( + "nextjs", + entry["file"], + f"error_file_missing_use_client::{entry.get('name','error')}", + tier=2, + confidence="high", + summary="App Router error boundary module is missing 'use client' (required for error.js/error.tsx).", + detail={"line": entry["line"], "name": entry.get("name")}, + ), + log_message=lambda count: ( + " nextjs: " f"{count} App Router error boundary files missing 'use client'" + ), + ), + ScannerRule( + id="pages_router_artifact_in_app_router", + scan=_wrap_scan(scan_nextjs_pages_router_artifacts_in_app_router), + issue_factory=lambda entry: make_issue( + "nextjs", + entry["file"], + f"pages_router_artifact_in_app_router::{entry.get('name','artifact')}", + tier=3, + confidence="high", + summary=( + "App Router tree contains Pages Router artifact file " + f"{entry.get('name')} (likely migration artifact)." + ), + detail={"line": entry["line"], "name": entry.get("name")}, + ), + log_message=lambda count: ( + " nextjs: " f"{count} Pages Router artifact files found under app/" + ), + ), + ScannerRule( + id="missing_use_client", + scan=_wrap_scan(scan_rsc_missing_use_client), + issue_factory=lambda entry: make_issue( + "nextjs", + entry["file"], + f"missing_use_client::{entry['hook']}", + tier=2, + confidence="medium", + summary=f"Missing 'use client' directive: App Router module uses {entry['hook']}()", + detail={"line": entry["line"], "hook": entry["hook"]}, + ), + log_message=lambda count: ( + " nextjs: " f"{count} App Router files missing 'use client'" + ), + ), + ScannerRule( + id="nav_hook_missing_use_client", + scan=_wrap_scan(scan_nextjs_navigation_hooks_missing_use_client), + issue_factory=lambda entry: make_issue( + "nextjs", + entry["file"], + f"nav_hook_missing_use_client::{entry['hook']}", + tier=2, + confidence="high", + summary=f"Missing 'use client' directive: App Router module uses {entry['hook']}()", + detail={"line": entry["line"], "hook": entry["hook"]}, + ), + log_message=lambda count: ( + " nextjs: " + f"{count} App Router files use next/navigation hooks without 'use client'" + ), + ), + ScannerRule( + id="server_import_in_client", + scan=_wrap_scan(scan_nextjs_server_imports_in_client), + issue_factory=lambda entry: make_issue( + "nextjs", + entry["file"], + "server_import_in_client", + tier=2, + confidence="high", + summary=( + ( + "Client component imports server-only modules (" + + ", ".join(entry.get("modules", [])[:4]) + + ")." + ) + if entry.get("modules") + else "Client component imports server-only modules." + ), + detail={ + "line": entry["line"], + "modules": entry.get("modules", []), + "imports": entry.get("imports", []), + }, + ), + log_message=lambda count: ( + " nextjs: " f"{count} client components import server-only modules" + ), + ), + ScannerRule( + id="server_export_in_client", + scan=_wrap_scan(scan_nextjs_server_exports_in_client), + issue_factory=lambda entry: make_issue( + "nextjs", + entry["file"], + f"server_export_in_client::{entry.get('export','export')}", + tier=3, + confidence="high", + summary=( + "Client component exports server-only Next.js module exports " + f"({entry.get('export')})." + ), + detail={"line": entry["line"], "export": entry.get("export")}, + ), + log_message=lambda count: ( + " nextjs: " f"{count} client components export server-only Next.js exports" + ), + ), + ScannerRule( + id="pages_router_api_in_app_router", + scan=_wrap_scan(scan_nextjs_pages_router_apis_in_app_router), + issue_factory=lambda entry: make_issue( + "nextjs", + entry["file"], + f"pages_router_api_in_app_router::{entry.get('api','api')}", + tier=3, + confidence="high", + summary=( + "App Router module uses Pages Router data-fetching API " + f"({entry.get('api')})." + ), + detail={"line": entry["line"], "api": entry.get("api")}, + ), + log_message=lambda count: ( + " nextjs: " f"{count} App Router files use Pages Router APIs" + ), + ), + ScannerRule( + id="next_head_in_app_router", + scan=_wrap_scan(scan_nextjs_next_head_in_app_router), + issue_factory=_make_line_issue( + "nextjs", + "next_head_in_app_router", + tier=3, + confidence="high", + summary="App Router module imports next/head (unsupported in App Router).", + ), + log_message=lambda count: ( + " nextjs: " f"{count} App Router files import next/head" + ), + ), + ScannerRule( + id="next_document_misuse", + scan=_wrap_scan(scan_nextjs_next_document_misuse), + issue_factory=_make_line_issue( + "nextjs", + "next_document_misuse", + tier=3, + confidence="high", + summary="next/document import outside valid Pages Router _document.* file.", + ), + log_message=lambda count: ( + " nextjs: " f"{count} files import next/document outside _document.*" + ), + ), + ScannerRule( + id="browser_global_missing_use_client", + scan=_wrap_scan(scan_nextjs_browser_globals_missing_use_client), + issue_factory=lambda entry: make_issue( + "nextjs", + entry["file"], + f"browser_global_missing_use_client::{entry.get('global','global')}", + tier=2, + confidence="medium", + summary=( + "App Router module accesses browser globals " + f"({entry.get('global')}) but is missing 'use client'." + ), + detail={"line": entry["line"], "global": entry.get("global")}, + ), + log_message=lambda count: ( + " nextjs: " f"{count} App Router files access browser globals without 'use client'" + ), + ), + ScannerRule( + id="client_layout_smell", + scan=_wrap_scan(scan_nextjs_client_layouts), + issue_factory=_make_line_issue( + "nextjs", + "client_layout_smell", + tier=3, + confidence="low", + summary="Client layout detected (layout.* marked 'use client') — consider isolating interactivity to leaf components.", + ), + log_message=lambda count: ( + " nextjs: " f"{count} client layouts detected" + ), + ), + ScannerRule( + id="async_client_component", + scan=_wrap_scan(scan_nextjs_async_client_components), + issue_factory=_make_line_issue( + "nextjs", + "async_client_component", + tier=3, + confidence="high", + summary="Client component is async (invalid in Next.js).", + ), + log_message=lambda count: ( + " nextjs: " f"{count} async client components detected" + ), + ), + ScannerRule( + id="env_leak_in_client", + scan=_wrap_scan(scan_nextjs_env_leaks_in_client), + issue_factory=lambda entry: make_issue( + "nextjs", + entry["file"], + f"env_leak_in_client::{entry.get('var','env')}", + tier=2, + confidence="high", + summary=( + "Client module accesses non-public env var " + f"process.env.{entry.get('var')} (only NEXT_PUBLIC_* should be used in client)." + ), + detail={"line": entry["line"], "var": entry.get("var")}, + ), + log_message=lambda count: ( + " nextjs: " f"{count} client modules access non-public env vars" + ), + ), + ScannerRule( + id="pages_api_route_handlers", + scan=_wrap_scan(scan_nextjs_pages_api_route_handlers), + issue_factory=lambda entry: make_issue( + "nextjs", + entry["file"], + "pages_api_route_handlers", + tier=3, + confidence="high", + summary=( + "Pages Router API route exports App Router route-handler HTTP functions " + f"({', '.join(entry.get('exports', [])[:4])})." + ), + detail={"line": entry["line"], "exports": entry.get("exports", [])}, + ), + log_message=lambda count: ( + " nextjs: " f"{count} Pages Router API routes export App Router handlers" + ), + ), + ScannerRule( + id="middleware_misuse", + scan=_wrap_scan(scan_nextjs_route_handlers_and_middleware_misuse), + issue_factory=lambda entry: make_issue( + "nextjs", + entry["file"], + f"middleware_misuse::{entry.get('kind','route')}", + tier=3, + confidence="medium", + summary=( + f"Next.js {entry.get('kind')} misuses route context " + f"({entry.get('reason')})." + ), + detail={ + "line": entry.get("line", 1), + "kind": entry.get("kind"), + "reason": entry.get("reason"), + "findings": entry.get("findings", []), + }, + ), + log_message=lambda count: ( + " nextjs: " f"{count} route handler/middleware context misuse findings" + ), + ), + ScannerRule( + id="server_api_in_client", + scan=_wrap_scan(scan_nextjs_server_navigation_apis_in_client), + issue_factory=lambda entry: make_issue( + "nextjs", + entry["file"], + f"server_api_in_client::{entry.get('api','api')}", + tier=2, + confidence="high", + summary=( + "Client module calls server-only next/navigation API " + f"({entry.get('api')})." + ), + detail={"line": entry["line"], "api": entry.get("api")}, + ), + log_message=lambda count: ( + " nextjs: " f"{count} client modules call server-only next/navigation APIs" + ), + ), + ScannerRule( + id="use_server_in_client", + scan=_wrap_scan(scan_nextjs_use_server_in_client), + issue_factory=_make_line_issue( + "nextjs", + "use_server_in_client", + tier=2, + confidence="high", + summary="'use server' directive in a client module (invalid in Next.js).", + ), + log_message=lambda count: ( + " nextjs: " f"{count} client modules contain a module-level 'use server' directive" + ), + ), + ScannerRule( + id="use_server_not_first", + scan=_wrap_scan(scan_nextjs_use_server_not_first), + issue_factory=_make_line_issue( + "nextjs", + "use_server_not_first", + tier=2, + confidence="high", + summary="'use server' directive is present but not the first meaningful line (invalid in Next.js).", + ), + log_message=lambda count: ( + " nextjs: " f"{count} modules contain a non-top-level 'use server' directive" + ), + ), + ScannerRule( + id="app_router_exports_in_pages_router", + scan=_wrap_scan(scan_nextjs_app_router_exports_in_pages_router), + issue_factory=lambda entry: make_issue( + "nextjs", + entry["file"], + f"app_router_exports_in_pages_router::{entry.get('export','export')}", + tier=3, + confidence="high", + summary=( + "Pages Router module exports App Router-only module export " + f"({entry.get('export')})." + ), + detail={"line": entry["line"], "export": entry.get("export")}, + ), + log_message=lambda count: ( + " nextjs: " f"{count} Pages Router files export App Router-only module exports" + ), + ), + ScannerRule( + id="server_modules_in_pages_router", + scan=_wrap_scan(scan_nextjs_server_modules_in_pages_router), + issue_factory=lambda entry: make_issue( + "nextjs", + entry["file"], + "server_modules_in_pages_router", + tier=3, + confidence="high", + summary=( + ( + "Pages Router module imports App Router server-only modules (" + + ", ".join(entry.get("modules", [])[:4]) + + ")." + ) + if entry.get("modules") + else "Pages Router module imports App Router server-only modules." + ), + detail={ + "line": entry["line"], + "modules": entry.get("modules", []), + "imports": entry.get("imports", []), + }, + ), + log_message=lambda count: ( + " nextjs: " f"{count} Pages Router files import App Router server-only modules" + ), + ), + ScannerRule( + id="next_router_in_app_router", + scan=_wrap_scan(scan_next_router_imports_in_app_router), + issue_factory=_make_line_issue( + "nextjs", + "next_router_in_app_router", + tier=3, + confidence="high", + summary="App Router file imports legacy next/router (prefer next/navigation).", + ), + log_message=lambda count: ( + f" nextjs: {count} App Router files import next/router" + ), + ), + ScannerRule( + id="mixed_routers", + scan=_wrap_info_scan(scan_mixed_router_layout), + issue_factory=lambda entry: make_issue( + "nextjs", + entry["file"], + "mixed_routers", + tier=4, + confidence="low", + summary="Project contains both App Router (app/) and Pages Router (pages/) trees.", + detail={ + "app_roots": entry.get("app_roots", []), + "pages_roots": entry.get("pages_roots", []), + }, + ), + ), +) + + +NEXTJS_SPEC = FrameworkSpec( + id="nextjs", + label="Next.js", + ecosystem="node", + detection=DetectionConfig( + dependencies=("next",), + config_files=( + "next.config.js", + "next.config.mjs", + "next.config.cjs", + "next.config.ts", + ), + marker_dirs=("app", "src/app", "pages", "src/pages"), + script_pattern=r"(?:^|\s)next(?:\s|$)", + marker_dirs_imply_presence=False, + ), + excludes=(), + scanners=NEXTJS_SCANNERS, + tools=( + ToolIntegration( + id="next_lint", + label="next lint", + cmd="npx --no-install next lint --format json", + fmt="next_lint", + tier=2, + slow=True, + confidence="high", + ), + ), +) + + +__all__ = ["NEXTJS_SPEC"] diff --git a/desloppify/languages/_framework/frameworks/types.py b/desloppify/languages/_framework/frameworks/types.py new file mode 100644 index 000000000..183ad9c91 --- /dev/null +++ b/desloppify/languages/_framework/frameworks/types.py @@ -0,0 +1,87 @@ +"""Framework spec contracts (detection + scanners + tool integrations).""" + +from __future__ import annotations + +from collections.abc import Callable +from dataclasses import dataclass +from pathlib import Path +from typing import Any + +from desloppify.languages._framework.base.types import LangRuntimeContract +from desloppify.state_io import Issue + +FrameworkEvidence = dict[str, Any] + + +@dataclass(frozen=True) +class DetectionConfig: + """Deterministic framework presence detection hints for an ecosystem.""" + + dependencies: tuple[str, ...] = () + dev_dependencies: tuple[str, ...] = () + config_files: tuple[str, ...] = () + marker_files: tuple[str, ...] = () + marker_dirs: tuple[str, ...] = () + script_pattern: str | None = None + + # Markers are valuable for routing context, but default to "context only" + # so frameworks don't light up purely from directory shape. + marker_dirs_imply_presence: bool = False + + +@dataclass(frozen=True) +class ScannerRule: + """A single scanner rule within a FrameworkSpec.""" + + id: str + requires: tuple[str, ...] = () + scan: Callable[[Path, LangRuntimeContract], tuple[list[dict[str, Any]], int]] | None = None + issue_factory: Callable[[dict[str, Any]], Issue] | None = None + log_message: Callable[[int], str] | None = None + + +@dataclass(frozen=True) +class ToolIntegration: + """Framework tool integration (ToolSpec-like + phase semantics).""" + + id: str # detector id (e.g. "next_lint") + label: str + cmd: str + fmt: str + tier: int + slow: bool = False + confidence: str = "medium" + + +@dataclass(frozen=True) +class FrameworkSpec: + """A framework "horizontal layer" spec, analogous to tree-sitter specs.""" + + id: str + label: str + ecosystem: str + detection: DetectionConfig + + excludes: tuple[str, ...] = () + scanners: tuple[ScannerRule, ...] = () + tools: tuple[ToolIntegration, ...] = () + + +@dataclass(frozen=True) +class EcosystemFrameworkDetection: + """Framework presence detection result for a scan path within an ecosystem.""" + + ecosystem: str + package_root: Path + package_json_relpath: str | None + present: dict[str, FrameworkEvidence] + + +__all__ = [ + "DetectionConfig", + "EcosystemFrameworkDetection", + "FrameworkEvidence", + "FrameworkSpec", + "ScannerRule", + "ToolIntegration", +] diff --git a/desloppify/languages/_framework/generic_parts/parsers.py b/desloppify/languages/_framework/generic_parts/parsers.py index 271f764a2..313683ab8 100644 --- a/desloppify/languages/_framework/generic_parts/parsers.py +++ b/desloppify/languages/_framework/generic_parts/parsers.py @@ -181,25 +181,139 @@ def parse_eslint(output: str, scan_path: Path) -> list[dict]: return entries -PARSERS: dict[str, Callable[[str, Path], list[dict]]] = { +def parse_phpstan(output: str, scan_path: Path) -> list[dict]: + """Parse PHPStan JSON: ``{"files": {"": {"messages": [{"message": "...", "line": 42}]}}}``.""" + del scan_path + entries: list[dict] = [] + data = _load_json_output(output, parser_name="phpstan") + files = data.get("files") if isinstance(data, dict) else {} + for filepath, fdata in (files or {}).items(): + if not isinstance(fdata, dict): + continue + for msg in fdata.get("messages") or []: + if not isinstance(msg, dict): + continue + line = _coerce_line(msg.get("line", 0)) + message = msg.get("message", "") + if filepath and message and line is not None: + entries.append({"file": str(filepath), "line": line, "message": str(message)}) + return entries + + +def _extract_json_array(text: str) -> str | None: + """Best-effort: return the first JSON array substring in *text*.""" + start = text.find("[") + if start == -1: + return None + end = text.rfind("]") + if end == -1 or end <= start: + return None + return text[start : end + 1] + + +def _relativize_to_project_root(filepath: str, *, scan_path: Path) -> str: + """Resolve a tool-emitted path to a project-root-relative string when possible.""" + from desloppify.base.discovery.paths import get_project_root + + project_root = get_project_root().resolve() + try: + p = Path(filepath) + abs_path = p.resolve() if p.is_absolute() else (scan_path / p).resolve() + try: + return str(abs_path.relative_to(project_root)).replace("\\", "/") + except ValueError: + return str(abs_path) + except Exception: # pragma: no cover + return filepath + + +def parse_next_lint(output: str, scan_path: Path) -> tuple[list[dict], dict]: + """Parse Next.js `next lint --format json` output. + + Returns ``(entries, meta)`` where: + - entries are *per-file* aggregates: {file, line, message, id, detail} + - meta includes ``potential`` (number of files lint reported on) + """ + raw = (output or "").strip() + json_text = _extract_json_array(raw) + if not json_text: + raise ToolParserError("next_lint parser could not find JSON output array") + + data = _load_json_output(json_text, parser_name="next_lint") + if not isinstance(data, list): + raise ToolParserError("next_lint parser expected a JSON array") + + potential = len(data) + entries: list[dict] = [] + for fobj in data: + if not isinstance(fobj, dict): + continue + file_path = fobj.get("filePath") or "" + messages = fobj.get("messages") or [] + if not file_path or not isinstance(messages, list) or not messages: + continue + + rel = _relativize_to_project_root(str(file_path), scan_path=scan_path) + first = next((m for m in messages if isinstance(m, dict)), None) + if first is None: + continue + line = _coerce_line(first.get("line", 0)) or 1 + msg = first.get("message") if isinstance(first.get("message"), str) else "Lint issue" + entries.append( + { + "file": rel, + "line": line if line > 0 else 1, + "id": "lint", + "message": f"next lint: {msg} ({len(messages)} issue(s) in file)", + "detail": { + "count": len(messages), + "messages": [ + { + "line": _coerce_line(m.get("line", 0)) or 0, + "column": _coerce_line(m.get("column", 0)) or 0, + "ruleId": m.get("ruleId", "") if isinstance(m.get("ruleId", ""), str) else "", + "message": m.get("message", "") if isinstance(m.get("message", ""), str) else "", + "severity": _coerce_line(m.get("severity", 0)) or 0, + } + for m in messages + if isinstance(m, dict) + ][:50], + }, + } + ) + + return entries, {"potential": potential} + + +ToolParseResult = list[dict] | tuple[list[dict], dict] +ToolParser = Callable[[str, Path], ToolParseResult] + + +PARSERS: dict[str, ToolParser] = { "gnu": parse_gnu, "golangci": parse_golangci, "json": parse_json, "credo": parse_credo, + "phpstan": parse_phpstan, "rubocop": parse_rubocop, "cargo": parse_cargo, "eslint": parse_eslint, + "next_lint": parse_next_lint, } __all__ = [ "PARSERS", "ToolParserError", + "ToolParseResult", + "ToolParser", "parse_cargo", "parse_credo", "parse_eslint", "parse_gnu", "parse_golangci", "parse_json", + "parse_phpstan", + "parse_next_lint", "parse_rubocop", ] diff --git a/desloppify/languages/_framework/generic_parts/tool_factories.py b/desloppify/languages/_framework/generic_parts/tool_factories.py index b6722a53c..4d7c518b3 100644 --- a/desloppify/languages/_framework/generic_parts/tool_factories.py +++ b/desloppify/languages/_framework/generic_parts/tool_factories.py @@ -63,12 +63,16 @@ def make_tool_phase( fmt: str, smell_id: str, tier: int, + *, + confidence: str = "medium", + cwd_fn: Callable[[Path, Any], Path] | None = None, ) -> DetectorPhase: """Create a DetectorPhase that runs an external tool and parses output.""" parser = PARSERS[fmt] def run(path: Path, lang: Any) -> tuple[list[dict[str, Any]], dict[str, int]]: - run_result = run_tool_result(cmd, path, parser) + run_path = cwd_fn(path, lang).resolve() if cwd_fn is not None else path + run_result = run_tool_result(cmd, run_path, parser) if run_result.status == "error": _record_tool_failure_coverage( lang, @@ -78,20 +82,28 @@ def run(path: Path, lang: Any) -> tuple[list[dict[str, Any]], dict[str, int]]: ) return [], {} entries = list(run_result.entries) + meta = run_result.meta if isinstance(run_result.meta, dict) else {} + meta_potential = meta.get("potential") + potential = meta_potential if isinstance(meta_potential, int) else 0 + + if run_result.status == "empty": + return [], ({smell_id: potential} if potential > 0 else {}) + if not entries: - return [], {} + return [], ({smell_id: potential} if potential > 0 else {}) issues = [ make_issue( smell_id, entry["file"], - f"{smell_id}::{entry['line']}", + str(entry.get("id") or f"{smell_id}::{entry['line']}"), tier=tier, - confidence="medium", - summary=entry["message"], + confidence=str(entry.get("confidence") or confidence), + summary=str(entry.get("summary") or entry["message"]), + detail=entry.get("detail") if isinstance(entry.get("detail"), dict) else None, ) for entry in entries ] - return issues, {smell_id: len(entries)} + return issues, {smell_id: potential if potential > 0 else len(entries)} return DetectorPhase(label, run) diff --git a/desloppify/languages/_framework/generic_parts/tool_runner.py b/desloppify/languages/_framework/generic_parts/tool_runner.py index 90508e3db..095f6db15 100644 --- a/desloppify/languages/_framework/generic_parts/tool_runner.py +++ b/desloppify/languages/_framework/generic_parts/tool_runner.py @@ -15,6 +15,7 @@ from desloppify.languages._framework.generic_parts.parsers import ToolParserError SubprocessRun = Callable[..., subprocess.CompletedProcess[str]] +ToolParser = Callable[[str, Path], list[dict] | tuple[list[dict], dict]] _SHELL_META_CHARS = re.compile(r"[|&;<>()$`\n]") logger = logging.getLogger(__name__) @@ -26,6 +27,7 @@ class ToolRunResult: entries: list[dict] status: Literal["ok", "empty", "error"] + meta: dict | None = None error_kind: str | None = None message: str | None = None returncode: int | None = None @@ -62,7 +64,7 @@ def _output_preview(output: str, *, limit: int = 160) -> str: def run_tool_result( cmd: str, path: Path, - parser: Callable[[str, Path], list[dict]], + parser: ToolParser, *, run_subprocess: SubprocessRun | None = None, ) -> ToolRunResult: @@ -91,8 +93,15 @@ def run_tool_result( error_kind="tool_timeout", message=str(exc), ) - output = (result.stdout or "") + (result.stderr or "") - if not output.strip(): + stdout = result.stdout or "" + stderr = result.stderr or "" + # Parse stdout when it has content (structured JSON tools always write + # there). Fall back to combined stdout+stderr only when stdout is empty, + # so that tools which emit diagnostics to stderr don't corrupt the JSON + # parse input while still being treated as "no output" when truly silent. + parse_input = stdout if stdout.strip() else (stdout + stderr) + combined = stdout + stderr + if not combined.strip(): if result.returncode not in (0, None): return ToolRunResult( entries=[], @@ -107,7 +116,7 @@ def run_tool_result( returncode=result.returncode, ) try: - parsed = parser(output, path) + parsed = parser(parse_input, path) except ToolParserError as exc: logger.debug("Parser decode error for tool output: %s", exc) return ToolRunResult( @@ -126,7 +135,25 @@ def run_tool_result( message=str(exc), returncode=result.returncode, ) - if not isinstance(parsed, list): + meta: dict | None = None + parsed_entries = parsed + if isinstance(parsed, tuple): + if ( + len(parsed) != 2 + or not isinstance(parsed[0], list) + or not isinstance(parsed[1], dict) + ): + return ToolRunResult( + entries=[], + status="error", + error_kind="parser_shape_error", + message="parser returned invalid (entries, meta) tuple", + returncode=result.returncode, + ) + parsed_entries = parsed[0] + meta = dict(parsed[1]) + + if not isinstance(parsed_entries, list): return ToolRunResult( entries=[], status="error", @@ -134,9 +161,9 @@ def run_tool_result( message="parser returned non-list output", returncode=result.returncode, ) - if not parsed: + if not parsed_entries: if result.returncode not in (0, None): - preview = _output_preview(output) + preview = _output_preview(combined) return ToolRunResult( entries=[], status="error", @@ -150,11 +177,13 @@ def run_tool_result( return ToolRunResult( entries=[], status="empty", + meta=meta, returncode=result.returncode, ) return ToolRunResult( - entries=parsed, + entries=parsed_entries, status="ok", + meta=meta, returncode=result.returncode, ) diff --git a/desloppify/languages/_framework/generic_support/core.py b/desloppify/languages/_framework/generic_support/core.py index c5a279833..1cb66154d 100644 --- a/desloppify/languages/_framework/generic_support/core.py +++ b/desloppify/languages/_framework/generic_support/core.py @@ -59,6 +59,7 @@ def generic_lang( entry_patterns: list[str] | None = None, external_test_dirs: list[str] | None = None, test_file_extensions: list[str] | None = None, + frameworks: bool = False, ) -> LangConfig: """Build and register a generic language plugin from tool specs. @@ -131,6 +132,20 @@ def generic_lang( zone_rules=opts.zone_rules if opts.zone_rules is not None else generic_zone_rules(extensions), ) + if frameworks: + from desloppify.languages._framework.frameworks.phases import framework_phases + + phases = list(cfg.phases) + fw_phases = framework_phases(name) + + insert_at = len(phases) + for idx, phase in enumerate(phases): + if getattr(phase, "label", "") == "Structural analysis": + insert_at = idx + 1 + break + phases[insert_at:insert_at] = fw_phases + cfg.phases = phases + # Set integration depth — upgrade when tree-sitter provides capabilities. if has_treesitter and opts.depth in ("shallow", "minimal"): cfg.integration_depth = "standard" diff --git a/desloppify/languages/_framework/node/__init__.py b/desloppify/languages/_framework/node/__init__.py new file mode 100644 index 000000000..c86041049 --- /dev/null +++ b/desloppify/languages/_framework/node/__init__.py @@ -0,0 +1,3 @@ +"""Node/JavaScript ecosystem shared helpers (package.json, framework tooling).""" + +from __future__ import annotations diff --git a/desloppify/languages/_framework/node/frameworks/__init__.py b/desloppify/languages/_framework/node/frameworks/__init__.py new file mode 100644 index 000000000..51183dc09 --- /dev/null +++ b/desloppify/languages/_framework/node/frameworks/__init__.py @@ -0,0 +1,13 @@ +"""Node ecosystem framework scanners (Next.js, etc). + +Framework presence detection now lives under +``desloppify.languages._framework.frameworks``. + +This package remains the shared home for framework scanners and helper code so +JS/TS language plugins can reuse the same framework checks without duplicating +logic or importing across plugins. +""" + +from __future__ import annotations + +__all__: list[str] = [] diff --git a/desloppify/languages/_framework/node/frameworks/nextjs/README.md b/desloppify/languages/_framework/node/frameworks/nextjs/README.md new file mode 100644 index 000000000..8a6857fd9 --- /dev/null +++ b/desloppify/languages/_framework/node/frameworks/nextjs/README.md @@ -0,0 +1,158 @@ +# Next.js Framework Support (Scanners + Spec) + +This document explains the Next.js framework support used by Desloppify's TypeScript and JavaScript plugins. + +It covers: + +- What the Next.js framework module does +- How framework detection and scanning flow works +- What each file in `desloppify/languages/_framework/node/frameworks/nextjs/` is responsible for +- Which shared files outside this folder affect behavior +- Current limits and safe extension points + +If you are new to this code, start with the "Spec + scan flow" section, then read `scanners.py`. + +## High-level purpose + +The Next.js framework module adds framework-aware smells that generic code-quality detectors do not catch. + +Current scope includes: + +- App Router vs Pages Router migration and misuse signals +- Client/server boundary misuse (`"use client"`, `"use server"`, server-only imports/exports) +- Route handler and middleware context misuse +- Next.js API misuse in wrong router contexts +- Environment variable leakage in client modules +- `next lint` integration as a framework quality gate (`next_lint` detector) + +This module is intentionally heuristic-heavy (regex/file-structure based) so scans remain fast and robust without requiring full compiler semantics. + +## Module map + +Files in this folder: + +- `desloppify/languages/_framework/node/frameworks/nextjs/__init__.py` +- `desloppify/languages/_framework/node/frameworks/nextjs/info.py` +- `desloppify/languages/_framework/node/frameworks/nextjs/scanners.py` + +Spec + orchestration lives outside this folder: + +- `desloppify/languages/_framework/frameworks/specs/nextjs.py` +- `desloppify/languages/_framework/frameworks/phases.py` + +### What each file does + +`__init__.py`: + +- Exposes the framework info contract and shared phase entrypoint for imports + +`info.py`: + +- Defines `NextjsFrameworkInfo` +- Converts ecosystem detection evidence into Next.js-specific router roots and flags + +`scanners.py`: + +- Implements all Next.js smell scanners +- Performs fast source-file discovery and content heuristics +- Returns normalized scanner entries for the framework spec adapter + +## Shared surfaces outside this folder + +These files are part of the same feature boundary and should be considered together: + +- `desloppify/languages/_framework/frameworks/detection.py` +- `desloppify/languages/_framework/frameworks/phases.py` +- `desloppify/languages/_framework/frameworks/specs/nextjs.py` +- `desloppify/languages/typescript/__init__.py` +- `desloppify/languages/javascript/__init__.py` +- `desloppify/languages/_framework/generic_parts/parsers.py` (parser: `parse_next_lint`) +- `desloppify/languages/_framework/generic_parts/tool_factories.py` (tool phase: `make_tool_phase`) +- `desloppify/base/discovery/source.py` + +### Responsibility split + +- `frameworks/detection.py` decides whether Next.js is present for a scan path and where package roots are. +- `frameworks/specs/nextjs.py` defines the Next.js FrameworkSpec (detection config + scanners + tool integrations). +- `frameworks/phases.py` adapts specs into `DetectorPhase` objects. +- `nextjs/info.py` derives routing context (`app_roots`, `pages_roots`) from detection evidence. +- `nextjs/scanners.py` only finds smell candidates (fast, heuristic). + +## Detectors + +This module emits findings under: + +- `nextjs` +- `next_lint` + +Registry/scoring wiring lives outside this folder in: + +- `desloppify/base/registry/catalog_entries.py` +- `desloppify/base/registry/catalog_models.py` +- `desloppify/engine/_scoring/policy/core.py` + +## Scan flow in plain language + +## Spec + scan flow in plain language + +When TypeScript or JavaScript scans run for a Next.js project, flow is: + +1. Ecosystem framework detection (Node) evaluates deterministic presence signals from `package.json`. +2. Next.js info derives App/Pages router roots from detection evidence (`marker_dir_hits`). +3. Next.js framework smells phase runs all scanner functions and maps entries into normalized `nextjs` issues. +4. `next lint` tool phase runs (slow) and maps ESLint JSON output into `next_lint` issues. +5. Potentials are returned for scoring and state merge. + +## `next lint` behavior + +The Next.js spec runs: + +- `npx --no-install next lint --format json` + +Behavior: + +- If lint runs and returns JSON, file-level lint findings are emitted (one issue per file). +- If lint cannot run or output cannot be parsed, coverage is degraded for `next_lint` (shown as a scan coverage warning). + +`next lint` runs as a slow phase (`DetectorPhase.slow=True`) so `--skip-slow` skips it automatically. + +## Smell families covered + +Current high-value families include: + +- `"use client"` placement and missing directive checks +- `"use server"` placement checks (module-level misuse only) +- Server-only imports in client modules (`next/headers`, `next/server`, `next/cache`, `server-only`, Node built-ins) +- Server-only Next exports from client modules (`metadata`, `generateMetadata`, `revalidate`, `dynamic`, etc.) +- Pages Router APIs used under App Router (`getServerSideProps`, `getStaticProps`, etc.) +- `next/navigation` usage in Pages Router files +- App Router metadata/config exports in Pages Router files +- Pages API route files exporting App Router route-handler HTTP functions +- App Router route handler and middleware misuse +- `next/head` usage in App Router +- `next/document` imports outside valid `_document.*` pages context +- Browser global usage in App Router modules missing `"use client"` +- Client layout smell and async client component smell +- Mixed `app/` and `pages/` router project smell +- Env leakage in client modules via non-`NEXT_PUBLIC_*` `process.env` usage + +## Extending this module safely + +When adding a new smell: + +1. Add scanner logic in `scanners.py`. +2. Return compact entries (`file`, `line`, and minimal structured detail). +3. Map entries to `make_issue(...)` in `phase.py` with clear `id`, `summary`, and `detail`. +4. Update/extend tests in: + - `desloppify/languages/typescript/tests/test_ts_nextjs_framework.py` + - `desloppify/languages/javascript/tests/test_js_nextjs_framework.py` (if JS parity applies) +5. Keep logic shared (do not duplicate TS vs JS framework smell rules). + +## Limits and tradeoffs + +- Scanners are heuristic, not compiler-accurate. +- Some patterns are intentionally conservative to avoid noisy false positives. +- Router/middleware checks rely on conventional Next.js file placement. +- `next lint` requires project dependencies to be present for full lint execution. + +These tradeoffs are deliberate: fast scans with high-signal framework smells, while preserving a clear extension path when stronger analysis is needed. diff --git a/desloppify/languages/_framework/node/frameworks/nextjs/__init__.py b/desloppify/languages/_framework/node/frameworks/nextjs/__init__.py new file mode 100644 index 000000000..8dc1fc8f9 --- /dev/null +++ b/desloppify/languages/_framework/node/frameworks/nextjs/__init__.py @@ -0,0 +1,58 @@ +"""Next.js framework support shared across JS/TS scans.""" + +from __future__ import annotations + +from .info import NextjsFrameworkInfo, nextjs_info_from_evidence +from .scanners import ( + scan_nextjs_app_router_exports_in_pages_router, + scan_nextjs_async_client_components, + scan_nextjs_browser_globals_missing_use_client, + scan_nextjs_client_layouts, + scan_nextjs_error_files_missing_use_client, + scan_mixed_router_layout, + scan_next_router_imports_in_app_router, + scan_nextjs_env_leaks_in_client, + scan_nextjs_navigation_hooks_missing_use_client, + scan_nextjs_next_document_misuse, + scan_nextjs_next_head_in_app_router, + scan_nextjs_pages_api_route_handlers, + scan_nextjs_pages_router_apis_in_app_router, + scan_nextjs_pages_router_artifacts_in_app_router, + scan_nextjs_route_handlers_and_middleware_misuse, + scan_nextjs_server_navigation_apis_in_client, + scan_nextjs_server_modules_in_pages_router, + scan_nextjs_server_exports_in_client, + scan_nextjs_server_imports_in_client, + scan_nextjs_use_client_not_first, + scan_nextjs_use_server_not_first, + scan_nextjs_use_server_in_client, + scan_rsc_missing_use_client, +) + +__all__ = [ + "NextjsFrameworkInfo", + "nextjs_info_from_evidence", + "scan_nextjs_app_router_exports_in_pages_router", + "scan_nextjs_async_client_components", + "scan_nextjs_browser_globals_missing_use_client", + "scan_nextjs_client_layouts", + "scan_nextjs_error_files_missing_use_client", + "scan_mixed_router_layout", + "scan_next_router_imports_in_app_router", + "scan_nextjs_env_leaks_in_client", + "scan_nextjs_navigation_hooks_missing_use_client", + "scan_nextjs_next_document_misuse", + "scan_nextjs_next_head_in_app_router", + "scan_nextjs_pages_api_route_handlers", + "scan_nextjs_pages_router_apis_in_app_router", + "scan_nextjs_pages_router_artifacts_in_app_router", + "scan_nextjs_route_handlers_and_middleware_misuse", + "scan_nextjs_server_navigation_apis_in_client", + "scan_nextjs_server_modules_in_pages_router", + "scan_nextjs_server_exports_in_client", + "scan_nextjs_server_imports_in_client", + "scan_nextjs_use_client_not_first", + "scan_nextjs_use_server_not_first", + "scan_nextjs_use_server_in_client", + "scan_rsc_missing_use_client", +] diff --git a/desloppify/languages/_framework/node/frameworks/nextjs/info.py b/desloppify/languages/_framework/node/frameworks/nextjs/info.py new file mode 100644 index 000000000..1a02b51ea --- /dev/null +++ b/desloppify/languages/_framework/node/frameworks/nextjs/info.py @@ -0,0 +1,49 @@ +"""Next.js framework info derived from ecosystem-level detection evidence.""" + +from __future__ import annotations + +from dataclasses import dataclass +from pathlib import Path +from typing import Any + + +@dataclass(frozen=True) +class NextjsFrameworkInfo: + package_root: Path + package_json_relpath: str | None + app_roots: tuple[str, ...] + pages_roots: tuple[str, ...] + + @property + def uses_app_router(self) -> bool: + return bool(self.app_roots) + + @property + def uses_pages_router(self) -> bool: + return bool(self.pages_roots) + + +def _tuple_str(value: Any) -> tuple[str, ...]: + if not isinstance(value, list): + return () + return tuple(str(v) for v in value if isinstance(v, str)) + + +def nextjs_info_from_evidence( + evidence: dict[str, Any] | None, + *, + package_root: Path, + package_json_relpath: str | None, +) -> NextjsFrameworkInfo: + """Convert generic framework evidence into Next.js routing context.""" + evidence_dict = evidence if isinstance(evidence, dict) else {} + marker_dirs = _tuple_str(evidence_dict.get("marker_dir_hits")) + return NextjsFrameworkInfo( + package_root=package_root, + package_json_relpath=package_json_relpath, + app_roots=tuple(p for p in marker_dirs if p.endswith("/app") or p == "app"), + pages_roots=tuple(p for p in marker_dirs if p.endswith("/pages") or p == "pages"), + ) + + +__all__ = ["NextjsFrameworkInfo", "nextjs_info_from_evidence"] diff --git a/desloppify/languages/_framework/node/frameworks/nextjs/scanners.py b/desloppify/languages/_framework/node/frameworks/nextjs/scanners.py new file mode 100644 index 000000000..6ee3335be --- /dev/null +++ b/desloppify/languages/_framework/node/frameworks/nextjs/scanners.py @@ -0,0 +1,1209 @@ +"""Next.js-specific scanners. + +These scanners are intentionally lightweight (regex/heuristic-based) so they +can run as part of the normal smell phase without requiring a full TS AST. +""" + +from __future__ import annotations + +import logging +import re +from pathlib import Path + +from desloppify.base.discovery.paths import get_project_root +from desloppify.base.discovery.source import find_js_ts_and_tsx_files +from desloppify.languages._framework.node.js_text import ( + code_text as _code_text, + strip_js_ts_comments as _strip_ts_comments, +) + +from .info import NextjsFrameworkInfo + +logger = logging.getLogger(__name__) + +_USE_CLIENT_RE = re.compile( + r"""^(?:'use client'|"use client")\s*;?\s*(?://.*)?$""" +) +_MODULE_SPECIFIER_RE = re.compile( + r"""(?:from\s+['"](?P[^'"]+)['"]|require\(\s*['"](?P[^'"]+)['"]\s*\)|import\(\s*['"](?P[^'"]+)['"]\s*\))""" +) +_NEXT_ROUTER_IMPORT_RE = re.compile( + r"""(?:from\s+['"]next/router['"]|require\(\s*['"]next/router['"]\s*\))""" +) +_NEXT_NAV_IMPORT_RE = re.compile( + r"""(?:from\s+['"]next/navigation['"]|require\(\s*['"]next/navigation['"]\s*\)|import\(\s*['"]next/navigation['"]\s*\))""" +) +_NEXT_NAV_HOOK_CALL_RE = re.compile( + r"""\b(?:useRouter|usePathname|useSearchParams|useParams|useSelectedLayoutSegments|useSelectedLayoutSegment)\s*\(""" +) +_CLIENT_HOOK_CALL_RE = re.compile( + r"""\b(?:useState|useEffect|useLayoutEffect|useReducer|useRef|useContext|useTransition|useDeferredValue|useImperativeHandle|useSyncExternalStore|useMemo|useCallback|useId|useInsertionEffect)\s*\(""" +) +_REACT_NAMESPACE_HOOK_CALL_RE = re.compile( + r"""\bReact\.(?:useState|useEffect|useLayoutEffect|useReducer|useRef|useContext|useTransition|useDeferredValue|useImperativeHandle|useSyncExternalStore|useMemo|useCallback|useId|useInsertionEffect)\s*\(""" +) + +_NEXTJS_SERVER_ONLY_IMPORTS: set[str] = { + "next/headers", + "next/server", + "next/cache", + "server-only", +} + +# Heuristic list (not exhaustive). These are commonly invalid in client bundles. +_NODE_BUILTIN_MODULES: set[str] = { + "assert", + "buffer", + "child_process", + "cluster", + "crypto", + "dgram", + "dns", + "events", + "fs", + "http", + "https", + "module", + "net", + "os", + "path", + "perf_hooks", + "process", + "stream", + "timers", + "tls", + "tty", + "url", + "util", + "vm", + "worker_threads", + "zlib", +} + +_NEXTJS_SERVER_EXPORT_RE = re.compile( + r"""\bexport\s+(?:(?:const|let|var)\s+(?Pmetadata|revalidate|dynamic|runtime|fetchCache|preferredRegion|maxDuration|dynamicParams|metadataBase|viewport|experimental_ppr)\b|(?:async\s+)?function\s+(?PgenerateMetadata|generateStaticParams|generateViewport)\b)""" +) + +_NEXTJS_PAGES_ROUTER_API_RE = re.compile( + r"""\b(?:export\s+(?:async\s+)?function\s+(?PgetServerSideProps|getStaticProps|getStaticPaths|getInitialProps)\b|export\s+const\s+(?PgetServerSideProps|getStaticProps|getStaticPaths|getInitialProps)\b|\b(?PgetServerSideProps|getStaticProps|getStaticPaths|getInitialProps)\s*=)""" +) + +_PROCESS_ENV_DOT_RE = re.compile(r"""\bprocess\.env\.([A-Z0-9_]+)\b""") +_PROCESS_ENV_BRACKET_RE = re.compile(r"""\bprocess\.env\[\s*['"]([A-Z0-9_]+)['"]\s*\]""") +_CLIENT_ENV_ALLOWLIST: set[str] = {"NODE_ENV"} + +_USE_SERVER_LINE_RE = re.compile(r"""^\s*(?:'use server'|"use server")\s*;?\s*(?://.*)?$""") +_DIRECTIVE_LINE_RE = re.compile(r"""^\s*(?:'[^']*'|"[^"]*")\s*;?\s*(?://.*)?$""") +_ASYNC_EXPORT_DEFAULT_RE = re.compile(r"""\bexport\s+default\s+async\s+(?:function\b|\()""") +_BROWSER_GLOBAL_ACCESS_RE = re.compile( + r"""\b(?Pwindow|document|localStorage|sessionStorage|navigator)\s*(?:\.|\[)""" +) +_INVALID_REACTY_MODULES_IN_ROUTE_CONTEXT: set[str] = { + "next/link", + "next/image", + "next/head", + "next/script", +} + +# NOTE: `redirect()` and `permanentRedirect()` can be called from Client +# Components during the render phase (not event handlers). We intentionally do +# not flag those patterns here. +_NEXT_NAV_SERVER_API_CALL_RE = re.compile(r"""\b(?PnotFound)\s*\(""") +_ROUTE_HANDLER_HTTP_EXPORT_RE = re.compile( + r"""\bexport\s+(?:async\s+)?function\s+(GET|POST|PUT|PATCH|DELETE|HEAD|OPTIONS)\b""" +) +_EXPORT_DEFAULT_RE = re.compile(r"""\bexport\s+default\b""") +_NEXTAPI_TYPES_RE = re.compile(r"""\bNextApi(?:Request|Response)\b""") +_RES_STATUS_RE = re.compile(r"""\bres\.status\s*\(""") +_RUNTIME_EDGE_RE = re.compile(r"""\bexport\s+const\s+runtime\s*=\s*['"]edge['"]""") + + +def _has_use_server_directive_at_top(content: str) -> bool: + first = _first_meaningful_line(content.splitlines()) + return bool(first and _USE_SERVER_LINE_RE.match(first)) + + +def _find_use_server_directive_line_anywhere(content: str) -> int | None: + for idx, line in enumerate(content.splitlines()[:200], start=1): + if _USE_SERVER_LINE_RE.match(line.strip()): + return idx + return None + + +def _first_meaningful_line(lines: list[str]) -> str | None: + """Return the first non-empty, non-comment-only line.""" + in_block_comment = False + for line in lines[:80]: + s = line.strip() + if not s: + continue + if in_block_comment: + end = s.find("*/") + if end == -1: + continue + s = s[end + 2 :].strip() + in_block_comment = False + if not s: + continue + if s.startswith("//"): + continue + if s.startswith("/*"): + end = s.find("*/", 2) + if end == -1: + in_block_comment = True + continue + s = s[end + 2 :].strip() + if not s: + continue + return s + return None + + +def _has_use_client_directive(content: str) -> bool: + first = _first_meaningful_line(content.splitlines()) + return bool(first and _USE_CLIENT_RE.match(first)) + + +def _find_use_client_directive_anywhere(content: str) -> int | None: + for idx, line in enumerate(content.splitlines()[:120], start=1): + if _USE_CLIENT_RE.match(line.strip()): + return idx + return None + + +def _is_under_any_root(filepath: str, roots: tuple[str, ...]) -> bool: + return any(filepath == root or filepath.startswith(root.rstrip("/") + "/") for root in roots) + + +def _iter_import_specifiers(search_text: str) -> list[dict]: + matches: list[dict] = [] + for match in _MODULE_SPECIFIER_RE.finditer(search_text): + module = match.group("from") or match.group("require") or match.group("import") or "" + if not module: + continue + line_no = search_text[: match.start()].count("\n") + 1 + matches.append({"module": module, "line": line_no}) + return matches + + +def _is_node_builtin(module: str) -> bool: + raw = module[5:] if module.startswith("node:") else module + base = raw.split("/", 1)[0] + return base in _NODE_BUILTIN_MODULES + + +def _find_misplaced_module_use_server_directive(content: str) -> int | None: + """Find module-level 'use server' directives that are not first. + + Intentionally ignores nested inline server actions where `'use server'` is + inside a function body (valid Next.js pattern). + """ + search_text = _strip_ts_comments(content) + first_directive: str | None = None + in_prologue = True + + for idx, line in enumerate(search_text.splitlines()[:300], start=1): + if not line.strip(): + continue + + stripped = line.strip() + is_directive = bool(_DIRECTIVE_LINE_RE.match(stripped)) + is_use_server = bool(_USE_SERVER_LINE_RE.match(stripped)) + + if in_prologue: + if is_directive: + if first_directive is None: + first_directive = stripped + if is_use_server and first_directive != stripped: + return idx + continue + in_prologue = False + + # Top-level misplaced directive after code starts. + if line == line.lstrip() and is_use_server: + return idx + + return None + + +def _is_layout_module(filepath: str) -> bool: + name = Path(filepath).name + return name in {"layout.tsx", "layout.ts", "layout.jsx", "layout.js"} + + +def _is_pages_document_module(filepath: str) -> bool: + name = Path(filepath).name + return name.startswith("_document.") + + +def scan_nextjs_error_files_missing_use_client( + path: Path, info: NextjsFrameworkInfo +) -> tuple[list[dict], int]: + """Find App Router error boundary modules missing a 'use client' directive.""" + if not info.uses_app_router: + return [], 0 + + targets = { + "error.tsx", + "error.ts", + "error.jsx", + "error.js", + "global-error.tsx", + "global-error.ts", + "global-error.jsx", + "global-error.js", + } + entries: list[dict] = [] + scanned = 0 + for filepath in find_js_ts_and_tsx_files(path): + if not _is_under_any_root(filepath, info.app_roots): + continue + if Path(filepath).name not in targets: + continue + + scanned += 1 + try: + full = Path(filepath) if Path(filepath).is_absolute() else get_project_root() / filepath + content = full.read_text() + except (OSError, UnicodeDecodeError) as exc: + logger.debug("Skipping unreadable Next.js candidate %s: %s", filepath, exc) + continue + + if _has_use_client_directive(content): + continue + if _find_use_client_directive_anywhere(content) is not None: + continue + + entries.append({"file": filepath, "line": 1, "name": Path(filepath).name}) + + return entries, scanned + + +def scan_nextjs_pages_router_artifacts_in_app_router( + path: Path, info: NextjsFrameworkInfo +) -> tuple[list[dict], int]: + """Find Pages Router artifact filenames (e.g. _app.tsx) under app/ trees.""" + if not info.uses_app_router: + return [], 0 + + entries: list[dict] = [] + scanned = 0 + for filepath in find_js_ts_and_tsx_files(path): + if not _is_under_any_root(filepath, info.app_roots): + continue + + scanned += 1 + name = Path(filepath).name + if not (name.startswith("_app.") or name.startswith("_document.") or name.startswith("_error.")): + continue + + entries.append({"file": filepath, "line": 1, "name": name}) + + return entries, scanned + + +def scan_nextjs_use_server_not_first( + path: Path, info: NextjsFrameworkInfo +) -> tuple[list[dict], int]: + """Find modules where 'use server' exists but is not the first meaningful line.""" + entries: list[dict] = [] + scanned = 0 + for filepath in find_js_ts_and_tsx_files(path): + scanned += 1 + try: + full = Path(filepath) if Path(filepath).is_absolute() else get_project_root() / filepath + content = full.read_text() + except (OSError, UnicodeDecodeError) as exc: + logger.debug("Skipping unreadable Next.js candidate %s: %s", filepath, exc) + continue + + if _has_use_server_directive_at_top(content): + continue + + line_no = _find_misplaced_module_use_server_directive(content) + if line_no is None: + continue + + entries.append({"file": filepath, "line": line_no}) + + return entries, scanned + + +def scan_nextjs_next_head_in_app_router( + path: Path, info: NextjsFrameworkInfo +) -> tuple[list[dict], int]: + """Find App Router modules importing legacy next/head.""" + if not info.uses_app_router: + return [], 0 + + entries: list[dict] = [] + scanned = 0 + for filepath in find_js_ts_and_tsx_files(path): + if not _is_under_any_root(filepath, info.app_roots): + continue + + scanned += 1 + try: + full = Path(filepath) if Path(filepath).is_absolute() else get_project_root() / filepath + content = full.read_text() + except (OSError, UnicodeDecodeError) as exc: + logger.debug("Skipping unreadable Next.js candidate %s: %s", filepath, exc) + continue + + search_text = _strip_ts_comments(content) + for imp in _iter_import_specifiers(search_text): + if imp["module"] == "next/head": + entries.append({"file": filepath, "line": imp["line"]}) + break + + return entries, scanned + + +def scan_nextjs_use_client_not_first( + path: Path, info: NextjsFrameworkInfo +) -> tuple[list[dict], int]: + """Find App Router modules where 'use client' exists but is not the first meaningful line.""" + if not info.uses_app_router: + return [], 0 + + entries: list[dict] = [] + scanned = 0 + for filepath in find_js_ts_and_tsx_files(path): + if not _is_under_any_root(filepath, info.app_roots): + continue + + scanned += 1 + try: + full = Path(filepath) if Path(filepath).is_absolute() else get_project_root() / filepath + content = full.read_text() + except (OSError, UnicodeDecodeError) as exc: + logger.debug("Skipping unreadable Next.js candidate %s: %s", filepath, exc) + continue + + if _has_use_client_directive(content): + continue + + line_no = _find_use_client_directive_anywhere(content) + if line_no is None: + continue + + entries.append({"file": filepath, "line": line_no}) + + return entries, scanned + + +def scan_nextjs_next_document_misuse( + path: Path, info: NextjsFrameworkInfo +) -> tuple[list[dict], int]: + """Find next/document imports outside Pages Router `_document.*`.""" + entries: list[dict] = [] + scanned = 0 + + for filepath in find_js_ts_and_tsx_files(path): + scanned += 1 + try: + full = Path(filepath) if Path(filepath).is_absolute() else get_project_root() / filepath + content = full.read_text() + except (OSError, UnicodeDecodeError) as exc: + logger.debug("Skipping unreadable Next.js candidate %s: %s", filepath, exc) + continue + + search_text = _strip_ts_comments(content) + bad_lines: list[int] = [] + for imp in _iter_import_specifiers(search_text): + if imp["module"] != "next/document": + continue + + allowed = info.uses_pages_router and _is_under_any_root(filepath, info.pages_roots) and _is_pages_document_module(filepath) + if not allowed: + bad_lines.append(imp["line"]) + + if bad_lines: + entries.append({"file": filepath, "line": min(bad_lines)}) + + return entries, scanned + + +def scan_nextjs_server_navigation_apis_in_client( + path: Path, info: NextjsFrameworkInfo +) -> tuple[list[dict], int]: + """Find 'use client' modules calling server-only next/navigation APIs (notFound).""" + if not info.uses_app_router: + return [], 0 + + entries: list[dict] = [] + scanned = 0 + for filepath in find_js_ts_and_tsx_files(path): + scanned += 1 + try: + full = Path(filepath) if Path(filepath).is_absolute() else get_project_root() / filepath + content = full.read_text() + except (OSError, UnicodeDecodeError) as exc: + logger.debug("Skipping unreadable Next.js candidate %s: %s", filepath, exc) + continue + + if not _has_use_client_directive(content): + continue + + search_text = _strip_ts_comments(content) + if not _NEXT_NAV_IMPORT_RE.search(search_text): + continue + + code = _code_text(search_text) + match = _NEXT_NAV_SERVER_API_CALL_RE.search(code) + if not match: + continue + + line_no = code[: match.start()].count("\n") + 1 + entries.append({"file": filepath, "line": line_no, "api": match.group("api")}) + + return entries, scanned + + +def scan_nextjs_browser_globals_missing_use_client( + path: Path, info: NextjsFrameworkInfo +) -> tuple[list[dict], int]: + """Find App Router modules using browser globals without a 'use client' directive.""" + if not info.uses_app_router: + return [], 0 + + entries: list[dict] = [] + scanned = 0 + for filepath in find_js_ts_and_tsx_files(path): + if not _is_under_any_root(filepath, info.app_roots): + continue + if filepath.endswith("/route.ts") or filepath.endswith("/route.tsx"): + continue + + scanned += 1 + try: + full = Path(filepath) if Path(filepath).is_absolute() else get_project_root() / filepath + content = full.read_text() + except (OSError, UnicodeDecodeError) as exc: + logger.debug("Skipping unreadable Next.js candidate %s: %s", filepath, exc) + continue + + if _has_use_client_directive(content): + continue + + code = _code_text(_strip_ts_comments(content)) + match = _BROWSER_GLOBAL_ACCESS_RE.search(code) + if not match: + continue + + line_no = code[: match.start()].count("\n") + 1 + entries.append({"file": filepath, "line": line_no, "global": match.group("global")}) + + return entries, scanned + + +def scan_nextjs_client_layouts( + path: Path, info: NextjsFrameworkInfo +) -> tuple[list[dict], int]: + """Find App Router layout modules that are marked as client components.""" + if not info.uses_app_router: + return [], 0 + + entries: list[dict] = [] + scanned = 0 + for filepath in find_js_ts_and_tsx_files(path): + if not _is_under_any_root(filepath, info.app_roots): + continue + if not _is_layout_module(filepath): + continue + + scanned += 1 + try: + full = Path(filepath) if Path(filepath).is_absolute() else get_project_root() / filepath + content = full.read_text() + except (OSError, UnicodeDecodeError) as exc: + logger.debug("Skipping unreadable Next.js candidate %s: %s", filepath, exc) + continue + + if _has_use_client_directive(content): + entries.append({"file": filepath, "line": 1}) + + return entries, scanned + + +def scan_nextjs_async_client_components( + path: Path, info: NextjsFrameworkInfo +) -> tuple[list[dict], int]: + """Find 'use client' modules exporting async default components (invalid in Next.js).""" + if not info.uses_app_router: + return [], 0 + + entries: list[dict] = [] + scanned = 0 + for filepath in find_js_ts_and_tsx_files(path): + scanned += 1 + try: + full = Path(filepath) if Path(filepath).is_absolute() else get_project_root() / filepath + content = full.read_text() + except (OSError, UnicodeDecodeError) as exc: + logger.debug("Skipping unreadable Next.js candidate %s: %s", filepath, exc) + continue + + if not _has_use_client_directive(content): + continue + + code = _code_text(_strip_ts_comments(content)) + match = _ASYNC_EXPORT_DEFAULT_RE.search(code) + if not match: + continue + + line_no = code[: match.start()].count("\n") + 1 + entries.append({"file": filepath, "line": line_no}) + + return entries, scanned + + +def scan_nextjs_use_server_in_client( + path: Path, info: NextjsFrameworkInfo +) -> tuple[list[dict], int]: + """Find 'use client' modules that include a 'use server' directive.""" + if not info.uses_app_router: + return [], 0 + + entries: list[dict] = [] + scanned = 0 + for filepath in find_js_ts_and_tsx_files(path): + scanned += 1 + try: + full = Path(filepath) if Path(filepath).is_absolute() else get_project_root() / filepath + content = full.read_text() + except (OSError, UnicodeDecodeError) as exc: + logger.debug("Skipping unreadable Next.js candidate %s: %s", filepath, exc) + continue + + if not _has_use_client_directive(content): + continue + + # Only module-level 'use server' directives are invalid in 'use client' modules. + # Inline server actions (e.g. inside a function body) are valid and should not be flagged. + line_no = _find_misplaced_module_use_server_directive(content) + if line_no is None: + continue + + entries.append({"file": filepath, "line": line_no}) + + return entries, scanned + + +def scan_nextjs_server_modules_in_pages_router( + path: Path, info: NextjsFrameworkInfo +) -> tuple[list[dict], int]: + """Find Pages Router modules importing App Router server-only Next.js modules.""" + if not info.uses_pages_router: + return [], 0 + + entries: list[dict] = [] + scanned = 0 + + def _is_pages_api_route(fp: str) -> bool: + for root in info.pages_roots: + prefix = root.rstrip("/") + "/api/" + if fp.startswith(prefix) or fp == (root.rstrip("/") + "/api"): + return True + return False + + for filepath in find_js_ts_and_tsx_files(path): + if not _is_under_any_root(filepath, info.pages_roots): + continue + if _is_pages_api_route(filepath): + continue + + scanned += 1 + try: + full = Path(filepath) if Path(filepath).is_absolute() else get_project_root() / filepath + content = full.read_text() + except (OSError, UnicodeDecodeError) as exc: + logger.debug("Skipping unreadable Next.js candidate %s: %s", filepath, exc) + continue + + search_text = _strip_ts_comments(content) + imports = _iter_import_specifiers(search_text) + bad = [imp for imp in imports if imp["module"] in _NEXTJS_SERVER_ONLY_IMPORTS] + if not bad: + continue + + entries.append( + { + "file": filepath, + "line": bad[0]["line"], + "imports": bad, + "modules": sorted({b["module"] for b in bad}), + } + ) + + return entries, scanned + + +def scan_nextjs_pages_api_route_handlers( + path: Path, info: NextjsFrameworkInfo +) -> tuple[list[dict], int]: + """Find Pages Router API routes using App Router route handler patterns (export GET/POST/etc).""" + if not info.uses_pages_router: + return [], 0 + + entries: list[dict] = [] + scanned = 0 + + def _is_pages_api_route(fp: str) -> bool: + for root in info.pages_roots: + prefix = root.rstrip("/") + "/api/" + if fp.startswith(prefix) or fp == (root.rstrip("/") + "/api"): + return True + return False + + for filepath in find_js_ts_and_tsx_files(path): + if not _is_under_any_root(filepath, info.pages_roots): + continue + if not _is_pages_api_route(filepath): + continue + + scanned += 1 + try: + full = Path(filepath) if Path(filepath).is_absolute() else get_project_root() / filepath + content = full.read_text() + except (OSError, UnicodeDecodeError) as exc: + logger.debug("Skipping unreadable Next.js candidate %s: %s", filepath, exc) + continue + + search_text = _strip_ts_comments(content) + code = _code_text(search_text) + match = _ROUTE_HANDLER_HTTP_EXPORT_RE.search(code) + if not match: + continue + + line_no = code[: match.start()].count("\n") + 1 + entries.append({"file": filepath, "line": line_no, "method": match.group(1)}) + + return entries, scanned + + +def scan_nextjs_app_router_exports_in_pages_router( + path: Path, info: NextjsFrameworkInfo +) -> tuple[list[dict], int]: + """Find Pages Router modules exporting App Router metadata/config exports.""" + if not info.uses_pages_router: + return [], 0 + + entries: list[dict] = [] + scanned = 0 + for filepath in find_js_ts_and_tsx_files(path): + if not _is_under_any_root(filepath, info.pages_roots): + continue + + scanned += 1 + try: + full = Path(filepath) if Path(filepath).is_absolute() else get_project_root() / filepath + content = full.read_text() + except (OSError, UnicodeDecodeError) as exc: + logger.debug("Skipping unreadable Next.js candidate %s: %s", filepath, exc) + continue + + search_text = _strip_ts_comments(content) + matches: list[dict] = [] + for m in _NEXTJS_SERVER_EXPORT_RE.finditer(search_text): + name = m.group("const_name") or m.group("fn_name") or "" + if not name: + continue + line_no = search_text[: m.start()].count("\n") + 1 + matches.append({"name": name, "line": line_no}) + if not matches: + continue + + entries.append( + { + "file": filepath, + "line": matches[0]["line"], + "exports": matches, + "names": sorted({mm["name"] for mm in matches}), + } + ) + + return entries, scanned + + +def scan_rsc_missing_use_client(path: Path, info: NextjsFrameworkInfo) -> tuple[list[dict], int]: + """Find App Router modules that appear to use client-only React hooks without 'use client'.""" + if not info.uses_app_router: + return [], 0 + + entries: list[dict] = [] + scanned = 0 + for filepath in find_js_ts_and_tsx_files(path): + if not _is_under_any_root(filepath, info.app_roots): + continue + + scanned += 1 + try: + full = Path(filepath) if Path(filepath).is_absolute() else get_project_root() / filepath + content = full.read_text() + except (OSError, UnicodeDecodeError) as exc: + logger.debug("Skipping unreadable Next.js candidate %s: %s", filepath, exc) + continue + + if _has_use_client_directive(content): + continue + if _find_use_client_directive_anywhere(content) is not None: + continue + + code = _code_text(content) + match = _CLIENT_HOOK_CALL_RE.search(code) or _REACT_NAMESPACE_HOOK_CALL_RE.search(code) + if not match: + continue + + line_no = code[: match.start()].count("\n") + 1 + hook = match.group(0).split("(")[0].strip() + entries.append( + { + "file": filepath, + "line": line_no, + "hook": hook, + } + ) + + return entries, scanned + + +def scan_nextjs_navigation_hooks_missing_use_client( + path: Path, info: NextjsFrameworkInfo +) -> tuple[list[dict], int]: + """Find App Router modules using next/navigation hooks without 'use client'.""" + if not info.uses_app_router: + return [], 0 + + entries: list[dict] = [] + scanned = 0 + for filepath in find_js_ts_and_tsx_files(path): + if not _is_under_any_root(filepath, info.app_roots): + continue + + scanned += 1 + try: + full = Path(filepath) if Path(filepath).is_absolute() else get_project_root() / filepath + content = full.read_text() + except (OSError, UnicodeDecodeError) as exc: + logger.debug("Skipping unreadable Next.js candidate %s: %s", filepath, exc) + continue + + if _has_use_client_directive(content): + continue + if _find_use_client_directive_anywhere(content) is not None: + continue + + code = _code_text(_strip_ts_comments(content)) + match = _NEXT_NAV_HOOK_CALL_RE.search(code) + if not match: + continue + + line_no = code[: match.start()].count("\n") + 1 + hook = match.group(0).split("(")[0].strip() + entries.append({"file": filepath, "line": line_no, "hook": hook}) + + return entries, scanned + + +def scan_nextjs_server_imports_in_client( + path: Path, info: NextjsFrameworkInfo +) -> tuple[list[dict], int]: + """Find 'use client' modules importing server-only APIs (Next server modules, node built-ins).""" + if not info.uses_app_router: + return [], 0 + + entries: list[dict] = [] + scanned = 0 + for filepath in find_js_ts_and_tsx_files(path): + scanned += 1 + try: + full = Path(filepath) if Path(filepath).is_absolute() else get_project_root() / filepath + content = full.read_text() + except (OSError, UnicodeDecodeError) as exc: + logger.debug("Skipping unreadable Next.js candidate %s: %s", filepath, exc) + continue + + if not _has_use_client_directive(content): + continue + + search_text = _strip_ts_comments(content) + imports = _iter_import_specifiers(search_text) + bad: list[dict] = [] + for imp in imports: + module = imp["module"] + if module in _NEXTJS_SERVER_ONLY_IMPORTS or _is_node_builtin(module): + bad.append(imp) + + if not bad: + continue + + entries.append( + { + "file": filepath, + "line": bad[0]["line"], + "imports": bad, + "modules": sorted({b["module"] for b in bad}), + } + ) + + return entries, scanned + + +def scan_next_router_imports_in_app_router( + path: Path, info: NextjsFrameworkInfo +) -> tuple[list[dict], int]: + """Find App Router files importing legacy `next/router`.""" + if not info.uses_app_router: + return [], 0 + + entries: list[dict] = [] + scanned = 0 + for filepath in find_js_ts_and_tsx_files(path): + if not _is_under_any_root(filepath, info.app_roots): + continue + scanned += 1 + try: + full = Path(filepath) if Path(filepath).is_absolute() else get_project_root() / filepath + content = full.read_text() + except (OSError, UnicodeDecodeError) as exc: + logger.debug("Skipping unreadable Next.js candidate %s: %s", filepath, exc) + continue + + search_text = _strip_ts_comments(content) + match = _NEXT_ROUTER_IMPORT_RE.search(search_text) + if not match: + continue + + line_no = search_text[: match.start()].count("\n") + 1 + entries.append({"file": filepath, "line": line_no}) + + return entries, scanned + + +def scan_nextjs_server_exports_in_client( + path: Path, info: NextjsFrameworkInfo +) -> tuple[list[dict], int]: + """Find 'use client' modules exporting server-only Next.js metadata/config exports.""" + if not info.uses_app_router: + return [], 0 + + entries: list[dict] = [] + scanned = 0 + for filepath in find_js_ts_and_tsx_files(path): + scanned += 1 + try: + full = Path(filepath) if Path(filepath).is_absolute() else get_project_root() / filepath + content = full.read_text() + except (OSError, UnicodeDecodeError) as exc: + logger.debug("Skipping unreadable Next.js candidate %s: %s", filepath, exc) + continue + + if not _has_use_client_directive(content): + continue + + search_text = _strip_ts_comments(content) + matches: list[dict] = [] + for m in _NEXTJS_SERVER_EXPORT_RE.finditer(search_text): + name = m.group("const_name") or m.group("fn_name") or "" + if not name: + continue + line_no = search_text[: m.start()].count("\n") + 1 + matches.append({"name": name, "line": line_no}) + + if not matches: + continue + + entries.append( + { + "file": filepath, + "line": matches[0]["line"], + "exports": matches, + "names": sorted({mm["name"] for mm in matches}), + } + ) + + return entries, scanned + + +def scan_nextjs_pages_router_apis_in_app_router( + path: Path, info: NextjsFrameworkInfo +) -> tuple[list[dict], int]: + """Find Pages Router data fetching APIs used under the App Router tree.""" + if not info.uses_app_router: + return [], 0 + + entries: list[dict] = [] + scanned = 0 + for filepath in find_js_ts_and_tsx_files(path): + if not _is_under_any_root(filepath, info.app_roots): + continue + + scanned += 1 + try: + full = Path(filepath) if Path(filepath).is_absolute() else get_project_root() / filepath + content = full.read_text() + except (OSError, UnicodeDecodeError) as exc: + logger.debug("Skipping unreadable Next.js candidate %s: %s", filepath, exc) + continue + + code = _code_text(_strip_ts_comments(content)) + matches: list[dict] = [] + for m in _NEXTJS_PAGES_ROUTER_API_RE.finditer(code): + name = m.group("fn") or m.group("const") or m.group("assign") or "" + if not name: + continue + line_no = code[: m.start()].count("\n") + 1 + matches.append({"name": name, "line": line_no}) + + if not matches: + continue + + entries.append( + { + "file": filepath, + "line": matches[0]["line"], + "apis": matches, + "names": sorted({mm["name"] for mm in matches}), + } + ) + + return entries, scanned + + +def scan_nextjs_env_leaks_in_client( + path: Path, info: NextjsFrameworkInfo +) -> tuple[list[dict], int]: + """Find 'use client' modules that reference non-NEXT_PUBLIC_* env vars via process.env.""" + if not info.uses_app_router: + return [], 0 + + entries: list[dict] = [] + scanned = 0 + for filepath in find_js_ts_and_tsx_files(path): + scanned += 1 + try: + full = Path(filepath) if Path(filepath).is_absolute() else get_project_root() / filepath + content = full.read_text() + except (OSError, UnicodeDecodeError) as exc: + logger.debug("Skipping unreadable Next.js candidate %s: %s", filepath, exc) + continue + + if not _has_use_client_directive(content): + continue + + code = _code_text(_strip_ts_comments(content)) + occurrences: list[tuple[str, int]] = [] + for m in _PROCESS_ENV_DOT_RE.finditer(code): + name = m.group(1) + line_no = code[: m.start()].count("\n") + 1 + occurrences.append((name, line_no)) + for m in _PROCESS_ENV_BRACKET_RE.finditer(code): + name = m.group(1) + line_no = code[: m.start()].count("\n") + 1 + occurrences.append((name, line_no)) + + bad_occurrences = [ + (name, line_no) + for (name, line_no) in occurrences + if not name.startswith("NEXT_PUBLIC_") and name not in _CLIENT_ENV_ALLOWLIST + ] + if not bad_occurrences: + continue + + bad_vars = sorted({name for (name, _) in bad_occurrences}) + first_line = min(line_no for (_, line_no) in bad_occurrences) + entries.append({"file": filepath, "line": first_line, "vars": bad_vars}) + + return entries, scanned + + +def scan_nextjs_route_handlers_and_middleware_misuse( + path: Path, info: NextjsFrameworkInfo +) -> tuple[list[dict], int]: + """Special-case checks for `app/**/route.ts(x)` and `middleware.ts`.""" + if not info.uses_app_router: + return [], 0 + + entries: list[dict] = [] + scanned = 0 + + def _is_route_handler(fp: str) -> bool: + if not _is_under_any_root(fp, info.app_roots): + return False + return fp.endswith(("/route.ts", "/route.tsx", "/route.js", "/route.jsx")) or fp in { + "route.ts", + "route.tsx", + "route.js", + "route.jsx", + } + + def _is_middleware(fp: str) -> bool: + return fp in { + "middleware.ts", + "middleware.tsx", + "middleware.js", + "middleware.jsx", + "src/middleware.ts", + "src/middleware.tsx", + "src/middleware.js", + "src/middleware.jsx", + } + + for filepath in find_js_ts_and_tsx_files(path): + if not (_is_route_handler(filepath) or _is_middleware(filepath)): + continue + + scanned += 1 + try: + full = Path(filepath) if Path(filepath).is_absolute() else get_project_root() / filepath + content = full.read_text() + except (OSError, UnicodeDecodeError) as exc: + logger.debug("Skipping unreadable Next.js candidate %s: %s", filepath, exc) + continue + + search_text = _strip_ts_comments(content) + code_text = _code_text(search_text) + findings: list[dict] = [] + + if _has_use_client_directive(content): + findings.append({"kind": "use_client", "line": 1}) + + react_import = re.search( + r"""(?:from\s+['"]react['"]|require\(\s*['"]react['"]\s*\))""", + search_text, + ) + if react_import: + findings.append( + { + "kind": "react_import", + "line": search_text[: react_import.start()].count("\n") + 1, + } + ) + + hook_call = _CLIENT_HOOK_CALL_RE.search(code_text) or _REACT_NAMESPACE_HOOK_CALL_RE.search(code_text) + if hook_call: + findings.append( + { + "kind": "react_hook_call", + "line": search_text[: hook_call.start()].count("\n") + 1, + } + ) + + nav_import = _NEXT_NAV_IMPORT_RE.search(search_text) + if nav_import: + findings.append( + { + "kind": "next_navigation_import", + "line": search_text[: nav_import.start()].count("\n") + 1, + } + ) + + if _is_route_handler(filepath): + default_export = _EXPORT_DEFAULT_RE.search(code_text) + if default_export: + findings.append( + { + "kind": "default_export", + "line": code_text[: default_export.start()].count("\n") + 1, + } + ) + + nextapi = _NEXTAPI_TYPES_RE.search(code_text) + if nextapi: + findings.append( + { + "kind": "next_api_types", + "line": code_text[: nextapi.start()].count("\n") + 1, + } + ) + + res_status = _RES_STATUS_RE.search(code_text) + if res_status: + findings.append( + { + "kind": "res_status_usage", + "line": code_text[: res_status.start()].count("\n") + 1, + } + ) + + imports = _iter_import_specifiers(search_text) + for imp in imports: + module = imp["module"] + if module in _INVALID_REACTY_MODULES_IN_ROUTE_CONTEXT: + findings.append({"kind": f"invalid_import::{module}", "line": imp["line"]}) + + if _is_middleware(filepath): + for imp in imports: + module = imp["module"] + if _is_node_builtin(module): + findings.append({"kind": f"node_builtin_import::{module}", "line": imp["line"]}) + elif _is_route_handler(filepath): + runtime_edge = _RUNTIME_EDGE_RE.search(code_text) + if runtime_edge: + for imp in imports: + module = imp["module"] + if _is_node_builtin(module): + findings.append( + {"kind": f"edge_runtime_node_builtin_import::{module}", "line": imp["line"]} + ) + + if filepath.endswith(".tsx"): + jsx_return = re.search(r"""\breturn\s*<""", code_text) + if jsx_return: + findings.append( + { + "kind": "jsx_return", + "line": code_text[: jsx_return.start()].count("\n") + 1, + } + ) + + if not findings: + continue + + kind = "route_handler" if _is_route_handler(filepath) else "middleware" + entries.append( + {"file": filepath, "line": findings[0]["line"], "kind": kind, "findings": findings} + ) + + return entries, scanned + + +def scan_mixed_router_layout(info: NextjsFrameworkInfo) -> list[dict]: + """Project-level check: both App Router and Pages Router present.""" + if not (info.uses_app_router and info.uses_pages_router): + return [] + return [ + { + "file": info.package_json_relpath or "package.json", + "app_roots": list(info.app_roots), + "pages_roots": list(info.pages_roots), + } + ] + + +__all__ = [ + "scan_nextjs_app_router_exports_in_pages_router", + "scan_nextjs_async_client_components", + "scan_nextjs_browser_globals_missing_use_client", + "scan_nextjs_client_layouts", + "scan_nextjs_error_files_missing_use_client", + "scan_mixed_router_layout", + "scan_next_router_imports_in_app_router", + "scan_nextjs_env_leaks_in_client", + "scan_nextjs_navigation_hooks_missing_use_client", + "scan_nextjs_next_document_misuse", + "scan_nextjs_next_head_in_app_router", + "scan_nextjs_pages_router_apis_in_app_router", + "scan_nextjs_pages_api_route_handlers", + "scan_nextjs_pages_router_artifacts_in_app_router", + "scan_nextjs_route_handlers_and_middleware_misuse", + "scan_nextjs_server_navigation_apis_in_client", + "scan_nextjs_server_modules_in_pages_router", + "scan_nextjs_server_exports_in_client", + "scan_nextjs_server_imports_in_client", + "scan_nextjs_use_client_not_first", + "scan_nextjs_use_server_not_first", + "scan_nextjs_use_server_in_client", + "scan_rsc_missing_use_client", +] diff --git a/desloppify/languages/_framework/node/js_text.py b/desloppify/languages/_framework/node/js_text.py new file mode 100644 index 000000000..a1790d55a --- /dev/null +++ b/desloppify/languages/_framework/node/js_text.py @@ -0,0 +1,76 @@ +"""JavaScript/TypeScript-oriented text helpers. + +These helpers are intentionally framework-agnostic and live under the shared +Node layer so they can be used by framework scanners across JS/TS plugins. +""" + +from __future__ import annotations + +from collections.abc import Generator + +from desloppify.base.text_utils import strip_c_style_comments + + +def strip_js_ts_comments(text: str) -> str: + """Strip // and /* */ comments while preserving string literals.""" + return strip_c_style_comments(text) + + +def scan_code(text: str) -> Generator[tuple[int, str, bool], None, None]: + """Yield ``(index, char, in_string)`` tuples while handling escapes.""" + i = 0 + in_str = None + while i < len(text): + ch = text[i] + if in_str: + if ch == "\\" and i + 1 < len(text): + yield (i, ch, True) + i += 1 + yield (i, text[i], True) + i += 1 + continue + if ch == in_str: + in_str = None + yield (i, ch, in_str is not None) + else: + if ch in ("'", '"', "`"): + in_str = ch + yield (i, ch, True) + else: + yield (i, ch, False) + i += 1 + + +def code_text(text: str) -> str: + """Blank string literals and ``//`` comments to spaces, preserving positions.""" + out = list(text) + in_line_comment = False + prev_code_idx = -2 + prev_code_ch = "" + for i, ch, in_s in scan_code(text): + if ch == "\n": + in_line_comment = False + prev_code_ch = "" + continue + if in_line_comment: + out[i] = " " + continue + if in_s: + out[i] = " " + continue + if ch == "/" and prev_code_ch == "/" and prev_code_idx == i - 1: + out[prev_code_idx] = " " + out[i] = " " + in_line_comment = True + prev_code_ch = "" + continue + prev_code_idx = i + prev_code_ch = ch + return "".join(out) + + +__all__ = [ + "code_text", + "scan_code", + "strip_js_ts_comments", +] diff --git a/desloppify/languages/_framework/runtime_support/runtime.py b/desloppify/languages/_framework/runtime_support/runtime.py index 27965b1a8..b0a5c387f 100644 --- a/desloppify/languages/_framework/runtime_support/runtime.py +++ b/desloppify/languages/_framework/runtime_support/runtime.py @@ -150,9 +150,9 @@ def __dir__(self): def _coerce_lang_override(field_name: str, value: object) -> object: """Normalize override values to LangRuntimeState-compatible payloads.""" if field_name in _LANG_OVERRIDE_DICT_FIELDS: - return value or {} + return value if isinstance(value, dict) else {} if field_name in _LANG_OVERRIDE_LIST_FIELDS: - return value or [] + return value if isinstance(value, list) else [] if field_name in _LANG_OVERRIDE_INT_FIELDS: return int(value or 0) if field_name == "review_max_age_days": diff --git a/desloppify/languages/_framework/treesitter/analysis/extractors.py b/desloppify/languages/_framework/treesitter/analysis/extractors.py index 1a36f313d..02530b22e 100644 --- a/desloppify/languages/_framework/treesitter/analysis/extractors.py +++ b/desloppify/languages/_framework/treesitter/analysis/extractors.py @@ -141,10 +141,10 @@ def ts_extract_functions( for _pattern_idx, captures in matches: func_node = _unwrap_node(captures.get("func")) name_node = _unwrap_node(captures.get("name")) - if not func_node or not name_node: + if not func_node: continue - name_text = _node_text(name_node) + name_text = _node_text(name_node) if name_node else "" line = func_node.start_point[0] + 1 # 1-indexed end_line = func_node.end_point[0] + 1 diff --git a/desloppify/languages/_framework/treesitter/analysis/unused_imports.py b/desloppify/languages/_framework/treesitter/analysis/unused_imports.py index f06899918..0064064ea 100644 --- a/desloppify/languages/_framework/treesitter/analysis/unused_imports.py +++ b/desloppify/languages/_framework/treesitter/analysis/unused_imports.py @@ -19,6 +19,34 @@ logger = logging.getLogger(__name__) +_ECMASCRIPT_IMPORT_NODE_TYPE = "import_statement" + +# Identifier-ish nodes that represent a reference to a binding in JavaScript/TypeScript. +# JSX tag names are typically represented as `identifier` in tree-sitter-javascript/tsx, +# but we include `jsx_identifier` as well for compatibility with grammar variants. +_ECMASCRIPT_REFERENCE_NODE_TYPES = frozenset({ + "identifier", + "jsx_identifier", + "type_identifier", + "shorthand_property_identifier", +}) + +_ECMASCRIPT_ASSIGNMENT_PATTERN_NODE_TYPES = frozenset({ + "assignment_pattern", + "object_assignment_pattern", + "array_assignment_pattern", +}) + +_ECMASCRIPT_DECLARATION_NAME_NODE_TYPES = frozenset({ + # JS + "function_declaration", + "class_declaration", + # TS/TSX + "type_alias_declaration", + "interface_declaration", + "enum_declaration", +}) + def detect_unused_imports( file_list: list[str], @@ -37,6 +65,11 @@ def detect_unused_imports( logger.debug("tree-sitter init failed: %s", exc) return [] + # JavaScript/JSX: extract imported *local bindings* and check whether each + # binding is referenced in the file body. This avoids module-path heuristics. + if spec.grammar in ("javascript", "tsx"): + return _detect_unused_imports_ecmascript(file_list, spec, parser, language) + query = _make_query(language, spec.import_query) entries: list[dict] = [] @@ -89,6 +122,267 @@ def detect_unused_imports( return entries +def _detect_unused_imports_ecmascript( + file_list: list[str], + spec: TreeSitterLangSpec, + parser, + language, +) -> list[dict]: + """Binding-aware unused import detection for JavaScript/TypeScript (JSX/TSX). + + Emits one entry per unused imported local binding: + {file, line, name, symbol} + + Side-effect-only imports (e.g. `import "x"`) are ignored. + """ + query = _make_query(language, f"({_ECMASCRIPT_IMPORT_NODE_TYPE}) @import") + entries: list[dict] = [] + + for filepath in file_list: + cached = get_or_parse_tree(filepath, parser, spec.grammar) + if cached is None: + continue + source, tree = cached + + # Some real-world repos contain stray NUL bytes (e.g. broken fixtures). + # Tree-sitter can treat these as parse-stopping errors, leading to false + # positives due to missing references. Replace NUL with space (same length) + # and re-parse for analysis. + if b"\x00" in source: + source = source.replace(b"\x00", b" ") + tree = parser.parse(source) + + # If the parse is still errorful, be conservative and skip this file to + # avoid false positives from incomplete trees. + if getattr(tree.root_node, "has_error", False): + continue + + matches = _run_query(query, tree.root_node) + if not matches: + continue + + referenced = _collect_ecmascript_references(tree.root_node) + + for _pattern_idx, captures in matches: + import_node = _unwrap_node(captures.get("import")) + if not import_node: + continue + + bindings = _extract_ecmascript_import_bindings(import_node) + if not bindings: + # Side-effect import (`import "x"`) or empty named import (`import {} from "x"`). + continue + + line = import_node.start_point[0] + 1 + for symbol in bindings: + if symbol not in referenced: + entries.append({ + "file": filepath, + "line": line, + "name": symbol, + "symbol": symbol, + }) + + return entries + + +def _extract_ecmascript_import_bindings(import_node) -> list[str]: + """Extract local binding names from an ECMAScript import_statement node.""" + import_clause = None + for child in import_node.named_children: + if child.type == "import_clause": + import_clause = child + break + if import_clause is None: + return [] + + bindings: list[str] = [] + seen: set[str] = set() + + def add(name: str | None) -> None: + if not name or name in seen: + return + seen.add(name) + bindings.append(name) + + for child in import_clause.named_children: + # Default import: `import Foo from "x"` + if child.type == "identifier": + add(_node_text(child)) + continue + + # Namespace import: `import * as ns from "x"` + if child.type == "namespace_import": + for grand in child.named_children: + if grand.type == "identifier": + add(_node_text(grand)) + break + continue + + # Named imports: `import { a, b as c } from "x"` + if child.type == "named_imports": + for spec in child.named_children: + if spec.type != "import_specifier": + continue + alias = spec.child_by_field_name("alias") + name = spec.child_by_field_name("name") + add(_node_text(alias) if alias is not None else _node_text(name)) + continue + + return bindings + + +def _collect_ecmascript_references(root_node) -> set[str]: + """Collect identifier-like references outside ECMAScript import statements.""" + referenced: set[str] = set() + stack = [root_node] + + while stack: + node = stack.pop() + if node.type in _ECMASCRIPT_REFERENCE_NODE_TYPES and not _has_ancestor_type( + node, {_ECMASCRIPT_IMPORT_NODE_TYPE} + ): + if not _is_ecmascript_declaration_occurrence(node): + text = _node_text(node) + if text: + referenced.add(text) + + for child in reversed(node.named_children): + stack.append(child) + + return referenced + + +def _has_ancestor_type(node, ancestor_types: set[str]) -> bool: + parent = node.parent + while parent is not None: + if parent.type in ancestor_types: + return True + parent = parent.parent + return False + + +def _is_ecmascript_declaration_occurrence(node) -> bool: + """Return True when `node` appears in a declaration/binding position. + + This prevents counting declarations as references (e.g. destructuring patterns, + parameter names, catch parameters, type names). + + Not a full scope resolver; it is a conservative structural filter. + """ + # If we're on the right side of an assignment pattern, treat as an expression reference. + if _is_within_assignment_pattern_right(node): + return False + + cur = node + while cur is not None: + # Variable declarators: `const foo = ...`, `const {a: b} = ...` + if cur.type == "variable_declarator": + name = cur.child_by_field_name("name") + if name is not None and _is_descendant(name, node): + return True + + # TS/TSX params: `required_parameter` / `optional_parameter` pattern field. + if cur.type in ("required_parameter", "optional_parameter"): + pattern = cur.child_by_field_name("pattern") + if pattern is not None and _is_descendant(pattern, node): + return True + + # JS params: patterns live directly under `formal_parameters`. + if cur.type == "formal_parameters": + param_root = _direct_child_under(cur, node) + if param_root is not None: + # TS/TSX wraps params in required/optional_parameter; handled above. + if param_root.type not in ("required_parameter", "optional_parameter"): + if _is_param_binding_occurrence(param_root, node): + return True + + # Catch binding: `catch (e) { ... }` + if cur.type == "catch_clause": + param = cur.child_by_field_name("parameter") + if param is not None and _is_descendant(param, node): + return True + + # Declaration names (function/class/type/interface/enum) + if cur.type in _ECMASCRIPT_DECLARATION_NAME_NODE_TYPES: + name = cur.child_by_field_name("name") + if name is not None and _is_descendant(name, node): + return True + + # `for (const x of xs)` / `for (let x in xs)` binding. + if cur.type == "for_in_statement": + left = cur.child_by_field_name("left") + if left is not None and _is_descendant(left, node): + # Only treat as a declaration if preceded by a declaration keyword. + prev = left.prev_sibling + if prev is not None and prev.type in ("const", "let", "var"): + return True + + cur = cur.parent + + return False + + +def _is_within_assignment_pattern_right(node) -> bool: + """Return True if node appears within the `right` field of an assignment pattern.""" + cur = node + while cur is not None: + parent = cur.parent + if parent is None: + return False + if parent.type in _ECMASCRIPT_ASSIGNMENT_PATTERN_NODE_TYPES: + right = parent.child_by_field_name("right") + if right is not None and _is_descendant(right, node): + return True + cur = parent + return False + + +def _is_descendant(ancestor, node) -> bool: + cur = node + while cur is not None: + if cur == ancestor: + return True + cur = cur.parent + return False + + +def _direct_child_under(ancestor, node): + """Return the direct child of `ancestor` that contains `node`, if any.""" + cur = node + while cur is not None and cur.parent is not None and cur.parent != ancestor: + cur = cur.parent + if cur is not None and cur.parent == ancestor: + return cur + return None + + +def _is_param_binding_occurrence(param_root, node) -> bool: + """Return True if `node` is part of the parameter binding pattern. + + `param_root` is the direct child of `formal_parameters` that contains `node`. + """ + if _is_within_assignment_pattern_right(node): + return False + + # `x` in `(x)` or `...rest` in `(...rest)` are bindings. + if param_root.type in ("identifier", "rest_pattern"): + return True + + # `x=Default` binds `x` on the left; right side is an expression. + if param_root.type == "assignment_pattern": + left = param_root.child_by_field_name("left") + if left is not None and _is_descendant(left, node): + return True + return False + + # Destructuring patterns (object/array) bind identifiers inside them. + if param_root.type in ("object_pattern", "array_pattern", "pair_pattern"): + return True + + return False + + def _extract_alias(import_node) -> str | None: """Extract alias name from import nodes. diff --git a/desloppify/languages/_framework/treesitter/phases.py b/desloppify/languages/_framework/treesitter/phases.py index 7438a750a..e05273a59 100644 --- a/desloppify/languages/_framework/treesitter/phases.py +++ b/desloppify/languages/_framework/treesitter/phases.py @@ -126,8 +126,10 @@ def run(path: Path, lang: LangRuntimeContract) -> tuple[list[Issue], dict[str, i entries = detect_unused_imports(file_list, spec) for e in entries: + symbol = e.get("symbol") + issue_name = f"unused_import::{e['line']}" + (f"::{symbol}" if symbol else "") issues.append(make_issue( - "unused", e["file"], f"unused_import::{e['line']}", + "unused", e["file"], issue_name, tier=3, confidence="medium", summary=f"Unused import: {e['name']}", )) diff --git a/desloppify/languages/framework.py b/desloppify/languages/framework.py index 3430d5118..508738c80 100644 --- a/desloppify/languages/framework.py +++ b/desloppify/languages/framework.py @@ -75,6 +75,24 @@ def reset_script_import_caches(scan_path: str | None = None) -> None: _reset_script_import_caches(scan_path) +def prewarm_review_phase_detectors(path, lang, phases) -> None: + """Prime expensive shared review detectors for overlap during scan.""" + from desloppify.languages._framework.base.shared_phases_review import ( + prewarm_review_phase_detectors as _prewarm_review_phase_detectors, + ) + + _prewarm_review_phase_detectors(path, lang, phases) + + +def clear_review_phase_prefetch(lang) -> None: + """Clear in-memory shared review detector prefetch state.""" + from desloppify.languages._framework.base.shared_phases_review import ( + clear_review_phase_prefetch as _clear_review_phase_prefetch, + ) + + _clear_review_phase_prefetch(lang) + + __all__ = [ "BoundaryRule", "LangConfig", @@ -90,12 +108,14 @@ def reset_script_import_caches(scan_path: str | None = None) -> None: "auto_detect_lang", "available_langs", "capability_report", + "clear_review_phase_prefetch", "disable_parse_cache", "enable_parse_cache", "get_lang", "load_all", "make_lang_run", "make_lang_config", + "prewarm_review_phase_detectors", "reset_script_import_caches", "registry_state", "shared_phase_labels", diff --git a/desloppify/languages/javascript/README.md b/desloppify/languages/javascript/README.md index 3b0a04d53..14c1d0b33 100644 --- a/desloppify/languages/javascript/README.md +++ b/desloppify/languages/javascript/README.md @@ -21,11 +21,11 @@ Activates on projects containing a `package.json` file. # Scan for issues desloppify scan --path -# Scan and auto-fix -desloppify scan --path --fix +# Auto-fix ESLint issues +desloppify autofix --path ``` -Autofix is supported — ESLint's `--fix` flag is used to apply safe automatic corrections. +Autofix is supported — ESLint's `--fix` flag is used via `desloppify autofix` to apply safe automatic corrections. ## Exclusions diff --git a/desloppify/languages/javascript/__init__.py b/desloppify/languages/javascript/__init__.py index 9f5673357..d885ee8ef 100644 --- a/desloppify/languages/javascript/__init__.py +++ b/desloppify/languages/javascript/__init__.py @@ -1,9 +1,12 @@ """JavaScript/JSX language plugin — ESLint.""" +from __future__ import annotations + from desloppify.languages._framework.generic_support.core import generic_lang from desloppify.languages._framework.treesitter import JS_SPEC -generic_lang( + +cfg = generic_lang( name="javascript", extensions=[".js", ".jsx", ".mjs", ".cjs"], tools=[ @@ -21,6 +24,7 @@ detect_markers=["package.json"], default_src="src", treesitter_spec=JS_SPEC, + frameworks=True, ) __all__ = [ diff --git a/desloppify/languages/javascript/tests/test_js_nextjs_framework.py b/desloppify/languages/javascript/tests/test_js_nextjs_framework.py new file mode 100644 index 000000000..9c9523866 --- /dev/null +++ b/desloppify/languages/javascript/tests/test_js_nextjs_framework.py @@ -0,0 +1,83 @@ +"""Tests for JavaScript Next.js framework smells integration.""" + +from __future__ import annotations + +from pathlib import Path +from types import SimpleNamespace + +import pytest + +import desloppify.languages.javascript # noqa: F401 (registration side effect) +from desloppify.languages.framework import get_lang + + +@pytest.fixture(autouse=True) +def _root(tmp_path, set_project_root): + """Point PROJECT_ROOT at the tmp directory via RuntimeContext.""" + + +def _write(tmp_path: Path, name: str, content: str) -> Path: + p = tmp_path / name + p.parent.mkdir(parents=True, exist_ok=True) + p.write_text(content) + return p + + +class _FakeLang(SimpleNamespace): + zone_map = None + dep_graph = None + file_finder = None + + def __init__(self): + super().__init__(review_cache={}, detector_coverage={}, coverage_warnings=[]) + + +def test_javascript_plugin_includes_nextjs_framework_phases_and_next_lint_is_slow(): + cfg = get_lang("javascript") + labels = [getattr(p, "label", "") for p in cfg.phases] + assert "Next.js framework smells" in labels + lint = next(p for p in cfg.phases if getattr(p, "label", "") == "next lint") + assert lint.slow is True + + +def test_nextjs_smells_phase_emits_smells_when_next_is_present(tmp_path: Path): + _write( + tmp_path, + "package.json", + '{"dependencies": {"next": "14.0.0", "react": "18.3.0"}}\n', + ) + _write( + tmp_path, + "app/server-in-client.jsx", + "'use client'\nimport fs from 'node:fs'\nexport default function X(){return null}\n", + ) + + cfg = get_lang("javascript") + phase = next(p for p in cfg.phases if getattr(p, "label", "") == "Next.js framework smells") + issues, potentials = phase.run(tmp_path, _FakeLang()) + detectors = {issue.get("detector") for issue in issues} + assert "nextjs" in detectors + assert potentials.get("nextjs", 0) >= 1 + assert any("server_import_in_client" in str(issue.get("id", "")) for issue in issues) + + +def test_nextjs_smells_phase_scans_jsx_error_and_js_middleware(tmp_path: Path): + _write( + tmp_path, + "package.json", + '{"dependencies": {"next": "14.0.0", "react": "18.3.0"}}\n', + ) + _write(tmp_path, "app/error.jsx", "export default function Error(){ return null }\n") + _write( + tmp_path, + "middleware.js", + "'use client'\nimport React from 'react'\nexport function middleware(){ return null }\n", + ) + + cfg = get_lang("javascript") + phase = next(p for p in cfg.phases if getattr(p, "label", "") == "Next.js framework smells") + issues, potentials = phase.run(tmp_path, _FakeLang()) + ids = {issue["id"] for issue in issues} + assert any("error_file_missing_use_client" in issue_id for issue_id in ids) + assert any("middleware_misuse" in issue_id for issue_id in ids) + assert potentials.get("nextjs", 0) >= 1 diff --git a/desloppify/languages/php/__init__.py b/desloppify/languages/php/__init__.py index 08031ac17..7b6edda24 100644 --- a/desloppify/languages/php/__init__.py +++ b/desloppify/languages/php/__init__.py @@ -73,7 +73,7 @@ { "label": "phpstan", "cmd": "phpstan analyse --error-format=json --no-progress", - "fmt": "json", + "fmt": "phpstan", "id": "phpstan_error", "tier": 2, "fix_cmd": None, diff --git a/desloppify/languages/python/_security.py b/desloppify/languages/python/_security.py index 9930b89f0..9b623657d 100644 --- a/desloppify/languages/python/_security.py +++ b/desloppify/languages/python/_security.py @@ -4,6 +4,7 @@ import shutil +from desloppify.base.config import load_config from desloppify.base.discovery.source import collect_exclude_dirs from desloppify.languages._framework.base.types import DetectorCoverageStatus, LangSecurityResult from desloppify.languages.python.detectors.bandit_adapter import detect_with_bandit @@ -32,13 +33,27 @@ def python_scan_coverage_prerequisites() -> list[DetectorCoverageStatus]: return [missing_bandit_coverage()] +def _load_bandit_skip_tests() -> list[str] | None: + """Read ``languages.python.bandit_skip_tests`` from project config.""" + cfg = load_config() + lang_cfg = cfg.get("languages", {}) + py_cfg = lang_cfg.get("python", {}) if isinstance(lang_cfg, dict) else {} + raw = py_cfg.get("bandit_skip_tests") if isinstance(py_cfg, dict) else None + if isinstance(raw, list) and all(isinstance(t, str) for t in raw): + return raw + return None + + def detect_python_security(files, zone_map) -> LangSecurityResult: scan_root = scan_root_from_files(files) if scan_root is None: return LangSecurityResult(entries=[], files_scanned=0) exclude_dirs = collect_exclude_dirs(scan_root) - result = detect_with_bandit(scan_root, zone_map, exclude_dirs=exclude_dirs) + skip_tests = _load_bandit_skip_tests() + result = detect_with_bandit( + scan_root, zone_map, exclude_dirs=exclude_dirs, skip_tests=skip_tests, + ) coverage = result.status.coverage() return LangSecurityResult( entries=result.entries, diff --git a/desloppify/languages/python/detectors/bandit_adapter.py b/desloppify/languages/python/detectors/bandit_adapter.py index 3c66f2c12..361daed3f 100644 --- a/desloppify/languages/python/detectors/bandit_adapter.py +++ b/desloppify/languages/python/detectors/bandit_adapter.py @@ -155,8 +155,10 @@ def _to_security_entry( raw_severity = result.get("issue_severity", "MEDIUM").upper() raw_confidence = result.get("issue_confidence", "MEDIUM").upper() - # Suppress LOW-severity + LOW-confidence (very noisy, low signal). - if raw_severity == "LOW" and raw_confidence == "LOW": + # Suppress noisy low-signal combinations: + # - LOW severity + LOW confidence (very noisy, low signal) + # - MEDIUM severity + LOW confidence (e.g. "tokenizer_name" flagged as hardcoded secret) + if raw_confidence == "LOW" and raw_severity in ("LOW", "MEDIUM"): return None tier = _SEVERITY_TO_TIER.get(raw_severity, 3) @@ -188,6 +190,7 @@ def detect_with_bandit( zone_map: FileZoneMap | None, timeout: int = 120, exclude_dirs: list[str] | None = None, + skip_tests: list[str] | None = None, ) -> BanditScanResult: """Run bandit on *path* and return issues + typed execution status. @@ -197,6 +200,9 @@ def detect_with_bandit( Absolute directory paths to pass to bandit's ``--exclude`` flag. When non-empty, bandit will skip these directories during its recursive scan. + skip_tests: + Bandit test IDs to suppress via ``--skip`` (e.g. ``["B101", "B601"]``). + Allows users to disable entire rule families from ``config.json``. """ cmd = [ sys.executable, @@ -209,6 +215,8 @@ def detect_with_bandit( ] if exclude_dirs: cmd.extend(["--exclude", ",".join(exclude_dirs)]) + if skip_tests: + cmd.extend(["--skip", ",".join(skip_tests)]) cmd.append(str(path.resolve())) try: diff --git a/desloppify/languages/r/__init__.py b/desloppify/languages/r/__init__.py index 39677c7cb..3a28cf1ed 100644 --- a/desloppify/languages/r/__init__.py +++ b/desloppify/languages/r/__init__.py @@ -1,4 +1,4 @@ -"""R language plugin — lintr + tree-sitter.""" +"""R language plugin — Jarl, lintr + tree-sitter.""" from desloppify.languages._framework.generic_support.core import generic_lang from desloppify.languages._framework.treesitter import R_SPEC @@ -7,16 +7,22 @@ name="r", extensions=[".R", ".r"], tools=[ + { + "label": "jarl", + "cmd": "jarl check .", + "fmt": "gnu", + "id": "jarl_lint", + "tier": 2, + "fix_cmd": "jarl check . --fix --allow-dirty", + }, { "label": "lintr", "cmd": ( - 'Rscript -e \'cat(paste(capture.output(' - 'lintr::lint_dir(".", show_notifications=FALSE)' - '), collapse="\\n"))\'' + "Rscript -e \"lintr::lint_dir('.')\"" ), "fmt": "gnu", "id": "lintr_lint", - "tier": 2, + "tier": 3, "fix_cmd": None, }, ], diff --git a/desloppify/languages/ruby/README.md b/desloppify/languages/ruby/README.md index d15d6e2a7..fdd3f35b5 100644 --- a/desloppify/languages/ruby/README.md +++ b/desloppify/languages/ruby/README.md @@ -33,7 +33,7 @@ desloppify scan --path desloppify scan --path --profile full # Auto-correct RuboCop offenses -desloppify fix --path +desloppify autofix --path ``` ## What gets analysed @@ -51,7 +51,7 @@ desloppify fix --path ## Autofix -RuboCop's `--auto-correct` is wired to `desloppify fix`. Only offenses that +RuboCop's `--auto-correct` is wired to `desloppify autofix`. Only offenses that RuboCop marks as safe to auto-correct will be changed. ## Exclusions diff --git a/desloppify/languages/rust/detectors/deps.py b/desloppify/languages/rust/detectors/deps.py index 55bbbf004..1f64df3cc 100644 --- a/desloppify/languages/rust/detectors/deps.py +++ b/desloppify/languages/rust/detectors/deps.py @@ -7,6 +7,7 @@ from desloppify.engine.detectors.graph import finalize_graph from desloppify.languages.rust.support import ( + build_production_file_index, build_workspace_package_index, find_rust_files, iter_mod_targets, @@ -31,6 +32,7 @@ def build_dep_graph( return {} file_set = set(graph.keys()) + production_index = build_production_file_index(file_set) package_index = build_workspace_package_index() for filepath in files: content = read_text_or_none(filepath) @@ -44,6 +46,7 @@ def build_dep_graph( filepath, file_set, declared_path=declared_path, + production_index=production_index, ) if resolved and resolved != filepath: graph[filepath]["imports"].add(resolved) @@ -56,6 +59,7 @@ def build_dep_graph( file_set, package_index, allow_crate_root_fallback=False, + production_index=production_index, ) if resolved and resolved != filepath: graph[filepath]["imports"].add(resolved) diff --git a/desloppify/languages/rust/support.py b/desloppify/languages/rust/support.py index 71bca2afe..fb5afc20f 100644 --- a/desloppify/languages/rust/support.py +++ b/desloppify/languages/rust/support.py @@ -2,6 +2,7 @@ from __future__ import annotations +import functools import re import tomllib from dataclasses import dataclass @@ -34,6 +35,15 @@ class RustFileContext: module_segments: tuple[str, ...] +@dataclass(frozen=True) +class RustProductionFileIndex: + """Precomputed lookup tables for production-file resolution.""" + + project_root: Path + by_absolute: dict[str, str] + by_relative: dict[str, str] + + def normalize_crate_name(name: str | None) -> str | None: """Normalize Cargo package names to Rust crate names.""" if not name: @@ -53,6 +63,35 @@ def find_rust_files(path: Path | str) -> list[str]: ) +def build_production_file_index( + production_files: set[str], + *, + project_root: Path | None = None, +) -> RustProductionFileIndex: + """Build O(1) absolute/relative lookup maps for production files.""" + root = (project_root or get_project_root()).resolve() + by_absolute: dict[str, str] = {} + by_relative: dict[str, str] = {} + for production_file in production_files: + prod_path = Path(production_file) + resolved = ( + prod_path.resolve() if prod_path.is_absolute() else (root / prod_path).resolve() + ) + resolved_str = str(resolved) + by_absolute.setdefault(resolved_str, production_file) + try: + rel_path = rel(resolved, project_root=root) + except (TypeError, ValueError, OSError): + rel_path = None + if rel_path is not None: + by_relative.setdefault(rel_path, production_file) + return RustProductionFileIndex( + project_root=root, + by_absolute=by_absolute, + by_relative=by_relative, + ) + + def read_text_or_none(path: Path | str, *, errors: str = "replace") -> str | None: """Read a file as text, returning ``None`` when the file is unavailable.""" try: @@ -181,22 +220,119 @@ def iter_mod_targets(content: str) -> list[tuple[str, str | None]]: def iter_use_specs(content: str) -> list[str]: """Return normalized Rust `use` / `pub use` specs from a file.""" - stripped = strip_rust_comments(content) - specs: list[str] = [] - for match in USE_STATEMENT_RE.finditer(stripped): - specs.extend(_expand_use_tree(match.group(1))) - return specs + return _iter_use_specs_with_pattern(content, USE_STATEMENT_RE) def iter_pub_use_specs(content: str) -> list[str]: """Return normalized `pub use` specs from a file.""" - stripped = strip_rust_comments(content) + return _iter_use_specs_with_pattern(content, PUB_USE_STATEMENT_RE) + + +def _iter_use_specs_with_pattern(content: str, pattern: re.Pattern[str]) -> list[str]: + """Extract `use` specs while ignoring string-literal contents. + + Rust files can contain natural-language strings (for example JSON tool + descriptions) with lines that begin with "use ...". We mask string literal + content before regex matching so import extraction only sees real code. + """ + stripped = strip_rust_comments(content, preserve_lines=True) + masked = _mask_rust_string_literals_preserve_lines(stripped) specs: list[str] = [] - for match in PUB_USE_STATEMENT_RE.finditer(stripped): - specs.extend(_expand_use_tree(match.group(1))) + for match in pattern.finditer(masked): + start, end = match.span(1) + specs.extend(_expand_use_tree(stripped[start:end])) return specs +def _mask_rust_string_literals_preserve_lines(content: str) -> str: + """Replace string literal contents with spaces while preserving newlines.""" + chars = list(content) + result = chars[:] + length = len(chars) + i = 0 + in_normal_string = False + raw_hash_count: int | None = None + while i < length: + ch = chars[i] + + if raw_hash_count is not None: + if ch == '"': + if raw_hash_count == 0: + result[i] = " " + raw_hash_count = None + i += 1 + continue + hash_count = raw_hash_count + hashes = "#" * hash_count + if content.startswith(hashes, i + 1): + result[i] = " " + for j in range(i + 1, i + 1 + hash_count): + result[j] = " " + raw_hash_count = None + i += 1 + hash_count + continue + result[i] = "\n" if ch == "\n" else " " + i += 1 + continue + + if in_normal_string: + if ch == "\\" and i + 1 < length: + result[i] = " " + result[i + 1] = "\n" if chars[i + 1] == "\n" else " " + i += 2 + continue + result[i] = "\n" if ch == "\n" else " " + if ch == '"': + in_normal_string = False + i += 1 + continue + + raw_prefix = _raw_string_prefix_length(chars, i) + if raw_prefix is not None: + prefix_len, hashes = raw_prefix + for j in range(i, i + prefix_len): + result[j] = " " + raw_hash_count = hashes + i += prefix_len + continue + + if ch == '"': + result[i] = " " + in_normal_string = True + i += 1 + continue + + if ch == "b" and i + 1 < length and chars[i + 1] == '"': + result[i] = " " + result[i + 1] = " " + in_normal_string = True + i += 2 + continue + + i += 1 + return "".join(result) + + +def _raw_string_prefix_length(chars: list[str], index: int) -> tuple[int, int] | None: + """Return (prefix_length, hash_count) for raw string prefixes at index.""" + length = len(chars) + j = index + if chars[j] == "b": + j += 1 + if j >= length: + return None + if chars[j] != "r": + return None + j += 1 + hash_count = 0 + while j < length and chars[j] == "#": + hash_count += 1 + j += 1 + if j >= length or chars[j] != '"': + return None + return (j - index + 1, hash_count) + + def find_manifest_dir(path: Path | str) -> Path | None: """Walk up from path to the nearest Cargo.toml root.""" candidate = Path(resolve_path(str(path))) @@ -240,9 +376,16 @@ def _read_manifest_data(manifest_dir: Path) -> dict[str, Any]: def build_workspace_package_index(scan_root: Path | None = None) -> dict[str, Path]: """Return local crate-name -> Cargo manifest dir for the active project root.""" root = find_workspace_root(scan_root) if scan_root is not None else get_project_root() + return _build_workspace_package_index_cached(root) + + +@functools.lru_cache(maxsize=8) +def _build_workspace_package_index_cached(root: Path) -> dict[str, Path]: + """Cached inner implementation of workspace package index building.""" + _exclusions = set(RUST_FILE_EXCLUSIONS) packages: dict[str, Path] = {} for manifest in root.rglob("Cargo.toml"): - if any(part in RUST_FILE_EXCLUSIONS for part in manifest.parts): + if any(part in _exclusions for part in manifest.relative_to(root).parts[:-1]): continue manifest_dir = manifest.parent.resolve() for name in { @@ -254,14 +397,21 @@ def build_workspace_package_index(scan_root: Path | None = None) -> dict[str, Pa return packages -def build_local_dependency_alias_index( - manifest_dir: Path, - package_index: dict[str, Path] | None = None, +@functools.lru_cache(maxsize=64) +def _build_local_dependency_alias_index_cached( + normalized_manifest_dir: Path, + workspace_root: Path, + package_index_items: tuple[tuple[str, str], ...], ) -> dict[str, Path]: - """Map local dependency aliases usable from one manifest to their crate roots.""" - normalized_manifest_dir = manifest_dir.resolve() - workspace_root = find_workspace_root(normalized_manifest_dir) - package_index = package_index or build_workspace_package_index(workspace_root) + """Cached implementation of local dependency alias extraction. + + Keying on primitive tuples keeps the cache hashable while preserving exact + package_index state for correctness. + """ + package_index: dict[str, Path] = { + name: Path(manifest_dir) + for name, manifest_dir in package_index_items + } workspace_aliases = _workspace_dependency_alias_index(workspace_root, package_index) aliases: dict[str, Path] = {} data = _read_manifest_data(normalized_manifest_dir) @@ -281,6 +431,24 @@ def build_local_dependency_alias_index( return aliases +def build_local_dependency_alias_index( + manifest_dir: Path, + package_index: dict[str, Path] | None = None, +) -> dict[str, Path]: + """Map local dependency aliases usable from one manifest to their crate roots.""" + normalized_manifest_dir = manifest_dir.resolve() + workspace_root = find_workspace_root(normalized_manifest_dir) + package_index = package_index or build_workspace_package_index(workspace_root) + package_index_items = tuple( + sorted((name, str(path.resolve())) for name, path in package_index.items()) + ) + return _build_local_dependency_alias_index_cached( + normalized_manifest_dir, + workspace_root, + package_index_items, + ) + + def _workspace_dependency_alias_index( workspace_root: Path, package_index: dict[str, Path], @@ -471,6 +639,7 @@ def resolve_mod_declaration( production_files: set[str], *, declared_path: str | None = None, + production_index: RustProductionFileIndex | None = None, ) -> str | None: """Resolve `mod foo;` to `foo.rs` or `foo/mod.rs` relative to the file's module dir.""" source = Path(resolve_path(str(source_file))).resolve() @@ -484,7 +653,11 @@ def resolve_mod_declaration( candidates.append(source.parent / declared_path) candidates.extend((base_dir / f"{module_name}.rs", base_dir / module_name / "mod.rs")) for candidate in candidates: - matched = _candidate_matches(candidate, production_files) + matched = _candidate_matches( + candidate, + production_files, + production_index=production_index, + ) if matched: return matched return None @@ -497,6 +670,7 @@ def resolve_use_spec( package_index: dict[str, Path] | None = None, *, allow_crate_root_fallback: bool = True, + production_index: RustProductionFileIndex | None = None, ) -> str | None: """Resolve a Rust `use` spec to a local module file when possible.""" cleaned = _normalize_use_spec(spec) @@ -521,6 +695,7 @@ def resolve_use_spec( context.root_files, segments[1:], production_files, + production_index=production_index, allow_root_fallback=allow_crate_root_fallback, ) ) @@ -532,6 +707,7 @@ def resolve_use_spec( context.root_files, resolved_segments, production_files, + production_index=production_index, allow_root_fallback=allow_crate_root_fallback, ) ) @@ -546,6 +722,7 @@ def resolve_use_spec( (manifest_dir / "src" / "lib.rs", manifest_dir / "src" / "main.rs"), segments[1:], production_files, + production_index=production_index, allow_root_fallback=allow_crate_root_fallback, ) ) @@ -555,6 +732,7 @@ def resolve_use_spec( context.root_files, list(context.module_segments) + segments, production_files, + production_index=production_index, allow_root_fallback=False, ) ) @@ -564,6 +742,7 @@ def resolve_use_spec( context.root_files, segments, production_files, + production_index=production_index, allow_root_fallback=allow_crate_root_fallback, ) ) @@ -578,6 +757,7 @@ def resolve_barrel_targets( filepath: str | Path, production_files: set[str], package_index: dict[str, Path] | None = None, + production_index: RustProductionFileIndex | None = None, ) -> set[str]: """Resolve `pub use` / `pub mod` targets from a Rust facade file.""" try: @@ -594,6 +774,7 @@ def resolve_barrel_targets( production_files, package_index, allow_crate_root_fallback=False, + production_index=production_index, ) if resolved: targets.add(resolved) @@ -603,6 +784,7 @@ def resolve_barrel_targets( filepath, production_files, declared_path=declared_path, + production_index=production_index, ) if resolved: targets.add(resolved) @@ -643,10 +825,15 @@ def _resolve_from_source_root( segments: list[str], production_files: set[str], *, + production_index: RustProductionFileIndex | None, allow_root_fallback: bool, ) -> str | None: if not segments: - return _match_root_files(root_files, production_files) + return _match_root_files( + root_files, + production_files, + production_index=production_index, + ) for width in range(len(segments), 0, -1): module_parts = segments[:width] @@ -655,42 +842,60 @@ def _resolve_from_source_root( file_candidate = source_root.joinpath(*module_parts).with_suffix(".rs") mod_candidate = source_root.joinpath(*module_parts, "mod.rs") for candidate in (file_candidate, mod_candidate): - matched = _candidate_matches(candidate, production_files) + matched = _candidate_matches( + candidate, + production_files, + production_index=production_index, + ) if matched: return matched if allow_root_fallback: - return _match_root_files(root_files, production_files) + return _match_root_files( + root_files, + production_files, + production_index=production_index, + ) return None -def _match_root_files(root_files: tuple[Path, ...], production_files: set[str]) -> str | None: +def _match_root_files( + root_files: tuple[Path, ...], + production_files: set[str], + *, + production_index: RustProductionFileIndex | None, +) -> str | None: for root_file in root_files: - matched = _candidate_matches(root_file, production_files) + matched = _candidate_matches( + root_file, + production_files, + production_index=production_index, + ) if matched: return matched return None -def _candidate_matches(candidate: Path, production_files: set[str]) -> str | None: +def _candidate_matches( + candidate: Path, + production_files: set[str], + *, + production_index: RustProductionFileIndex | None = None, +) -> str | None: + index = production_index or build_production_file_index(production_files) resolved_candidate = candidate.resolve() - project_root = get_project_root() candidate_abs = str(resolved_candidate) + absolute_match = index.by_absolute.get(candidate_abs) + if absolute_match is not None: + return absolute_match try: - candidate_rel = rel(resolved_candidate, project_root=project_root) + candidate_rel = rel(resolved_candidate, project_root=index.project_root) except (TypeError, ValueError, OSError): candidate_rel = None - - for production_file in production_files: - prod_path = Path(production_file) - if prod_path.is_absolute(): - normalized = str(prod_path.resolve()) - else: - normalized = str((project_root / prod_path).resolve()) - if normalized == candidate_abs: - return production_file - if candidate_rel is not None and production_file == candidate_rel: - return production_file + if candidate_rel is not None: + relative_match = index.by_relative.get(candidate_rel) + if relative_match is not None: + return relative_match return None @@ -804,7 +1009,9 @@ def _load_toml_dict(path: Path) -> dict[str, Any] | None: "RUST_FILE_EXCLUSIONS", "PUB_USE_STATEMENT_RE", "RustFileContext", + "RustProductionFileIndex", "USE_STATEMENT_RE", + "build_production_file_index", "build_workspace_package_index", "build_local_dependency_alias_index", "describe_rust_file", diff --git a/desloppify/languages/rust/tests/test_support.py b/desloppify/languages/rust/tests/test_support.py index f490fa303..5ca88e58c 100644 --- a/desloppify/languages/rust/tests/test_support.py +++ b/desloppify/languages/rust/tests/test_support.py @@ -5,7 +5,10 @@ from pathlib import Path from desloppify.languages.rust.support import ( + build_production_file_index, find_workspace_root, + iter_use_specs, + match_production_candidate, read_text_or_none, strip_rust_comments, ) @@ -80,3 +83,30 @@ def test_find_workspace_root_skips_invalid_nested_manifest(tmp_path): source = _write(tmp_path, "app/src/lib.rs", "pub fn run() {}\n") assert find_workspace_root(source) == tmp_path.resolve() + + +def test_iter_use_specs_ignores_use_text_inside_strings(): + content = r''' +fn registry() { + let description = r#" + use this wording in docs only; still not an import. + "#; +} +use crate::real::Thing; +''' + + specs = iter_use_specs(content) + + assert specs == ["crate::real::Thing"] + + +def test_match_production_candidate_uses_relative_index_key(tmp_path): + prod = _write(tmp_path, "src/lib.rs", "pub fn run() {}\n") + production_files = {"src/lib.rs"} + + from desloppify.base.runtime_state import RuntimeContext, runtime_scope + + with runtime_scope(RuntimeContext(project_root=tmp_path)): + index = build_production_file_index(production_files) + assert match_production_candidate(prod, production_files) == "src/lib.rs" + assert index.by_relative["src/lib.rs"] == "src/lib.rs" diff --git a/desloppify/languages/rust/tools.py b/desloppify/languages/rust/tools.py index 66c31c4c8..c1760256c 100644 --- a/desloppify/languages/rust/tools.py +++ b/desloppify/languages/rust/tools.py @@ -560,11 +560,6 @@ def _strip_comments_preserve_lines(text: str) -> str: return "".join(result) -def _strip_c_style_comments_preserve_lines(text: str) -> str: - """Backwards-compatible shim for tests/imports expecting the old helper name.""" - return _strip_comments_preserve_lines(text) - - def _line_number(content: str, offset: int) -> int: return content.count("\n", 0, offset) + 1 diff --git a/desloppify/languages/scss/__init__.py b/desloppify/languages/scss/__init__.py new file mode 100644 index 000000000..149409a23 --- /dev/null +++ b/desloppify/languages/scss/__init__.py @@ -0,0 +1,25 @@ +"""SCSS language plugin -- stylelint.""" + +from desloppify.languages._framework.generic_support.core import generic_lang + +generic_lang( + name="scss", + extensions=[".scss", ".sass"], + tools=[ + { + "label": "stylelint", + "cmd": "stylelint '**/*.scss' '**/*.sass' --formatter unix --max-warnings 1000", + "fmt": "gnu", + "id": "stylelint_issue", + "tier": 2, + "fix_cmd": "stylelint --fix '**/*.scss' '**/*.sass'", + }, + ], + exclude=["node_modules", "_output", ".quarto", "vendor"], + detect_markers=["_scss", ".stylelintrc"], + treesitter_spec=None, +) + +__all__ = [ + "generic_lang", +] diff --git a/desloppify/languages/typescript/__init__.py b/desloppify/languages/typescript/__init__.py index aedfcb57a..97a4a86cb 100644 --- a/desloppify/languages/typescript/__init__.py +++ b/desloppify/languages/typescript/__init__.py @@ -15,6 +15,7 @@ LangConfig, LangSecurityResult, ) +from desloppify.languages._framework.frameworks.phases import framework_phases from desloppify.languages._framework.registry.registration import register_full_plugin from desloppify.languages._framework.registry.state import register_lang_hooks from desloppify.languages.typescript import test_coverage as ts_test_coverage_hooks @@ -118,6 +119,7 @@ def __init__(self): detector_phase_signature(), detector_phase_test_coverage(), DetectorPhase("Code smells", phase_smells), + *framework_phases("typescript"), detector_phase_security(), *shared_subjective_duplicates_tail(), ], diff --git a/desloppify/languages/typescript/detectors/unused.py b/desloppify/languages/typescript/detectors/unused.py index fee3d4ab3..16cfca3fb 100644 --- a/desloppify/languages/typescript/detectors/unused.py +++ b/desloppify/languages/typescript/detectors/unused.py @@ -46,14 +46,27 @@ def _run_tsc_unused_check( project_root: Path, tsconfig_path: Path, ) -> subprocess.CompletedProcess[str]: - """Run the fixed `npx tsc` unused-symbol check for one project root.""" + """Run the unused-symbol check for one project root. + + Prefers `npx tsc` (project-local), then `node_modules/.bin/tsc`, then `tsc`. + """ npx_path = shutil.which("npx") - if not npx_path: - raise OSError("npx executable not found in PATH") + if npx_path: + cmd = [npx_path, "tsc"] + else: + local_tsc = project_root / "node_modules" / ".bin" / "tsc" + if local_tsc.is_file(): + cmd = [str(local_tsc)] + else: + tsc_path = shutil.which("tsc") + if tsc_path: + cmd = [tsc_path] + else: + raise OSError("TypeScript compiler not found (npx/tsc)") + return _proc_runtime.run( # nosec B603 [ - npx_path, - "tsc", + *cmd, "--project", str(tsconfig_path), "--noEmit", diff --git a/desloppify/languages/typescript/phases_smells.py b/desloppify/languages/typescript/phases_smells.py index e5ea4e741..2d0a4aad5 100644 --- a/desloppify/languages/typescript/phases_smells.py +++ b/desloppify/languages/typescript/phases_smells.py @@ -106,10 +106,12 @@ def phase_smells(path: Path, lang: LangRuntimeContract) -> tuple[list[Issue], di if bool_entries: log(f" react: {len(bool_entries)} boolean state explosions") - return results, { + potentials: dict[str, int] = { "smells": adjust_potential(lang.zone_map, total_smell_files), "react": total_effects, } + return results, potentials + __all__ = ["phase_smells"] diff --git a/desloppify/languages/typescript/tests/test_ts_nextjs_framework.py b/desloppify/languages/typescript/tests/test_ts_nextjs_framework.py new file mode 100644 index 000000000..57d48d5fc --- /dev/null +++ b/desloppify/languages/typescript/tests/test_ts_nextjs_framework.py @@ -0,0 +1,229 @@ +"""Tests for Next.js framework spec integration (TypeScript).""" + +from __future__ import annotations + +from pathlib import Path +from types import SimpleNamespace + +import pytest + +from desloppify.engine.planning import scan as plan_scan_mod +from desloppify.languages._framework.frameworks.detection import detect_ecosystem_frameworks +from desloppify.languages._framework.node.frameworks.nextjs.info import ( + nextjs_info_from_evidence, +) +from desloppify.languages._framework.node.frameworks.nextjs.scanners import ( + scan_nextjs_server_modules_in_pages_router, + scan_nextjs_server_navigation_apis_in_client, + scan_nextjs_use_server_in_client, + scan_nextjs_use_server_not_first, +) +from desloppify.languages.framework import make_lang_run +from desloppify.languages.typescript import TypeScriptConfig + + +@pytest.fixture(autouse=True) +def _root(tmp_path, set_project_root): + """Point PROJECT_ROOT at the tmp directory via RuntimeContext.""" + + +def _write(tmp_path: Path, name: str, content: str) -> Path: + p = tmp_path / name + p.parent.mkdir(parents=True, exist_ok=True) + p.write_text(content) + return p + + +class _FakeLang(SimpleNamespace): + zone_map = None + dep_graph = None + file_finder = None + + def __init__(self): + super().__init__(review_cache={}, detector_coverage={}, coverage_warnings=[]) + + +def test_detect_nextjs_present_when_next_dependency_and_app_present(tmp_path: Path): + _write(tmp_path, "package.json", '{"dependencies": {"next": "14.0.0"}}\n') + _write(tmp_path, "app/page.tsx", "export default function Page() { return
}\n") + + detection = detect_ecosystem_frameworks(tmp_path, None, "node") + assert detection.package_root == tmp_path.resolve() + assert detection.package_json_relpath == "package.json" + assert "nextjs" in detection.present + assert "app" in (detection.present["nextjs"].get("marker_dir_hits") or []) + + +def test_detect_nextjs_absent_when_only_app_tree_exists(tmp_path: Path): + _write(tmp_path, "package.json", '{"dependencies": {"react": "18.3.0"}}\n') + _write(tmp_path, "app/page.tsx", "export default function Page() { return
}\n") + + detection = detect_ecosystem_frameworks(tmp_path, None, "node") + assert "nextjs" not in detection.present + + +def test_detect_nextjs_package_root_for_external_scan_path(tmp_path: Path): + external = tmp_path.parent / f"{tmp_path.name}-external-next" + external.mkdir(parents=True, exist_ok=True) + (external / "package.json").write_text('{"dependencies": {"next": "14.0.0"}}\n') + (external / "app").mkdir(parents=True, exist_ok=True) + (external / "app" / "page.tsx").write_text("export default function Page(){return
}\n") + + detection = detect_ecosystem_frameworks(external, None, "node") + assert detection.package_root == external.resolve() + assert detection.package_json_relpath is not None + assert detection.package_json_relpath.endswith("package.json") + assert "nextjs" in detection.present + + +def test_use_server_not_first_ignores_nested_inline_actions(tmp_path: Path): + _write(tmp_path, "package.json", '{"dependencies": {"next": "14.0.0"}}\n') + _write( + tmp_path, + "app/inline-action.tsx", + ( + "export default async function Page() {\n" + " async function doAction() {\n" + " 'use server'\n" + " return 1\n" + " }\n" + " return
{String(!!doAction)}
\n" + "}\n" + ), + ) + _write( + tmp_path, + "app/misplaced.ts", + "export const x = 1\n'use server'\nexport async function action(){ return 1 }\n", + ) + + info = nextjs_info_from_evidence( + {"marker_dir_hits": ["app"]}, + package_root=tmp_path.resolve(), + package_json_relpath="package.json", + ) + entries, _ = scan_nextjs_use_server_not_first(tmp_path, info) + files = {entry["file"] for entry in entries} + assert "app/misplaced.ts" in files + assert "app/inline-action.tsx" not in files + + +def test_use_server_in_client_ignores_comments_and_string_literals(tmp_path: Path): + _write(tmp_path, "package.json", '{"dependencies": {"next": "14.0.0"}}\n') + _write( + tmp_path, + "app/page.tsx", + ( + "'use client'\n" + 'console.log("use server")\n' + "// 'use server'\n" + "export default function X(){return null}\n" + ), + ) + + info = nextjs_info_from_evidence( + {"marker_dir_hits": ["app"]}, + package_root=tmp_path.resolve(), + package_json_relpath="package.json", + ) + entries, _ = scan_nextjs_use_server_in_client(tmp_path, info) + assert not entries + + +def test_server_navigation_apis_in_client_only_flags_not_found(tmp_path: Path): + _write(tmp_path, "package.json", '{"dependencies": {"next": "14.0.0"}}\n') + _write( + tmp_path, + "app/client-redirect.tsx", + ( + "'use client'\n" + "import { redirect } from 'next/navigation'\n" + "export default function X(){ redirect('/'); return null }\n" + ), + ) + _write( + tmp_path, + "app/client-notfound.tsx", + ( + "'use client'\n" + "import { notFound } from 'next/navigation'\n" + "export default function X(){ notFound(); return null }\n" + ), + ) + + info = nextjs_info_from_evidence( + {"marker_dir_hits": ["app"]}, + package_root=tmp_path.resolve(), + package_json_relpath="package.json", + ) + entries, _ = scan_nextjs_server_navigation_apis_in_client(tmp_path, info) + files = {entry["file"] for entry in entries} + assert "app/client-notfound.tsx" in files + assert "app/client-redirect.tsx" not in files + + +def test_server_modules_in_pages_router_skips_pages_api_routes(tmp_path: Path): + _write(tmp_path, "package.json", '{"dependencies": {"next": "14.0.0"}}\n') + _write( + tmp_path, + "pages/api/edge.ts", + ( + "import { NextResponse } from 'next/server'\n" + "export const config = { runtime: 'edge' }\n" + "export default function handler(){ return NextResponse.json({ ok: true }) }\n" + ), + ) + + info = nextjs_info_from_evidence( + {"marker_dir_hits": ["pages"]}, + package_root=tmp_path.resolve(), + package_json_relpath="package.json", + ) + entries, _ = scan_nextjs_server_modules_in_pages_router(tmp_path, info) + assert not entries + + +def test_typescript_config_includes_nextjs_framework_phases_and_next_lint_is_slow(): + cfg = TypeScriptConfig() + labels = [getattr(p, "label", "") for p in cfg.phases] + assert "Next.js framework smells" in labels + lint = next(p for p in cfg.phases if getattr(p, "label", "") == "next lint") + assert lint.slow is True + + +def test_nextjs_smells_phase_emits_issues_when_next_present(tmp_path: Path): + _write( + tmp_path, + "package.json", + '{"dependencies": {"next": "14.0.0", "react": "18.3.0"}}\n', + ) + _write( + tmp_path, + "app/legacy.tsx", + "import { useRouter } from 'next/router'\nexport default function X(){return null}\n", + ) + _write( + tmp_path, + "app/server-in-client.tsx", + ( + "'use client'\n" + "import { cookies } from 'next/headers'\n" + "import fs from 'node:fs'\n" + "export default function X(){return null}\n" + ), + ) + + cfg = TypeScriptConfig() + phase = next(p for p in cfg.phases if getattr(p, "label", "") == "Next.js framework smells") + issues, potentials = phase.run(tmp_path, _FakeLang()) + assert potentials.get("nextjs", 0) >= 1 + assert any(issue.get("detector") == "nextjs" for issue in issues) + assert any("next_router_in_app_router" in str(issue.get("id", "")) for issue in issues) + + +def test_next_lint_phase_is_skipped_when_include_slow_false(): + run = make_lang_run(TypeScriptConfig()) + selected = plan_scan_mod._select_phases(run, include_slow=False, profile="full") + labels = [getattr(p, "label", "") for p in selected] + assert "Next.js framework smells" in labels + assert "next lint" not in labels diff --git a/desloppify/languages/typescript/tests/test_ts_unused.py b/desloppify/languages/typescript/tests/test_ts_unused.py index a954a2da2..d9ba376df 100644 --- a/desloppify/languages/typescript/tests/test_ts_unused.py +++ b/desloppify/languages/typescript/tests/test_ts_unused.py @@ -168,7 +168,7 @@ def _fake_run(*args, **kwargs): def test_run_tsc_unused_check_raises_without_npx(self, tmp_path, monkeypatch): monkeypatch.setattr(ts_unused_mod.shutil, "which", lambda _name: None) - with pytest.raises(OSError, match="npx executable not found"): + with pytest.raises(OSError, match="TypeScript compiler not found"): ts_unused_mod._run_tsc_unused_check(tmp_path, tmp_path / "tsconfig.json") def test_detect_unused_uses_deno_fallback_for_url_imports(self, tmp_path, monkeypatch): @@ -238,6 +238,11 @@ def _fake_run(*args, **kwargs): calls["count"] += 1 return _Result() + monkeypatch.setattr( + ts_unused_mod.shutil, + "which", + lambda name: "/opt/homebrew/bin/npx" if name == "npx" else None, + ) monkeypatch.setattr(ts_unused_mod._proc_runtime, "run", _fake_run) entries, total = detect_unused(tmp_path / "src") assert calls["count"] == 1 @@ -263,6 +268,11 @@ def _fake_run(*args, **kwargs): calls["count"] += 1 return _Result() + monkeypatch.setattr( + ts_unused_mod.shutil, + "which", + lambda name: "/opt/homebrew/bin/npx" if name == "npx" else None, + ) monkeypatch.setattr(ts_unused_mod._proc_runtime, "run", _fake_run) entries, total = detect_unused(tmp_path / "src") assert calls["count"] == 1 diff --git a/desloppify/tests/commands/plan/test_cluster_ops_direct.py b/desloppify/tests/commands/plan/test_cluster_ops_direct.py index 9a7e1e8e6..42ccfc29c 100644 --- a/desloppify/tests/commands/plan/test_cluster_ops_direct.py +++ b/desloppify/tests/commands/plan/test_cluster_ops_direct.py @@ -33,7 +33,7 @@ def test_cluster_steps_print_step_variants(capsys) -> None: assert "1. [ ] Structured" in out assert "line one" in out assert "Refs: x, y" in out - assert "(completed)" in out + assert "2. [x] Done step" in out def test_cluster_display_helpers_and_renderers(monkeypatch, capsys) -> None: @@ -71,14 +71,13 @@ def test_cluster_display_helpers_and_renderers(monkeypatch, capsys) -> None: cluster_display_mod._cmd_cluster_show(argparse.Namespace(cluster_name="alpha")) out_show = capsys.readouterr().out assert "Cluster: alpha" in out_show - assert "Members (1)" in out_show - assert "File: src/a.py at lines: 3, 7" in out_show + assert "Members (1): i1" in out_show cluster_display_mod._cmd_cluster_list( argparse.Namespace(verbose=True, missing_steps=False) ) out_list = capsys.readouterr().out - assert "Clusters (2 total" in out_list + assert "2 clusters" in out_list assert "alpha" in out_list assert "beta" in out_list diff --git a/desloppify/tests/commands/review/test_review_batch_execution_phases_direct.py b/desloppify/tests/commands/review/test_review_batch_execution_phases_direct.py index c959f451b..add500463 100644 --- a/desloppify/tests/commands/review/test_review_batch_execution_phases_direct.py +++ b/desloppify/tests/commands/review/test_review_batch_execution_phases_direct.py @@ -222,3 +222,110 @@ def test_merge_and_import_batch_run_calls_all_pipeline_steps() -> None: phases_mod.import_and_finalize = original_import assert calls == ["enforce", "import"] + + +def test_is_partial_batch_retry_detects_subset() -> None: + """Selected indexes that are a strict subset of all batches is a partial retry.""" + ctx = _prepared_context( + batches=[{"dimensions": ["a"]}, {"dimensions": ["b"]}, {"dimensions": ["c"]}], + selected_indexes=[1], + ) + assert phases_mod._is_partial_batch_retry(ctx) is True + + +def test_is_partial_batch_retry_false_for_full_run() -> None: + ctx = _prepared_context( + batches=[{"dimensions": ["a"]}, {"dimensions": ["b"]}], + selected_indexes=[0, 1], + ) + assert phases_mod._is_partial_batch_retry(ctx) is False + + +def test_partial_retry_bypasses_coverage_gate() -> None: + """When --only-batches selects a subset, the coverage gate gets allow_partial=True.""" + captured_kwargs: dict = {} + original_merge = phases_mod.merge_and_write_results + original_enforce = phases_mod.enforce_import_coverage + original_import = phases_mod.import_and_finalize + + # Return missing dims so the gate would normally block. + phases_mod.merge_and_write_results = lambda **_k: (Path("merged.json"), ["missing_dim"]) + + def capture_enforce(**kwargs): + captured_kwargs.update(kwargs) + + phases_mod.enforce_import_coverage = capture_enforce + phases_mod.import_and_finalize = lambda **_k: None + try: + # 3 batches but only batch 1 selected => partial retry + phases_mod.merge_and_import_batch_run( + prepared=_prepared_context( + allow_partial=False, + batches=[{"dimensions": ["a"]}, {"dimensions": ["b"]}, {"dimensions": ["c"]}], + selected_indexes=[1], + append_run_log=lambda *_a, **_k: None, + args=SimpleNamespace(), + ), + executed=_executed_context(), + state_file=Path("state.json"), + deps=SimpleNamespace( + merge_batch_results_fn=lambda *_a, **_k: {"issues": []}, + build_import_provenance_fn=lambda **_k: {}, + safe_write_text_fn=lambda *_a, **_k: None, + colorize_fn=lambda text, _tone=None: text, + do_import_fn=lambda *_a, **_k: None, + run_followup_scan_fn=lambda **_k: 0, + ), + ) + finally: + phases_mod.merge_and_write_results = original_merge + phases_mod.enforce_import_coverage = original_enforce + phases_mod.import_and_finalize = original_import + + # The gate should have been called with allow_partial=True despite + # the prepared context having allow_partial=False. + assert captured_kwargs["allow_partial"] is True + + +def test_full_run_does_not_bypass_coverage_gate() -> None: + """A full run (all batches selected) should NOT override allow_partial.""" + captured_kwargs: dict = {} + original_merge = phases_mod.merge_and_write_results + original_enforce = phases_mod.enforce_import_coverage + original_import = phases_mod.import_and_finalize + + phases_mod.merge_and_write_results = lambda **_k: (Path("merged.json"), ["missing_dim"]) + + def capture_enforce(**kwargs): + captured_kwargs.update(kwargs) + + phases_mod.enforce_import_coverage = capture_enforce + phases_mod.import_and_finalize = lambda **_k: None + try: + # All 2 batches selected => full run + phases_mod.merge_and_import_batch_run( + prepared=_prepared_context( + allow_partial=False, + batches=[{"dimensions": ["a"]}, {"dimensions": ["b"]}], + selected_indexes=[0, 1], + append_run_log=lambda *_a, **_k: None, + args=SimpleNamespace(), + ), + executed=_executed_context(), + state_file=Path("state.json"), + deps=SimpleNamespace( + merge_batch_results_fn=lambda *_a, **_k: {"issues": []}, + build_import_provenance_fn=lambda **_k: {}, + safe_write_text_fn=lambda *_a, **_k: None, + colorize_fn=lambda text, _tone=None: text, + do_import_fn=lambda *_a, **_k: None, + run_followup_scan_fn=lambda **_k: 0, + ), + ) + finally: + phases_mod.merge_and_write_results = original_merge + phases_mod.enforce_import_coverage = original_enforce + phases_mod.import_and_finalize = original_import + + # Full run should preserve the original allow_partial=False. + assert captured_kwargs["allow_partial"] is False diff --git a/desloppify/tests/commands/scan/test_cmd_scan.py b/desloppify/tests/commands/scan/test_cmd_scan.py index da2499273..03960276b 100644 --- a/desloppify/tests/commands/scan/test_cmd_scan.py +++ b/desloppify/tests/commands/scan/test_cmd_scan.py @@ -527,6 +527,23 @@ def file_finder(self, path): assert result["total_loc"] == 6 # 2 + 1 + 3 assert result["total_directories"] == 2 # tmp_path and sub + def test_uses_precomputed_file_list_when_provided(self, tmp_path): + file_path = tmp_path / "a.py" + file_path.write_text("line1\nline2\n") + + class FakeLang: + def file_finder(self, _path): + raise AssertionError("file_finder should not run when files are provided") + + result = collect_codebase_metrics( + FakeLang(), + tmp_path, + files=[str(file_path)], + ) + assert result is not None + assert result["total_files"] == 1 + assert result["total_loc"] == 2 + # --------------------------------------------------------------------------- # warn_explicit_lang_with_no_files @@ -581,4 +598,3 @@ class FakeLang: # show_post_scan_analysis # --------------------------------------------------------------------------- - diff --git a/desloppify/tests/commands/scan/test_plan_reconcile.py b/desloppify/tests/commands/scan/test_plan_reconcile.py index 617cd1f31..a3ab7b5d7 100644 --- a/desloppify/tests/commands/scan/test_plan_reconcile.py +++ b/desloppify/tests/commands/scan/test_plan_reconcile.py @@ -19,11 +19,12 @@ # Helpers # --------------------------------------------------------------------------- -def _runtime(*, state=None, config=None) -> SimpleNamespace: +def _runtime(*, state=None, config=None, force_rescan=False) -> SimpleNamespace: return SimpleNamespace( state=state or {}, state_path=Path("/tmp/fake-state.json"), config=config or {}, + force_rescan=force_rescan, ) diff --git a/desloppify/tests/commands/scan/test_plan_reconcile_postflight_and_reconcile.py b/desloppify/tests/commands/scan/test_plan_reconcile_postflight_and_reconcile.py index e5dd2e256..9960a3813 100644 --- a/desloppify/tests/commands/scan/test_plan_reconcile_postflight_and_reconcile.py +++ b/desloppify/tests/commands/scan/test_plan_reconcile_postflight_and_reconcile.py @@ -215,6 +215,22 @@ def test_marks_postflight_scan_on_empty_plan(self, monkeypatch): assert isinstance(saved[0]["plan_start_scores"].get("strict"), float) assert saved[0]["refresh_state"]["postflight_scan_completed_at_scan_count"] == 1 + def test_force_rescan_marks_postflight_scan_complete(self, monkeypatch): + plan = empty_plan() + plan["queue_order"] = ["workflow::run-scan"] + plan["plan_start_scores"] = {"strict": 86.4} + state = _make_state(scan_count=5) + + saved: list[dict] = [] + monkeypatch.setattr(reconcile_mod, "load_plan", lambda _path=None: plan) + monkeypatch.setattr(reconcile_mod, "save_plan", lambda p, _path=None: saved.append(p)) + + reconcile_mod.reconcile_plan_post_scan(_runtime(state=state, force_rescan=True)) + + assert len(saved) == 1 + assert saved[0]["plan_start_scores"] == {"strict": 86.4} + assert saved[0]["refresh_state"]["postflight_scan_completed_at_scan_count"] == 5 + def test_superseded_issue_removed_from_clusters(self, monkeypatch): plan = empty_plan() plan["queue_order"] = ["issue-1", "issue-2"] diff --git a/desloppify/tests/commands/scan/test_scan_orchestrator_direct.py b/desloppify/tests/commands/scan/test_scan_orchestrator_direct.py index 3a4e3d094..267a3f2fa 100644 --- a/desloppify/tests/commands/scan/test_scan_orchestrator_direct.py +++ b/desloppify/tests/commands/scan/test_scan_orchestrator_direct.py @@ -83,14 +83,24 @@ def test_run_scan_generation_uses_planning_scan_surface(monkeypatch) -> None: ([{"id": "open-1"}], {"smells": 1}), )[1], ) - monkeypatch.setattr(scan_workflow_mod, "collect_codebase_metrics", lambda _lang, _path: {"loc": 10}) + monkeypatch.setattr( + scan_workflow_mod, + "collect_codebase_metrics", + lambda _lang, _path, **_kwargs: ( + calls.setdefault("metrics_kwargs", _kwargs), + {"loc": 10}, + )[1], + ) monkeypatch.setattr(scan_workflow_mod, "warn_explicit_lang_with_no_files", lambda *_a, **_k: None) monkeypatch.setattr(scan_workflow_mod, "get_exclusions", lambda: []) monkeypatch.setattr(scan_workflow_mod, "_augment_stale_wontfix_impl", lambda issues, **_k: (issues, 0)) runtime = SimpleNamespace( path=".", - lang=SimpleNamespace(file_finder=None), + lang=SimpleNamespace( + file_finder=None, + zone_map=SimpleNamespace(all_files=lambda: ["src/a.py"]), + ), effective_include_slow=True, zone_overrides={"src": "prod"}, profile="full", @@ -110,6 +120,7 @@ def test_run_scan_generation_uses_planning_scan_surface(monkeypatch) -> None: assert calls["generate"][2].include_slow is True assert calls["generate"][2].zone_overrides == {"src": "prod"} assert calls["generate"][2].profile == "full" + assert calls["metrics_kwargs"] == {"files": ["src/a.py"]} assert calls["file_cache_on"] is True assert calls["file_cache_off"] is True assert calls["parse_cache_on"] is True diff --git a/desloppify/tests/commands/scan/test_scan_preflight.py b/desloppify/tests/commands/scan/test_scan_preflight.py index d1fa4c6d9..91bdaf2bb 100644 --- a/desloppify/tests/commands/scan/test_scan_preflight.py +++ b/desloppify/tests/commands/scan/test_scan_preflight.py @@ -76,6 +76,34 @@ def test_queue_clear_allows_scan(): scan_queue_preflight(args) +def test_queue_drained_with_non_scan_lifecycle_allows_scan(): + """When queue is fully drained but lifecycle phase hasn't advanced to scan, + scan should still be allowed. Regression test for #441.""" + from desloppify.app.commands.helpers.queue_progress import QueueBreakdown + + args = SimpleNamespace(profile=None, force_rescan=False, state=None, lang="python") + plan = {"plan_start_scores": {"strict": 80.0}} + # lifecycle_phase stuck on "review" even though queue_total is 0 + breakdown = QueueBreakdown(queue_total=0, workflow=0, lifecycle_phase="review") + with ( + patch( + "desloppify.app.commands.scan.preflight.resolve_plan_load_status", + return_value=_plan_status(plan), + ), + patch( + "desloppify.app.commands.scan.preflight.state_path", + return_value="/tmp/test-state.json", + ), + patch("desloppify.app.commands.scan.preflight.state_mod") as mock_state_mod, + patch( + "desloppify.app.commands.scan.preflight.plan_aware_queue_breakdown", + return_value=breakdown, + ), + ): + mock_state_mod.load_state.return_value = {"issues": {}} + scan_queue_preflight(args) + + # ── Queue remaining = gate ────────────────────────────────── diff --git a/desloppify/tests/commands/test_queue_progress.py b/desloppify/tests/commands/test_queue_progress.py index e7cc467b0..7c9a818b0 100644 --- a/desloppify/tests/commands/test_queue_progress.py +++ b/desloppify/tests/commands/test_queue_progress.py @@ -162,6 +162,13 @@ def test_score_display_mode_live_when_queue_empty(): assert score_display_mode(b, 80.0) is ScoreDisplayMode.LIVE +@pytest.mark.parametrize("phase", ["review", "execute", "workflow"]) +def test_score_display_mode_live_when_queue_drained_any_lifecycle(phase: str): + """queue_total=0 always returns LIVE regardless of lifecycle phase (#441).""" + b = QueueBreakdown(queue_total=0, lifecycle_phase=phase) + assert score_display_mode(b, 80.0) is ScoreDisplayMode.LIVE + + # ── format_queue_headline ──────────────────────────────────── diff --git a/desloppify/tests/commands/test_review_coordinator.py b/desloppify/tests/commands/test_review_coordinator.py new file mode 100644 index 000000000..ffeca3841 --- /dev/null +++ b/desloppify/tests/commands/test_review_coordinator.py @@ -0,0 +1,33 @@ +"""Tests for review coordinator baseline helpers.""" + +from __future__ import annotations + +from pathlib import Path +from types import SimpleNamespace + +from desloppify.app.commands.review import coordinator as mod + + +def test_git_baseline_returns_none_tuple_when_status_raises_oserror(): + calls: list[list[str]] = [] + + def _run(command, **_kwargs): + calls.append(command) + if command[-2:] == ["rev-parse", "HEAD"]: + return SimpleNamespace(returncode=0, stdout="abc123\n") + raise OSError("git unavailable") + + assert mod.git_baseline(Path("/tmp/project"), subprocess_run=_run) == (None, None) + assert len(calls) == 2 + + +def test_git_baseline_hashes_status_output_when_both_commands_succeed(): + def _run(command, **_kwargs): + if command[-2:] == ["rev-parse", "HEAD"]: + return SimpleNamespace(returncode=0, stdout="abc123\n") + return SimpleNamespace(returncode=0, stdout=" M foo.py\n?? bar.py\n") + + head, status_hash = mod.git_baseline(Path("/tmp/project"), subprocess_run=_run) + + assert head == "abc123" + assert status_hash == mod._stable_json_sha256(" M foo.py\n?? bar.py\n") diff --git a/desloppify/tests/commands/test_runner_modules_direct.py b/desloppify/tests/commands/test_runner_modules_direct.py index c6c65fa98..1d383306b 100644 --- a/desloppify/tests/commands/test_runner_modules_direct.py +++ b/desloppify/tests/commands/test_runner_modules_direct.py @@ -9,6 +9,57 @@ import desloppify.app.commands.runner.run_logs as run_logs_mod +def test_wrap_cmd_c_collapses_arguments_into_single_string() -> None: + """_wrap_cmd_c should join everything after /c into one quoted string.""" + wrap = codex_batch_mod._wrap_cmd_c + + # cmd /c with a path containing spaces — arguments are collapsed + cmd = ["cmd", "/c", "C:\\Program Files\\codex.cmd", "exec", "-C", "C:\\my project - Copy"] + result = wrap(cmd) + assert result[:2] == ["cmd", "/c"] + assert len(result) == 3 # exactly three elements + inner = result[2] + # The inner string should contain the quoted path + assert '"C:\\my project - Copy"' in inner + assert "exec" in inner + assert '"C:\\Program Files\\codex.cmd"' in inner + + # Non-cmd command — returned unchanged + assert wrap(["codex", "exec", "-C", "path"]) == ["codex", "exec", "-C", "path"] + + # cmd /c with simple paths (no spaces) — still collapses, no quotes needed + simple = wrap(["cmd", "/c", "codex", "exec", "-C", "repo"]) + assert len(simple) == 3 + assert simple[2] == "codex exec -C repo" + + +def test_codex_batch_command_on_windows_collapses_cmd_c(monkeypatch, tmp_path: Path) -> None: + """On Windows with a .cmd wrapper, paths with spaces must be quoted inside a single /c arg.""" + monkeypatch.setattr("sys.platform", "win32") + monkeypatch.setattr( + "shutil.which", + lambda _name: "C:\\Program Files\\npm\\codex.CMD", + ) + repo = tmp_path / "core_project - Copy" + repo.mkdir() + output = repo / ".desloppify" / "out.json" + + cmd = codex_batch_mod.codex_batch_command( + prompt="review prompt", + repo_root=repo, + output_file=output, + ) + # Should be exactly ["cmd", "/c", ""] + assert cmd[0] == "cmd" + assert cmd[1] == "/c" + assert len(cmd) == 3 + inner = cmd[2] + # The repo path with spaces must be quoted + assert f'"{repo}"' in inner or f'"{str(repo)}"' in inner + assert "exec" in inner + assert "--ephemeral" in inner + + def test_codex_batch_command_uses_sanitized_reasoning_effort(monkeypatch, tmp_path: Path) -> None: monkeypatch.setenv("DESLOPPIFY_CODEX_REASONING_EFFORT", "HIGH") diff --git a/desloppify/tests/commands/test_transition_messages.py b/desloppify/tests/commands/test_transition_messages.py new file mode 100644 index 000000000..088198b2a --- /dev/null +++ b/desloppify/tests/commands/test_transition_messages.py @@ -0,0 +1,100 @@ +"""Tests for lifecycle transition messages.""" + +from __future__ import annotations + +import pytest + +from desloppify.app.commands.helpers import transition_messages as mod + + +@pytest.fixture() +def _config_with_messages(monkeypatch): + """Patch load_config to return transition messages.""" + def _make(messages: dict): + monkeypatch.setattr(mod, "load_config", lambda: {"transition_messages": messages}) + return _make + + +def test_emit_exact_phase_match(_config_with_messages, capsys): + _config_with_messages({"execute": "Switch to Sonnet for speed."}) + assert mod.emit_transition_message("execute") is True + assert "Switch to Sonnet for speed." in capsys.readouterr().out + + +def test_emit_coarse_fallback(_config_with_messages, capsys): + _config_with_messages({"review": "Use blind packet."}) + assert mod.emit_transition_message("review_initial") is True + assert "Use blind packet." in capsys.readouterr().out + + +def test_exact_phase_takes_priority_over_coarse(_config_with_messages, capsys): + _config_with_messages({ + "review_initial": "Exact message.", + "review": "Coarse message.", + }) + assert mod.emit_transition_message("review_initial") is True + out = capsys.readouterr().out + assert "Exact message." in out + assert "Coarse message." not in out + + +def test_no_message_configured(_config_with_messages, capsys): + _config_with_messages({}) + assert mod.emit_transition_message("execute") is False + assert capsys.readouterr().out == "" + + +def test_no_transition_messages_key(monkeypatch, capsys): + monkeypatch.setattr(mod, "load_config", lambda: {}) + assert mod.emit_transition_message("execute") is False + assert capsys.readouterr().out == "" + + +def test_empty_string_message_skipped(_config_with_messages, capsys): + _config_with_messages({"execute": " "}) + assert mod.emit_transition_message("execute") is False + assert capsys.readouterr().out == "" + + +def test_non_string_message_skipped(_config_with_messages, capsys): + _config_with_messages({"execute": 42}) + assert mod.emit_transition_message("execute") is False + assert capsys.readouterr().out == "" + + +def test_postflight_fallback_for_review_phase(_config_with_messages, capsys): + _config_with_messages({"postflight": "Summarise what you fixed."}) + assert mod.emit_transition_message("review_initial") is True + assert "Summarise what you fixed." in capsys.readouterr().out + + +def test_postflight_fallback_for_triage_phase(_config_with_messages, capsys): + _config_with_messages({"postflight": "Stop and review."}) + assert mod.emit_transition_message("triage_postflight") is True + assert "Stop and review." in capsys.readouterr().out + + +def test_postflight_does_not_fire_for_execute(_config_with_messages, capsys): + _config_with_messages({"postflight": "Should not appear."}) + assert mod.emit_transition_message("execute") is False + assert capsys.readouterr().out == "" + + +def test_postflight_does_not_fire_for_scan(_config_with_messages, capsys): + _config_with_messages({"postflight": "Should not appear."}) + assert mod.emit_transition_message("scan") is False + assert capsys.readouterr().out == "" + + +def test_coarse_takes_priority_over_postflight(_config_with_messages, capsys): + _config_with_messages({"review": "Specific.", "postflight": "Generic."}) + assert mod.emit_transition_message("review_initial") is True + out = capsys.readouterr().out + assert "Specific." in out + assert "Generic." not in out + + +def test_config_load_failure_returns_false(monkeypatch, capsys): + monkeypatch.setattr(mod, "load_config", lambda: (_ for _ in ()).throw(OSError("nope"))) + assert mod.emit_transition_message("execute") is False + assert capsys.readouterr().out == "" diff --git a/desloppify/tests/detectors/test_dupes.py b/desloppify/tests/detectors/test_dupes.py index ff91c2645..6c4f06be6 100644 --- a/desloppify/tests/detectors/test_dupes.py +++ b/desloppify/tests/detectors/test_dupes.py @@ -2,6 +2,7 @@ import hashlib +import desloppify.engine.detectors.dupes as dupes_mod from desloppify.engine.detectors.base import FunctionInfo from desloppify.engine.detectors.dupes import detect_duplicates @@ -231,3 +232,68 @@ def test_same_file_same_name_pairs_do_not_collapse(self): assert total == 4 assert len(entries) == 2 assert all(entry["kind"] == "exact" for entry in entries) + + def test_near_duplicate_cache_reuses_pairs_without_matcher(self, monkeypatch): + base_lines = [f" result = compute_value_{i}(x, y, z)" for i in range(20)] + body_a = "\n".join(base_lines) + changed = base_lines.copy() + changed[-1] = " result = compute_value_19(x, y, w)" + body_b = "\n".join(changed) + fns = [ + _make_fn("foo", "a.py", body_a, loc=20), + _make_fn("bar", "b.py", body_b, loc=20), + ] + cache: dict[str, object] = {} + first_entries, total = detect_duplicates(fns, threshold=0.8, cache=cache) + assert total == 2 + assert len(first_entries) == 1 + assert isinstance(cache.get("near_pairs"), list) + assert cache.get("near_pairs") + + class _NoMatcher: + def __init__(self, *_args, **_kwargs): + raise AssertionError("near matcher should not run for unchanged cached pairs") + + monkeypatch.setattr( + "desloppify.engine.detectors.dupes.difflib.SequenceMatcher", + _NoMatcher, + ) + second_entries, second_total = detect_duplicates( + fns, + threshold=0.8, + cache=cache, + ) + assert second_total == 2 + assert len(second_entries) == 1 + assert second_entries[0]["kind"] == "near-duplicate" + + def test_cache_threshold_mismatch_falls_back_to_full_near_pass(self, monkeypatch): + base_lines = [f" result = compute_value_{i}(x, y, z)" for i in range(20)] + body_a = "\n".join(base_lines) + changed = base_lines.copy() + changed[-1] = " result = compute_value_19(x, y, w)" + body_b = "\n".join(changed) + fns = [ + _make_fn("foo", "a.py", body_a, loc=20), + _make_fn("bar", "b.py", body_b, loc=20), + ] + cache: dict[str, object] = {} + detect_duplicates(fns, threshold=0.8, cache=cache) + + real_matcher = dupes_mod.difflib.SequenceMatcher + calls = {"count": 0} + + class _CountingMatcher(real_matcher): + def __init__(self, *args, **kwargs): + calls["count"] += 1 + super().__init__(*args, **kwargs) + + monkeypatch.setattr( + "desloppify.engine.detectors.dupes.difflib.SequenceMatcher", + _CountingMatcher, + ) + entries, total = detect_duplicates(fns, threshold=0.95, cache=cache) + assert total == 2 + assert calls["count"] > 0 + assert len(entries) == 1 + assert cache["threshold"] == 0.95 diff --git a/desloppify/tests/detectors/test_next_lint.py b/desloppify/tests/detectors/test_next_lint.py new file mode 100644 index 000000000..de64a6af8 --- /dev/null +++ b/desloppify/tests/detectors/test_next_lint.py @@ -0,0 +1,186 @@ +"""Tests for Next.js `next lint` parser + tool-phase integration.""" + +from __future__ import annotations + +import json +import subprocess # nosec B404 +from pathlib import Path +from types import SimpleNamespace + +import pytest + +from desloppify.languages._framework.generic_parts.parsers import ( + ToolParserError, + parse_next_lint, +) +from desloppify.languages._framework.generic_support.core import make_tool_phase +from desloppify.languages._framework.generic_parts import tool_runner as tool_runner_mod + + +def test_parse_next_lint_aggregates_per_file_and_relativizes_paths(monkeypatch, tmp_path): + monkeypatch.chdir(tmp_path) + scan_path = tmp_path / "apps" / "web" + scan_path.mkdir(parents=True, exist_ok=True) + + payload = [ + { + "filePath": str(tmp_path / "app" / "page.tsx"), + "messages": [ + { + "line": 10, + "column": 2, + "ruleId": "rule-a", + "message": "Bad thing", + "severity": 2, + }, + { + "line": 11, + "column": 1, + "ruleId": "rule-b", + "message": "Another thing", + "severity": 1, + }, + ], + }, + { + "filePath": "relative.js", + "messages": [{"line": 0, "message": "Line defaults to 1", "severity": 1}], + }, + {"filePath": "empty.js", "messages": []}, + ] + + raw = "eslint noise\n" + json.dumps(payload) + "\nmore noise" + entries, meta = parse_next_lint(raw, scan_path) + assert meta == {"potential": 3} + assert len(entries) == 2 + + first = next(e for e in entries if e["file"] == "app/page.tsx") + assert first["line"] == 10 + assert first["id"] == "lint" + assert first["message"].startswith("next lint: Bad thing") + assert first["detail"]["count"] == 2 + assert len(first["detail"]["messages"]) == 2 + + second = next(e for e in entries if e["file"] == "apps/web/relative.js") + assert second["line"] == 1 + assert second["id"] == "lint" + assert second["message"].startswith("next lint: Line defaults to 1") + assert second["detail"]["count"] == 1 + +def test_parse_next_lint_raises_on_missing_json_array(tmp_path): + with pytest.raises(ToolParserError): + parse_next_lint("not json output", tmp_path) + + +def test_next_lint_tool_phase_emits_issues_and_potential(monkeypatch, tmp_path): + monkeypatch.chdir(tmp_path) + scan_path = tmp_path / "apps" / "web" + scan_path.mkdir(parents=True, exist_ok=True) + + payload = [ + {"filePath": "a.js", "messages": [{"line": 3, "message": "x", "severity": 1}]}, + {"filePath": "b.js", "messages": []}, + ] + output = json.dumps(payload) + + def fake_run(argv, *, shell, cwd, capture_output, text, timeout): + assert shell is False + assert capture_output is True + assert text is True + assert timeout == 120 + assert Path(cwd).resolve() == scan_path.resolve() + return subprocess.CompletedProcess(argv, 0, stdout=output, stderr="") + + monkeypatch.setattr(tool_runner_mod.subprocess, "run", fake_run) + + phase = make_tool_phase( + "next lint", + "npx --no-install next lint --format json", + "next_lint", + "next_lint", + 2, + ) + lang = SimpleNamespace(detector_coverage={}, coverage_warnings=[]) + + issues, signals = phase.run(scan_path, lang) + assert signals == {"next_lint": 2} + assert len(issues) == 1 + assert issues[0]["detector"] == "next_lint" + assert issues[0]["file"] == "apps/web/a.js" + assert issues[0]["tier"] == 2 + assert issues[0]["detail"]["count"] == 1 + + +def test_next_lint_tool_phase_reports_potential_when_clean(monkeypatch, tmp_path): + monkeypatch.chdir(tmp_path) + scan_path = tmp_path / "apps" / "web" + scan_path.mkdir(parents=True, exist_ok=True) + + payload = [{"filePath": "a.js", "messages": []}] + output = json.dumps(payload) + + def fake_run(argv, *, shell, cwd, capture_output, text, timeout): + return subprocess.CompletedProcess(argv, 0, stdout=output, stderr="") + + monkeypatch.setattr(tool_runner_mod.subprocess, "run", fake_run) + + phase = make_tool_phase( + "next lint", + "npx --no-install next lint --format json", + "next_lint", + "next_lint", + 2, + ) + lang = SimpleNamespace(detector_coverage={}, coverage_warnings=[]) + + issues, signals = phase.run(scan_path, lang) + assert issues == [] + assert signals == {"next_lint": 1} + + +def test_next_lint_tool_phase_records_coverage_warning_on_tool_missing(monkeypatch, tmp_path): + monkeypatch.chdir(tmp_path) + + def fake_run(*_args, **_kwargs): + raise FileNotFoundError("missing tool") + + monkeypatch.setattr(tool_runner_mod.subprocess, "run", fake_run) + + phase = make_tool_phase( + "next lint", + "npx --no-install next lint --format json", + "next_lint", + "next_lint", + 2, + ) + lang = SimpleNamespace(detector_coverage={}, coverage_warnings=[]) + + issues, signals = phase.run(tmp_path, lang) + assert issues == [] + assert signals == {} + assert lang.detector_coverage["next_lint"]["reason"] == "tool_not_found" + assert lang.coverage_warnings and lang.coverage_warnings[0]["detector"] == "next_lint" + + +def test_next_lint_tool_phase_records_coverage_warning_on_parser_error(monkeypatch, tmp_path): + monkeypatch.chdir(tmp_path) + + def fake_run(argv, *, shell, cwd, capture_output, text, timeout): + return subprocess.CompletedProcess(argv, 0, stdout="not json", stderr="") + + monkeypatch.setattr(tool_runner_mod.subprocess, "run", fake_run) + + phase = make_tool_phase( + "next lint", + "npx --no-install next lint --format json", + "next_lint", + "next_lint", + 2, + ) + lang = SimpleNamespace(detector_coverage={}, coverage_warnings=[]) + + issues, signals = phase.run(tmp_path, lang) + assert issues == [] + assert signals == {} + assert lang.detector_coverage["next_lint"]["reason"] == "parser_error" + assert lang.coverage_warnings and lang.coverage_warnings[0]["detector"] == "next_lint" diff --git a/desloppify/tests/intelligence/test_review_import_prepare_split_direct.py b/desloppify/tests/intelligence/test_review_import_prepare_split_direct.py index f32ff6a67..088f6b52e 100644 --- a/desloppify/tests/intelligence/test_review_import_prepare_split_direct.py +++ b/desloppify/tests/intelligence/test_review_import_prepare_split_direct.py @@ -296,6 +296,78 @@ def test_authorization_collector_includes_with_auth_siblings_same_directory() -> ] +def test_prepare_holistic_payload_compacts_batch_dimension_contexts() -> None: + deps = orchestration_mod.HolisticPrepareDependencies( + is_file_cache_enabled_fn=lambda: False, + enable_file_cache_fn=lambda: None, + disable_file_cache_fn=lambda: None, + build_holistic_context_fn=lambda *_args, **_kwargs: {"codebase_stats": {"total_files": 1}}, + build_review_context_fn=lambda *_args, **_kwargs: SimpleNamespace(), + load_dimensions_for_lang_fn=lambda _name: ( + ["naming_quality"], + {"naming_quality": {"prompt": "Assess naming"}}, + "sys", + ), + resolve_dimensions_fn=lambda cli_dimensions, default_dimensions: cli_dimensions or default_dimensions, + get_lang_guidance_fn=lambda _name: "guide", + assemble_holistic_batches_fn=lambda *_args, **_kwargs: [ + { + "name": "Naming", + "dimensions": ["naming_quality"], + "files_to_read": ["src/a.py"], + "why": "seed", + } + ], + holistic_batch_deps=holistic_batches_mod.HolisticBatchAssemblyDependencies( + build_investigation_batches_fn=lambda *_args, **_kwargs: [], + batch_concerns_fn=lambda *_args, **_kwargs: None, + filter_batches_to_dimensions_fn=lambda batches, _dims, **_kwargs: batches, + append_full_sweep_batch_fn=lambda **_kwargs: None, + log_best_effort_failure_fn=lambda *_args, **_kwargs: None, + logger=object(), + ), + serialize_context_fn=lambda _ctx: {}, + ) + payload = orchestration_mod.prepare_holistic_review_payload( + Path("."), + SimpleNamespace(name="python", file_finder=lambda _path: ["src/a.py"], zone_map=None), + state={ + "dimension_contexts": { + "naming_quality": { + "insights": [ + { + "header": "Names map to command intent", + "description": "Full rationale should remain packet-level only.", + "settled": True, + "positive": True, + } + ] + } + } + }, + options=SimpleNamespace( + files=["src/a.py"], + dimensions=["naming_quality"], + include_full_sweep=False, + max_files_per_batch=10, + include_issue_history=False, + issue_history_max_issues=10, + issue_history_max_batch_items=5, + ), + deps=deps, + ) + + assert payload["dimension_contexts"]["naming_quality"]["insights"][0]["description"] + batch_ctx = payload["investigation_batches"][0]["dimension_contexts"]["naming_quality"] + assert batch_ctx["insights"] == [ + { + "header": "Names map to command intent", + "settled": True, + "positive": True, + } + ] + + def test_holistic_batch_assembly_skips_concerns_for_inactive_dimension() -> None: deps = holistic_batches_mod.HolisticBatchAssemblyDependencies( build_investigation_batches_fn=lambda *_args, **_kwargs: [ diff --git a/desloppify/tests/lang/common/test_framework_shared_phases_and_structural_split_direct.py b/desloppify/tests/lang/common/test_framework_shared_phases_and_structural_split_direct.py index 336448758..5b2bc4cf5 100644 --- a/desloppify/tests/lang/common/test_framework_shared_phases_and_structural_split_direct.py +++ b/desloppify/tests/lang/common/test_framework_shared_phases_and_structural_split_direct.py @@ -2,6 +2,7 @@ from __future__ import annotations +import concurrent.futures from pathlib import Path from types import SimpleNamespace @@ -9,6 +10,7 @@ import desloppify.languages._framework.base.shared_phases_structural as structural_mod import desloppify.languages._framework.generic_support.structural as generic_structural_mod from desloppify.engine.policy.zones import Zone +from desloppify.languages._framework.base.types import LangSecurityResult def test_phase_dupes_filters_non_production_functions(monkeypatch) -> None: @@ -21,12 +23,16 @@ def test_phase_dupes_filters_non_production_functions(monkeypatch) -> None: zone_map=SimpleNamespace( get=lambda file_path: Zone.TEST if "tests/" in str(file_path) else Zone.PRODUCTION ), + review_cache={}, ) captured: dict[str, int] = {} - def _fake_detect(filtered): + def _fake_detect(filtered, **kwargs): captured["count"] = len(filtered) + cache_payload = kwargs.get("cache") + assert isinstance(cache_payload, dict) + captured["cache_size"] = len(cache_payload) return [{"id": "pair"}], len(filtered) monkeypatch.setattr(review_mod, "detect_duplicates", _fake_detect) @@ -36,6 +42,8 @@ def _fake_detect(filtered): assert len(issues) == 1 assert captured["count"] == 1 + assert "detectors" in lang.review_cache + assert "dupes" in lang.review_cache["detectors"] assert potentials == {"dupes": 1} @@ -68,6 +76,96 @@ def test_phase_boilerplate_duplication_handles_none_and_entries(monkeypatch) -> assert potentials == {"boilerplate_duplication": 2} +def test_phase_boilerplate_duplication_reuses_cached_entries(monkeypatch, tmp_path) -> None: + (tmp_path / "src").mkdir() + (tmp_path / "src" / "a.py").write_text("print('a')\n") + + calls = {"count": 0} + entries = [ + { + "id": "cluster-1", + "distinct_files": 2, + "window_size": 6, + "sample": ["x = 1"], + "locations": [ + {"file": "src/a.py", "line": 1}, + {"file": "src/b.py", "line": 2}, + ], + } + ] + lang = SimpleNamespace( + zone_map=None, + name="python", + review_cache={}, + file_finder=lambda _path: ["src/a.py"], + ) + + def _fake_detect(_path): + calls["count"] += 1 + return entries + + monkeypatch.setattr(review_mod, "detect_with_jscpd", _fake_detect) + monkeypatch.setattr(review_mod, "_filter_boilerplate_entries_by_zone", lambda items, _zone: items) + + first_issues, first_potentials = review_mod.phase_boilerplate_duplication(tmp_path, lang) + second_issues, second_potentials = review_mod.phase_boilerplate_duplication(tmp_path, lang) + + assert calls["count"] == 1 + assert len(first_issues) == 1 + assert len(second_issues) == 1 + assert first_potentials == {"boilerplate_duplication": 2} + assert second_potentials == {"boilerplate_duplication": 2} + + +def test_phase_boilerplate_duplication_uses_prefetched_result(monkeypatch, tmp_path) -> None: + class _ImmediateExecutor: + def submit(self, fn, *args, **kwargs): + future: concurrent.futures.Future = concurrent.futures.Future() + future.set_result(fn(*args, **kwargs)) + return future + + (tmp_path / "src").mkdir() + (tmp_path / "src" / "a.py").write_text("print('a')\n") + calls = {"count": 0} + entries = [ + { + "id": "cluster-1", + "distinct_files": 2, + "window_size": 6, + "sample": ["x = 1"], + "locations": [ + {"file": "src/a.py", "line": 1}, + {"file": "src/b.py", "line": 2}, + ], + } + ] + lang = SimpleNamespace( + zone_map=None, + name="python", + review_cache={}, + file_finder=lambda _path: ["src/a.py"], + ) + + monkeypatch.setattr(review_mod, "_PREFETCH_EXECUTOR", _ImmediateExecutor()) + monkeypatch.setattr( + review_mod, + "detect_with_jscpd", + lambda _path: (calls.__setitem__("count", calls["count"] + 1), entries)[1], + ) + monkeypatch.setattr(review_mod, "_filter_boilerplate_entries_by_zone", lambda items, _zone: items) + + review_mod.prewarm_review_phase_detectors( + tmp_path, + lang, + [SimpleNamespace(label="Boilerplate duplication", run=review_mod.phase_boilerplate_duplication)], + ) + issues, potentials = review_mod.phase_boilerplate_duplication(tmp_path, lang) + + assert calls["count"] == 1 + assert len(issues) == 1 + assert potentials == {"boilerplate_duplication": 2} + + def test_phase_security_records_default_coverage_when_missing(monkeypatch) -> None: lang = SimpleNamespace( zone_map=None, @@ -119,6 +217,153 @@ def test_phase_security_records_default_coverage_when_missing(monkeypatch) -> No assert lang.detector_coverage["security"]["status"] == "full" +def test_phase_security_reuses_lang_security_cache(monkeypatch, tmp_path) -> None: + (tmp_path / "src").mkdir() + (tmp_path / "src" / "a.py").write_text("print('a')\n") + + calls = {"count": 0} + lang = SimpleNamespace( + zone_map=None, + file_finder=lambda _path: ["src/a.py"], + name="python", + review_cache={}, + detector_coverage={}, + detect_lang_security_detailed=lambda _files, _zones: ( + calls.__setitem__("count", calls["count"] + 1), + LangSecurityResult( + entries=[ + { + "file": "src/lang.py", + "tier": 2, + "confidence": "medium", + "summary": "lang issue", + "name": "lang", + } + ], + files_scanned=3, + ), + )[1], + ) + + monkeypatch.setattr(review_mod, "filter_entries", lambda _zones, entries, _detector: entries) + monkeypatch.setattr( + review_mod, + "_entries_to_issues", + lambda detector, entries, **_kwargs: [{"detector": detector, "file": e["file"]} for e in entries], + ) + monkeypatch.setattr(review_mod, "_log_phase_summary", lambda *_args, **_kwargs: None) + + first_issues, first_potentials = review_mod.phase_security( + tmp_path, + lang, + detect_security_issues=lambda _files, _zones, _name, scan_root: ( + [ + { + "file": str(scan_root / "src" / "cross.py"), + "tier": 2, + "confidence": "high", + "summary": "cross issue", + "name": "cross", + } + ], + 1, + ), + ) + second_issues, second_potentials = review_mod.phase_security( + tmp_path, + lang, + detect_security_issues=lambda _files, _zones, _name, scan_root: ( + [ + { + "file": str(scan_root / "src" / "cross.py"), + "tier": 2, + "confidence": "high", + "summary": "cross issue", + "name": "cross", + } + ], + 1, + ), + ) + + assert calls["count"] == 1 + assert len(first_issues) == 2 + assert len(second_issues) == 2 + assert first_potentials == {"security": 3} + assert second_potentials == {"security": 3} + + +def test_phase_security_uses_prefetched_lang_result(monkeypatch, tmp_path) -> None: + class _ImmediateExecutor: + def submit(self, fn, *args, **kwargs): + future: concurrent.futures.Future = concurrent.futures.Future() + future.set_result(fn(*args, **kwargs)) + return future + + (tmp_path / "src").mkdir() + (tmp_path / "src" / "a.py").write_text("print('a')\n") + calls = {"count": 0} + lang = SimpleNamespace( + zone_map=None, + file_finder=lambda _path: ["src/a.py"], + name="python", + review_cache={}, + detector_coverage={}, + detect_lang_security_detailed=lambda _files, _zones: ( + calls.__setitem__("count", calls["count"] + 1), + LangSecurityResult( + entries=[], + files_scanned=4, + ), + )[1], + ) + + monkeypatch.setattr(review_mod, "_PREFETCH_EXECUTOR", _ImmediateExecutor()) + monkeypatch.setattr(review_mod, "filter_entries", lambda _zones, entries, _detector: entries) + monkeypatch.setattr(review_mod, "_entries_to_issues", lambda *_a, **_k: []) + monkeypatch.setattr(review_mod, "_log_phase_summary", lambda *_args, **_kwargs: None) + + review_mod.prewarm_review_phase_detectors( + tmp_path, + lang, + [SimpleNamespace(label="Security", run=review_mod.phase_security)], + ) + issues, potentials = review_mod.phase_security( + tmp_path, + lang, + detect_security_issues=lambda _files, _zones, _name, **_kwargs: ([], 1), + ) + + assert calls["count"] == 1 + assert issues == [] + assert potentials == {"security": 4} + + +def test_review_function_extraction_cached_across_signature_and_dupes(monkeypatch, tmp_path) -> None: + calls = {"count": 0} + functions = [SimpleNamespace(file="src/a.py", name="foo")] + lang = SimpleNamespace( + extract_functions=lambda _path: ( + calls.__setitem__("count", calls["count"] + 1), + functions, + )[1], + zone_map=None, + review_cache={}, + ) + + monkeypatch.setattr( + "desloppify.engine.detectors.signature.detect_signature_variance", + lambda _functions, **_kwargs: ([], 0), + ) + monkeypatch.setattr(review_mod, "detect_duplicates", lambda _functions, **_kwargs: ([], 1)) + monkeypatch.setattr(review_mod, "make_dupe_issues", lambda *_args, **_kwargs: []) + + review_mod.phase_signature(tmp_path, lang) + review_mod.phase_dupes(tmp_path, lang) + + assert calls["count"] == 1 + + def test_phase_test_coverage_and_private_imports_paths(monkeypatch) -> None: lang_without_zones = SimpleNamespace(zone_map=None) assert review_mod.phase_test_coverage(Path("."), lang_without_zones) == ([], {}) diff --git a/desloppify/tests/lang/common/test_generic_plugin.py b/desloppify/tests/lang/common/test_generic_plugin.py index 8958d65ee..f4eeb667c 100644 --- a/desloppify/tests/lang/common/test_generic_plugin.py +++ b/desloppify/tests/lang/common/test_generic_plugin.py @@ -22,7 +22,7 @@ parse_json, parse_rubocop, ) -from desloppify.languages._framework.generic_parts.parsers import ToolParserError +from desloppify.languages._framework.generic_parts.parsers import ToolParserError, parse_phpstan from desloppify.languages._framework.generic_parts.tool_runner import ( resolve_command_argv, run_tool_result, @@ -233,6 +233,71 @@ def test_invalid_json(self): parse_eslint("not json", Path(".")) +class TestParsePhpstan: + def test_extracts_messages(self): + data = { + "totals": {"errors": 2}, + "files": { + "/app/src/Foo.php": { + "messages": [ + {"message": "Call to undefined method Bar::baz().", "line": 10}, + {"message": "Variable $x might not be defined.", "line": 25}, + ] + }, + "/app/src/Bar.php": { + "messages": [ + {"message": "Parameter #1 expects int, string given.", "line": 5}, + ] + }, + }, + } + entries = parse_phpstan(json.dumps(data), Path(".")) + assert len(entries) == 3 + assert entries[0] == { + "file": "/app/src/Foo.php", + "line": 10, + "message": "Call to undefined method Bar::baz().", + } + assert entries[2] == { + "file": "/app/src/Bar.php", + "line": 5, + "message": "Parameter #1 expects int, string given.", + } + + def test_empty_files(self): + assert parse_phpstan(json.dumps({"files": {}}), Path(".")) == [] + + def test_empty_messages_list(self): + data = {"files": {"/app/src/Foo.php": {"messages": []}}} + assert parse_phpstan(json.dumps(data), Path(".")) == [] + + def test_skips_non_dict_messages(self): + data = { + "files": { + "/app/src/Foo.php": { + "messages": [ + "unexpected string", + {"message": "Real error", "line": 7}, + ] + } + } + } + entries = parse_phpstan(json.dumps(data), Path(".")) + assert len(entries) == 1 + assert entries[0]["message"] == "Real error" + + def test_skips_non_dict_file_values(self): + data = {"files": {"/app/src/Foo.php": "not a dict"}} + assert parse_phpstan(json.dumps(data), Path(".")) == [] + + def test_invalid_json(self): + with pytest.raises(ToolParserError): + parse_phpstan("not json", Path(".")) + + def test_non_dict_root(self): + assert parse_phpstan(json.dumps([1, 2, 3]), Path(".")) == [] + + # ── make_tool_phase tests ───────────────────────────────── @@ -360,6 +425,44 @@ def test_run_tool_result_parser_decode_error_is_error(self, tmp_path): assert failed_result.status == "error" assert failed_result.error_kind == "parser_error" + def test_run_tool_result_parses_stdout_ignoring_stderr_preamble(self, tmp_path): + """stdout JSON should parse successfully even when stderr has non-JSON diagnostics.""" + valid_json = json.dumps([{"file": "a.php", "line": 1, "message": "err"}]) + result_with_stderr_noise = subprocess.CompletedProcess( + args="fake", + returncode=1, + stdout=valid_json, + stderr="Note: Using configuration file /app/phpstan.neon.dist.\n", + ) + result = run_tool_result( + "fake", + tmp_path, + parse_json, + run_subprocess=lambda *_a, **_k: result_with_stderr_noise, + ) + assert result.status == "ok" + assert len(result.entries) == 1 + + def test_run_tool_result_error_preview_uses_combined_output(self, tmp_path): + """Error preview in tool_failed_unparsed_output message includes both stdout and stderr.""" + result_bad = subprocess.CompletedProcess( + args="fake", + returncode=2, + stdout='{"not": "an array"}', + stderr="Note: some diagnostic from stderr\n", + ) + result = run_tool_result( + "fake", + tmp_path, + parse_json, + run_subprocess=lambda *_a, **_k: result_bad, + ) + assert result.status == "error" + assert result.error_kind == "tool_failed_unparsed_output" + assert result.message is not None + assert "not" in result.message # from stdout + assert "diagnostic from stderr" in result.message # from stderr + def test_resolve_command_argv_plain_command_does_not_shell_fallback(self): argv = resolve_command_argv("nonexistent_tool_xyz_123 --version") assert argv == ["nonexistent_tool_xyz_123", "--version"] diff --git a/desloppify/tests/lang/common/test_lang_runtime_isolation.py b/desloppify/tests/lang/common/test_lang_runtime_isolation.py index 2d3d6a5d4..91b6ed98b 100644 --- a/desloppify/tests/lang/common/test_lang_runtime_isolation.py +++ b/desloppify/tests/lang/common/test_lang_runtime_isolation.py @@ -97,6 +97,16 @@ def test_lang_run_does_not_auto_forward_unknown_config_attrs() -> None: _ = run.future_runtime_attr +def test_make_lang_run_preserves_empty_review_cache_reference() -> None: + config = PythonConfig() + review_cache: dict[str, object] = {} + run = make_lang_run( + config, + overrides=LangRunOverrides(review_cache=review_cache), + ) + assert run.review_cache is review_cache + + def test_lang_run_props_threshold_defaults_to_lang_config() -> None: config = PythonConfig() config.props_threshold = 23 diff --git a/desloppify/tests/lang/common/test_treesitter.py b/desloppify/tests/lang/common/test_treesitter.py index 1daf74ea1..6efa1c083 100644 --- a/desloppify/tests/lang/common/test_treesitter.py +++ b/desloppify/tests/lang/common/test_treesitter.py @@ -1163,6 +1163,68 @@ def test_normalization_strips_console(self, js_file, tmp_path): assert "return" in greet.normalized +# ── R extraction tests ───────────────────────────────────────── + + +class TestRExtraction: + @pytest.fixture + def r_file(self, tmp_path): + code = """\ +# Named function assigned with <- +my_func <- function(x, y) { + result <- x + y + return(result) +} + +# Anonymous function inside lapply +result <- lapply(items, function(i) { + value <- process(i) + transform(value) + return(value) +}) +""" + f = tmp_path / "analysis.R" + f.write_text(code) + return str(f) + + def test_named_function_extraction(self, r_file, tmp_path): + from desloppify.languages._framework.treesitter.analysis.extractors import ( + ts_extract_functions, + ) + from desloppify.languages._framework.treesitter.specs.scripting import R_SPEC + + functions = ts_extract_functions(tmp_path, R_SPEC, [r_file]) + names = [f.name for f in functions] + assert "my_func" in names + + def test_anonymous_function_extraction(self, r_file, tmp_path): + from desloppify.languages._framework.treesitter.analysis.extractors import ( + ts_extract_functions, + ) + from desloppify.languages._framework.treesitter.specs.scripting import R_SPEC + + functions = ts_extract_functions(tmp_path, R_SPEC, [r_file]) + names = [f.name for f in functions] + assert "" in names, ( + "Anonymous functions in lapply() should be extracted with " + "synthesized name" + ) + + def test_anonymous_function_has_correct_metadata(self, r_file, tmp_path): + from desloppify.languages._framework.treesitter.analysis.extractors import ( + ts_extract_functions, + ) + from desloppify.languages._framework.treesitter.specs.scripting import R_SPEC + + functions = ts_extract_functions(tmp_path, R_SPEC, [r_file]) + anon_fns = [f for f in functions if f.name == ""] + assert len(anon_fns) >= 1 + fn = anon_fns[0] + assert fn.file == r_file + assert fn.line > 0 + assert fn.body_hash is not None + + # ── ESLint parser tests ─────────────────────────────────────── diff --git a/desloppify/tests/lang/common/test_treesitter_complexity_and_integration.py b/desloppify/tests/lang/common/test_treesitter_complexity_and_integration.py index c3cbe82a1..28990a885 100644 --- a/desloppify/tests/lang/common/test_treesitter_complexity_and_integration.py +++ b/desloppify/tests/lang/common/test_treesitter_complexity_and_integration.py @@ -737,6 +737,341 @@ def test_no_import_query_returns_empty(self, tmp_path): entries = detect_unused_imports([], spec) assert entries == [] + def test_js_named_imports_all_used_no_issue(self, tmp_path): + from desloppify.languages._framework.treesitter import JS_SPEC + from desloppify.languages._framework.treesitter.analysis.unused_imports import ( + detect_unused_imports, + ) + + code = """\ +import { rateLimit, RateLimitConfig } from "@/lib/rate-limit"; + +console.log(rateLimit, RateLimitConfig); +""" + f = tmp_path / "main.js" + f.write_text(code) + + entries = detect_unused_imports([str(f)], JS_SPEC) + assert entries == [] + + def test_js_default_import_used_in_jsx_no_issue(self, tmp_path): + from desloppify.languages._framework.treesitter import JS_SPEC + from desloppify.languages._framework.treesitter.analysis.unused_imports import ( + detect_unused_imports, + ) + + code = """\ +import ReactMarkdown from "react-markdown"; + +export function App() { + return ; +} +""" + f = tmp_path / "main.jsx" + f.write_text(code) + + entries = detect_unused_imports([str(f)], JS_SPEC) + assert entries == [] + + def test_js_aliased_named_import_used_no_issue(self, tmp_path): + from desloppify.languages._framework.treesitter import JS_SPEC + from desloppify.languages._framework.treesitter.analysis.unused_imports import ( + detect_unused_imports, + ) + + code = """\ +import { foo as bar } from "x"; + +console.log(bar); +""" + f = tmp_path / "main.js" + f.write_text(code) + + entries = detect_unused_imports([str(f)], JS_SPEC) + assert entries == [] + + def test_js_namespace_import_used_no_issue(self, tmp_path): + from desloppify.languages._framework.treesitter import JS_SPEC + from desloppify.languages._framework.treesitter.analysis.unused_imports import ( + detect_unused_imports, + ) + + code = """\ +import * as ns from "x"; + +console.log(ns.foo); +""" + f = tmp_path / "main.js" + f.write_text(code) + + entries = detect_unused_imports([str(f)], JS_SPEC) + assert entries == [] + + def test_js_partially_unused_import_line_flags_only_unused_symbol(self, tmp_path): + from desloppify.languages._framework.treesitter import JS_SPEC + from desloppify.languages._framework.treesitter.analysis.unused_imports import ( + detect_unused_imports, + ) + + code = """\ +import { + used, + unused, +} from "x"; + +console.log(used); +""" + f = tmp_path / "main.js" + f.write_text(code) + + entries = detect_unused_imports([str(f)], JS_SPEC) + names = [e["name"] for e in entries] + assert "unused" in names + assert "used" not in names + assert len(entries) == 1 + + def test_js_side_effect_import_only_not_flagged(self, tmp_path): + from desloppify.languages._framework.treesitter import JS_SPEC + from desloppify.languages._framework.treesitter.analysis.unused_imports import ( + detect_unused_imports, + ) + + code = """\ +import "x"; + +console.log("hi"); +""" + f = tmp_path / "main.js" + f.write_text(code) + + entries = detect_unused_imports([str(f)], JS_SPEC) + assert entries == [] + + def test_js_unused_import_issue_id_includes_line_and_symbol(self, tmp_path): + from unittest.mock import MagicMock + + from desloppify.languages._framework.treesitter import JS_SPEC + from desloppify.languages._framework.treesitter.phases import ( + make_unused_imports_phase, + ) + + code = """\ +import { used, unused } from "x"; + +console.log(used); +""" + f = tmp_path / "main.js" + f.write_text(code) + + phase = make_unused_imports_phase(JS_SPEC) + mock_lang = MagicMock() + mock_lang.file_finder.return_value = [str(f)] + + issues, potentials = phase.run(tmp_path, mock_lang) + + assert potentials["unused_imports"] == 1 + assert len(issues) == 1 + assert issues[0]["id"].endswith("::unused_import::1::unused") + assert issues[0]["summary"] == "Unused import: unused" + + def test_ts_named_imports_all_used_no_issue(self, tmp_path): + from desloppify.languages._framework.treesitter import TYPESCRIPT_SPEC + from desloppify.languages._framework.treesitter.analysis.unused_imports import ( + detect_unused_imports, + ) + + code = """\ +import { rateLimit, RateLimitConfig } from "@/lib/rate-limit"; + +export const x: RateLimitConfig = rateLimit(); +""" + f = tmp_path / "main.ts" + f.write_text(code) + + entries = detect_unused_imports([str(f)], TYPESCRIPT_SPEC) + assert entries == [] + + def test_ts_default_import_used_in_tsx_no_issue(self, tmp_path): + from desloppify.languages._framework.treesitter import TYPESCRIPT_SPEC + from desloppify.languages._framework.treesitter.analysis.unused_imports import ( + detect_unused_imports, + ) + + code = """\ +import ReactMarkdown from "react-markdown"; + +export function App() { + return ; +} +""" + f = tmp_path / "main.tsx" + f.write_text(code) + + entries = detect_unused_imports([str(f)], TYPESCRIPT_SPEC) + assert entries == [] + + def test_ts_type_only_named_import_used_in_type_position_no_issue(self, tmp_path): + from desloppify.languages._framework.treesitter import TYPESCRIPT_SPEC + from desloppify.languages._framework.treesitter.analysis.unused_imports import ( + detect_unused_imports, + ) + + code = """\ +import type { Foo } from "x"; + +export type X = Foo; +""" + f = tmp_path / "main.ts" + f.write_text(code) + + entries = detect_unused_imports([str(f)], TYPESCRIPT_SPEC) + assert entries == [] + + def test_ts_partially_unused_import_line_flags_only_unused_symbol(self, tmp_path): + from desloppify.languages._framework.treesitter import TYPESCRIPT_SPEC + from desloppify.languages._framework.treesitter.analysis.unused_imports import ( + detect_unused_imports, + ) + + code = """\ +import { + used, + unused, +} from "x"; + +export const y = used; +""" + f = tmp_path / "main.ts" + f.write_text(code) + + entries = detect_unused_imports([str(f)], TYPESCRIPT_SPEC) + names = [e["name"] for e in entries] + assert "unused" in names + assert "used" not in names + assert len(entries) == 1 + + def test_ts_side_effect_import_only_not_flagged(self, tmp_path): + from desloppify.languages._framework.treesitter import TYPESCRIPT_SPEC + from desloppify.languages._framework.treesitter.analysis.unused_imports import ( + detect_unused_imports, + ) + + code = """\ +import "x"; + +export const x = 1; +""" + f = tmp_path / "main.ts" + f.write_text(code) + + entries = detect_unused_imports([str(f)], TYPESCRIPT_SPEC) + assert entries == [] + + def test_ts_unused_import_issue_id_includes_line_and_symbol(self, tmp_path): + from unittest.mock import MagicMock + + from desloppify.languages._framework.treesitter import TYPESCRIPT_SPEC + from desloppify.languages._framework.treesitter.phases import ( + make_unused_imports_phase, + ) + + code = """\ +import { used, unused } from "x"; + +export const z = used; +""" + f = tmp_path / "main.ts" + f.write_text(code) + + phase = make_unused_imports_phase(TYPESCRIPT_SPEC) + mock_lang = MagicMock() + mock_lang.file_finder.return_value = [str(f)] + + issues, potentials = phase.run(tmp_path, mock_lang) + + assert potentials["unused_imports"] == 1 + assert len(issues) == 1 + assert issues[0]["id"].endswith("::unused_import::1::unused") + assert issues[0]["summary"] == "Unused import: unused" + + def test_js_destructuring_default_value_counts_as_usage(self, tmp_path): + """Default values inside destructuring patterns should count as usage.""" + from desloppify.languages._framework.treesitter import JS_SPEC + from desloppify.languages._framework.treesitter.analysis.unused_imports import ( + detect_unused_imports, + ) + + code = """\ +import { imported } from "x"; + +const { x = imported } = obj; +console.log(x); +""" + f = tmp_path / "main.js" + f.write_text(code) + + entries = detect_unused_imports([str(f)], JS_SPEC) + assert entries == [] + + def test_js_param_default_value_counts_as_usage(self, tmp_path): + """Parameter default values should count as usage (JS grammar).""" + from desloppify.languages._framework.treesitter import JS_SPEC + from desloppify.languages._framework.treesitter.analysis.unused_imports import ( + detect_unused_imports, + ) + + code = """\ +import { Bar } from "x"; + +function f(x = Bar) { + return x; +} +""" + f = tmp_path / "main.js" + f.write_text(code) + + entries = detect_unused_imports([str(f)], JS_SPEC) + assert entries == [] + + def test_ts_param_type_annotation_counts_as_usage(self, tmp_path): + """Parameter type annotations should count as usage (TSX grammar).""" + from desloppify.languages._framework.treesitter import TYPESCRIPT_SPEC + from desloppify.languages._framework.treesitter.analysis.unused_imports import ( + detect_unused_imports, + ) + + code = """\ +import type { Foo } from "x"; + +export function f(x: Foo) { + return x; +} +""" + f = tmp_path / "main.ts" + f.write_text(code) + + entries = detect_unused_imports([str(f)], TYPESCRIPT_SPEC) + assert entries == [] + + def test_ts_file_with_nul_byte_does_not_false_positive(self, tmp_path): + """Stray NUL bytes should not cause parse-truncation false positives.""" + from desloppify.languages._framework.treesitter import TYPESCRIPT_SPEC + from desloppify.languages._framework.treesitter.analysis.unused_imports import ( + detect_unused_imports, + ) + + code = ( + b'import { jest } from "@jest/globals";\n' + b'jest.spyOn(console, "log");\n' + b'\x00\n' + b'jest.spyOn(console, "warn");\n' + ) + f = tmp_path / "main.ts" + f.write_bytes(code) + + entries = detect_unused_imports([str(f)], TYPESCRIPT_SPEC) + assert entries == [] + # ── Signature variance tests ───────────────────────────────── diff --git a/desloppify/tests/lang/python/test_python_security_dictkeys_and_smells_split_direct.py b/desloppify/tests/lang/python/test_python_security_dictkeys_and_smells_split_direct.py index c667ae242..8870e9dda 100644 --- a/desloppify/tests/lang/python/test_python_security_dictkeys_and_smells_split_direct.py +++ b/desloppify/tests/lang/python/test_python_security_dictkeys_and_smells_split_direct.py @@ -41,7 +41,7 @@ def coverage(self): monkeypatch.setattr( py_security_mod, "detect_with_bandit", - lambda _root, _zone_map, *, exclude_dirs: SimpleNamespace( + lambda _root, _zone_map, *, exclude_dirs, skip_tests=None: SimpleNamespace( entries=[{"file": "a.py", "line": 1}], files_scanned=3, status=_Status(), diff --git a/desloppify/tests/plan/test_epic_triage_apply.py b/desloppify/tests/plan/test_epic_triage_apply.py index 1258b06a8..94a3b4af1 100644 --- a/desloppify/tests/plan/test_epic_triage_apply.py +++ b/desloppify/tests/plan/test_epic_triage_apply.py @@ -575,14 +575,15 @@ def test_strategy_summary_in_result(self): assert result.strategy_summary == "My strategy" - def test_only_open_review_issues_in_triaged_ids(self): - """triaged_ids should only contain IDs of open review/concerns issues.""" + def test_only_open_defect_issues_in_triaged_ids(self): + """triaged_ids should contain IDs of all open defect issues (unified pipeline).""" state: dict = { "issues": { "r1": {"status": "open", "detector": "review"}, "r2": {"status": "fixed", "detector": "review"}, "u1": {"status": "open", "detector": "unused"}, "c1": {"status": "open", "detector": "concerns"}, + "a1": {"status": "open", "detector": "subjective_review"}, }, "scan_count": 1, "dimension_scores": {}, @@ -595,9 +596,10 @@ def test_only_open_review_issues_in_triaged_ids(self): triaged = plan["epic_triage_meta"]["triaged_ids"] assert "r1" in triaged assert "c1" in triaged - # Fixed review and non-review issues should not appear + assert "u1" in triaged # mechanical defects are now triage findings + # Fixed issues and assessment requests should not appear assert "r2" not in triaged - assert "u1" not in triaged + assert "a1" not in triaged # --------------------------------------------------------------------------- diff --git a/desloppify/tests/plan/test_epic_triage_prompt_direct.py b/desloppify/tests/plan/test_epic_triage_prompt_direct.py index c8e8647d1..072da5be7 100644 --- a/desloppify/tests/plan/test_epic_triage_prompt_direct.py +++ b/desloppify/tests/plan/test_epic_triage_prompt_direct.py @@ -134,8 +134,8 @@ def test_build_triage_prompt_includes_mechanical_backlog_context() -> None: prompt = build_triage_prompt(triage_input) - assert "## Mechanical backlog (2 items: 1 in 1 auto-clusters, 1 unclustered)" in prompt - assert "### Auto-clusters" in prompt + assert "## Auto-cluster candidates (2 items: 1 in 1 auto-clusters, 1 unclustered)" in prompt + assert "### Auto-clusters (decision required for each)" in prompt assert "- auto/unused-imports (1 items) [autofix: desloppify autofix import-cleanup --dry-run]" in prompt assert "Remove 1 unused import issue" in prompt assert "### Unclustered items (1 items — needs human judgment or isolated findings)" in prompt diff --git a/desloppify/tests/plan/test_plan_modules_direct.py b/desloppify/tests/plan/test_plan_modules_direct.py index 2513640c1..90db1bf33 100644 --- a/desloppify/tests/plan/test_plan_modules_direct.py +++ b/desloppify/tests/plan/test_plan_modules_direct.py @@ -5,6 +5,8 @@ from pathlib import Path from types import SimpleNamespace +import pytest + import desloppify.engine._state.filtering as filtering_mod from desloppify.engine._work_queue.core import QueueBuildOptions import desloppify.engine.planning.helpers as plan_common_mod @@ -63,6 +65,60 @@ def test_select_phases_and_run_phases_behavior(): assert potentials == {"fast": 1, "slow": 2, "review": 3} +def test_generate_issues_from_lang_primes_and_clears_review_prefetch(monkeypatch): + calls: list[str] = [] + lang = SimpleNamespace(phases=[], zone_map=None, name="python") + + monkeypatch.setattr(plan_scan_mod, "_build_zone_map", lambda *_a, **_k: None) + monkeypatch.setattr(plan_scan_mod, "_select_phases", lambda *_a, **_k: []) + monkeypatch.setattr(plan_scan_mod, "_run_phases", lambda *_a, **_k: ([], {})) + monkeypatch.setattr(plan_scan_mod, "_stamp_issue_context", lambda *_a, **_k: None) + monkeypatch.setattr( + plan_scan_mod, + "prewarm_review_phase_detectors", + lambda *_a, **_k: calls.append("prime"), + ) + monkeypatch.setattr( + plan_scan_mod, + "clear_review_phase_prefetch", + lambda *_a, **_k: calls.append("clear"), + ) + + issues, potentials = plan_scan_mod._generate_issues_from_lang(Path("."), lang) + + assert issues == [] + assert potentials == {} + assert calls == ["prime", "clear"] + + +def test_generate_issues_from_lang_clears_prefetch_on_phase_error(monkeypatch): + calls: list[str] = [] + lang = SimpleNamespace(phases=[], zone_map=None, name="python") + + monkeypatch.setattr(plan_scan_mod, "_build_zone_map", lambda *_a, **_k: None) + monkeypatch.setattr(plan_scan_mod, "_select_phases", lambda *_a, **_k: []) + monkeypatch.setattr( + plan_scan_mod, + "_run_phases", + lambda *_a, **_k: (_ for _ in ()).throw(RuntimeError("boom")), + ) + monkeypatch.setattr( + plan_scan_mod, + "prewarm_review_phase_detectors", + lambda *_a, **_k: calls.append("prime"), + ) + monkeypatch.setattr( + plan_scan_mod, + "clear_review_phase_prefetch", + lambda *_a, **_k: calls.append("clear"), + ) + + with pytest.raises(RuntimeError, match="boom"): + plan_scan_mod._generate_issues_from_lang(Path("."), lang) + + assert calls == ["prime", "clear"] + + def test_resolve_lang_prefers_explicit_and_fallbacks(monkeypatch): explicit = object() assert plan_scan_mod._resolve_lang(explicit, Path(".")) is explicit diff --git a/desloppify/tests/plan/test_stale_policy.py b/desloppify/tests/plan/test_stale_policy.py index 9112197fb..701534713 100644 --- a/desloppify/tests/plan/test_stale_policy.py +++ b/desloppify/tests/plan/test_stale_policy.py @@ -512,13 +512,28 @@ def test_closed_review_issues_ignored(self): } assert is_triage_stale(plan, state) is False - def test_non_review_issues_ignored(self): + def test_mechanical_issues_trigger_staleness_on_first_triage(self): + """Mechanical issues trigger staleness when no prior triage exists.""" plan = {"epic_triage_meta": {"triaged_ids": []}} state = { "issues": { "u1": {"status": "open", "detector": "unused"}, } } + assert is_triage_stale(plan, state) is True + + def test_mechanical_within_threshold_not_stale(self): + """Mechanical count growth within threshold does not trigger staleness.""" + plan = {"epic_triage_meta": { + "triaged_ids": ["r1"], + "last_mechanical_count": 100, + }} + state = { + "issues": { + "r1": {"status": "open", "detector": "review"}, + **{f"u{i}": {"status": "open", "detector": "unused"} for i in range(105)}, + } + } assert is_triage_stale(plan, state) is False def test_not_stale_when_stages_in_queue_but_all_triaged(self): diff --git a/desloppify/tests/review/work_queue_cases.py b/desloppify/tests/review/work_queue_cases.py index af65a8e5a..cd5bd848d 100644 --- a/desloppify/tests/review/work_queue_cases.py +++ b/desloppify/tests/review/work_queue_cases.py @@ -875,6 +875,7 @@ def test_registry_standalone_threshold_count(): "dict_keys", "dupes", "naming", + "nextjs", "patterns", "props", "react", diff --git a/docs/CLAUDE.md b/docs/CLAUDE.md index bc6d92d90..f9c283a78 100644 --- a/docs/CLAUDE.md +++ b/docs/CLAUDE.md @@ -19,7 +19,7 @@ Run `desloppify review --prepare` first to generate review data, then use Claude - The codebase path and list of dimensions to score - The blind packet path to read - Instruction to score from code evidence only, not from targets -- Each agent writes output to a separate file. Merge assessments (average overlapping dimension scores) and concatenate findings. +- Each agent writes output to `results/batch-N.raw.txt` (matching the batch index). Merge assessments (average overlapping dimension scores) and concatenate findings. ### Subagent rules diff --git a/docs/CURSOR.md b/docs/CURSOR.md index 2de98d04e..1f455796d 100644 --- a/docs/CURSOR.md +++ b/docs/CURSOR.md @@ -20,7 +20,7 @@ tools: Use the prompt from the "Reviewer agent prompt" section above. Launch multiple reviewer subagents, each with a subset of dimensions. -Each agent writes its output to a separate file. +Each agent writes its output to `results/batch-N.raw.txt` (matching the batch index). Merge assessments (average where dimensions overlap) and findings, then import. diff --git a/docs/HERMES.md b/docs/HERMES.md index fbad5d939..3a48eb392 100644 --- a/docs/HERMES.md +++ b/docs/HERMES.md @@ -1,38 +1,65 @@ ## Hermes Agent Overlay -Hermes Agent supports parallel execution via worktree isolation (`hermes -w`). -Use separate worktree sessions for parallel review and triage work. +Hermes has built-in parallel subagent support via `delegate_task` (up to 3 +concurrent children). Use `delegate_task(tasks=[...])` for subjective review +batches and per-stage triage support; avoid the older worktree-based guidance +here. ### Review workflow -1. Run `desloppify review --prepare` to generate `query.json` and `.desloppify/review_packet_blind.json`. -2. Split dimensions into 3-4 batches by theme (e.g., naming + clarity, - abstraction + error consistency, testing + coverage). -3. Launch parallel Hermes sessions with worktree isolation, one per batch: +1. Prepare review prompts and the blind packet: + ```bash + desloppify review --run-batches --dry-run ``` - hermes -w -q "Score these dimensions: . Read .desloppify/review_packet_blind.json for the blind packet. Score from code evidence only." + This generates one prompt file per batch in + `.desloppify/subagents/runs//prompts/` and prints the run directory. + +2. Launch Hermes subagents in batches of 3 with `delegate_task(tasks=[...])`. + Each subagent should: + - read its prompt file at + `.desloppify/subagents/runs//prompts/batch-N.md` + - read `.desloppify/review_packet_blind.json` + - inspect the repository from the prompt's dimension + - write ONLY valid JSON to + `.desloppify/subagents/runs//results/batch-N.raw.txt` + + Example task payload: + ```json + { + "goal": "Review batch N. Read the prompt at .desloppify/subagents/runs//prompts/batch-N.md, follow it exactly, inspect the repository, and write ONLY valid JSON to .desloppify/subagents/runs//results/batch-N.raw.txt.", + "context": "Repository root: . Blind packet: .desloppify/review_packet_blind.json. The prompt file defines the required output schema. Do not edit repository source files. Only write the review result file.", + "toolsets": ["terminal", "file"] + } ``` -4. Each session writes output to a separate file. Merge assessments - (average overlapping dimension scores) and concatenate findings. -5. Import: `desloppify review --import merged.json --manual-override --attest "Hermes agents ran blind reviews against review_packet_blind.json" --scan-after-import`. -Each session must consume `.desloppify/review_packet_blind.json` (not full -`query.json`) to avoid score anchoring. + Repeat for batches 1-3, 4-6, 7-9, etc. Wait for each group of 3 to finish + before launching the next group. -### Triage workflow +3. After all prompt files for that run have matching results, import them: + ```bash + desloppify review --import-run .desloppify/subagents/runs/ --scan-after-import + ``` -Orchestrate triage with per-stage Hermes sessions: +### Key constraints + +- `delegate_task` supports at most 3 concurrent children at a time. +- Subagents do not inherit parent context; the prompt file and blind packet must + provide everything needed. +- Subagents cannot call `delegate_task`, `clarify`, `memory`, or `send_message`. +- The importer expects `results/batch-N.raw.txt` files, not `.json` filenames. +- The blind packet intentionally omits score history to prevent anchoring bias. + +### Triage workflow -1. For each stage (observe → reflect → organize → enrich → sense-check): - - Get prompt: `desloppify plan triage --stage-prompt ` - - Launch a Hermes session with that prompt: `hermes -w -q ""` - - Verify: `desloppify plan triage` (check dashboard) - - Confirm: `desloppify plan triage --confirm --attestation "..."` -2. Complete: `desloppify plan triage --complete --strategy "..." --attestation "..."` +Run triage stages sequentially. For each stage: -Run stages sequentially. Within observe and sense-check, use parallel -worktree sessions (`hermes -w`) for per-dimension-group and per-cluster -batches respectively. +1. Get the stage prompt or use the command suggested by `desloppify next`. +2. If the stage benefits from parallel review work, use `delegate_task(tasks=[...])` + in groups of 3; otherwise run the stage directly in the parent session. +3. Record the stage output with `desloppify plan triage --stage --report "..."` + or the corresponding `--run-stages --runner ...` command when available. +4. Confirm with `desloppify plan triage --confirm --attestation "..."`. +5. Finish with `desloppify plan triage --complete --strategy "..." --attestation "..."`. diff --git a/docs/SKILL.md b/docs/SKILL.md index 2287e893f..fd632344c 100644 --- a/docs/SKILL.md +++ b/docs/SKILL.md @@ -6,7 +6,6 @@ description: > duplicate functions, code smells, naming issues, import cycles, or coupling problems. Also use when asked for a health score, what to fix next, or to create a cleanup plan. Supports 29 languages. -allowed-tools: Bash(desloppify *) --- @@ -121,6 +120,8 @@ Four paths to get subjective scores: - **Cloud/external**: `desloppify review --external-start --external-runner claude` → follow session template → `--external-submit`. - **Manual path**: `desloppify review --prepare` → review per dimension → `desloppify review --import file.json`. +**Batch output vs import filenames:** Individual batch outputs from subagents must be named `batch-N.raw.txt` (plain text/JSON content, `.raw.txt` extension). The `.json` filenames in `--import merged.json` or `--import findings.json` refer to the final merged import file, not individual batch outputs. Do not name batch outputs with a `.json` extension. + - Import first, fix after — import creates tracked state entries for correlation. - Target-matching scores trigger auto-reset to prevent gaming. Use the blind-review workflow described in your agent overlay doc (e.g. `docs/CLAUDE.md`, `docs/HERMES.md`). - Even moderate scores (60-80) dramatically improve overall health. @@ -203,6 +204,20 @@ desloppify config set commit_tracking_enabled false # disable guidance After resolving findings as `fixed`, the tool shows uncommitted work, committed history, and a suggested commit message. After committing externally, run `record` to move findings from uncommitted to committed and auto-update the linked PR description. +### Agent directives + +Directives are messages shown to agents at lifecycle phase transitions — use them to switch models, set constraints, or give context-specific instructions. + +```bash +desloppify directives # show all configured directives +desloppify directives set execute "Switch to claude-sonnet-4-6. Focus on speed." +desloppify directives set triage "Switch to claude-opus-4-6. Read carefully." +desloppify directives set review "Use blind packet. Do not anchor on previous scores." +desloppify directives unset execute # remove a directive +``` + +Available phases: `execute`, `review`, `triage`, `workflow`, `scan` (and fine-grained variants like `review_initial`, `triage_postflight`, etc.). + ### Quick reference ```bash @@ -261,6 +276,8 @@ If the fix is unclear or the change needs discussion, open an issue at `https:// ## Prerequisite -`command -v desloppify >/dev/null 2>&1 && echo "desloppify: installed" || echo "NOT INSTALLED — run: pip install --upgrade git+https://github.com/peteromallet/desloppify.git"` +`command -v desloppify >/dev/null 2>&1 && echo "desloppify: installed" || echo "NOT INSTALLED — run: uvx --from git+https://github.com/peteromallet/desloppify.git desloppify"` + +If `uvx` is not available: `pip install desloppify[full]` diff --git a/docs/WINDSURF.md b/docs/WINDSURF.md index 295300b51..b52c5fee3 100644 --- a/docs/WINDSURF.md +++ b/docs/WINDSURF.md @@ -12,7 +12,7 @@ multiple Cascade panes manually. in one, abstraction + error consistency in another). 3. Each pane scores its assigned dimensions independently, reading the codebase and `query.json`'s `dimension_prompts` for context. -4. Each pane writes output to a separate file. +4. Each pane writes output to `results/batch-N.raw.txt` (matching the batch index). 5. In the primary pane, merge assessments and findings, then import. If the user prefers a single-pane workflow, review all dimensions sequentially diff --git a/pyproject.toml b/pyproject.toml index a986a4e29..ca52c6c1c 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta" [project] name = "desloppify" -version = "0.9.9" +version = "0.9.10" description = "Multi-language codebase health scanner and technical debt tracker" readme = "README.md" requires-python = ">=3.11" diff --git a/website/index.html b/website/index.html index 3be291215..4cca5c1e3 100644 --- a/website/index.html +++ b/website/index.html @@ -114,6 +114,8 @@

Experimental Token

Our first initiative: a $1,000 bounty to find something poorly engineered in our 91k-line AI-built codebase. 262 comments. Winner: @agustif. + Now live: $1,000 + if Desloppify does something stupid when refactoring your codebase.

Read the full token page → diff --git a/website/main.js b/website/main.js index daba58885..6a9ecafa8 100644 --- a/website/main.js +++ b/website/main.js @@ -158,8 +158,10 @@ async function loadReleases() { } container.innerHTML = releases.map(r => { - const fullHtml = renderMarkdownLight(r.body || 'No release notes.'); - const preview = getFirstParagraph(r.body || ''); + // Strip HTML image blocks (mascot etc.) from release notes + const body = (r.body || 'No release notes.').replace(/]*>[\s\S]*?]*>[\s\S]*?<\/p>/gi, '').trim(); + const fullHtml = renderMarkdownLight(body); + const preview = getFirstParagraph(body); const previewHtml = renderMarkdownLight(preview); const hasMore = (r.body || '').trim().length > preview.length + 10; diff --git a/website/token.css b/website/token.css index d959b2f16..5b3c926a7 100644 --- a/website/token.css +++ b/website/token.css @@ -141,11 +141,16 @@ body #details { color: #fff; } -.status-soon { +.status-active { background: var(--amber); color: #fff; } +.status-soon { + background: #999; + color: #fff; +} + .initiative h3 { font-family: var(--font-hand); font-size: 1.6rem; diff --git a/website/token.html b/website/token.html index 55b6726e8..f1a72b4df 100644 --- a/website/token.html +++ b/website/token.html @@ -114,17 +114,36 @@

$1,000 Bounty: Find Something Poorly Engineered in This ~91k LOC Codebase

-
+
Initiative #2 - Coming Soon + Active
-

More details coming soon

+

$1,000 if Desloppify Does Something Stupid When Refactoring Your Codebase

+

+ Desloppify is an agent harness that refactors and improves code quality. We're + reasonably confident it generally improves codebases — but we want to surface + the cases where it genuinely makes something worse. +

- The next challenge will change the format — less time-based, more focused on - who gets the objectively best response. Follow along on - Discord or - @peterom on X. + If Desloppify refactors your codebase and makes a demonstrably stupid decision — + an abstraction that degrades readability, a change that introduces fragility, a + refactor that makes the code harder to maintain — share your evidence and you + could claim $1,000. +

+
+

Requirements

+

+ Codebase must be 10k+ lines. Run with Claude 4.5/Sonnet 4.6 or GPT-5.3/5.4. + Share your logs and code evidence. One submission per person. + Deadline: March 21st 23:59:59 UTC. +

+

$1,000 in SOL to the winner

+
+