From 2dec96bacbcc14821b6c17fc1d9afc9ec0c25fad Mon Sep 17 00:00:00 2001 From: Zane Wang Date: Tue, 14 Apr 2026 15:14:14 +0800 Subject: [PATCH 1/2] feat: auto-loop + continuity contract + month of iteration MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Three logical change groups bundled in one PR (see PR description for full breakdown). High-level: 1. Month of local iteration (~72 lines): language/framework detection table, source-evaluation rules, phase-adaptation polish, description tweaks. 2. LOOP INTEGRATION (+192 lines): ScheduleWakeup (/loop dynamic) as first-class tool inside /dev. 4 standard patterns (CI wait, long task, agent merge, deploy health), exit-conditions matrix, cost budget (300s cache-TTL trap documented), anti-patterns, CHECKPOINT interaction. 3. CONTINUITY CONTRACT (+64 lines): /dev owns the task end-to-end; stops only at CHECKPOINT 1, CHECKPOINT 2, or loop-escalation. Auto-loop is ON by default for STANDARD/DEEP tiers on phases with external wait. --no-loop opts out. --loop becomes redundant. File growth: skills/dev-orchestrator/SKILL.md 311 -> 639 lines (+328 net). File growth: commands/dev.md adds --loop/--no-loop/--max-loop-iter/--loop-delay flags. Risk assessment: skill is prompt/markdown, no executable code path — there is no crash/security risk. The only real risks are semantic (loop without exit condition → cost drift; 300s delay → cache miss). Both are mitigated by the new Exit Conditions Matrix (mandatory) and cache-budget table with 300s explicitly flagged as the worst choice. --- commands/dev.md | 4 +- skills/dev-orchestrator/SKILL.md | 342 ++++++++++++++++++++++++++++++- 2 files changed, 337 insertions(+), 9 deletions(-) diff --git a/commands/dev.md b/commands/dev.md index d538d71..004dc8a 100644 --- a/commands/dev.md +++ b/commands/dev.md @@ -1,7 +1,7 @@ --- -description: "End-to-end AI development orchestrator. Classifies task type, auto-detects tier, chains all skills." +description: "End-to-end AI development orchestrator. Auto-detects task tier and chains all skills." allowed-tools: Bash, Read, Edit, Write, Glob, Grep, Agent, WebFetch, Skill, AskUserQuestion -argument-hint: " [--quick|--deep|--no-spec|--no-pr|--research|--cicd]" +argument-hint: " [--quick|--deep|--no-spec|--no-pr|--research|--cicd|--loop|--no-loop|--max-loop-iter=N|--loop-delay=N]" --- Invoke the `dev-orchestrator` skill with the user's task description. diff --git a/skills/dev-orchestrator/SKILL.md b/skills/dev-orchestrator/SKILL.md index 1f45b28..743e02d 100644 --- a/skills/dev-orchestrator/SKILL.md +++ b/skills/dev-orchestrator/SKILL.md @@ -20,6 +20,63 @@ P0-P5 P6-P7 P8-P10 --- +## CONTINUITY CONTRACT (end-to-end auto-run) + +**Once /dev starts, it owns the task until P10 ARCHIVE, or a CHECKPOINT, or an +explicit error escalation.** This contract governs when /dev pauses vs continues. + +### /dev MUST NOT stop between phases just because a phase finished. +Proceed directly from P status block to P without user prompting, +unless one of the three legitimate pause reasons below applies. + +### Three legitimate pauses (the ONLY ones) + +1. **CHECKPOINT 1** (after P5 PLAN) — human approves the plan before BUILD +2. **CHECKPOINT 2** (after P9 SHIP) — human reviews deliverable before push/merge +3. **External wait** (CI, long build, agent teams, deploy health) — enters a + LOOP pattern via `ScheduleWakeup`, automatically resumes on signal + +Any other "stop and ask" is a protocol violation. If you find yourself unsure +→ pick the safer default and note it in the next status block, don't halt. + +### Auto-Loop Default (STANDARD and DEEP tiers) + +- **STANDARD / DEEP**: Loop is **automatically enabled** in every phase that + has an "external wait" signal. User does NOT need to pass `--loop`. +- **QUICK**: Loop is **automatically disabled** — every QUICK-tier work is by + definition short enough to block on without cache cost. +- **Override**: `--no-loop` forces blocking mode even on STANDARD/DEEP (useful + when the user wants to watch CI live rather than cede control to the loop). + +### End-to-End Sequencing Examples + +**DEEP DEVELOP task with CI**: +``` +P0 → P1 → P2 → P3 → P4 → P5 → [CHECKPOINT 1: pause for human] +↓ (approved) +P6 → P7 (TDD per task, Agent Teams may auto-loop at merge point) + → P8 (verify; if long-running → auto-loop Pattern 2) + → P9 (push PR; auto-enter LOOP Pattern 1 for CI wait) + → [CI green] → [CHECKPOINT 2: pause for human] +↓ (approved) +P10 → DONE (session ends gracefully) +``` + +**STANDARD DEVELOP task without CI wait**: +``` +P0 → P1 → P2 → P3 → P4 → P5 → [CHECKPOINT 1] → ... → P9 (push) → [CHECKPOINT 2] → P10 → DONE +``` +No loop invocation needed (no external waits), but the /dev session still runs +end-to-end without user needing to say "continue" between phases. + +### Why this matters + +Without this contract, /dev stops 10+ times in a DEEP task ("P1 done, continue?" +"P3 done, continue?"). With it, /dev stops 2-3 times total: the two CHECKPOINTs +and any loop-failure escalations. User cognitive load drops by 5-10×. + +--- + ## ENFORCEMENT: Phase Status Blocks (NON-NEGOTIABLE) **Every phase MUST output a status block. Silent skipping = protocol violation.** @@ -37,6 +94,24 @@ P0-P5 P6-P7 P8-P10 - No status block = phase not executed = protocol violation. - Output status blocks in conversation, NOT in plan files. +### Loop Wake Status Block (when ScheduleWakeup-driven iteration is active) + +When a phase uses `ScheduleWakeup` (see LOOP INTEGRATION section), each wake +MUST emit a loop-specific block BEFORE the regular phase block: + +``` +┌─ LOOP WAKE: P/ (iter /) ─ +│ Elapsed: m | Next delay: s (cache ) +│ Signal: +│ Exit target: | +│ Action this cycle: +│ Decision: +└──────────────────────────────────────────────── +``` + +If `iter >= max` or `cost cap hit` → Decision MUST be `exit-failure` or +`escalate`. Do NOT silently continue past the declared cap. + --- ## P0: PRE-CHECK (Auto-Setup, runs once per project) @@ -61,11 +136,44 @@ Show one-line status per check. Do NOT ask permission — just do it. | Source | Action | When | |--------|--------|------| | **Memory** | `claudemem search "" --compact --format json --limit 5` | Always | +| **Skills** | Scan available skills list in system context. Identify domain-specific skills whose `description` matches the task (e.g., `social-investigator` for social media, `claude-api` for Anthropic SDK). Add matched skills to P7 execution plan as tool sources. | Always | +| **Search-first** | Invoke ECC `search-first` skill — search for existing tools, libraries, patterns before writing custom code | DEVELOP tasks | | **Web** | Brave (direct lookups) or Exa (exploratory) per CLAUDE.md MCP table | Always | | **Library docs** | context7 for frameworks/libraries involved | If applicable | | **Current state** | API → fetch MCP; DB → postgres/mysql MCP; UI → playwright | If applicable | | **Project context** | Read files, `openspec/specs/`, `git log --oneline -10` | Always | +### Language/Framework Detection (for DEVELOP tasks) + +Detect primary project language to enable language-aware agent delegation in P7/P9: + +| Indicator File | Language | ECC Reviewer Agent | ECC Build Resolver | +|---------------|----------|-------------------|-------------------| +| `package.json` / `tsconfig.json` | TypeScript | `typescript-reviewer` | `build-error-resolver` | +| `pyproject.toml` / `setup.py` / `requirements.txt` | Python | `python-reviewer` | — | +| `go.mod` | Go | `go-reviewer` | `go-build-resolver` | +| `Cargo.toml` | Rust | `rust-reviewer` | `rust-build-resolver` | +| `pom.xml` / `build.gradle.kts` (Java) | Java | `java-reviewer` | `java-build-resolver` | +| `build.gradle.kts` (Kotlin) | Kotlin | `kotlin-reviewer` | `kotlin-build-resolver` | +| `CMakeLists.txt` / `*.cpp` | C++ | `cpp-reviewer` | `cpp-build-resolver` | +| `Package.swift` | Swift | — | — | +| `pubspec.yaml` | Flutter/Dart | `flutter-reviewer` | — | + +Record detected language in P2 status block. If mixed-language, note primary + secondary. + +### Source Evaluation (applied to ALL P1 findings) + +External information (web search, DeepWiki, API responses, LLM summaries) is input, NOT truth. + +**Before any P1 finding influences P2-P10 decisions**: +1. **Verify load-bearing claims** against primary sources (source code > official docs > AI summaries) +2. **Flag source type**: primary (code/docs), community (wikis/tutorials), AI-generated (DeepWiki/LLM), opinion (blogs/forums) +3. **Check for bias/agenda**: Is the source promoting something? Selling something? Competing? +4. **Cross-reference** when claims conflict — trust what you can directly observe over what others describe +5. **Note confidence level** in status block: "P1 finding X (high confidence — verified in source code)" vs "P1 finding Y (medium — from DeepWiki, not verified)" + +If a decision in P3-P7 depends on an unverified external claim → verify first or flag uncertainty. + Save key findings as claudemem notes **during** investigation, not after. --- @@ -135,7 +243,7 @@ QUICK tier outputs status blocks for P1, P2, and a combined P7-P9 block. |-------|------|--------| | **Deep Investigation** | Before P3 | `feature-dev:code-explorer` agents (2-3 parallel), claudemem sessions | | **Agent Teams** | P7 if root cause unclear | 3-5 agents with competing hypotheses, disprove each other | -| **CICD Persistence** | After P9 | `gh run watch`, verify CI passes, fix if fails | +| **CICD Persistence (auto)** | After P9 | **Automatically** enters `ScheduleWakeup` LOOP Pattern 1 (CI wait) — NOT blocking `gh run watch`. User does not need to pass `--loop`. Verify CI passes, auto-spawn fix cycle if CI fails. See LOOP INTEGRATION + CONTINUITY CONTRACT. | | **Deep Wrapup** | P10 | Detailed `/wrapup` session report, Goal/Done/Next by business objectives | --- @@ -146,6 +254,7 @@ QUICK tier outputs status blocks for P1, P2, and a combined P7-P9 block. **This is the FIRST skill invoked after classification. It determines everything downstream.** Invoke `brainstorming` skill. Do NOT skip "because it seems simple." +**DEEP tier**: additionally invoke ECC `architect` agent for architecture review and ADR generation. | Task Type | Brainstorm Focus | |-----------|-----------------| @@ -187,17 +296,31 @@ This is the ONLY human input before BUILD starts. ## P7: EXECUTE -### DEVELOP: TDD Per Task +### DEVELOP: TDD Per Task (Language-Aware) Invoke `subagent-driven-development` per task from plan: -1. **Implementer** — follows TDD Protocol | 2. **Spec reviewer** — compare vs specs -3. **Quality reviewer** — bugs, security (≥80% confidence) | 4. **Per-task verify** — full suite +1. **Implementer** — follows TDD Protocol (use `tdd-guide` agent from ECC when available) +2. **Spec reviewer** — compare vs specs +3. **Quality reviewer** — use language-specific ECC reviewer agent (detected in P1): + - TypeScript → `typescript-reviewer` | Python → `python-reviewer` | Go → `go-reviewer` + - Rust → `rust-reviewer` | Java → `java-reviewer` | Kotlin → `kotlin-reviewer` + - C++ → `cpp-reviewer` | Flutter → `flutter-reviewer` | Mixed/other → `code-reviewer` +4. **Per-task verify** — full suite 5. **Auto-commit** — test + implementation together +6. **On build failure** — use language-specific ECC build resolver agent if available **Agent Teams upgrade** (DEEP): if 3+ independent layers → `dispatching-parallel-agents`. +**Database changes**: if task involves SQL/ORM/migrations → add `database-reviewer` agent to quality team. + **Proactive MCP use**: context7 for APIs, fetch for endpoints, playwright for UI, DB MCPs for data. +**Long-running work → ScheduleWakeup loop** (see LOOP INTEGRATION): +- Docker build >3 min, heavy test suites, background subagent teams, remote pipelines +- Start the work with `run_in_background: true` OR `Task` in background OR `Monitor`, then call `ScheduleWakeup` with LOOP Pattern 2 (Long task) or Pattern 3 (Agent merge) +- Each wake: read `BashOutput` / `TaskOutput`, decide continue vs exit +- Do NOT sit idle blocking on long builds — context dies faster than you wait + ### RESEARCH: Investigate + Notes Per question from research plan: @@ -236,7 +359,10 @@ After all tasks: `bash ~/.claude/scripts/verify-dev.sh [--research|--cicd]` ## P8: VERIFY -Invoke `verification-before-completion` skill, then: +Invoke `/verify` command (9-phase unified verification with Iron Law — supersedes +`verification-before-completion` + `verification-loop` skills). For DEVELOP tasks, +run `/verify` with project auto-detection. For RESEARCH/CICD, run manual checks below. +Then verify: | Check | DEVELOP | RESEARCH | CICD | |-------|---------|----------|------| @@ -248,13 +374,21 @@ Invoke `verification-before-completion` skill, then: If ANY check fails → loop back to P7. +**Long-running verify via ScheduleWakeup** (LOOP Pattern 2): If `/verify` includes +a heavy step (full integration suite, docker build + e2e, browser visual diff) +that runs >2 min: +- Start it in background (`run_in_background: true` or `Task`) +- Call ScheduleWakeup with the `/dev` prompt and reason `"verify wait: "` +- On each wake: poll `BashOutput`, decide continue vs exit +- On completion: proceed to P9; on fail: return to P7 with evidence + ## P9: SHIP | Task Type | Steps | |-----------|-------| -| **DEVELOP** | 1. Code review (`requesting-code-review`) → fix Critical/Important. 2. `verify-dev.sh` final gate. 3. `finishing-a-development-branch` → PR. 4. Push (ask user). 5. DEEP: `gh run watch`. | +| **DEVELOP** | 1. `/codex-review` (Codex + 5 Claude agents + Haiku cross-scorer — supersedes `requesting-code-review`) → fix Critical/Important. 2. STANDARD+: `security-reviewer` agent for auth/crypto/input code. 3. `verify-dev.sh` final gate. 4. `finishing-a-development-branch` → PR. 5. Push (ask user). 6. DEEP: enter LOOP Pattern 1 (CI wait) via ScheduleWakeup — NOT blocking `gh run watch`. | | **RESEARCH** | 1. Self-review report for accuracy. 2. Deliver: report + claudemem notes. 3. Present Goal/Done/Next summary. | -| **CICD** | 1. Apply change (with user confirmation). 2. Monitor (`gh run watch` / health check). 3. Verify health. 4. Document in runbook if new pattern. | +| **CICD** | 1. Apply change (with user confirmation). 2. Monitor via LOOP Pattern 1 or health check. 3. Verify health. 4. Document in runbook if new pattern. | ### CHECKPOINT 2: Developer reviews deliverable - DEVELOP: "PR ready with X/X tests passing, 0 regressions." @@ -309,3 +443,197 @@ When Plan Mode is active simultaneously with /dev: | `--no-pr` | Auto-commit without PR (quick fixes on feature branches) | | `--research` | Force RESEARCH task type | | `--cicd` | Force CICD task type | +| `--loop` | **Redundant on STANDARD/DEEP — loop is auto-enabled by default** (see CONTINUITY CONTRACT). Kept for explicit clarity. | +| `--no-loop` | Force blocking mode even on STANDARD/DEEP (use when you want to watch a long wait live rather than auto-loop) | +| `--max-loop-iter=N` | Cap max wakes per loop pattern (default 20 for CI, 15 for long-task, 10 for agent-merge) | +| `--loop-delay=N` | Default delay seconds for loop cycles (default 270 — stays within prompt cache TTL) | + +--- + +## LOOP INTEGRATION (ScheduleWakeup) + +**`ScheduleWakeup` is a Claude Code harness built-in tool** (not a skill, not a +plugin — provided by the CLI runtime itself). It re-fires the same `/loop` +prompt after N seconds so a single session can "check back" on long-running +work without blocking. Used inside /dev for phases that wait on external +signals (CI, long builds, agent teams, deployments). + +### Core Contract + +``` +ScheduleWakeup( + delaySeconds: int, # clamped to [60, 3600] by the runtime + prompt: str, # MUST be the same /dev input — forwards the loop + reason: str # short specific line shown to user between wakes +) +``` + +Omit the call to end the loop. For an autonomous /loop (no user prompt), +pass the literal sentinel `<>` as `prompt` instead +(do NOT confuse with `<>` which is the CronCreate variant). + +### When /dev SHOULD use it — AUTO by default on STANDARD/DEEP + +Per CONTINUITY CONTRACT, loop is **automatically engaged** in these phases +when the relevant signal exists. User does NOT need to invoke it manually: + +| Phase | Pattern | Signal to watch | Auto-trigger condition | Decision | +|-------|---------|-----------------|----------------------|----------| +| P7 EXECUTE (DEEP) | Agent-merge loop | `TaskList --status running == 0` | 3+ parallel subagents dispatched | All agents done → continue | +| P7 EXECUTE | Long-build loop | `BashOutput` of background task | Build cmd started with `run_in_background=true` and expected >2 min | Build complete → continue; fail → P7 retry | +| P8 VERIFY | Long-verify loop | Full suite / e2e / docker build | `/verify` step elapsed >2 min | Pass → P9; Fail → P7 with evidence | +| P9 SHIP (DEEP) | CI-wait loop | `gh pr checks ` | PR pushed AND at least one check is pending | Green → P10; Red → auto-fix subtask → restart | +| P9 SHIP CICD | Deploy-health loop | `curl /health` or `gh run watch` status | Deploy applied AND health probe defined | Healthy → runbook; Unhealthy → rollback | + +**Auto vs manual**: `--no-loop` disables all of the above (blocking mode). +`--loop` is redundant (already the default). `--max-loop-iter` and +`--loop-delay` tune individual patterns. + +### When /dev MUST NOT use it + +- QUICK tier — no phase is long enough to justify cache cycling +- P3 BRAINSTORM / P5 PLAN — pure reasoning, no external signal +- CHECKPOINT 1 / CHECKPOINT 2 — human decision; session auto-pauses on AskUserQuestion +- Any phase completing in <60s (below delay floor, overhead not worth it) + +### Cost Budget — prompt cache TTL trap + +Anthropic prompt cache TTL = **300 seconds**. This gives the regimes: + +| delaySeconds | Regime | Economics | Use for | +|------|--------|-----------|---------| +| 60–270 | cache-HIT | near-free per wake (incremental context only) | Active polling: CI pending, build mid-flight, recent test run | +| **300** | WORST | full context reread without amortization | **NEVER — do not pick 300** | +| 301–3600 | cache-MISS | 1 full-context reread per wake | Genuinely idle waits: overnight deploy, slow-queue jobs | + +Default for /dev loops: **270s** (last cache-safe value). Use `--loop-delay=N` +to override. Only go above 300s if each wake is expected to find "still +waiting, nothing new" — otherwise cache-HIT cadence is strictly cheaper. + +### Exit Conditions Matrix (MANDATORY — declare before each loop) + +Before the FIRST `ScheduleWakeup` call of a loop, output this block inline: + +``` +┌─ LOOP CONFIG: ───────────────── +│ Phase: P | Pattern: <1-CI | 2-long-task | 3-agent-merge | 4-deploy-health> +│ Success exit: +│ Failure exit: +│ Timeout exit: = max OR elapsed > Xm> +│ Delay: s | Max iter: | Cost cap: ~ full-context reads +└──────────────────────────────────────────────── +``` + +At least one of (success, failure, timeout) MUST fire within max_iter. If you +can't state a concrete signal → don't start the loop (use blocking call instead). + +### Standard Patterns + +#### Pattern 1 — CI Wait (P9 DEEP, DEVELOP + CICD) +``` +Trigger : PR pushed via finishing-a-development-branch +Signal : gh pr checks → pass | pending | fail +Delay : 270s while pending +Max iter : 20 (≈ 90 min — covers most CI runs) +Success : all checks green → proceed to P10 ARCHIVE +Failure : any check red → spawn fix subtask (new P7 cycle), re-push, restart loop iter counter +Timeout : escalate to user ("CI stuck > 90min, investigating") +reason field: "CI wait iter /20 — PR # checks pending" +``` + +#### Pattern 2 — Long Task (P7, P8) +``` +Trigger : Background build/test started (run_in_background=true or Task) +Signal : BashOutput status + exit code +Delay : 120s for first 5 min, 270s after (adaptive — cache still hits in both) +Max iter : 15 (≈ 60 min) +Success : exit code 0 → proceed +Failure : exit code non-zero → extract last 30 lines of log, return to P7 with evidence +Timeout : kill background task, report partial output, escalate +reason field: "build wait iter /15 — elapsed m" +``` + +#### Pattern 3 — Agent Merge (P7 DEEP with Agent Teams) +``` +Trigger : 3+ parallel subagents dispatched via dispatching-parallel-agents +Signal : TaskList where status == "running" → 0 +Delay : 180s +Max iter : 10 (≈ 30 min) +Success : all agents completed → gather TaskOutput, synthesize, continue +Failure : any agent errored → inspect error, decide retry vs abort +Timeout : kill hanging agents, proceed with partial results + flag +reason field: "agent merge iter /10 — agents running, done" +``` + +#### Pattern 4 — Deploy Health (P9 CICD) +``` +Trigger : Deploy applied (k8s rollout, docker compose up, terraform apply) +Signal : curl health endpoint OR gh run status OR platform-specific probe +Delay : 60s for first 3 min (rapid smoke), 270s after +Max iter : 15 +Success : 5 consecutive green health checks → runbook entry + P10 +Failure : any 5xx or status!=healthy for 3 consecutive checks → auto-rollback trigger, return to P7 +Timeout : declare degraded state, escalate +reason field: "deploy health iter /15 — status=" +``` + +### Call Template + +``` +ScheduleWakeup( + delaySeconds=270, + prompt="", + reason="CI wait iter 3/20 — PR #1234 checks pending (typescript, python, e2e)" +) +``` + +- `prompt` must be verbatim the original /dev invocation. Stripping or + rephrasing loses /dev's context-classification. +- `reason` is user-facing — it's the ONLY line shown between wakes — so it + must tell them specifically what is being watched and where in the loop. + +### Anti-Patterns (forbidden inside /dev) + +| Anti-pattern | Why forbidden | +|--------------|---------------| +| Calling ScheduleWakeup without declaring exit conditions | No way to prove the loop terminates — risks cost drift | +| Picking `delaySeconds=300` | Worst-of-both: cache miss + no idle benefit | +| Loop without max_iter cap | Prompt/signal drift can cause infinite cycling | +| Using same loop across two phases | Each wake reruns P0-P2, wastes context; split instead | +| Omitting reason field or making it generic ("continue loop") | User has zero visibility into what /dev is doing | +| Re-entering a loop pattern after `exit-failure` without root-cause fix | Cycling the same error burns cache without progress | + +### Loop Interaction with CHECKPOINTs + +- CHECKPOINT 1 (after P5) and CHECKPOINT 2 (after P9) use `AskUserQuestion` — the + session pauses for the human. **Do NOT call ScheduleWakeup at checkpoints** — + it races against human response. +- If a loop is active when a checkpoint fires, end the loop first (declare + `exit-success` or `exit-escalate`), then present the checkpoint. + +--- + +## ECC (everything-claude-code) INTEGRATION + +ECC plugin provides language-specific agents, hooks, and skills that enhance /dev phases. + +### Auto-Active (no action needed — hooks run via plugin) +- **PostToolUse**: quality-gate, build-analysis, console-log-warning, typecheck +- **PreToolUse**: commit-quality, config-protection, block-no-verify, observe (CL v2) +- **Stop**: cost-tracker, session-persist, evaluate-session (pattern extraction) +- **PreCompact**: save state before compaction + +### Language-Aware Agent Selection (P7/P9) +Detect project language in P1 (via indicator files). Use corresponding ECC agents: +- `tdd-guide`: Enhanced TDD enforcement (replaces generic TDD when available) +- `{lang}-reviewer`: Language-specific code review (typescript, python, go, rust, java, kotlin, cpp, flutter) +- `{lang}-build-resolver`: Language-specific build error resolution +- `database-reviewer`: SQL/ORM change review (any project with DB operations) +- `security-reviewer`: Security-focused review (auth, crypto, input handling) +- `architect`: Architecture review and ADR generation (DEEP tier P3) + +### ECC Hook Profile +Controlled via `ECC_HOOK_PROFILE` env var: +- `minimal`: lifecycle hooks only (session-start, session-end) +- `standard` (default): + quality gates, observers, commit checks +- `strict`: + auto-format, typecheck, full security monitoring From da57e4e76917b4e2a7192d1bf4000ac9e6703ae9 Mon Sep 17 00:00:00 2001 From: Zane Wang Date: Tue, 14 Apr 2026 16:27:38 +0800 Subject: [PATCH 2/2] refactor(v3): zero-checkpoint + requirement clarification gate + loop fixes MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Evolves from v2.1 (2 mandatory checkpoints) to v3.0 (zero mandatory checkpoints) per design roadmap from 2026-03-08. Key changes: 1. NEW CONTINUITY CONTRACT v3 (zero-checkpoint): - /dev runs P0 → P10 without mandatory pauses - 3 legitimate pauses only: Requirement Clarification Gate (P3 prefix, one-shot), External Wait (auto-loop), Anomaly Escalation (AI confidence <70%) - Models a senior engineer's workflow — clarify once up-front, then execute autonomously 2. Requirement Clarification Gate: - Fires at P3 prefix if requirement has load-bearing ambiguity AI can't resolve via claudemem + codebase + web + DAO reasoning - Single AskUserQuestion with 2-4 pointed options, no subsequent pauses - Calibration examples for when to clarify vs when to just do it 3. CHECKPOINT 1 + CHECKPOINT 2 removed: - P5 no longer pauses after plan (plan written to disk as artifact) - P9 no longer pauses after PR (user reviews via PR/git log/wrapup) - Safety exception: push to master/main/production triggers anomaly-escalate 4. Loop fixes addressing codex-review findings: - Pattern 1 success: direct → P10 (no CHECKPOINT detour) - Pattern 1 failure: max fix-cycles: 3 outer cap (was unbounded) - Universal unknown-signal circuit breaker: ≥3 consecutive unknown wakes → exit-escalate (prevents 90min burn on probe failures) - LOOP WAKE status block: Elapsed/cache fields marked estimated (Claude can't observe wall-clock or cache state directly) - P9 CICD → Pattern 4 (Deploy Health), not Pattern 1 (PR CI) 5. Language detection fixes: - TypeScript: tsconfig.json OR (.ts files AND package.json) - JavaScript: package.json without .ts files - Kotlin: build.gradle.kts (no longer collides with Java) - Java: pom.xml OR build.gradle (Groovy DSL, no .kts) File growth: skills/dev-orchestrator/SKILL.md 639 → 736 lines (+97 net). --- skills/dev-orchestrator/SKILL.md | 226 ++++++++++++++++++++++--------- 1 file changed, 163 insertions(+), 63 deletions(-) diff --git a/skills/dev-orchestrator/SKILL.md b/skills/dev-orchestrator/SKILL.md index 743e02d..d6c5e2f 100644 --- a/skills/dev-orchestrator/SKILL.md +++ b/skills/dev-orchestrator/SKILL.md @@ -14,66 +14,133 @@ Developer describes what they want → AI classifies, investigates, plans, execu verifies, and delivers — with structural enforcement at every phase. ``` -UNDERSTAND ──[CHECKPOINT 1]──> BUILD ──[GATE]──> DELIVER ──[CHECKPOINT 2]──> -P0-P5 P6-P7 P8-P10 +UNDERSTAND ──[GATE]──> BUILD ──[GATE]──> DELIVER ───> +P0-P5 (1 optional clarify) P6-P7 P8-P10 (full auto) +Zero mandatory checkpoints — see CONTINUITY CONTRACT ``` --- -## CONTINUITY CONTRACT (end-to-end auto-run) - -**Once /dev starts, it owns the task until P10 ARCHIVE, or a CHECKPOINT, or an -explicit error escalation.** This contract governs when /dev pauses vs continues. - -### /dev MUST NOT stop between phases just because a phase finished. -Proceed directly from P status block to P without user prompting, -unless one of the three legitimate pause reasons below applies. - -### Three legitimate pauses (the ONLY ones) - -1. **CHECKPOINT 1** (after P5 PLAN) — human approves the plan before BUILD -2. **CHECKPOINT 2** (after P9 SHIP) — human reviews deliverable before push/merge -3. **External wait** (CI, long build, agent teams, deploy health) — enters a - LOOP pattern via `ScheduleWakeup`, automatically resumes on signal - -Any other "stop and ask" is a protocol violation. If you find yourself unsure -→ pick the safer default and note it in the next status block, don't halt. +## CONTINUITY CONTRACT (zero-checkpoint, full-auto) — v3 + +**Once /dev starts, it owns the task from P0 to P10 ARCHIVE. Zero mandatory +checkpoints.** /dev is modelling a senior engineer's end-to-end workflow: they +don't stop every 15 minutes to ask "should I continue?". They stop only when +something genuinely demands human input. + +### Design evolution (claudemem note 271c6900, 2026-03-08) + +| Version | Checkpoints | Philosophy | +|---------|------------|-----------| +| v0.1 | 2 (Spec + PR) | Conservative: human gates at intent + delivery | +| v0.3 | 1 (Spec only) | PR Review redundant after TDD + verify + regression | +| v0.5 | confidence-based | Auto-approve when AI confidence high | +| **v3 (current)** | **0 mandatory** | **Full auto, interrupt ONLY on genuine anomaly** | + +### Three legitimate pauses (ONLY these — nothing else) + +1. **REQUIREMENT CLARIFICATION** (at start of P3 BRAINSTORM, one-shot) + Fires ONLY when the user's requirement has load-bearing ambiguity that AI + cannot resolve via claudemem + codebase + web + DAO reasoning. See the + "Requirement Clarification Gate" section below for the test. + If requirement is specific enough → skip entirely, no pause. + +2. **EXTERNAL WAIT** (during P7/P8/P9) + CI / long build / agent teams / deploy health — enters a LOOP pattern via + `ScheduleWakeup`, resumes automatically on signal. This is not really a + "pause for human" — it's cached dormancy with automatic resume. + +3. **ANOMALY ESCALATION** (anywhere) + AI encounters a decision it cannot make with confidence ≥70% after using + all available resources (claudemem, codebase, web, DAO, evidence). Covers: + - Hard red-line trigger (secret about to be committed, destructive DB op, + .env modification, deny-listed command) + - Rules-conflict with no override precedent (e.g., user instruction says X, + CLAUDE.md says not-X, and context cannot arbitrate) + - Loop `exit-escalate` verdict from any Pattern 1-4 + +Any other "stop and ask" = **protocol violation**. Examples of forbidden stops: +- "P3 BRAINSTORM done — approve to proceed?" ❌ +- "Plan created — should I start?" ❌ +- "PR ready — want me to push?" ❌ (just push; user reviews in git log / GitHub) +- "Verify passed — archive now?" ❌ +- "I'm not sure whether to X or Y, which do you want?" ❌ (use DAO + evidence; + only escalate if genuinely >30% probability of being wrong) + +### Requirement Clarification Gate (P3 prefix, at most once per /dev invocation) + +**Test — do I need to clarify?** Answer these before P3 BRAINSTORM: + +1. Can I name the target file(s) / module(s) from the user's words + P1 findings? +2. Can I name the success criterion in one concrete sentence (e.g., "function + X returns Y when Z", not "make it better")? +3. Do I have ≥70% confidence on the implementation approach from DAO + + codebase patterns + claudemem? + +If all three are YES → proceed to P3, no pause. +If any is NO → ONE `AskUserQuestion` with 2-4 pointed options to resolve the +specific ambiguity. Phrase it like a senior engineer: "I see three ways this +could be interpreted — A, B, C. Which did you have in mind?" + +**Calibration examples**: + +| User said | Decision | +|-----------|----------| +| "fix the 500 in the video upload endpoint" | Specific → skip gate | +| "add refresh-token support to JWT auth per auth0 pattern" | Specific → skip gate | +| "make the pipeline faster" | Ambiguous: which pipeline? how fast? → ask | +| "add auth" | Ambiguous: OAuth2 vs JWT vs session? → ask | +| "refactor this mess" (with file in context) | Ambiguous: what mess? preserve API? → ask | +| "add tests for the user service" | Specific → skip gate (AI can pick TDD patterns from repo) | ### Auto-Loop Default (STANDARD and DEEP tiers) -- **STANDARD / DEEP**: Loop is **automatically enabled** in every phase that +- **STANDARD / DEEP**: Loop is **automatically enabled** for every phase that has an "external wait" signal. User does NOT need to pass `--loop`. - **QUICK**: Loop is **automatically disabled** — every QUICK-tier work is by definition short enough to block on without cache cost. -- **Override**: `--no-loop` forces blocking mode even on STANDARD/DEEP (useful - when the user wants to watch CI live rather than cede control to the loop). +- **Override**: `--no-loop` forces blocking mode even on STANDARD/DEEP (e.g., + when user wants to watch CI live). -### End-to-End Sequencing Examples +### End-to-End Sequencing Examples (v3 zero-checkpoint) -**DEEP DEVELOP task with CI**: +**DEEP DEVELOP task with CI, requirement clear**: ``` -P0 → P1 → P2 → P3 → P4 → P5 → [CHECKPOINT 1: pause for human] -↓ (approved) -P6 → P7 (TDD per task, Agent Teams may auto-loop at merge point) +P0 → P1 → P2 → P3 (requirement clear, skip clarify gate) → P4 → P5 + → P6 → P7 (TDD per task, Agent Teams may auto-loop at merge) → P8 (verify; if long-running → auto-loop Pattern 2) - → P9 (push PR; auto-enter LOOP Pattern 1 for CI wait) - → [CI green] → [CHECKPOINT 2: pause for human] -↓ (approved) -P10 → DONE (session ends gracefully) + → P9 (push PR; auto-enter LOOP Pattern 1 for CI wait → green) + → P10 ARCHIVE → DONE (session ends) +``` +Zero stops unless loop-escalate or anomaly fires. User sees full /dev run in +one conversational turn (plus async wakes for CI). + +**DEEP DEVELOP task with ambiguous requirement**: ``` +P0 → P1 → P2 → [REQUIREMENT CLARIFICATION: 1 AskUserQuestion] +↓ (answered) +P3 → P4 → P5 → P6 → ... → P10 → DONE +``` +One pause at the start — where clarification is highest-value — then full auto. -**STANDARD DEVELOP task without CI wait**: +**QUICK task (bug fix)**: ``` -P0 → P1 → P2 → P3 → P4 → P5 → [CHECKPOINT 1] → ... → P9 (push) → [CHECKPOINT 2] → P10 → DONE +P0 → P1 → P2 → [root cause] → TDD → verify → commit → DONE ``` -No loop invocation needed (no external waits), but the /dev session still runs -end-to-end without user needing to say "continue" between phases. +Zero pauses. Ever. ### Why this matters -Without this contract, /dev stops 10+ times in a DEEP task ("P1 done, continue?" -"P3 done, continue?"). With it, /dev stops 2-3 times total: the two CHECKPOINTs -and any loop-failure escalations. User cognitive load drops by 5-10×. +v3 removes the 5-10× user-cognitive-load penalty of v2 (which stopped at 2 +checkpoints per task). A user running 5 /dev tasks in a day went from "10+ +interruptions" to "0-2 interruptions" (one clarification if requirement is +ambiguous, zero otherwise). This frees the user to context-switch to other +work while /dev runs, matching how you'd delegate to a trusted senior engineer. + +**Trade-off accepted**: if AI misjudges requirement clarity and proceeds +without asking, it may build the wrong thing and waste one cycle. This cost +is explicit and small vs. the cost of asking at every phase boundary. The +anomaly-escalation rule is the safety net for high-stakes misjudgments. --- @@ -101,7 +168,8 @@ MUST emit a loop-specific block BEFORE the regular phase block: ``` ┌─ LOOP WAKE: P/ (iter /) ─ -│ Elapsed: m | Next delay: s (cache ) +│ Elapsed: ~m (estimated, not wall-clock) +│ Next delay: s (expected-cache: hit if ≤270s, miss if >300s) │ Signal: │ Exit target: | │ Action this cycle: @@ -109,8 +177,12 @@ MUST emit a loop-specific block BEFORE the regular phase block: └──────────────────────────────────────────────── ``` +Fields Claude cannot directly observe (wall-clock across wakes, actual cache +hit/miss state) are marked as estimated/expected — derived from iter × delay. +Do NOT fabricate precise numbers; use the estimates. + If `iter >= max` or `cost cap hit` → Decision MUST be `exit-failure` or -`escalate`. Do NOT silently continue past the declared cap. +`exit-escalate`. Do NOT silently continue past the declared cap. --- @@ -147,20 +219,28 @@ Show one-line status per check. Do NOT ask permission — just do it. Detect primary project language to enable language-aware agent delegation in P7/P9: -| Indicator File | Language | ECC Reviewer Agent | ECC Build Resolver | +Detection rules (evaluate in order — first match wins): + +| Indicator (AND conditions) | Language | ECC Reviewer Agent | ECC Build Resolver | |---------------|----------|-------------------|-------------------| -| `package.json` / `tsconfig.json` | TypeScript | `typescript-reviewer` | `build-error-resolver` | +| `tsconfig.json` present | TypeScript | `typescript-reviewer` | `build-error-resolver` | +| `package.json` present AND `tsconfig.json` absent AND ≥1 `.ts`/`.tsx` file in project | TypeScript | `typescript-reviewer` | `build-error-resolver` | +| `package.json` present AND no `.ts`/`.tsx` files | JavaScript | `typescript-reviewer` (covers JS) | `build-error-resolver` | | `pyproject.toml` / `setup.py` / `requirements.txt` | Python | `python-reviewer` | — | | `go.mod` | Go | `go-reviewer` | `go-build-resolver` | | `Cargo.toml` | Rust | `rust-reviewer` | `rust-build-resolver` | -| `pom.xml` / `build.gradle.kts` (Java) | Java | `java-reviewer` | `java-build-resolver` | -| `build.gradle.kts` (Kotlin) | Kotlin | `kotlin-reviewer` | `kotlin-build-resolver` | +| `build.gradle.kts` present | Kotlin | `kotlin-reviewer` | `kotlin-build-resolver` | +| `pom.xml` OR `build.gradle` (Groovy DSL, no `.kts`) | Java | `java-reviewer` | `java-build-resolver` | | `CMakeLists.txt` / `*.cpp` | C++ | `cpp-reviewer` | `cpp-build-resolver` | | `Package.swift` | Swift | — | — | | `pubspec.yaml` | Flutter/Dart | `flutter-reviewer` | — | Record detected language in P2 status block. If mixed-language, note primary + secondary. +Notes: +- `package.json` alone does NOT imply TypeScript — almost every JS project has it. Require `tsconfig.json` OR `.ts`/`.tsx` files as the disambiguator. +- `build.gradle.kts` is Kotlin DSL, overwhelmingly Kotlin projects. Java projects using Gradle typically use `build.gradle` (Groovy). Assigning `.kts` to Kotlin eliminates the previous Java/Kotlin collision. + ### Source Evaluation (applied to ALL P1 findings) External information (web search, DeepWiki, API responses, LLM summaries) is input, NOT truth. @@ -278,9 +358,10 @@ Invoke `brainstorming` skill. Do NOT skip "because it seems simple." | RESEARCH | Create research plan | Questions to answer, sources per question, deliverable per section | | CICD | Create change plan | Steps, dry-run command, verify command, rollback command | -### CHECKPOINT 1: Developer reviews plan/specs -Present plan summary via AskUserQuestion. -This is the ONLY human input before BUILD starts. + --- @@ -386,12 +467,18 @@ that runs >2 min: | Task Type | Steps | |-----------|-------| -| **DEVELOP** | 1. `/codex-review` (Codex + 5 Claude agents + Haiku cross-scorer — supersedes `requesting-code-review`) → fix Critical/Important. 2. STANDARD+: `security-reviewer` agent for auth/crypto/input code. 3. `verify-dev.sh` final gate. 4. `finishing-a-development-branch` → PR. 5. Push (ask user). 6. DEEP: enter LOOP Pattern 1 (CI wait) via ScheduleWakeup — NOT blocking `gh run watch`. | +| **DEVELOP** | 1. `/codex-review` (Codex + 5 Claude agents + Haiku cross-scorer — supersedes `requesting-code-review`) → fix Critical/Important. 2. STANDARD+: `security-reviewer` agent for auth/crypto/input code. 3. `verify-dev.sh` final gate. 4. `finishing-a-development-branch` → PR. 5. **Push automatically** (no confirmation — user reviews via git log / GitHub PR). Safety exception: if branch is `master` / `main` / `production`, anomaly-escalate to user. 6. DEEP: enter LOOP Pattern 1 (CI wait) via ScheduleWakeup — NOT blocking `gh run watch`. On CI green → proceed directly to P10. | | **RESEARCH** | 1. Self-review report for accuracy. 2. Deliver: report + claudemem notes. 3. Present Goal/Done/Next summary. | -| **CICD** | 1. Apply change (with user confirmation). 2. Monitor via LOOP Pattern 1 or health check. 3. Verify health. 4. Document in runbook if new pattern. | +| **CICD** | 1. Apply change (anomaly-escalate to user only if change affects production infra; else proceed). 2. Monitor via LOOP Pattern 4 (Deploy Health) or blocking health check. 3. Verify health. 4. Document in runbook if new pattern. | + + -### CHECKPOINT 2: Developer reviews deliverable -- DEVELOP: "PR ready with X/X tests passing, 0 regressions." +**Deliverable summary** (emit as part of P9 status block, no pause): +- DEVELOP: "PR # ready with X/X tests passing, 0 regressions." - RESEARCH: "Report complete, X notes saved, all questions answered." - CICD: "Change applied, pipeline healthy, rollback documented." @@ -414,8 +501,8 @@ When Plan Mode is active simultaneously with /dev: 2. Use Plan Mode's file as output location for /dev's plan (P5) 3. Map: Plan Mode "Understand" = /dev P0-P5, Plan Mode "Design" = /dev P5 4. Status blocks go in conversation output (not plan file) -5. Use AskUserQuestion for /dev checkpoints (CHECKPOINT 1 & 2) -6. Call ExitPlanMode after P5 (CHECKPOINT 1 approved) to begin BUILD stage +5. /dev v3 has NO mandatory checkpoints — Plan Mode's own approval flow is the only pause after P5 if Plan Mode is active +6. Call ExitPlanMode after P5 (once Plan Mode approval received) to begin BUILD stage. Without Plan Mode, /dev proceeds directly from P5 to P6 (v3 zero-checkpoint default). --- @@ -527,6 +614,13 @@ Before the FIRST `ScheduleWakeup` call of a loop, output this block inline: At least one of (success, failure, timeout) MUST fire within max_iter. If you can't state a concrete signal → don't start the loop (use blocking call instead). +**Universal "unknown signal" circuit breaker** (applies to ALL patterns): +If the signal probe returns `unknown` for ≥3 consecutive wakes (e.g., network +errors, GitHub rate-limit, health endpoint timing out with no response), +emit `exit-escalate` immediately — do NOT keep polling until max_iter. +Rationale: sustained probe failure is an infrastructure problem, not the +thing the loop is waiting for. Let the user fix the infra. + ### Standard Patterns #### Pattern 1 — CI Wait (P9 DEEP, DEVELOP + CICD) @@ -535,10 +629,14 @@ Trigger : PR pushed via finishing-a-development-branch Signal : gh pr checks → pass | pending | fail Delay : 270s while pending Max iter : 20 (≈ 90 min — covers most CI runs) -Success : all checks green → proceed to P10 ARCHIVE -Failure : any check red → spawn fix subtask (new P7 cycle), re-push, restart loop iter counter -Timeout : escalate to user ("CI stuck > 90min, investigating") -reason field: "CI wait iter /20 — PR # checks pending" +Max fix-cycles : 3 (outer cap across restarts; 4th CI failure → exit-escalate) +Success : all checks green → exit-success → proceed to P10 ARCHIVE (no checkpoint) +Failure : any check red → IF fix-cycles < 3: spawn fix subtask (new P7 cycle), + re-push, restart iter counter, increment fix-cycle counter. + ELSE: exit-escalate (3 fix cycles exhausted — likely flaky infra or real bug) +Timeout : iter >= 20 with no signal change → exit-escalate +Unknown : signal = unknown for ≥3 consecutive wakes (probe errors, network, rate limit) → exit-escalate +reason field: "CI wait iter /20, fix-cycle /3 — PR # checks pending" ``` #### Pattern 2 — Long Task (P7, P8) @@ -605,11 +703,13 @@ ScheduleWakeup( ### Loop Interaction with CHECKPOINTs -- CHECKPOINT 1 (after P5) and CHECKPOINT 2 (after P9) use `AskUserQuestion` — the - session pauses for the human. **Do NOT call ScheduleWakeup at checkpoints** — - it races against human response. -- If a loop is active when a checkpoint fires, end the loop first (declare - `exit-success` or `exit-escalate`), then present the checkpoint. +- v3 has NO mandatory CHECKPOINTs. The only pause points are (1) Requirement + Clarification Gate at P3 prefix (one-shot, at session start) and (2) Anomaly + Escalation (user-invoked by AI when decision confidence <70%). Neither happens + mid-loop — clarification is before loops begin, anomaly is from inside a loop. +- **Do NOT call ScheduleWakeup while requesting human input via `AskUserQuestion`** — + it races against human response. If an anomaly fires mid-loop, emit + `exit-escalate` first, then raise `AskUserQuestion`. ---