refactor(bugfix): convert commands to skills, add CLAUDE.md, clean up#42
refactor(bugfix): convert commands to skills, add CLAUDE.md, clean up#42jwm4 wants to merge 28 commits intoambient-code:mainfrom
Conversation
- Convert all 5 commands (reproduce, diagnose, fix, test, document) into
detailed skills in .claude/skills/{name}/SKILL.md
- Rewrite commands as thin wrappers that delegate to skills, passing
arguments and session context
- Add CLAUDE.md with behavioral guidelines extracted from amber.md
(engineering discipline, safety guardrails, quality standards)
- Remove .claude/agents/amber.md (redundant with systemPrompt persona)
- Remove supplementary files (AMBER_RESEARCH.md, FIELD_REFERENCE.md,
TEMPLATE_FEEDBACK.md)
- Update systemPrompt to reference skills alongside commands
- Update README to reflect new structure
- Add .claude/skills/pr/SKILL.md with systematic fork-based PR workflow - Add .claude/commands/pr.md thin wrapper - Handles pre-flight checks, fork setup, branch/commit/push, and PR creation - Includes error recovery for common failures (no push access, sandbox restrictions, missing fork) - Update systemPrompt and startupPrompt to include /pr as phase 6 - Update README with PR phase documentation
Make CLAUDE.md self-contained by including workflow phases, commands, skill locations, and artifact path directly instead of referencing ambient.json (which is platform plumbing, not model-visible). Also fix markdown lint warnings (emphasis-as-heading, code block language).
- Add .claude/skills/review/SKILL.md — critically evaluates fix and tests, issues verdict (inadequate fix / incomplete tests / solid), recommends next steps - Add .claude/commands/review.md thin wrapper - Update /test skill to report verification.md path and summarize results inline to the user - Update systemPrompt, startupPrompt, CLAUDE.md, README with /review as optional phase 5 between test and document
- systemPrompt: add PHASE TRANSITIONS section requiring agent to stop and summarize at end of each phase instead of auto-advancing - CLAUDE.md: add Flow Control section reinforcing pause behavior, trim to ~70 lines per best practices - pr/SKILL.md: add placeholders table defining GH_USER, FORK_OWNER, UPSTREAM_OWNER/REPO, REPO, and BRANCH_NAME with sources - pr/SKILL.md: rewrite Step 2 to ask user before creating fork, wait for confirmation, never silently skip ahead - pr/SKILL.md: replace flat fallback with 4-rung fallback ladder, patch file is now absolute last resort - ambient.json: remove rubric (deferred to future PR), remove startupPrompt and results fields - Delete ambient.clean.json (no longer needed) - AGENTS.md: add sandbox restrictions table
- All command wrappers now say 'Read the file ... now and follow it step by step' instead of 'Using the X skill from ...' which the agent was interpreting as conceptual guidance rather than a file-read instruction - systemPrompt: add CRITICAL prefix to skill-reading instruction, explicitly say 'use the Read tool to open the referenced SKILL.md file' - pr/SKILL.md: add 'IMPORTANT: Follow This Skill Exactly' section at top with anti-improvisation rules - pr/SKILL.md: add pre-flight summary checkpoint between Steps 1 and 2 - pr/SKILL.md: upgrade 'no fork' path from 'STOP' to 'HARD STOP' with explicit instructions to wait for user twice (before fork attempt, after fork attempt fails) - pr/SKILL.md: add fork verification command before proceeding to Step 3 - Add critical rules: never attempt gh repo fork without asking user first, never fall back to patch files without exhausting all other options
- systemPrompt: add step to tell user which skill is being invoked before reading it, so user can correct if wrong phase was picked - CLAUDE.md: reinforce skill announcement policy
- Step 1a: don't dead-end on missing gh auth — continue pre-flight to gather info from git, then present options - Step 1b: add fallback git config when gh is unavailable - Step 1d: add git-remote-based fallback for upstream repo identification - Add decision point after pre-flight: present 3 clear options to user (set up auth, provide fork URL, or prepare for manual push) and WAIT for response instead of dumping all manual instructions at once
Every skill now ends with a 'When This Phase Is Done' section that: - Tells the agent to summarize findings to the user - Recommends the natural next step in the workflow - Explicitly says to stop and wait for the user This reinforces the phase-transition pause from the systemPrompt and CLAUDE.md at the skill level, so the agent gets the instruction to stop regardless of whether it's following the systemPrompt guidance or reading the skill directly. Next-step recommendations follow the natural flow: reproduce → diagnose → fix → test → review → document → pr
- New /assess phase (phase 1): reads the bug report, presents understanding, identifies gaps, proposes reproduction plan, and waits for user confirmation before taking any action. No code execution or cloning in this phase. - systemPrompt: updated to 8-phase flow starting with ASSESS, added explicit instruction to start with assess when user provides a bug report - CLAUDE.md: updated phase list to include assess - pr/SKILL.md: clarified that 'never push to upstream' applies even to org bots and GitHub Apps — no exceptions based on account type - pr/SKILL.md: added reminder at pre-flight summary not to skip Step 2
- systemPrompt: scope 'start with ASSESS' to initial bug report only - systemPrompt: add 'INTERPRETING USER RESPONSES' section so agent maps phase names and confirmations to the correct next phase - assess skill: allow cloning and reading code (but not executing) - assess skill: add Step 2 to check for repo and clone if missing, read referenced PRs/files to inform assessment - assess skill: update output to note repo is cloned for later phases
- Add awareness of remaining phases (review, document, pr) so agent recommends the full path instead of jumping to PR
… via Task tool Instead of the orchestrator reading and executing skill files directly (which caused context bleed and skill-selection failures), Amber now dispatches each phase to a subagent using the Task tool. Changes: - systemPrompt declares Amber as an orchestrator with dispatch protocol - Command wrappers instruct dispatch via Task tool (not direct execution) - Skill 'When This Phase Is Done' sections return results (not user-facing) - CLAUDE.md reinforces orchestrator role and subagent dispatch pattern
Replace systemPrompt-based orchestration with a dedicated controller skill (.claude/skills/controller/SKILL.md) that the model reads and follows. The controller has: - Phase table mapping commands to skill paths - Explicit dispatch protocol via Task tool - User response interpretation table (yes → last recommendation, not repeat) - Critical rule: never re-dispatch the phase that just finished systemPrompt is now minimal: identity + pointer to controller. Command wrappers all route through the controller. CLAUDE.md references the controller instead of restating flow rules.
…subagents Switch from Task tool dispatch to direct execution so users see full progress in the platform UI. The controller re-reads itself after each phase to keep transition rules fresh and prevent getting stuck. This preserves the correct transition behavior while restoring visibility.
Move all next-step logic out of individual skills and into the controller. Skills now only report findings; the controller decides what to recommend. Recommendations are flexible: the controller considers skipping forward, going back, or ending early based on what actually happened, and presents multiple options instead of a single rigid next step.
… descriptions - Remove next-step recommendations from all 8 skills (they just report findings) - Controller owns all recommendation logic with flexible guidance - Add one-sentence descriptions to each phase in the controller - Remove response-interpretation table (model handles this naturally)
…ncements - systemPrompt and CLAUDE.md: re-read controller when choosing/starting a phase - Controller: explicit phase announcement with example before execution - Avoids re-reading on every message (context budget concern)
- pr/SKILL.md: use .parent.owner.login and .parent.name instead of non-existent .parent.nameWithOwner in fork detection query - ADR: add lesson 13 (test API responses against real data), lesson 5 (controller re-read timing), lesson 6 (phase descriptions improve recommendations); update lesson 2 (drop response-interpretation table); renumber to 15 total lessons
…ontroller - pr/SKILL.md: recognize bot 403 as expected, use GitHub compare URL for manual PR creation instead of bare fork URL, prevent patch-file spiral - review/SKILL.md: remove A/B/C verdict labels (meaningless to users) - controller/SKILL.md: add 'continue to next step' as default recommendation, clarify recommendation timing - ADR: add lesson 14 (GitHub App on user ≠ permission on upstream), now 16 total
- All 8 skills now end with 'Then re-read the controller for next-step guidance' - Removed redundant re-read instructions from controller, systemPrompt, CLAUDE.md - Controller Step 4 now acknowledges the skill-triggered re-read instead of duplicating it - Net effect: one place triggers the re-read (end of skill), one place handles it (controller)
|
This is solid, I'll load it as a custom workflow and see how we do. Thanks! |
|
Oh, I forgot to mention this in the PR description. Here are three things that I think should be in this workflow but are not in this PR because I wanted to break up the work a little:
|
|
@jeremyeder and @jwm4 : The workflow itself worked perfectly well - I used it to run the bugfix for this specific PR ambient-code/agentready#306 to address the ambient-code/agentready#302 using the updated bugfix workflow as a Custom Workflow based on Jeremy's suggestion. Here is the summary of the session and where it can be better. The agent (Amber) successfully navigated a structured, 8-phase bugfix workflow (/assess through /pr) to resolve GitHub Issue #302 in the agentready repository. The bug involved the application ignoring the excluded_attributes field in user configuration files. The agent diagnosed the root cause, implemented the fix, wrote regression tests, generated documentation, and successfully iterated through three rounds of user/CI feedback to polish the pull request. Workflow BreakdownAssess & Reproduce: The agent analyzed the issue and codebase, identifying that Config.is_excluded() was never actually called. It set up a Python 3.12 virtual environment and successfully reproduced the bug. Diagnose: Using git log and git blame, the agent traced the history, discovering that a CLI --exclude flag was added in a later commit without being integrated with the existing configuration file logic. Fix & Test: The agent modified cli/main.py to merge CLI and config exclusions into a single set before filtering. It then added three new regression tests to test_main.py using Python's unittest.mock. Document & PR: The agent automatically generated comprehensive artifacts (release notes, changelog, team announcement, and PR description) and pushed the branch to the user's fork, providing a direct link to open the PR. Iterative Refinement:
Areas for ImprovementWhile the agent ultimately produced a high-quality fix, there were several inefficiencies and oversights during the session:
@jeremyeder I have used the export chat feature from the ACP GUI, like you showed and put that into another LLM, voila, now we have actionable improvements to be plugged back in. I can clearly see the path for automated self-learning just by pointing the Agent from the active session back to this summary and learn from it. |
Bugfix workflow: skill-based architecture with workflow controller
Problem
The bugfix workflow had all process logic inline in command files. This meant users only got the workflow's carefully designed process if they typed the exact slash command — saying "fix this bug" in natural language would bypass all of it. Phase transitions were unreliable, the agent would auto-advance without pausing, and the PR creation process failed in predictable ways that weren't handled.
Changes
Architecture: commands → skills → controller
Refactored from a command-heavy design to a three-layer architecture:
.claude/skills/controller/SKILL.md) — owns all phase transitions, recommendations, and flow logic. The single source of truth for "what happens next.".claude/skills/{name}/SKILL.md) — contain the full multi-step process for each phase. Report findings only; never recommend next steps..claude/commands/*.md) — thin pointers (~5 lines each) that route to the controller.This means users get the workflow's benefit whether they type
/fixor "fix this bug."New phases:
/assess— initial phase that reads the bug report, summarizes understanding, identifies gaps, and proposes a plan before jumping into reproduction/review— optional phase after/testthat critically evaluates the fix and tests, looking for gaps and missed edge cases/pr— systematic, failure-resistant pull request creation with fork workflows, bot identity handling, and a 4-rung fallback ladderNew files:
workflows/bugfix/CLAUDE.md— behavioral guidelines (principles, hard limits, safety, quality, escalation)docs/adr/2026-02-12-bugfix-lessons-learned.md— 16 lessons learned about prompt engineering for multi-phase AI workflowsDeleted files:
.claude/agents/amber.md— redundant with systemPrompt; useful content moved to CLAUDE.mdAMBER_RESEARCH.md,FIELD_REFERENCE.md,TEMPLATE_FEEDBACK.md— supplementary docs not used by the workflow.ambient/ambient.clean.json— unused duplicateSimplified systemPrompt:
Reduced from ~100 lines to ~10 lines. The systemPrompt now just identifies the agent, points at the controller, and lists the workspace layout. All behavioral rules live in
CLAUDE.md; all process logic lives in skills.Key design decisions
/installation/repositories, uses the correctjqfields for fork detection, and treatsgh pr create403 as an expected outcome (provides a GitHub compare URL) rather than an error to debug.Testing
Iteratively tested on the Ambient Code Platform across multiple sessions. The ADR documents all issues encountered and how they were resolved.