Releases · fredchu/claude-automl

01 Apr 14:14

fredchu

v5.7.0

b5d0e03

v5.7.0 — Environment Gap Research Gate Latest

Latest

What's New

Environment Gap Research Gate (Phase 1 Step B')

When SCD C5 (Test Environment Gap) confidence is low, automl now forces a research step before designing evaluators. This prevents the "guess → deploy → fail → repeat" anti-pattern that wastes hours on platform-specific bugs.

Trigger: C5 confidence < 7 (agent admits uncertainty about test-vs-production gap)

Tool priority: NotebookLM deep research (300+ sources + multi-round Q&A) → WebSearch fallback (3-5 rounds)

Origin story: MumbleKey iOS background recording debugging — 6 rounds of guessing failed, one NLM research session found all 6 root causes at once. Lesson learned.

Phase 2 Emergency Research Gate

When a subagent is stuck and the cause can't be diagnosed from code alone, automl pauses the loop and triggers research before continuing.

NLM Pre-Create Routing

Research Gate respects the /notebooklm skill's notebook registry — checks for existing relevant notebooks before creating new ones. No duplicate notebooks.

Recommended: notebooklm-py

pip install notebooklm-py && notebooklm skill install — strongly recommended for best Research Gate results. Falls back to WebSearch if not installed.

See CHANGELOG.md for full details and migration guide.

Assets 2

01 Apr 08:11

fredchu

v5.6.0

ebe7ea2

v5.6.0 — System Context Dialogue + Coverage Sanity Check

What's New

System Context Dialogue (Phase 1 Step B, mandatory)

FMEA-driven conversation with user before evaluator design. Five categories: Trigger Paths, Dependencies & Latency, Environmental Constraints, History & Workarounds, Test Environment Gap. Agent self-assesses confidence per category, asks 3-5 targeted questions. Rapid RPN scoring (S×O×D, max 125) prioritizes failure modes.

Red Team Blind Spots

After gaming attempts, red team now identifies production scenarios that would fail but evaluators can't catch. Results flow to injectable tests, risk scenarios, or manual verification checklist.

Phase 1.5b' — Coverage Sanity Check

Fixes structural blind spot where SCD (failure-focused) + red team (gaming-focused) systematically miss happy paths. Three mechanical checks:

CHECK 1: Happy Path Coverage — at least one test verifying normal successful completion
CHECK 2: Alternative Path Parity — each trigger path has at least one test
CHECK 3: State Sequence Robustness — interrupt-then-retry test for state machines

Also includes v5.5 features

required_tests — red team findings become structured test requirements
methodology_skill — TDD as methodology separated from domain skill
TDD staging (RED→GREEN→REFACTOR) enforced with required_tests

See CHANGELOG.md for full details and migration guide.

Assets 2

29 Mar 16:08

fredchu

v4.0.0

8dcfd23

v4.0.0 — Mandatory Skills + Phase 3 Subagent Architecture

What's New

Mandatory Skills — Each task now requires a specialized skill (e.g., /investigate for debugging, /review for code review). Subagents load skills via the Skill tool before making changes. Default is "skill required" — none is the exception that needs justification.

Phase 3 Subagent Architecture — Phase 3 verification is no longer done by the main session. Three specialized subagents handle it:

FINAL_VERIFICATION (haiku) — re-runs all evaluators + risk scenario test cases
RISK_REVIEW (opus) — traces each risk scenario through actual code paths
CODE_REVIEW (codex-worker / sonnet fallback) — diff-aware review with security analysis

Model Routing — Every Agent call specifies a model: haiku for mechanical tasks, sonnet for execution, opus for deep analysis. User-overridable via params.model_overrides.

Skill Mapping Table — New references/skill-mapping.md provides task type → recommended skill lookup for Phase 1, 2, and 3.

gstack Integration — Phase 0 adds /design-consultation, Phase 1 defaults to /autoplan, Phase 2/3 integrate /investigate, /review, /cso, /qa-only, /benchmark.

Breaking Changes

Fourth required element: mandatory skill per task
State file format changed (new fields: skill, phase3_skill, risk_scenarios, phase3 block)
Phase 3 has full dispatcher decision tree in main skill file

Upgrade

cd ~/.claude/skills/automl && git pull

Existing v3 runs continue in v3 mode. New runs use v4 format.

See CHANGELOG.md for full details.

Assets 2

29 Mar 11:31

fredchu

v3.1.0

cf64441

v3.1.0 — Risk Scenarios & Verification Checklist

What's New

Phase 1: Risk Scenarios — each task now requires 3-5 "how could this break?" scenarios that auto-flow into evaluators and Phase 3 checklists. Works for code, text, config — any domain.

Phase 3: Risk Scenario Review — mandatory: trace each scenario through actual implementation, rating ✅ Safe or 🔴 Bug.

Phase 3: Verification Checklist — mandatory output: prioritized test list (crash > data loss > UX > cosmetic). User runs full list in one pass, minimizing ping-pong.

Why

In a real iOS architecture rewrite, automl v3.0.0 caught build errors but missed a race condition, an audio session leak, and two state machine bugs. All 5 bugs would have been caught by upfront risk scenario analysis.

Upgrade

cd ~/.claude/skills/automl && git pull

Full changelog: CHANGELOG.md

Assets 2

28 Mar 12:02

fredchu

v3.0.0

bc82a2f

v3.0.0 — Initial Release

Autonomous Evaluation Loop for Claude Code

Define success. Let the agent iterate until it gets there.

Features

Dual-loop engine — per-task improvement + cross-task regression check
Two evaluator modes — shell (exit code / score) and checklist (LLM-as-judge)
Subagent architecture — main session dispatches only, never touches code directly
Auto-resume — state persists in .automl/{run_id}/, interrupted sessions continue automatically
Safety — git tag baseline, whitelist scope, STOP file interrupt, non-git fallback
Optional skill integrations — Phase 0/1/3 can chain with external brainstorming, planning, and review skills

Install

git clone https://github.com/fredchu/claude-automl ~/.claude/skills/automl

Quick Start

/automl make all tests pass
evaluator: pytest tests/ -q
scope: src/

See README for full documentation (English + 繁體中文).

Assets 2

Releases: fredchu/claude-automl

v5.7.0 — Environment Gap Research Gate

What's New

Environment Gap Research Gate (Phase 1 Step B')

Phase 2 Emergency Research Gate

NLM Pre-Create Routing

Recommended: notebooklm-py

Uh oh!

v5.6.0 — System Context Dialogue + Coverage Sanity Check

What's New

System Context Dialogue (Phase 1 Step B, mandatory)

Red Team Blind Spots

Phase 1.5b' — Coverage Sanity Check

Also includes v5.5 features

Uh oh!

v4.0.0 — Mandatory Skills + Phase 3 Subagent Architecture

What's New

Breaking Changes

Upgrade

Uh oh!

v3.1.0 — Risk Scenarios & Verification Checklist

What's New

Why

Upgrade

Uh oh!

v3.0.0 — Initial Release

Autonomous Evaluation Loop for Claude Code

Features

Install

Quick Start

Uh oh!