Tell Claude what build. Get PR ready merge.
Hooks enforce patterns real-time, skills guide workflow, orchestration layer ensure nothing ships without tests, accessibility, type safety, code review -- zero babysit.
Surface area: 48 skills, 101 hook scripts, 97 Claude hook commands, 87 Codex hook commands, 9 agents, 5 routines.
Run inside Claude Code session (start with claude in terminal):
/plugin marketplace add redpanda-data/ui-harness/plugin install frontend-skills@ui-harness/reload-pluginsThree commands. Skills, hooks, agents activate immediately. Done.
Recommended: rtk (output-compression proxy, ~60-90% token savings on git/cargo/test/gh):
brew install rtk
rtk trust # approve .rtk/filters.toml per projectHarness fail-open -- skip safe; SessionStart nudge remind if miss.
Update (pull latest):
/plugin install frontend-skills --forceRestart Claude Code session so hooks reload from new cache.
Verify:
bash "$(ls -d ~/.claude/plugins/cache/ui-harness/frontend-skills/*/ | tail -1)scripts/verify-install.sh"Codex (OpenAI) -- install as Codex plugin
Repo ships a Codex-native plugin manifest (.codex-plugin/plugin.json) and marketplace (.agents/plugins/marketplace.json). Use the Codex marketplace CLI when you want Codex to install and track the marketplace source instead of editing config.toml by hand.
1. Upgrade Codex and enable plugins/hooks
brew upgrade --cask codex
codex features enable plugins
codex features enable codex_hooks2. Add this marketplace from the CLI
Track latest main:
codex plugin marketplace add redpanda-data/ui-harness --ref main
codex plugin marketplace upgrade ui-harnessOr pin a release:
codex plugin marketplace add redpanda-data/ui-harness --ref v4.10.2
codex plugin marketplace upgrade ui-harnessGeneral CLI forms
codex plugin marketplace add owner/repo
codex plugin marketplace add owner/repo --ref main
codex plugin marketplace add https://github.com/example/plugins.git --sparse .agents/plugins
codex plugin marketplace add ./local-marketplace-rootMarketplace sources can be GitHub shorthand (owner/repo or owner/repo@ref), HTTP or HTTPS Git URLs, SSH Git URLs, or local marketplace root directories. Use --ref to pin a Git ref, and repeat --sparse PATH to use a sparse checkout for Git-backed marketplace repos. --sparse is valid only for Git marketplace sources.
Do not use --sparse .agents/plugins for this repo as-is: the marketplace entry points frontend-skills at the repo root (./), so Codex needs the root plugin files too.
Refresh or remove configured marketplaces
codex plugin marketplace upgrade
codex plugin marketplace upgrade marketplace-name
codex plugin marketplace remove marketplace-nameFor this repo specifically:
codex plugin marketplace upgrade ui-harness
codex plugin marketplace remove ui-harnessAfter adding or upgrading, restart Codex so the Plugins UI reloads metadata.
Alternative: install individual skills via skills.sh (granular control)
Use if want specific skills instead of full plugin.
# Core workflow + all setup skills
bunx skills@latest add redpanda-data/ui-harness/frontend-starter-kit --agent claude-code -y
# Just the development lifecycle (no setup hooks)
bunx skills@latest add redpanda-data/ui-harness/development-lifecycle --agent claude-code -y
# Project management + planning skills
bunx skills@latest add redpanda-data/ui-harness/work-automation-kit --agent claude-code -y
# Individual skills -- pick what you need:
bunx skills@latest add redpanda-data/ui-harness/brainstorming --agent claude-code -y
bunx skills@latest add redpanda-data/ui-harness/tdd --agent claude-code -y
bunx skills@latest add redpanda-data/ui-harness/domain-model --agent claude-code -y
bunx skills@latest add redpanda-data/ui-harness/grill-me --agent claude-code -y
bunx skills@latest add redpanda-data/ui-harness/triage --agent claude-code -y
bunx skills@latest add redpanda-data/ui-harness/diagnose --agent claude-code -y
bunx skills@latest add redpanda-data/ui-harness/qa --agent claude-code -y
bunx skills@latest add redpanda-data/ui-harness/zoom-out --agent claude-code -y
bunx skills@latest add redpanda-data/ui-harness/design-an-interface --agent claude-code -y
bunx skills@latest add redpanda-data/ui-harness/improve-codebase-architecture --agent claude-code -y
bunx skills@latest add redpanda-data/ui-harness/request-refactor-plan --agent claude-code -y
bunx skills@latest add redpanda-data/ui-harness/write-a-skill --agent claude-code -y
bunx skills@latest add redpanda-data/ui-harness/codex-compat --agent claude-code -yOptional extras:
# TanStack official reference skills (28 soft-guidance skills from docs)
npx @tanstack/intent@latest install
# Atlassian/Jira integration (requires acli installed)
bunx skills@latest add redpanda-data/ui-harness/setup-atlassian-workflow --agent claude-code -y
# Redpanda-specific (Chakra/legacy bans, registry workflow)
bunx skills@latest add redpanda-data/ui-harness/redpanda-frontend-kit --agent claude-code -yVerify: bash scripts/verify-install.sh
You: "Build feature X" or "Fix these 5 issues overnight"
│
├── Interactive ──-> Claude Code + /development-lifecycle
│ └── understand -> plan -> TDD -> verify -> review -> compound
│
├── AFK batch ───-> Sandcastle (.sandcastle/main.ts)
│ └── picks issues -> spawns N agents in Docker
│ └── each agent runs development-lifecycle
│
└── Automated ───-> Routines (claude.ai/code/routines)
└── schedule, GitHub webhook, or API trigger
└── cloud session with hooks + CLAUDE.md active
└── PR review, triage, health checks, docs drift
6-phase workflow drive every task, feature to fix. Phases skippable by task type -- bug fix jump straight to TDD, test request go direct Phase 3.
graph TD
Start([User prompt]) --> Understand
Understand["1. Understand\nExplore codebase, clarify requirements"]
Plan["2. Plan\nExact file paths, code, expected output"]
Grill["2b. Grill\nAuto /domain-model, stress-test plan\nUpdate CONTEXT.md + ADRs inline\n--- GATE: user confirms ---"]
Implement["3. Implement -- TDD\nRED: failing test\nGREEN: minimal code\nREFACTOR: clean up"]
Verify["4. Verify\nSelf-verify via browser tools"]
Review["5. Review\nSecurity gate, code-reviewer agent, create PR"]
Iterate["5b. Iterate\n2 automated CI + review rounds\nthen request human review"]
Compound["6. Compound\nWrite .claude/rules/ rule"]
Done([PR ready to merge])
Understand --> Plan --> Grill --> Implement
Implement --> Verify --> Review --> Iterate --> Compound --> Done
Start -. "Fix bug" .-> Understand
Understand -. "skip plan" .-> Implement
Start -. "Write tests" .-> Implement
Start -. "Create PR" .-> Review
style Grill fill:#7c3aed,stroke:#4c1d95,stroke-width:3px,color:#fff
style Implement fill:#16a34a,stroke:#14532d,stroke-width:3px,color:#fff
style Review fill:#2563eb,stroke:#1e3a8a,stroke-width:3px,color:#fff
Sequence diagram -- full lifecycle (who does what, when)
Shows timing of user input, Claude phases, hook enforcement, and PR handoff. Two user gates: approve approach (after Plan) and approve plan (after Grill). Everything else automatic.
sequenceDiagram
actor You
participant Claude
participant Hooks as Hooks<br/>(PostToolUse / Stop)
participant GH as GitHub
You->>Claude: "Build feature X"
Claude->>Claude: Phase 1 -- Understand<br/>(explore, RCA if bug)
Claude-->>You: Clarifying Qs + 2-3 approaches
You-->>Claude: Pick approach (GATE)
Claude->>Claude: Phase 2 -- Plan<br/>(files, code, expected output)
Note over Claude: Auto-invoke /domain-model
Claude->>Claude: Phase 2b -- Grill<br/>challenge plan, sharpen terms<br/>update CONTEXT.md + ADRs inline
Claude-->>You: "Plan solid, proceed?"
You-->>Claude: Confirm (GATE)
loop TDD cycle per file
Claude->>Claude: RED -- failing test
Claude->>Hooks: Edit/Write
Hooks-->>Claude: react-rules, a11y, zustand... (~293ms)
Claude->>Claude: GREEN -- minimal code
Claude->>Claude: REFACTOR
end
Claude->>Claude: Phase 4 -- Verify (tests + types + visual)
Claude->>Hooks: Stop
Hooks->>Hooks: biome-autofix -> typecheck -> doctor<br/>-> orchestration -> violation-summary
Hooks-->>Claude: Quality gate result
Claude->>GH: git commit + push
Claude->>GH: gh pr create
Claude->>Claude: Monitor: gh pr checks --watch
GH-->>Claude: CI status + reviewer assigned
Claude-->>You: PR ready to merge
Sequence diagram -- PR review resolution loop (/resolve-pr-feedback)
How Claude handles human review feedback without you babysitting each thread.
sequenceDiagram
actor Reviewer
actor You
participant Claude
participant GH as GitHub
Reviewer->>GH: Review comments on PR
You->>Claude: /resolve-pr-feedback [PR#]
Claude->>GH: gh api /pulls/{n}/comments
GH-->>Claude: Unresolved threads + line refs
Claude->>Claude: Triage:<br/>in-scope / out-of-scope / invalid
loop Each in-scope comment
Claude->>Claude: Implement fix + test
Claude->>GH: Reply + resolve thread
end
Claude->>GH: git push
GH-->>Claude: CI re-runs
Claude->>Claude: Monitor CI
Claude-->>You: Summary: N resolved, M deferred<br/>(with reasoning per item)
Try it yourself -- copy-paste into a Claude Code session:
Real starter prompts (pick one)
Feature work:
/development-lifecycle -- add dark mode toggle to settings page.
Read src/routes/settings.tsx first. Propose approach, wait for my approval.
Bug fix (skip plan phase):
/development-lifecycle -- users report form submits twice on double-click.
Reproduce, find root cause, fix with test.
Overnight batch (Sandcastle):
Run Sandcastle on top 5 issues in our bug backlog overnight.
One agent per issue, Docker sandboxes, each follows development-lifecycle.
Address PR review:
/resolve-pr-feedback
Auto-detects current branch PR, triages, fixes, replies to threads.
Four layers, one outcome:
| Layer | What | How | Reliability |
|---|---|---|---|
| Skills | What do | development-lifecycle (6 phases) | Loaded on demand |
| Hooks | Enforce quality | PostToolUse + Stop hooks, every edit | 100% automatic |
| Agents | Specialize | code-reviewer + verifier | Dispatched by skills |
| Sandcastle | Delegate | N parallel agents in Docker sandboxes | AFK batch mode |
| Routines | Automate | Cloud-hosted sessions on schedule/webhook/API | Unattended, 24/7 |
| Problem | Without repo | With repo |
|---|---|---|
Claude write as any |
Ships to PR -> human catches -> feedback loop | Hook blocks immediately -> 50 tokens -> fixed |
| Claude skip tests | Ships -> human requests -> another round | Stop gate blocks -> tests added automatically |
| Claude use wrong patterns | 3-5 human review cycles per PR | 0-1 human review cycles per PR |
| Forget ask accessibility | No a11y until manual audit | Every component checked automatically |
| Must babysit every step | Manual: "now write tests", "now check types" | Full lifecycle runs without prompting |
How works: Hooks fire automatically, 100% reliable, zero LLM tokens. Skills add workflow guidance when need. Combo eliminate 80-90% human review cycles.
vs. obra/superpowers: Superpowers give great workflow skills (TDD, debug, plan). We take their best patterns AND add what they lack: mechanical enforcement via hooks. Superpowers teach Claude what do. We teach AND enforce -- if Claude forget, hook catch.
Three big wins -- autoplay teaser, real moments from the last 30 days:
Hero GIF -- hook blocking a banned cast at write time (~293ms, every edit):
2-minute highlight reel -- skill wins extracted from real transcripts (ADP UI + ui-registry + ui-harness repo):
Featured skill moments -- each from an actual session:
/grill-me-- 100+ rapid-fire questions on autoform proto-schema coupling. Surfaced 3 weeks of wasted work before a line of code was written./development-lifecycle-- adp-ui-llm-provider-cards: 4 waves, 13 phases, shipped end-to-end. No scope creep./tdd-- applied tocodex/autoform-v2-foundationrefactor. RED -> GREEN -> REFACTOR across the full PR surface./simplify-- three iterative passes on MCP marketplace PR. Caught 15% redundant code reviewers missed./domain-model-- stress-tested plans, updated CONTEXT.md + ADRs inline. Institutional memory captured mid-design.- Force-push to main blocked -- hook redirected to feature branch + PR flow every time.
- Dogfooding -- 12 skills + 60 hooks + 263 unit tests + 9 agent evals shipped using the harness itself.
- Skill auto-load on file match --
/tddon*.test.ts,/tanstack-routeronroute.tsx,/connect-queryon*_pb.ts. Zero invocation needed.
More videos -- 60s explainer, 50s comparison, 70s announcement
60-second explainer -- pain -> fix -> install -> proof:
50-second comparison -- prompt-packs vs obra/superpowers vs this harness:
70-second announcement -- launch version for Slack / social:
| Raw Claude Code | CLAUDE.md only | obra/superpowers | Prompt-pack / "gstack" | this harness | |
|---|---|---|---|---|---|
| Enforcement model | None | Prompt (LLM reads rules) | Prompt (skills) | Prompt (system prompts) | Hooks (deterministic) + skills |
| Reliability | 0% | ~60% (Claude forgets) | ~70% (skips skill) | ~70% | 100% on hooks, ~90% on paths: skills |
| LLM token overhead | 0 | ~2-8k / turn | ~500 / skill load | ~3-15k / prompt | 0 for hooks, ~500 for loaded skills |
Catches as any at write time |
No | No | No | No | Yes (~293ms) |
| Catches missing tests at stop | No | No | No | No | Yes (Stop gate blocks) |
| Forces plan -> grill -> confirm | No | No | Partial (prompts) | No | Yes (/development-lifecycle gates) |
| Codex (OpenAI) support | N/A | N/A | No | N/A | Yes (first-class, codex-compat skill) |
| Batch / overnight mode | No | No | No | No | Yes (Sandcastle, N Docker agents) |
| Cloud / scheduled mode | No | No | No | No | Yes (Routines) |
| Cross-session learning | No | Manual edit | No | No | Yes (Phase 6 Compound -> .claude/rules/) |
| Opinionated stack | N/A | N/A | Agnostic | Varies | React + TanStack + ConnectRPC + Bun |
| Config surface | 0 | Low | Low | Medium | Medium (14 setup skills, env vars) |
| Setup cost | 0 | ~30 min prompt writing | One /install |
Varies | 3 commands |
TL;DR: If your stack matches (React + Bun/TypeScript + modern patterns), the deterministic enforcement is worth the opinionation. If not, fork the hook scripts and keep the lifecycle skills.
Most alternatives only touch one layer. Deterministic enforcement at every boundary is what closes the reliability gap.
flowchart LR
subgraph Timeline["When does Claude violate a rule?"]
direction LR
Prompt["User prompt"] --> Think["Claude thinks"]
Think --> Edit["Edit/Write"]
Edit --> Stop["Stop / turn end"]
Stop --> Push["git push"]
Push --> CI["CI / PR review"]
end
subgraph Approaches["Where each approach catches"]
direction TB
ClaudeMd["CLAUDE.md rules"] -. "reads at Prompt" .-> Prompt
Superpowers["superpowers"] -. "injects at Prompt" .-> Prompt
Eslint["eslint / biome alone"] -. "save-time only" .-> Stop
CIHooks["CI-only lint"] -. "after push" .-> CI
This["this harness"] -. "Prompt" .-> Prompt
This -. "every Edit ~293ms" .-> Edit
This -. "Stop gate ~5-13s" .-> Stop
This -. "Monitor CI" .-> CI
end
style This fill:#c9f,stroke:#333,color:#000
style Prompt fill:#1f2a44,stroke:#333,color:#fff
style Edit fill:#1f2a44,stroke:#333,color:#fff
style Stop fill:#1f2a44,stroke:#333,color:#fff
style CI fill:#1f2a44,stroke:#333,color:#fff
Hooks and rules derive from a 2026-04 audit of ~2,500 PRs / ~3,500 review comments across 4 repos (2022-2026). Every hook maps to a pattern that actually generated review churn.
| Metric | Source | Value |
|---|---|---|
| Review comments audited | Console, cloudv2, ui-registry, ai-gateway | ~3,500+ |
| PRs analyzed | Same | ~2,500+ |
| PostToolUse hooks shipped | Repo | 60 |
| React/TS/security checks | react-rules-check.sh + tailwind-check.sh |
34 |
| Biome rules added from audit | setup-biome REFERENCE |
5 |
| Tokens saved / session (typical) | Compressed messages + dedup + trimmed REFERENCEs | ~6,700 |
Hook wall-clock on .tsx edit |
Concurrent hooks, slowest wins | ~293ms |
| Human review cycles per PR | Before / after enforcement | 3-5 -> 0-1 |
| Token waste per PR | Before / after | ~15-30k -> ~500-2k |
See memory project_pr_audit_hooks_2026_04 + project_transcript_audit_2026_04 for the methodology.
The fastest way to believe it: reproduce the core claim in your terminal.
Five-minute hook demo -- step by step
Prereq: Claude Code installed, fresh repo.
1. Install the plugin
/plugin marketplace add redpanda-data/ui-harness
/plugin install frontend-skills@ui-harness
/reload-plugins2. Ask Claude to write a banned pattern
Create src/bad.ts with: export function parseUser(data: unknown) { return data as any; }
3. Watch the hook fire -- ~293ms after the Edit, react-rules-check.sh blocks with:
as anybanned. Use type guards or zod.
4. Compare token cost
- Without hooks:
as anyships -> human review catches -> 3-5 comment rounds -> ~3,000 tokens - With hooks: blocked at write time -> Claude fixes in same turn -> ~50 tokens
5. Try to bypass it
Use // biome-ignore noExplicitAny to silence
Hook still fires -- message says "biome-ignore noExplicitAny banned. Fix the underlying type." No escape hatch without an explicit // allow: comment (auditable).
6. Try the Stop gate -- ask Claude to skip tests:
Write src/math.ts with an add() function. Skip tests.
Watch the Stop hook block: "orchestration-stop.sh: missing co-located test for src/math.ts".
Five tests, five deterministic blocks. Everything a human reviewer would catch, caught in <1 second by bash scripts with zero LLM tokens.
Honest scoping -- this harness is opinionated. Here's when to skip it or fork it:
| Don't use if | Why |
|---|---|
| Backend-only repo (Go/Python/Rust) | React/TypeScript hooks add no value; fork the lifecycle skills only |
| Non-React frontend (Vue, Svelte, Angular) | ~80% of checks are React-specific |
| Throwaway prototype / spike | Stop gate blocks commits without tests; kills exploration velocity (set ORCHESTRATION_STRICT=0 if keeping the plugin) |
| Can't use Bun | enforce-toolchain.sh bans npm/npx/yarn by default (configurable, but friction) |
| No Claude Code / Codex access | Hooks are harness-specific; nothing to enforce |
| Stack: not (TanStack Router + ConnectRPC + Protobuf v2) | Many setup skills assume this; fork to strip them |
| Team rejects opinionation | Setup skills pin choices (Biome over ESLint, tsgo over tsc, vitest over jest) |
Partial adoption works. Install development-lifecycle + tdd + grill-me only to get workflow discipline without the stack-specific hooks.
Why not just write more CLAUDE.md rules?
CLAUDE.md is read-every-turn context -- costs tokens AND is probabilistic. Claude reads "don't use as any" and still writes as any sometimes. Hooks are post-write bash scripts -- they fire 100% of the time, deterministic, zero LLM tokens. Rules in CLAUDE.md belong there when they're too nuanced for pattern matching (architecture, design judgment). Everything mechanically checkable belongs in a hook.
Why not just use obra/superpowers?
We do -- many lifecycle patterns (TDD red/green, domain-model grilling, write-a-skill) are inspired by or vendored from superpowers. The difference: superpowers teaches Claude what to do via prompts. We teach AND enforce. If Claude forgets the TDD rule mid-session, superpowers has no safety net. Our Stop hook refuses to end the turn until tests exist. Complementary, not competitive -- Pocock's own skills (to-prd, to-issues, git-guardrails) are listed under "Community Skills" for install.
Why not just use eslint/biome?
Biome (we ship it via setup-biome) handles lint + format -- we don't replace that. What Biome can't do: catch patterns at Edit time (biome runs at save/CI), enforce workflow (plan -> grill -> TDD -> review), inject context (which rules apply to this file type), or block Stop until PR-ready. Hooks + Biome are layers, not alternatives. setup-biome adds 5 custom rules from the PR audit that live there because AST rules beat regex for those patterns.
Why not a prompt-pack / "gstack" style system prompt?
Prompt-packs = bundled system prompts or context-injection. They're probabilistic (LLM may or may not follow) and token-heavy (3-15k per prompt). Our prompt layer (intent-detect + user-prompt-context) is ~500-2k tokens and composable with hooks that catch what the prompt misses. Also: prompt-packs can't run bash, can't fail CI, can't verify output, can't spawn subagents. Different tool for different job.
Is this Redpanda-specific?
No. Redpanda-specific rules live in a separate kit (redpanda-frontend-kit) -- registry workflow, Chakra bans, legacy imports. The core frontend-starter-kit is stack-opinionated (React + TanStack + ConnectRPC) but org-agnostic. The public plugin has zero Redpanda internals.
How do I customize or remove a hook?
Every hook is a bash script in .claude/hooks/ -- inspect, edit, delete. Plugin install places them in ~/.claude/plugins/cache/ui-harness/frontend-skills/<ver>/.claude/hooks/. Override per-project by copying to <project>/.claude/hooks/ (takes precedence). Env vars control most behavior: HOOK_VERBOSITY=terse, REACT_RULES_BAN_USEEFFECT=1, ORCHESTRATION_STRICT=0, etc. See Configuration.
What about Codex / OpenAI models?
First-class. codex-compat skill generates .codex/hooks.json + consolidated Stop-phase batch checker + AGENTS.md (replaces Claude's PostCompact context re-injection). Hook scripts use $(git rev-parse --show-toplevel) path resolution so they work across both harnesses. CLAUDE_SESSION_ID / CODEX_SESSION_ID detected transparently.
What's the overhead per session?
PostToolUse hooks on a .tsx edit: ~293ms wall-clock (hooks run concurrent, bottleneck = slowest). Non-JS/TS file: ~80ms (bash spawn + extension check, all hooks exit immediately). Stop hook: ~5-13s once per turn (biome autofix + tsgo + related tests + health score). PreToolUse on Bash: ~257ms. For context: a typical Claude tool call is 3-8 seconds (network + inference), so PostToolUse is 3-8% overhead -- imperceptible.
Only remember one skill: /development-lifecycle (or alias /work). Covers full flow. Everything else optional -- use when need specific capability.
| Skill | When use |
|---|---|
/development-lifecycle |
Build features, fix bugs, any dev work. Guide understand -> plan -> grill -> TDD -> /go (verify -> ship). |
/work |
Alias for /development-lifecycle. Same full lifecycle, shorter type. |
/go |
Ship what built. Phases 4-6 only: verify -> self-review -> /simplify -> /commit-push-pr -> monitor CI -> /resolve-pr-feedback. Use when implementation + tests done. |
/brainstorming |
Not sure what approach yet. Explore 2-3 design options with trade-offs. |
/tdd |
Write tests or want strict red-green-refactor enforcement. |
/domain-model |
Grill + document. 3 reviewer hats (product/eng/design) in parallel + sharpen terminology + update CONTEXT.md + ADRs inline. Phase 2b default. |
/grill-me |
Grill, no docs. 3 reviewer hats (product/eng/design) in parallel, fast. Skip when decision isn't worth writing down. Standalone alt to /domain-model. |
/triage |
Triage issues via state machine (GitHub via gh or Jira via acli). Grill sessions, agent briefs, out-of-scope KB. Bug-investigation mode produces a TDD fix plan and files the ticket. |
/diagnose |
Hard bug or perf regression. Six-phase loop: build feedback loop -> reproduce -> hypothesise (3-5 ranked) -> instrument -> fix + regression test -> cleanup + post-mortem. |
/qa |
Interactive QA session. Describe bugs conversationally, agent explore codebase in background, auto-file GitHub issues. |
/zoom-out |
Go up layer of abstraction. Map relevant modules + callers when unfamiliar with code area. |
/design-an-interface |
Design API or module. Spawn 3+ agents generate radically different designs. |
/request-refactor-plan |
Plan refactor. Interview you, break into tiny safe commits, file RFC issue. |
/resolve-pr-feedback |
Address PR reviews. Fetch unresolved threads, triage, fix, reply, resolve. Used by Phase 5b. |
/improve-codebase-architecture |
Find architectural improvements. Identify shallow modules, propose deep-module refactors. |
/write-a-skill |
Create new agent skill with proper structure + progressive disclosure. |
Prompt examples for each workflow skill
/development-lifecycle -- default for all dev work:
/development-lifecycle -- I need to add a user settings page with theme, language,
and notification preferences. Read src/routes/ to understand the routing structure first.
/brainstorming -- explore options before commit:
/brainstorming design -- I need to add real-time collaboration to our editor.
Compare WebSocket, SSE, and CRDT approaches. Focus on latency and offline support.
/tdd -- strict test-first:
/tdd -- add input validation to the signup form. Email format, password strength,
and matching confirm password. Start with the failing tests.
/domain-model -- DDD-light grill with inline doc updates:
/domain-model -- stress-test the data fetching strategy for the new dashboard feature.
We're planning to use TanStack Query with a 5-minute stale time.
/triage -- triage incoming issues (GitHub or Jira):
/triage -- show me anything that needs my attention
/diagnose -- debug a hard bug:
/diagnose -- the export button intermittently throws on Firefox but not Chrome.
Build the feedback loop first.
/qa -- interactive bug report session:
/qa -- I've been testing the settings page and found a few issues. Let me walk you through them.
/zoom-out -- understand unfamiliar code:
/zoom-out on the auth middleware. I need to understand how it connects to the rest of the system.
/grill-me -- light stress-test (no DDD docs):
/grill-me on the data fetching strategy for the new dashboard feature.
/design-an-interface -- compare API shapes:
/design-an-interface for a notification system module.
Must support email, Slack, and in-app channels with retry and rate limiting.
/triage -- investigate a bug and file a ticket with a TDD fix plan:
/triage -- users report the sidebar flickers on navigation.
It started after the last release. Check rendering and route transitions.
/resolve-pr-feedback -- address review comments:
/resolve-pr-feedback 123
or just /resolve-pr-feedback to auto-detect PR from current branch.
/request-refactor-plan -- plan safe refactor:
/request-refactor-plan -- extract the auth logic from the monolithic UserService
into a standalone AuthService. Must be backwards-compatible during migration.
/improve-codebase-architecture -- find opportunities:
/improve-codebase-architecture -- focus on module boundaries and testability
in src/features/. Look for tightly coupled modules that should be split.
/write-a-skill -- create new skill:
/write-a-skill for enforcing our internal design system tokens.
It should check that components use --color-* CSS variables instead of raw hex values.
| Skill | What bundles |
|---|---|
/frontend-starter-kit |
All setup skills + workflow skills. Full bootstrap new project. |
/work-automation-kit |
Planning skills -- PRD creation, issue breakdown, project management. |
/redpanda-frontend-kit |
frontend-starter-kit + Redpanda-specific registry workflow. |
/codex-compat |
Generate .codex/hooks.json + AGENTS.md for OpenAI Codex compatibility. |
Installed by /frontend-starter-kit, run automatic. Never invoke directly.
| Skill | What enforces |
|---|---|
setup-toolchain |
Blocks npm/npx, enforce bun + tsgo |
setup-biome |
Biome + Ultracite linting, auto-fix on Stop |
setup-quality-gate |
Type check + lint + related tests gate, bundle guard |
setup-react-rules |
Ban as any, raw HTML, XSS vectors, barrel imports |
setup-react-compiler |
Flag manual memoization, enforce compiler-friendly patterns |
setup-zustand |
Double-parens create, useShallow selectors, persist middleware |
setup-accessibility |
ARIA labels, keyboard handlers, axe-core testing |
setup-tanstack-router |
Route tree generation, ban react-router-dom/window.location |
setup-connect-query |
ConnectRPC + Protobuf v2 patterns, ban raw useQuery |
setup-env-validation |
Type-safe env vars via t3-env + zod, ban raw process.env |
setup-conventional-commits |
Validate commit format type(scope): description |
setup-react-doctor |
Health scoring (0-100), fails on score regression |
setup-e2e-testing |
Playwright + Testcontainers + axe-core setup |
setup-ci-pipeline |
GitHub Actions quality gate, coverage gates, bundle budgets |
setup-agent-config |
Token-efficient env vars, test flag optimization, output truncation |
setup-registry-workflow |
Remind rebuild registry.json when UI components change |
setup-sandcastle |
AFK agent delegation -- parallel agents in Docker sandboxes |
setup-routines |
Cloud-hosted automation -- PR review, health checks, triage, docs drift |
| Agent | Role |
|---|---|
| code-reviewer | Review PRs for correctness, patterns, test coverage |
| verifier | Read-only agent independently verify implementation via tests, types, lint, browser inspection |
New to AI-assisted dev? Start here.
Day 1 (30 min):
- Install (see Install above)
- Run
bash "$(ls -d ~/.claude/plugins/cache/ui-harness/frontend-skills/*/ | tail -1)scripts/verify-install.sh"confirm all wired - Pick real ticket from backlog -- not toy problem
First prompt:
Read [relevant files]. I want to [goal from your ticket].
Before writing any code, produce a plan with what you'll do, files you'll change,
edge cases, and how you'll verify. Wait for my approval before starting.
What happens automatic: Hooks enforce patterns every edit. Intent detection inject workflow guidance. Stop hooks run type checking, linting, related tests before Claude finish. No need ask.
Day 2+: Work real tickets. Let hooks catch mistakes. Focus on clarifying problem + reviewing output -- not writing code yourself. Post wins + failures to team channel.
Tips that matter:
- Plan before execute (
/plan). Engineers who skip spend day untangling misdirected work - Use
/clearbetween unrelated tasks. Long sessions degrade output quality - If Claude start deleting tests to make CI green -- stop immediately. Red flag
- Use
HOOK_VERBOSITY=tersefor long sessions to reduce token overhead - Run
verify-install.sh --remoteweekly to check updates (see Verify section above for path)
Three layers automation run without manual invocation:
Layer 1 -- Intent Detection (every prompt, ~30ms): Detect what doing from prompt keywords, inject workflow directives. "Write test" -> TDD workflow. "Fix bug" -> triage pattern. "Create component" -> accessibility + design system checklist. "Create PR" -> CI verify + review.
Layer 2 -- Pattern Enforcement (every Edit/Write, ~293ms): PostToolUse hooks catch violations real-time. Claude see error, fix, hook re-check -- cycle repeat until clean. Plus file-aware guidance: write test file -> async leak tips, write component -> accessibility checklist.
Layer 3 -- Quality Gate (when Claude finish, <10s): Stop hooks verify work production-ready. Type check, lint autofix, health score, PLUS orchestration gate blocks on missing tests, async leaks, security issues. Claude no stop until PR ready merge.
Auto-loading skills: Skills with paths: frontmatter auto-load when Claude work on matching files. Write test -> TDD patterns load. Edit route -> TanStack Router patterns load. No /skill-name invocation needed.
Never see hook output directly. Claude just produce better code, with tests, accessible, secure, type-safe -- without asking each thing individually.
| Env var | Default | What does |
|---|---|---|
PROMPT_CONTEXT_LEVEL |
standard |
How much state inject per prompt (minimal, standard, full) |
ORCHESTRATION_STRICT |
1 |
Set 0 during prototyping to disable "must have tests" gate |
REACT_COMPILER_MODE |
infer |
Set annotation for brownfield codebases |
HOOK_VERBOSITY |
normal |
terse = blocks only (suppress warns), quiet = all output suppressed |
HOOKS_FAIL_CLOSED |
0 |
Set 1 to block on hook script errors (catch misconfiguration) |
| Approach | Reliability | Token cost | Latency | Human effort |
|---|---|---|---|---|
| Manual prompting | 0% (forgotten) | 0 extra | 0 | High (remember every rule) |
| Skills on-demand | ~70% (Claude may skip) | ~500 tokens/skill | ~0ms | Low |
Skills with paths: |
~90% (auto-load on file match) | ~500 tokens/skill | ~0ms | None |
| PostToolUse hooks | 100% (always fires) | 0 (bash scripts) | ~293ms | None |
| UserPromptSubmit hooks | 100% (every prompt) | 0 (bash) + context | ~120ms | None |
| Stop hooks | 100% (every turn end) | 0 (bash) + test run | ~4-10s | None |
| Without hooks/skills | With hooks+skills |
|---|---|
as any -> ships -> human catches -> feedback loop (3000+ tokens) |
Hook blocks -> 50 tokens -> fixed |
| No tests -> ships -> human requests -> another round (5000+ tokens) | Stop gate blocks -> 100 tokens -> tests added |
| Wrong import -> review -> fix (2000+ tokens) | Rules line prevents -> 0 extra tokens |
| 3-5 human review cycles per PR | 0-1 human review cycles per PR |
| ~15,000-30,000 tokens wasted on violations | ~500-2,000 tokens in hook messages |
New project: Use infer mode (default). All hooks enforce max strictness.
Existing codebase: Set REACT_COMPILER_MODE=annotation in session env. Lets migrate file-by-file -- add "use memo" to files as adopt compiler, hooks only enforce compiler patterns in opted-in files.
# In .claude/hooks/session-env.sh, add:
echo "export REACT_COMPILER_MODE=annotation" >> "$CLAUDE_ENV_FILE"After install, paste prompt into new Claude Code session to migrate existing code comply with new hooks:
Migration prompt (click to expand)
I just installed the frontend-starter-kit skills. Run all setup skills now, then migrate existing code to comply with the new hooks.
## Phase 1: Run setup skills
Execute the frontend-starter-kit skill. This will:
- Install all setup skills (toolchain, biome, quality-gate, etc.)
- Install development-lifecycle skill (the one skill for the full workflow)
- Create all hook scripts in .claude/hooks/
- Set up src/env.ts, biome.jsonc, .github/workflows/quality-gate.yml
- Install community workflow skills
- Set REACT_RULES_BAN_USEEFFECT=1 in session env
## Phase 2: Migrate existing code
After hooks are installed, fix all existing violations. Work through these in order:
### 2a. Lint + format
Run bun run lint:fix to auto-fix everything Biome can handle.
### 2b. Type checking
Run bun run type:check and fix all errors.
### 2c. Filename convention
Rename any non-kebab-case files to kebab-case. Use git mv to preserve history.
### 2d. Environment variables
Find all process.env. usage outside of src/env.ts and move each env var into src/env.ts with a zod schema. Replace process.env.X with import { env } from "@/env".
### 2e. React patterns
Fix these in order (each may affect many files):
1. Class components -> functional components
2. useEffect for data fetching -> TanStack Query / route loaders
3. Raw HTML elements (<button>, <input>, etc.) -> @/components/ui/ components
4. as any, as never, as Record<string, any>, @ts-ignore, @ts-expect-error -> proper types, type guards, or zod validation
5. dangerouslySetInnerHTML -> DOMPurify or safe rendering
6. Inline style={{}} -> Tailwind utility classes
7. Raw hex/rgb in className -> design tokens
8. !important -> fix specificity
9. useMemo/useCallback/React.memo -> remove (React Compiler handles it, or add "use memo" in annotation mode)
10. outline: none -> focus-visible:outline-2
11. Barrel imports (import from index files) -> direct path imports
12. addEventListener('scroll') -> add { passive: true }
13. Static imports of chart.js/d3/three -> dynamic import() or React.lazy()
14. React.FC / React.FunctionComponent -> plain function declarations
15. cloneElement -> Context-based composition
16. biome-ignore comments -> fix the lint issue instead
17. import * as Foo -> import { specific } (tree-shaking)
18. export * from -> export specific items (tree-shaking)
19. handleSubmit(onSubmit) -> handleSubmit(onSubmit, onError)
20. navigate(-1) / history.back() -> explicit route path
21. react-beautiful-dnd -> @dnd-kit/core (archived by Atlassian)
22. framer-motion -> motion (renamed package)
23. plotly.js / recharts -> lazy load (heavy bundles)
24. JSON.parse(JSON.stringify(x)) -> structuredClone(x)
25. form.submit() -> form.requestSubmit() (fires validation + submit event)
26. Bare parseInt(s) -> parseInt(s, 10) or Number(s)
27. Global isNaN() -> Number.isNaN() (no coercion)
28. splice/direct mutation to remove -> .filter() or Array.prototype.with()
29. Unnamed useEffect(() => {...}) -> useEffect(function syncX() {...}, [deps]) (named for devtools)
30. 100vh -> 100dvh (mobile safe area), 100vw -> 100%, user-scalable=no -> remove (WCAG 1.4.4)
31. React.lazy() missing for heavy deps (chart.js, three, d3) -> wrap in Suspense + lazy import
32. Hooks defined inside route files -> extract to /hooks/ (route files stay thin)
### 2e-2. Protobuf v2 patterns (if applicable)
1. new Message() -> create(MessageSchema, { ... })
2. PlainMessage/PartialMessage -> MessageShape/MessageInitShape
3. Manual $typeName object literals -> create()
4. Protobuf spreads without create() wrapper -> wrap with create(Schema, { ...msg })
### 2e-3. Connect Query patterns (if applicable)
1. Raw useQuery/useMutation with ConnectRPC -> use Connect Query hooks
2. invalidateQueries() with no args -> specify query key
3. Duplicate Zod schemas for protobuf messages -> Standard Schema + protovalidate
4. Inline staleTime/gcTime numeric literals -> named constants (STALE_TIME_5_MIN, etc.)
5. useMutation without onError -> add onError with ConnectError.from() + toast
6. Mutation hooks missing *Mutation suffix -> rename (useCreateUser -> useCreateUserMutation)
7. refetchQueries -> await invalidateQueries (reactive, deduped)
### 2e-4. Accessibility patterns
1. All `<img>` must have `alt` attribute
2. Clickable `<div>`/`<span>` must have role + tabIndex + keyboard handler
3. Icon-only buttons -> add `aria-label`
4. Interactive elements -> add `data-track` or semantic identifiers for observability
5. `aria-invalid` field -> add `aria-describedby` pointing to error message id
6. Disabled `<Button>` -> wrap in `<Tooltip>` explaining why (blind users need context)
7. `outline: none` / `outline: 0` -> `focus-visible:ring-2 focus-visible:ring-*` (keep keyboard focus visible)
8. Nested interactives (Button inside Link, etc.) -> flatten to single interactive + aria-label
### 2e-5. Protobuf well-known types (if applicable)
1. Timestamp as { seconds, nanos } -> timestampFromDate() from @bufbuild/protobuf/wkt
2. Any without typeUrl -> anyPack() from @bufbuild/protobuf/wkt
### 2f. Zustand stores
Fix create<T>() -> create<T>()(), inline selectors -> useShallow, direct localStorage -> persist.
### 2g. Routing
Fix window.location navigation -> TanStack Router, react-router-dom -> @tanstack/react-router, URLSearchParams -> nuqs, untyped hooks -> { from } param.
### 2h. Testing (vitest + React Testing Library + Playwright)
1. it() -> test() (consistent naming, Biome rule enforces)
2. jest.fn() / jest.mock() / jest.spyOn() -> vi.fn() / vi.mock() / vi.spyOn()
3. .toBeInTheDocument() -> .toBeVisible() (also catches display:none, opacity:0)
4. fireEvent -> userEvent.setup() + user.click() / user.keyboard() (simulates real user)
5. setTimeout / waitForTimeout -> await waitFor(() => expect(...)) (deterministic)
6. Missing data-testid on interactive elements -> add stable test identifiers
7. ConnectRPC calls in tests: raw mocks -> createRouterTransport for typed mocking
8. test.skip in E2E -> test.fixme() (explicit known bug, not silent skip)
9. Co-located .test.ts(x) next to source; visual tests .browser.test.tsx; E2E under e2e/*.spec.ts
### 2i. Forms (react-hook-form)
1. form.watch() in render body -> useWatch({ control, name }) (fewer re-renders)
2. <input {...register("x")} /> losing ref -> spread ...field from Controller
3. handleSubmit(onSubmit) -> handleSubmit(onSubmit, onError) (surfaces validation errors)
4. Async onChange without abort -> AbortController, cancel stale request on next change
5. Form mode undefined -> mode: "onChange" + per-field validation + <FormMessage /> inline
6. FieldMask paths hardcoded -> Object.keys(dirtyFields) (only send what user touched)
7. Oneof protobuf fields: leftover value after branch switch -> clear prev branch explicitly
8. URL input type="text" -> type="url" (native validation + mobile keyboard)
### 2j. ConnectRPC error handling
1. throw new Error("...") in ConnectRPC handlers -> ConnectError.from(error, Code.Internal)
2. Magic numbers for proto enum cases -> import enum, compare by name
3. Error toasts with raw message -> formatToastErrorMessageGRPC(error) (code + message + trace)
4. Catch block with only console.log -> set error state, show inline UI, or re-throw
5. Silent fallbacks on parse failure -> early return <ErrorState /> with retry action
6. Exhaustive switch on enum: missing default -> add `default: never satisfies never` (compile error on new variant)
## Phase 3: Verify
Run bun run quality:gate -- should pass with zero errors.
Run bun run doctor -- score should be 80+.
Commit everything as: refactor(webui): migrate to frontend-starter-kit patterns
Migration ordered from least disruptive (auto-fixable lint) to most disruptive (React pattern rewrites) -- commit incrementally after each phase.
Meta-skills install everything need in one go. Diagram below show how starter kit, setup skills, workflow skills, agents compose into full ecosystem.
graph TD
FSK["frontend-starter-kit"]
Setup["14 Setup Skills\ntoolchain, biome, quality-gate,\nreact-compiler, accessibility, and so on"]
SC[setup-sandcastle]
RT[setup-routines]
subgraph Workflow["Workflow Skills"]
DL["development-lifecycle"]
TDD[tdd]
DM[domain-model]
GM[grill-me]
DAI[design-an-interface]
TR[triage]
DIAG[diagnose]
QA[qa]
RPF[resolve-pr-feedback]
end
subgraph Agents
CR[code-reviewer]
VER[verifier]
end
FSK --> Setup
FSK --> Workflow
DL -- "Phase 2b" --> DM
DL -- "Phase 3" --> TDD
DL -- "Phase 2 UI" --> DAI
DL -- "Phase 5b" --> RPF
DL -- "Phase 5" --> CR
SC -- "N parallel agents" --> DL
SC --> CR
RT -- "cloud sessions" --> RPF
RT -- "cloud sessions" --> CR
style DL fill:#f96,stroke:#333,color:#000
style FSK fill:#69f,stroke:#333,color:#fff
style SC fill:#9c6,stroke:#333,color:#000
style RT fill:#c9f,stroke:#333,color:#000
-
frontend-starter-kit -- Complete frontend stack in one command: all setup skills (toolchain, Biome, quality gate, LLM optimization, React Compiler, zustand, accessibility, React rules, env validation, conventional commits, react-doctor, TanStack Router, Connect Query, e2e testing, UX copy) + workflow skills (TDD, triage, architecture, refactoring, design, grilling, skill authoring) + optional community workflow skills (PRD, QA, DDD glossary, git guardrails).
console.*fully covered by Biome'snoConsole.bunx skills@latest add redpanda-data/ui-harness/frontend-starter-kit --agent claude-code -y -
redpanda-frontend-kit -- Everything in frontend starter kit, plus Redpanda-specific rules: Chakra/legacy import bans, TanStack Router, Connect Query + Protobuf enforcement, react-doctor, registry workflow.
bunx skills@latest add redpanda-data/ui-harness/redpanda-frontend-kit --agent claude-code -y -
work-automation-kit -- Project planning + management workflow skills: PRD creation, implementation planning, issue breakdown, bug triage. Optional: Atlassian/Jira integration via acli, Codex cross-model review via codex-plugin-cc.
bunx skills@latest add redpanda-data/ui-harness/work-automation-kit --agent claude-code -y
Claude Code hooks enforce tooling standards via PreToolUse + SessionStart hooks.
-
setup-toolchain -- Ban npm/npx/tsc/eslint/prettier, enforce bun as package manager with
--yarnflag, tsgo as TypeScript compiler, block global installs, guard against destructive commands (rm -rf,git push --force,git reset --hard,git checkout .). SetPKG_MANAGER,LINTER,TEST_RUNNERenv vars.bunx skills@latest add redpanda-data/ui-harness/setup-toolchain --agent claude-code -y
Lint, format, quality gate automation.
-
setup-biome -- Install Biome + Ultracite, create
biome.jsoncwith strict overrides (noConsole, cognitive complexity 15, kebab-case filenames, useExhaustiveSwitchCases, restricted imports for moment/lodash/classnames/mobx/yup). Stop hook auto-fix all changed JS/TS files before Claude finish.bunx skills@latest add redpanda-data/ui-harness/setup-biome --agent claude-code -y -
setup-quality-gate -- Add
quality:gatepackage.json script (biome + tsgo + related tests in <5s), GitHub Actions CI workflow with formatting integrity check (git diff --exit-code), Stop hook for tsgo type checking, bundle guard PostToolUse hook,ghCLI CI status checks,@claude reviewtrigger pattern, optional Codex cross-model review.bunx skills@latest add redpanda-data/ui-harness/setup-quality-gate --agent claude-code -y
PostToolUse hooks enforce React patterns every Edit/Write. All checks skip non-JS/TS files (zero overhead for backend devs) + auto-detect component library directories (components/ui/, redpanda-ui/, src/ui/, packages/ui/ -- configurable via UI_LIB_DIRS).
-
setup-react-rules -- 34 React/TS/security/a11y checks in two hook scripts (
react-rules-check.sh+tailwind-check.sh). All messages compressed for token efficiency -- keep fix, drop explanation. Key checks:- Ban raw HTML elements,
as any,@ts-ignore,@ts-expect-error, class components - Ban
dangerouslySetInnerHTML,eval(),.innerHTML,setTimeout("string"),=== NaN - Ban
onClick + navigate(), barrel imports,!important,outline: none - Suggest
structuredClone()over JSON roundtrip,.requestSubmit()over.submit() - Suggest
100dvhover100vh,100%over100vw, blockuser-scalable=no - Use
<Button>over<div role="button">,Number()over unradixedparseInt() - React Compiler: ban manual
useMemo/useCallback/React.memo - Opt-in: ban
useEffectviaREACT_RULES_BAN_USEEFFECT=1
bunx skills@latest add redpanda-data/ui-harness/setup-react-rules --agent claude-code -y - Ban raw HTML elements,
-
setup-react-compiler -- Install
babel-plugin-react-compilerwith rsbuild config. Defaultannotationmode for brownfield codebases (opt-in per file with"use memo"),inferfor greenfield.'use no memo'for escape hatch. Compiler modes reference, derived-state detection, named useEffect heuristic. Component library directories auto-excluded.bunx skills@latest add redpanda-data/ui-harness/setup-react-compiler --agent claude-code -y
Stop hooks + manual diagnostic skills.
-
setup-react-doctor -- Install react-doctor, add
doctorpackage.json script, Stop hook run health check on changed files. Fails on score regression.bunx skills@latest add redpanda-data/ui-harness/setup-react-doctor --agent claude-code -y
Test health diagnostics (async leaks, slow queries, flaky tests) now part of /tdd + orchestration-stop quality gate.
Reduce token usage + context waste.
-
setup-agent-config -- SessionStart set
AI_AGENT=1,CLAUDECODE=1,NODE_OPTIONS=--max-old-space-size=8192. UserPromptSubmit inject project state (3 context levels) + condensed rules line + intent detection (TDD/component/bug/PR/refactor/e2e). PreToolUse strip--verbose, suggest--pool=forks/--bail=1. PostToolUse truncate output >200 lines + file-aware orchestration guidance. Stop: violation summary + comprehensive quality gate.bunx skills@latest add redpanda-data/ui-harness/setup-agent-config --agent claude-code -y
-
setup-accessibility -- PostToolUse hook enforce ARIA accessibility patterns: ban
<img>withoutalt, ban mouse-onlyonClickon<div>/<span>(requirerole+tabIndex+ keyboard handler), enforce required ARIA attributes onrole="combobox"/role="tablist"/role="dialog". Include Playwright AXE test helper for WCAG 2.1 AA scanning + ARIA patterns reference (combobox, tabs, dialog, accordion, alert, listbox, switch, slider, radio group). Escape hatch:// allow: a11y-skip [reason].bunx skills@latest add redpanda-data/ui-harness/setup-accessibility --agent claude-code -y
-
setup-zustand -- PostToolUse hook enforce zustand best practices: ban single-parens
create<T>()(must becreate<T>()()), ban inline object selectors (suggestuseShallow), ban direct localStorage in stores (suggest persist middleware).bunx skills@latest add redpanda-data/ui-harness/setup-zustand --agent claude-code -y
-
setup-tanstack-router -- Auto-regenerate TanStack Router route tree when route files change, plus anti-pattern enforcement: ban react-router-dom, window.location navigation,
strict: false, untyped hooks (useParams()/useSearch()without{ from }), URLSearchParams (suggest nuqs), warn on exported components from route files (code splitting), requirevalidateSearchwhen usinguseSearch. Warn onwindow.location.reload()+window.locationreads. Optional: install TanStack's 28 official reference skills vianpx @tanstack/intent@latest install.bunx skills@latest add redpanda-data/ui-harness/setup-tanstack-router --agent claude-code -y -
setup-registry-workflow -- Stop hook remind about
registry.jsonrebuild + changelog update when redpanda-ui components modified.bunx skills@latest add redpanda-data/ui-harness/setup-registry-workflow --agent claude-code -y
-
setup-connect-query -- PostToolUse hook enforce ConnectRPC + Connect Query + Protobuf best practices: ban raw
useQuery/useMutationwhen ConnectRPC available (allowuseTransport/callUnaryMethod), baninvalidateQueries()with no args, warn on axios/fetch. Protobuf v2: bannew Message(),PlainMessage/PartialMessage, manual$typeNameliterals. Well-known types: warn onAnywithout@type,Timestampas plain object (usetimestampFromDate/anyPackfrom@bufbuild/protobuf/wkt). Promote Standard Schema + protovalidate. Auto-loads on*_pb*/*_connectquery*files viapaths:.bunx skills@latest add redpanda-data/ui-harness/setup-connect-query --agent claude-code -y
-
setup-e2e-testing -- Playwright for end-to-end testing with Testcontainers for backend infra, axe-core for automated WCAG 2.1 AA accessibility audits. Optional agent-browser integration for AI-driven test scaffolding + visual verification. Include test patterns for forms, tables, multi-step workflows, debugging strategies.
bunx skills@latest add redpanda-data/ui-harness/setup-e2e-testing --agent claude-code -y
-
setup-env-validation -- PostToolUse hook ban raw
process.env.Xaccess. Enforce t3-env with zod validation -- all env vars must be declared insrc/env.ts+ imported as validated object. Skip env files + test files.bunx skills@latest add redpanda-data/ui-harness/setup-env-validation --agent claude-code -y
-
setup-conventional-commits -- PreToolUse hook enforce
type(scope): descriptionformat ongit commitcommands. Replace commitlint + husky with zero dependencies. Validate type, scope (required), lowercase description, no trailing period, 5-72 character length.bunx skills@latest add redpanda-data/ui-harness/setup-conventional-commits --agent claude-code -y
-
setup-atlassian-workflow -- Opt-in Jira integration via
acli(Atlassian CLI). Mirror gh-based workflow skills for Jira users -- create work items, transition status, comment, link PRs. Works alongsidegh(ISSUE_TRACKER=both) or standalone (ISSUE_TRACKER=acli). Requireacliinstalled + authenticated.bunx skills@latest add redpanda-data/ui-harness/setup-atlassian-workflow --agent claude-code -y
-
setup-routines -- Configure Claude Code routines for unattended automation. Ships 5 prompt templates: PR review (GitHub trigger), PR feedback resolution (GitHub trigger), issue triage (GitHub trigger), weekly codebase health (schedule), docs drift detection (schedule). Routines run as full Claude Code cloud sessions -- all hooks, CLAUDE.md rules, agents enforce automatically. Stack-agnostic templates with built-in noise controls (silent approval, delta-based reporting, skip-what-hooks-catch).
bunx skills@latest add redpanda-data/ui-harness/setup-routines --agent claude-code -y -
setup-sandcastle -- AFK agent delegation via Sandcastle. N parallel agents in Docker sandboxes, each follow development-lifecycle with hooks enforced. For batch work on multiple issues simultaneously.
bunx skills@latest add redpanda-data/ui-harness/setup-sandcastle --agent claude-code -y
Skills from mattpocock/skills complement vendored ones. Install individually if need:
bunx skills@latest add mattpocock/skills/to-prd --agent claude-code -y # PRD via interview
bunx skills@latest add mattpocock/skills/to-issues --agent claude-code -y # PRD -> GitHub issues with blocking
bunx skills@latest add mattpocock/skills/git-guardrails-claude-code --agent claude-code -y # Branch protectionAlready vendored (no need install from mattpocock/skills): tdd, triage (multi-tracker GH+Jira), diagnose, improve-codebase-architecture, request-refactor-plan, design-an-interface, write-a-skill, grill-me, domain-model, qa, zoom-out, ubiquitous-language. Our versions incorporate Pocock's best patterns plus hook enforcement, lifecycle integration, DDD-light documentation, accessibility-first testing.
Note: setup-pre-commit (husky/lint-staged) intentionally omitted. Claude Code hooks already enforce linting, formatting, type checking deterministically every edit -- pre-commit hooks redundant + add friction for human devs who may prefer different workflows.
TanStack packages ship own reference skills via @tanstack/intent. Soft guidance (patterns, examples, API docs) -- no hooks, no enforcement. Install only if want Claude have deep TanStack knowledge in context:
# Install all TanStack skills from your node_modules (Router, DB, DevTools, etc.)
npx @tanstack/intent@latest installAvailable packages with skills:
- TanStack Router -- 25 skills: search params, data loading, auth guards, error handling, code splitting, type safety, navigation, SSR
- TanStack DB -- 14 skills: collections, live queries, optimistic mutations, persistence, offline transactions
- TanStack DevTools -- 9 skills: plugin panels, production devtools, instrumentation
- TanStack CLI -- 5 skills: scaffolding, addons, ecosystem integrations
Note: TanStack Query, Table, Form, Virtual have no published skills yet.
Two layers testing prevent regressions:
Script-level evals -- verify hook scripts, file structure, content. Run locally in <5 seconds:
./evals/run.sh
Agent-level evals -- 14 behavioral tests using @vercel/agent-eval verify Claude Code actually follow rules when given adversarial prompts. Runs in Docker sandbox:
cd agent-evals && bun install --yarn && npx @vercel/agent-eval
Hooks fire automatic at each stage of Claude Code session. PostToolUse hooks run concurrent for speed; Stop hooks run sequential as final quality gate.
graph TD
SS["SessionStart\nsession-env.sh, llm-env.sh"]
UP["UserPromptSubmit\nuser-prompt-context.sh\nintent-detect.sh"]
PRE["PreToolUse -- Bash\nenforce-toolchain.sh\nllm-test-flags.sh\nconventional-commits-check.sh"]
subgraph POST ["PostToolUse -- Edit|Write (~293ms, concurrent)"]
direction LR
P1["react-rules\ntailwind\naccessibility"]
P2["zustand\ntanstack-router\nconnect-query"]
P3["react-compiler\nenv-validation\nbundle-guard"]
P4["test-perf\nux-copy\norchestration-guidance"]
end
PB["PostToolUse -- Bash\nllm-truncate.sh"]
subgraph STOP ["Stop (~5-13s, sequential)"]
direction TB
ST1["biome-autofix.sh\nlint:fix changed files"]
ST2["typecheck-stop.sh\ntsgo + related tests"]
ST3["react-doctor-stop.sh\nhealth score"]
ST4["orchestration-stop.sh\nasync leaks, missing tests, security"]
ST5["violation-summary-stop.sh\naggregate session violations"]
ST1 --> ST2 --> ST3 --> ST4 --> ST5
end
SS --> UP --> PRE --> POST --> PB --> STOP
style POST fill:#ffe,stroke:#cc0
style STOP fill:#fee,stroke:#c00
Detailed hook reference (text)
SessionStart
├── session-env.sh -- PKG_MANAGER=bun, LINTER=biome, TEST_RUNNER=vitest, NODE_OPTIONS=8GB
└── llm-env.sh -- AI_AGENT=1, CLAUDECODE=1
UserPromptSubmit
├── user-prompt-context.sh -- inject git state, scripts, violations, rules, config (3 levels)
└── intent-detect.sh -- detect TDD/component/bug/PR/refactor/e2e intent, inject directives
PreToolUse (Bash)
├── enforce-toolchain.sh -- block npm/npx/tsc/eslint/prettier, enforce --yarn, guard destructive commands
├── llm-test-flags.sh -- strip --verbose (updatedInput rewrite), suggest --pool=forks/--bail
└── conventional-commits-check.sh -- enforce type(scope): description format
PostToolUse (Edit|Write) All use shared/hook-lib.sh
├── react-rules-check.sh -- React/TS/security checks (skips non-JS/TS)
├── tailwind-check.sh -- !important + raw hex ban (CSS/SCSS/TSX/JSX)
├── accessibility-check.sh -- ARIA/WCAG enforcement (~30ms, skips non-TSX/JSX)
├── zustand-check.sh -- zustand anti-patterns (~20ms, skips non-zustand files)
├── tanstack-router-check.sh -- 9 routing anti-patterns (skips non-router files)
├── connect-query-check.sh -- ConnectRPC/protobuf patterns (skips non-connect files)
├── react-compiler-check.sh -- ban manual memoization (skips 'use no memo' files)
├── env-validation-check.sh -- ban raw process.env (skips env.ts, test files)
├── bundle-guard.sh -- heavy dependency warnings (~10ms, skips non-package.json)
├── test-perf-check.sh -- detect test perf anti-patterns (dynamic imports, missing pool/isolate)
└── orchestration-guidance.sh -- file-aware guidance (test patterns, a11y, security) + category tracking
PostToolUse (Bash)
└── llm-truncate.sh -- truncate output >200 lines
Stop
├── biome-autofix.sh -- lint:fix all changed JS/TS files
├── typecheck-stop.sh -- tsgo type check + related tests (vitest, jest/bun fallback)
├── react-doctor-stop.sh -- health check on changed files (--diff mode)
├── registry-check.sh -- remind about registry.json rebuild (skips if no redpanda-ui dir)
├── orchestration-stop.sh -- quality gate: missing tests, async leaks, security, co-located tests
├── test-perf-stop.sh -- test performance audit (before/after timing comparison)
└── violation-summary-stop.sh -- session violation aggregator
Non-JS/TS file edits (Go, Python, Markdown, etc.) get zero overhead -- all hooks exit immediately on non-matching file extensions.
PostToolUse hooks run concurrently -- wall-clock time is slowest hook, not sum.
| Scenario | Wall-clock | What happens |
|---|---|---|
Edit .go / .md / .css file |
~80ms | All hooks exit immediately on extension check |
Edit .tsx file (clean code) |
~293ms | Slowest hook (react-rules) runs full diff + 19 grep checks |
Edit package.json |
~80ms | Only bundle-guard runs, rest exit |
~80ms floor = bash process spawn + jq parse -- fixed cost regardless of hook count. More hooks don't increase wall-clock since run parallel.
| Hook | Time | Notes |
|---|---|---|
| enforce-toolchain.sh | ~166ms | Grep chain on command string |
| conventional-commits-check.sh | ~91ms | Only does real work on git commit -m |
| Total | ~257ms | Sequential, runs before every Bash call |
| Hook | Time | Notes |
|---|---|---|
| biome-autofix.sh | 1-3s | Only changed files, skips UI library dirs |
| typecheck-stop.sh | 2-5s | tsgo (incremental) + related tests only |
| test-perf-stop.sh | 1-3s | Compare timings against session-start baseline |
| react-doctor-stop.sh | 1-2s | --diff mode, changed files only |
| Total | ~5-13s | Runs once when Claude finishes, not per edit |
Hook messages compressed (inspired by Caveman + arxiv:2604.00025). LLM already know rules from SKILL.md -- hooks reminders, not tutorials.
| Optimization | Savings |
|---|---|
| Compressed systemMessage strings | ~40% fewer tokens per violation |
| Guidance deduplication (once per category per session) | ~50-70% fewer orchestration tokens |
HOOK_VERBOSITY=terse (blocks only) |
Suppress all warns |
| REFERENCE.md trimmed to essentials | -21% input tokens on skill loads |
Combined: ~6,700 fewer tokens per session vs unoptimized hooks. For long sessions or multi-agent (Sandcastle), use HOOK_VERBOSITY=terse + PROMPT_CONTEXT_LEVEL=minimal to minimize overhead.
Typical Claude Code tool call takes 3-8 seconds (network + LLM inference). PostToolUse overhead ~293ms = 3-8% -- imperceptible. Stop hooks at 4-10s replace what'd run manually (lint, type check, tests) + only target changed/related files.
Codex = first-class harness. Supported equivalent hooks are mapped directly: SessionStart, UserPromptSubmit, PreToolUse, PostToolUse, Stop, plus Codex-only PermissionRequest through an adapter that reuses hard-deny Bash/MCP guardrails. Claude PostToolUseFailure maps to Codex PostToolUse because Codex post-tool hooks include failed Bash commands. Claude-only events without Codex equivalents (FileChanged, compact hooks, SessionEnd, Subagent hooks, WorktreeCreate) stay Claude-only until Codex adds matching lifecycle events. AGENTS.md at repo root replaces PostCompact context re-injection.
All hook paths use $(git rev-parse --show-toplevel) for resolution -- works from any CWD, silently skip in repos without hooks installed.
-
codex-compat -- Install
.codex/hooks.json, batch checker,AGENTS.md. Session state harness-agnostic (CLAUDE_SESSION_IDorCODEX_SESSION_ID).bunx skills@latest add redpanda-data/ui-harness/codex-compat --agent claude-code -y
In-house, zero-deps, Node + TypeScript. Ship as part of shared/reporters/. Scaffold into consumer repos via starter kit.
Vitest: shared/reporters/vitest-llm-reporter.ts -- silent-pass ok N trailer (~10 bytes); fail-case single-line JSON capped at VITEST_LLM_MAX_FAILURES (default 20). Wire:
import LlmReporter from './shared/reporters/vitest-llm-reporter';
export default defineConfig({ test: { reporters: [new LlmReporter()] } });Playwright: shared/reporters/playwright-llm-reporter.ts -- silent-pass ok N trailer with optional skip/flaky counts; fail-case single-line JSON capped at PW_LLM_MAX_FAILURES (default 15). Wire:
export default defineConfig({
reporter: [['./shared/reporters/playwright-llm-reporter.ts']],
});Both reporters follow same pattern: minimal-signal (not silent) on pass to defeat silent-glob-typo bugs, structured JSON on fail capped to prevent runaway output. Typical savings vs default reporters: 10-100x on stdout, which Claude reads back as tokens.
Figma + Chrome MCP servers = claude.ai account-level connectors, not project config. To disable (recommended for token savings if prefer subagent-browser patterns):
/mcpinside Claude Code to list currently connected servers- https://claude.ai/settings/connectors -- disconnect
Figma+Chrome Browserif want free ~4KB of system-prompt instructions they consume per turn
Prior art, techniques, related work that informed design decisions in this harness:
- vitest-llm-reporter (hansjm10) -- JSON schema + token-budget pattern inspiration
- @zenai/playwright-coding-agent-reporter -- consolidated all-failures pattern (community, MIT)
- Playwright Reporter API -- official interface docs
- Vitest Reporter interface -- official interface docs
- Playwright Test Agents -- official AI ecosystem (Planner/Generator/Healer)
- @playwright/mcp -- official Playwright MCP server
- TOON (Token-Oriented Object Notation) -- schema-aware JSON alternative for LLM prompts
- CTON (Compact Token-Oriented Notation) -- JSON-compatible wire format
- Caveman plugin (JuliusBrussee) -- per-turn style reinforcement, multi-intensity modes
- arxiv:2604.00025 -- prompt compression research


