ClawBio 2.0

A stateful workflow CLI that turns ClawBio bioinformatics skills into DAG-ordered, reproducible runs. Same "the LLM orchestrates but does not improvise" pitch as ClawBio 1.x, but this time a tiny Python CLI enforces it instead of prose asking nicely.

ClawBio 1.x is a library of 64 skills across GWAS, variant calling, scRNA-seq, pharmacogenomics, and more. 2.0 keeps the skills and rewires the spine.

Why I rewrote it

The default pattern for Claude Code Agent Skills today is: write a long SKILL.md, drop it into the agent's context, hope it follows the steps in order. For one-shot skills ("format this code", "draft this email") that's fine. For workflows where step 3 eats step 2's output and skipping step 1 silently corrupts the result, it isn't.

1.x was 64 carefully written SKILL.md files. But they can potentially run into the following issues:

The agent skips prereqs. Instructions are suggestions; it re-reads, reorders, or drops steps based on what it decides the user really wants.
The agent hallucinates outputs. When a step is awkward, it invents a plausible-looking result and moves on. Downstream steps consume the fiction.
Same skill, same inputs, different commands. "Reproducibility" was a sentence in the prose, not a property of the system.
/compact and /clear destroy progress. The skill's instructions survive (they get re-loaded), but where the agent was in the workflow doesn't.

I tried out several things (better prompting, LangGraph, XState, RAG-on-the-workflow, ...). None of them addressed the actual problem, which is that prose isn't a contract. You can't enforce a DAG by asking nicely in markdown.

So 2.0 moves the workflow out of the prompt and into a small external state machine:

Each skill ships a pipeline.yaml declaring steps, prereqs, inputs, outputs.
The CLI is the only legal way to advance: start → next → open → done → bundle.
open refuses if prereqs aren't met. done refuses if declared outputs are missing. bundle refuses if any step is unfinished. A Claude Code Stop hook refuses to let the session end without a bundle.
SKILL.md collapses from hundreds of lines of prose to ~30 lines of "call clawbio next". The agent stops trying to remember a workflow and starts asking what's next.

Given a clean spec, generating the next command is something LLMs are genuinely good at, and they stop bluffing through a workflow half-remembered from a markdown file. Tighter boundaries does more for output quality than any amount of prompt-wrangling I tried before.

Bioinformatics is the testbed because dependent steps hurt most here: a hallucinated VCF normalization quietly poisons every downstream analysis. But the pattern generalizes to anywhere you have prereqs, side effects, and a need for reproducibility. Data engineering, security audits, ML pipelines, build/release.

What it does

Each skill ships a pipeline.yaml that declares its steps, their prereqs, and the inputs/outputs each step needs. The CLI:

validates the pipeline statically at start time (unknown placeholders, forward refs, undeclared requires, duplicate ids all fail loudly),
holds a single .clawbio-run/state.json plus an append-only event log,
refuses to open a step whose prereqs or requires are unmet,
hashes every declared output at done time,
emits a reproducibility bundle (commands.sh, checksums.sha256, environment.yml) that mirrors actual execution order.

The same state.json serves three audiences: the agent (what's next?), the human reviewer (was it reproducible?), and the benchmark harness (did the agent follow the DAG?). One file, three jobs.

Install

pip install -e .

That gives you the clawbio script. Python ≥ 3.9, only runtime dependency is pyyaml.

Quickstart

The repo ships a placeholder skill in example/variant_annotation/ that uses cp and echo (no bcftools or VEP required), so you can exercise the CLI end-to-end without installing the bio stack.

# 1. start a run from a skill directory
clawbio start example/variant_annotation \
  --input vcf_path=/tmp/x.vcf \
  --input reference=/tmp/ref.fa

# 2. ask what's ready
clawbio next                   # -> normalize

# 3. open it (renders the command, marks in_progress)
clawbio open normalize

# 4. run the command yourself, then report outputs
clawbio done normalize --output normalized_vcf=.clawbio-run/work/normalize/normalized.vcf

# 5. repeat until `clawbio next` says nothing is ready, then bundle
clawbio bundle

clawbio status prints a self-sufficient human-readable summary, written so it can fully re-orient an agent after /compact or /clear.

For the full driver guide (every verb, every refusal, the contract an agent needs to follow), see USAGE.md. Each skill's SKILL.md links there rather than re-explaining the CLI.

Subcommands

Command	What it does
`start <skill-dir> --input k=v ...`	Load `pipeline.yaml`, initialize `.clawbio-run/state.json`.
`status [--check-bundle]`	Print run state. With `--check-bundle`, exits 2 if a finished run is unbundled.
`next`	Print the id of the next step whose prereqs are satisfied.
`open <step-id>`	Render `{placeholders}` in `command_template`, mark the step `in_progress`.
`done <step-id> --output k=path ...`	Hash outputs, advance state, recompute downstream blocks.
`bundle`	Emit `commands.sh`, `checksums.sha256`, `environment.yml`. Refuses if any step is unfinished.

`pipeline.yaml` in one glance

skill: variant_annotation
version: 1

inputs:
  - name: vcf_path
    required: true
  - name: reference
    required: true

steps:
  - id: normalize
    summary: "bcftools-norm placeholder"
    requires_inputs: [vcf_path, reference]
    emits: [normalized_vcf]
    command_template: "cp {vcf_path} {out}/normalized.vcf"

  - id: annotate
    prereqs: [normalize]
    requires: {normalized_vcf: present}
    emits: [annotated_vcf]
    command_template: "cp {normalized_vcf} {out}/annotated.vcf"

The schema is deliberately small: no if/when, no loops, no fan-out, no templating beyond {var} substitution. If a step needs branching, write a Python script and make that script the step body. The machine-readable draft-07 JSON Schema lives at pipeline.schema.json.

Claude Code integration

Drop hooks/stop-bundle.json into your ~/.claude/settings.json (or a project-level settings.json) so Claude Code refuses to end a turn while a finished run is still unbundled.

Repo layout

clawbio/ - the v0 CLI (~500 lines: cli.py, commands.py, pipeline.py, state.py, bundle.py).
skills/ - ported ClawBio 1.x skills with pipeline.yaml. Currently: gwas_lookup (11-step federated variant lookup).
example/ - runnable demos that exercise the CLI without external tools. Currently: variant_annotation (cp/echo placeholder, not a real port).
hooks/ - Claude Code hook snippets.
tests/ - pytest harness covering pipeline loading, schema conformance, and an end-to-end run. Real filesystem, real state.json, no mocking: 1.x got burned by mock/prod divergence.
USAGE.md - agent-facing driver guide for the CLI.
pipeline-schema.md - authoritative pipeline.yaml reference (for skill authors).
pipeline.schema.json - JSON Schema draft-07 artifact.

Status & ongoing work

ClawBio 2.0 is in progress. Not a finished product.

The CLI (v0) works. Exercised end-to-end against two demo skills (variant_annotation, gwas_lookup) covering the full start → next → open → done → bundle cycle, schema validation, block recomputation, and the reproducibility bundle.
Porting 1.x skills is the current focus. All 64 ClawBio 1.x skills need a pipeline.yaml and a trimmed SKILL.md. scRNA-seq will stress the schema hardest and is the next real port after variant_annotation.
Benchmarking is required, not optional. Every claim in "Why I rewrote it" above (fewer skipped prereqs, fewer hallucinated outputs, real reproducibility, recovery from /compact) is a hypothesis until the numbers say otherwise. The two-axis benchmark plan (scientific correctness via the existing clawbio_bench harness + new behavioral metrics queried directly from state.json) is sketched in bench-pipeline-brainstorm.md. Running it head-to-head against 1.x (same skills, same inputs, same model) is what will tell me whether 2.0 is actually the win it looks like on paper. BixBench (current SOTA 52.2%, Biomni Lab) is the external comparison target.
Expect breaking changes. pipeline.yaml is at v1 but not frozen; CLI flags and state.json shape may shift as real skills surface edge cases the demos miss.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
clawbio		clawbio
example/variant_annotation		example/variant_annotation
hooks		hooks
skills/gwas_lookup		skills/gwas_lookup
tests		tests
.gitignore		.gitignore
README.md		README.md
USAGE.md		USAGE.md
pipeline.schema.json		pipeline.schema.json
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ClawBio 2.0

Why I rewrote it

What it does

Install

Quickstart

Subcommands

`pipeline.yaml` in one glance

Claude Code integration

Repo layout

Status & ongoing work

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ClawBio 2.0

Why I rewrote it

What it does

Install

Quickstart

Subcommands

pipeline.yaml in one glance

Claude Code integration

Repo layout

Status & ongoing work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`pipeline.yaml` in one glance

Packages