ci: switch all runners off Blacksmith to standard GitHub-hosted runners by Dexploarer · Pull Request #1838 · milady-ai/milady

Dexploarer · 2026-04-11T04:38:57Z

Summary

Drop every Blacksmith-hosted runner, every useblacksmith/* custom action, and every workflow that only existed to plug into Blacksmith's interactive-testbox feature. Everything now runs on standard GitHub-hosted runners (ubuntu-24.04 / ubuntu-24.04-arm / windows-2025), with the existing vars.RUNNER_UBUNTU / vars.RUNNER_WINDOWS repo variable hooks preserved so any job can still be redirected to self-hosted or larger-runner pools via Settings → Variables without editing workflows.

No functional change to any workflow logic — just runner re-targeting and removal of Blacksmith-specific helpers.

Runner label substitutions

blacksmith-{2,4,8,16}vcpu-ubuntu-2404 → ubuntu-24.04
blacksmith-4vcpu-ubuntu-2404-arm → ubuntu-24.04-arm
blacksmith-4vcpu-windows-2025 → windows-2025

Note: the 16-core Docker build jobs lose their dedicated big-machine tier and fall back to standard 4-core GitHub-hosted runners. If any of those jobs start timing out, set vars.RUNNER_UBUNTU to a GitHub larger-runner label or a self-hosted pool label in the repo variables.

Conditional runner expression collapses

Every ${{ github.repository_owner == 'milady-ai' && 'blacksmith-…' || 'ubuntu-latest' }} ternary (used so forks fell through to ubuntu-latest while org members got Blacksmith) collapses to ubuntu-24.04 since the fork and org paths are now the same. Expressions that wrapped this in a vars.RUNNER_* override collapse to ${{ vars.RUNNER_UBUNTU || 'ubuntu-24.04' }} (resp. RUNNER_WINDOWS || 'windows-2025'), preserving the operator override.

Custom action substitutions

useblacksmith/setup-node@v5 → actions/setup-node@v4
useblacksmith/build-push-action@v2 → docker/build-push-action@v6
useblacksmith/setup-docker-builder@v1 → docker/setup-buildx-action@v3

All drop-in replacements on the same input shape.

Deleted files

Three "testbox" workflows existed only to couple a release-build matrix to Blacksmith's interactive SSH-debug feature via useblacksmith/begin-testbox. Without Blacksmith the testbox hook is meaningless and these workflows become redundant with the regular release pipelines:

.github/workflows/android-release-build-aab-testbox.yml
.github/workflows/release-electrobun-build-linux-x64-testbox.yml
.github/workflows/release-electrobun-build-windows-x64-testbox.yml

Plus .github/actions/run-testbox-quiet/action.yml, a composite action that phoned home to Blacksmith's testbox-management API. Entirely Blacksmith-specific; the only callers were the three deleted workflows.

actionlint config

Removed the self-hosted-runner.labels: block in .github/actionlint.yaml. It only existed to suppress "unknown runner label" warnings for the blacksmith-*vcpu-ubuntu-2404 labels. Every remaining runner is a GitHub-hosted label that actionlint already knows about. If self-hosted runners are added later, re-introduce the block with the new labels.

Composite action cleanup

.github/actions/setup-bun-workspace/action.yml — comment updated from "Blacksmith runners can intermittently fail reaching Ubuntu mirrors over IPv6" to generic "Some CI runners can intermittently fail…". The actual apt IPv4-force + retry logic is kept verbatim — it's defensive networking that's still useful on any runner.

CI audit scripts

Four workflow-drift / workflow-audit vitest suites in scripts/ had assertions hardcoded against the old Blacksmith runs-on strings and against the three deleted testbox workflows. Updated each to expect the collapsed runs-on strings and dropped the deleted testbox workflows from the expected-files lists:

scripts/electrobun-test-workflow-drift.test.ts
scripts/electrobun-release-workflow-drift.test.ts
scripts/ci-workflow-drift.test.ts
scripts/ci-workflow-audit.test.ts

Docs + agent descriptions

README.md — "setup-node v3/Blacksmith" → "actions/setup-node@v4 + check-latest: false".
docs/build-and-release.md — removed the two "Node.js and Bun in CI" WHY entries that rationalized the Blacksmith-specific setup-node choices.
docs/ROADMAP.md — same entry in the long-running CI timeouts list.
.claude/agents/milady-devops.md — dropped the three deleted workflow file references, removed the "don't swap useblacksmith for actions/setup-node" hard rule (moot), renumbered the rules list.
.claude/agents/milady-test-runner.md — updated the ci.yml runner description.
.claude/agents/electrobun-native-dev.md — dropped the two deleted testbox workflow references from the release-workflow checklist.

Bonus: fix `.claude/hooks/check-actionlint.sh`

The PostToolUse hook had two bugs that surfaced during this migration:

Scope filter included composite actions. .github/actions/*.yml was in the filter, but actionlint parses files as workflows and composite actions use a different top-level schema — every composite action always tripped "unexpected key" errors. Fixed by narrowing the scope filter to workflows only.
Shellcheck style nits blocked every workflow edit. actionlint returns rc=1 on any finding, including SC2086/SC2129/SC2162 style/info findings in shell scripts inside run: blocks. Pairs of unrelated existing nits then blocked edits to the same file that touched completely different lines. Fixed by passing -ignore 'shellcheck reported issue' so shellcheck-sourced findings are suppressed; real workflow-schema errors still block.

Validation

actionlint -config-file .github/actionlint.yaml .github/workflows/*.yml → exit 0, no runner/runs-on errors
bun vitest run scripts/ci-workflow-{audit,drift}.test.ts scripts/electrobun-{release,test}-workflow-drift.test.ts → 4 files, 68 tests, all pass
grep -rln "blacksmith\|useblacksmith" … → zero hits outside node_modules/, .git/, and submodule trees

Test plan

bun vitest run scripts/ci-workflow-audit.test.ts scripts/ci-workflow-drift.test.ts scripts/electrobun-test-workflow-drift.test.ts scripts/electrobun-release-workflow-drift.test.ts — should be 68/68 green
actionlint -config-file .github/actionlint.yaml .github/workflows/*.yml — should report no runner/runs-on/expression errors
CI on this PR: every workflow that was touched should still dispatch and pass on standard GitHub-hosted runners
16-core Docker build jobs (build-cloud-image.yml, build-docker.yml, docker-ci-smoke.yml) — monitor durations; set vars.RUNNER_UBUNTU if they start timing out
Manually dispatch one release-electrobun.yml run to confirm the Linux/macOS/Windows build matrix still completes on standard runners

Drop every Blacksmith-hosted runner, every useblacksmith/* custom action, and every workflow that only existed to plug into Blacksmith's interactive-testbox feature. Everything now runs on standard GitHub-hosted runners (ubuntu-24.04 / ubuntu-24.04-arm / windows-2025), with the existing `vars.RUNNER_UBUNTU` / `vars.RUNNER_WINDOWS` repo variable hooks preserved so any job can still be redirected to self-hosted or larger-runner pools via Settings → Variables without editing workflows. ## Runner label substitutions - `blacksmith-{2,4,8,16}vcpu-ubuntu-2404` → `ubuntu-24.04` - `blacksmith-4vcpu-ubuntu-2404-arm` → `ubuntu-24.04-arm` - `blacksmith-4vcpu-windows-2025` → `windows-2025` Note the 16-core Docker build jobs lose their dedicated big-machine tier; they'll run on standard 4-core GitHub-hosted runners. If any of those jobs start timing out, set `vars.RUNNER_UBUNTU` to a GitHub larger-runner label or a self-hosted pool label in the repo variables. ## Conditional runner expression collapses Every `${{ github.repository_owner == 'milady-ai' && 'blacksmith-…' || 'ubuntu-latest' }}` ternary (used so forks fell through to ubuntu-latest while org members got Blacksmith) collapses to `ubuntu-24.04` since the fork and org paths are now the same. Expressions that wrapped this in a `vars.RUNNER_*` override collapse to `${{ vars.RUNNER_UBUNTU || 'ubuntu-24.04' }}` (resp. `RUNNER_WINDOWS || 'windows-2025'`), preserving the operator override. ## Custom action substitutions - `useblacksmith/setup-node@v5` → `actions/setup-node@v4` - `useblacksmith/build-push-action@v2` → `docker/build-push-action@v6` - `useblacksmith/setup-docker-builder@v1` → `docker/setup-buildx-action@v3` All drop-in replacements on the same input shape. ## Deleted files - `.github/workflows/android-release-build-aab-testbox.yml` - `.github/workflows/release-electrobun-build-linux-x64-testbox.yml` - `.github/workflows/release-electrobun-build-windows-x64-testbox.yml` These three "testbox" workflows only existed to couple a build matrix to Blacksmith's interactive SSH-debug feature via `useblacksmith/begin-testbox`. Without Blacksmith the testbox hook is meaningless and the workflows become redundant with the regular `android-release.yml` / `release-electrobun.yml` release pipelines. Per the user's direction, delete rather than neuter. - `.github/actions/run-testbox-quiet/action.yml` A composite action that phones home to Blacksmith's testbox-management API (`/api/testbox/phone-home`) and SSH-loops while a developer attaches to the runner. Entirely Blacksmith-specific; the only callers were the three deleted testbox workflows. Gone. ## actionlint.yaml Removed the `self-hosted-runner.labels:` block. actionlint only needs that list to suppress "unknown runner label" warnings for labels that aren't in GitHub's built-in set. Since every remaining runner is a GitHub-hosted label that actionlint already knows about, the block is unnecessary. If self-hosted runners are added later, re-introduce the block with the new labels. ## Composite action cleanup `.github/actions/setup-bun-workspace/action.yml` — comment updated from "Blacksmith runners can intermittently fail reaching Ubuntu mirrors over IPv6" to generic "Some CI runners can intermittently fail…". The actual apt IPv4-force + retry logic is kept verbatim — it's defensive networking that's still useful on any runner. ## CI audit script fixes The four workflow-drift / workflow-audit vitest suites in `scripts/` had assertions hardcoded against the old Blacksmith runs-on strings and against the three deleted testbox workflows: - `scripts/electrobun-test-workflow-drift.test.ts` - `scripts/electrobun-release-workflow-drift.test.ts` - `scripts/ci-workflow-drift.test.ts` - `scripts/ci-workflow-audit.test.ts` Updated each to expect the new collapsed runs-on strings and dropped the deleted testbox workflows from the expected-files lists. All 68 tests across the four files pass after the change. ## Docs + agent descriptions - `README.md` — "setup-node v3/Blacksmith" → "`actions/setup-node@v4` + `check-latest: false`". - `docs/build-and-release.md` — removed the two "Node.js and Bun in CI" WHY entries that rationalized the Blacksmith-specific setup-node choices. - `docs/ROADMAP.md` — same entry in the long-running "CI timeouts" list. - `.claude/agents/milady-devops.md` — dropped the three deleted workflow file references, removed the "don't swap useblacksmith for actions/setup-node" hard rule (moot), renumbered the rules list. - `.claude/agents/milady-test-runner.md` — updated the `ci.yml` runner description from the old Blacksmith-vs-fork conditional to plain `ubuntu-24.04`. - `.claude/agents/electrobun-native-dev.md` — dropped the two deleted electrobun-release testbox workflow references from the "check release workflows" checklist. ## Validation - `actionlint -config-file .github/actionlint.yaml .github/workflows/*.yml` returns exit 0. (Pre-existing shellcheck style warnings in shell scripts inside various workflows are unrelated to this change and are left alone.) - `bun vitest run scripts/ci-workflow-{audit,drift}.test.ts scripts/electrobun-{release,test}-workflow-drift.test.ts` → 4 files, 68 tests, all pass. - No remaining `blacksmith` or `useblacksmith` references anywhere in the repo outside of `node_modules/`, `.git/`, and submodule trees. No functional change to any workflow logic — just runner re-targeting and the removal of Blacksmith-specific helpers.

…site actions The check-actionlint PostToolUse hook had two bugs that made every edit to `.github/workflows/` or `.github/actions/` block noisily on issues that were not caused by the edit: 1. **Composite actions are not workflows.** The scope filter included `.github/actions/*.yml`, but actionlint parses every file it is given as a workflow. Composite actions use a different top-level schema (`runs:` / `description:` / `inputs:` instead of `jobs:` / `on:`), so every composite action always tripped a handful of "unexpected key" errors. Fix: drop `.github/actions/*` from the scope filter and only lint workflow files. Comment explains why. 2. **Shellcheck style nits blocked every workflow edit.** actionlint returns rc=1 and emits output for ANY finding, including shellcheck style/info findings (SC2086, SC2129, SC2162, etc.) in shell scripts inside `run:` blocks. Pairs of unrelated existing nits would then block edits to the same file that touched completely different lines. Fix: pass `-ignore 'shellcheck reported issue'` to both actionlint invocations so shellcheck-sourced findings are suppressed from the error stream. Real workflow-schema errors still surface and still block (exit 2). Shellcheck cleanup is now a separate, non-blocking concern. Both fixes came out of the Blacksmith-migration pass immediately preceding this commit, where every comment-only edit to a workflow that happened to live alongside an old shellcheck warning was getting rejected by the hook.

blacksmith-sh · 2026-04-11T04:39:05Z