Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 85 additions & 1 deletion Docs/Sandbox/vz-linux-prepared-host-evidence.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,90 @@ triage issue.

## Latest Evidence

### 2026-07-03: local-operator launchd drill on `codex/vz-launchd-drill-evidence`

- Evidence source: local operator run on the same prepared Apple silicon macOS
host, using the manual `vz-helperctl.py launchd-drill` lifecycle check from
`origin/dev` after PR `#2601` merged the stale-socket evidence packet.
- Operator or workflow run: local shell run; no GitHub Actions workflow URL.
Git state at capture time was branch `codex/vz-launchd-drill-evidence` at
`origin/dev` merge commit `f2d9be986499eb1bfda36f566870a98e8dd90d0d` plus
this evidence/backlog documentation update.
- Host identity: Apple silicon `arm64`, macOS 15.6 build `24G84`, Darwin
`24.6.0`; local developer machine rather than a dedicated CI runner.
- Host prep: helper build used `vz-helperctl.py build` outside the managed
filesystem sandbox because Swift/Clang needed access to
`~/.cache/clang/ModuleCache`. The drill signed the helper with
`tools/macos-vz-helper/macos-vz-helper.entitlements` before bootstrap.
- Runtime paths: runtime root
`/private/tmp/tldw-vz-launchd-drill-launchd-drill-20260703-171446`, unique
LaunchAgent label
`org.tldw.macos-vz-helper.drill.codex.launchd-drill-20260703-171446`, helper
socket
`/private/tmp/tldw-vz-launchd-drill-launchd-drill-20260703-171446/helper.sock`,
log directory
`/private/tmp/tldw-vz-launchd-drill-launchd-drill-20260703-171446/logs`,
plist
`/private/tmp/tldw-vz-launchd-drill-launchd-drill-20260703-171446/org.tldw.macos-vz-helper.drill.codex.launchd-drill-20260703-171446.plist`,
and artifact directory
`/private/tmp/tldw-vz-launchd-drill-launchd-drill-20260703-171446/artifacts`.
Runtime, logs, and artifacts directories were owner-only mode `0700`.
- Commands:

```bash
tools/macos-vz-helper/scripts/vz-helperctl.py build

tools/macos-vz-helper/scripts/vz-helperctl.py launchd-drill \
--helper /Users/macbook-dev/Documents/GitHub/tldw_server2/.worktrees/vz-launchd-drill-evidence/tools/macos-vz-helper/.build/debug/macos-vz-helper \

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remediation recommended

1. Workstation paths committed 🐞 Bug ⚙ Maintainability

The new launchd-drill evidence packet records a developer-specific absolute home directory path
("/Users/..."), which makes the docs less portable and leaks workstation-specific directory
structure. Other backlog artifacts explicitly sanitize workstation-specific paths and sometimes use
a redacted placeholder, so these additions regress that documentation hygiene.
Agent Prompt
## Issue description
The PR commits workstation-specific absolute paths (e.g., `/Users/<name>/...`) into documentation/backlog evidence notes. This reduces portability and regresses the repo’s existing practice of sanitizing/redacting workstation-specific paths.

## Issue Context
The evidence remains useful without the full local path; placeholders like `<local_worktree_path_redacted>`, `$WORKTREE`, `~`, or a relative repo path are typically sufficient.

## Fix Focus Areas
- Docs/Sandbox/vz-linux-prepared-host-evidence.md[138-144]
- backlog/tasks/task-12137 - Record-VZ-launchd-drill-prepared-host-evidence.md[33-45]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

--socket /private/tmp/tldw-vz-launchd-drill-launchd-drill-20260703-171446/helper.sock \
--log-dir /private/tmp/tldw-vz-launchd-drill-launchd-drill-20260703-171446/logs \
--plist-output /private/tmp/tldw-vz-launchd-drill-launchd-drill-20260703-171446/org.tldw.macos-vz-helper.drill.codex.launchd-drill-20260703-171446.plist \
--label org.tldw.macos-vz-helper.drill.codex.launchd-drill-20260703-171446 \
--entitlements tools/macos-vz-helper/macos-vz-helper.entitlements \
--write-plist \
--create-dirs \
--skip-smoke \
--json
```

- Results: an initial diagnostic attempt passed a relative `--helper` path; the
generated LaunchAgent plist preserved that relative `ProgramArguments` value,
so launchd loaded and kicked the service but helper readiness failed with
`helper_ping_failed`. The accepted prepared-host evidence reran the drill with
an absolute helper path and passed with exit code `0`: preflight reported
`launchd_service_absent`, helper signing passed, `launchd_bootstrap`,
`launchd_status`, and `launchd_kickstart` passed, helper readiness passed,
`protocol_version=1`, `helper_version=0.1.0`, and `launchd_bootout` passed.
- Cleanup: after the drill-owned bootout, an explicit follow-up
`launchd status` returned exit code `1` with `launchd_status_failed=113`, and
an extra bootout returned `No such process`, confirming the LaunchAgent was
no longer loaded. Direct helper status showed no pid file and
`process=helper_not_running` / `ping=helper_ping_failed`; the socket file
remained as an inactive socket under the private runtime directory. That
stale socket is isolated by the `0700` parent and is covered by the separate
stale-socket recovery drill.
- Artifacts: retained under the artifact directory:
`launchd-drill.json`, `launchd-status-after-drill.json`,
`helper-status-after-launchd-bootout.json`, `launchd-bootout-after-drill.txt`,
`runtime-stat.txt`, `paths.txt`, exit code files, and `artifact-list.txt`.
Helper stdout/stderr were retained under the log directory and were empty,
both SHA-256
`e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855`.
- Expected skips: no PR workflow, no nightly schedule, no self-hosted runner
URL, no real `vz_linux` VM smoke because this drill intentionally used
`--skip-smoke`, no host reboot drill, and no manual boot/readiness fault
injection.
- Blocking regressions: none observed for the selected manual launchd lifecycle
drill. The relative-helper diagnostic attempt is recorded as an operator input
issue for launchd plists, not as the accepted evidence result.
- Residual gaps: launchd-managed real VM smoke, host-reboot, and manual
boot/readiness fault-injection evidence remain manual/operator-gated items.
Broader unclassified helper crash recovery and long-term evidence retention
remain separate follow-ups.
- Follow-up owner: `TASK-12137` records this evidence/update slice; repeat the
launchd drill when launchd scaffolding, helper signing, or plist generation
behavior changes.

### 2026-07-03: local-operator stale-socket drill on `codex/vz-stale-socket-evidence`

- Evidence source: local operator run on the same prepared Apple silicon macOS
Expand Down Expand Up @@ -628,7 +712,7 @@ triage issue.
| --- | --- | --- |
| Prepared-host default smoke evidence | Recorded locally on 2026-06-16 with helper daemon smoke, real ephemeral execution, same-session reuse, and recovery diagnostics/dry-run repair smoke passing. | Repeat periodically through a trusted local or host-gated run and add newer evidence packets as needed. |
| Failure-drill evidence | Recorded locally on 2026-06-16 with drill-owned stale VM replacement and smoke-owned helper restart drill passing. | Repeat when runtime/helper recovery behavior changes; keep manual opt-in only. |
| Launchd-drill evidence | Manual opt-in only. | Record results only when a runner is intentionally configured for LaunchAgent validation. |
| Launchd-drill evidence | Manual launchd lifecycle evidence was recorded locally on 2026-07-03 with isolated LaunchAgent bootstrap, kickstart, helper readiness, protocol/version check, and drill-owned bootout passing under `--skip-smoke`. | Repeat when launchd scaffolding, helper signing, or plist generation behavior changes. Run launchd-managed real VM smoke only when explicitly requested. |
| Host reboot recovery | Manual `host-reboot-drill pre/post` procedure only and out of scheduled CI. | Record results when a maintainer explicitly runs the reboot drill on a prepared host that can tolerate disruptive reboot testing and preserve logs. |
| Stuck boot/readiness | Host-independent helper and runner coverage verifies boot-driver failure cleanup, guest-readiness failure cleanup, and no reusable session state after create failure. The default prepared-host smoke still does not inject real boot faults. | Record manual prepared-host evidence only after a separate reviewed fault-injection plan; diagnostics/evidence should report stable reason codes and artifact pointers, not raw serial log contents. |
| Guest-agent mismatch | Not covered by the default smoke. | Use `Docs/superpowers/specs/2026-05-18-vz-linux-lifecycle-drill-gaps-design.md` to guide narrow tests or diagnostics checks before considering automated coverage. |
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
---

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The filename contains spaces and mixed casing (e.g., task-12137 - Record-VZ-launchd-drill-prepared-host-evidence.md). It is highly recommended to use lowercase kebab-case without spaces for filenames in the repository (e.g., task-12137-record-vz-launchd-drill-prepared-host-evidence.md). This prevents potential issues with shell scripts, command-line tools, and link resolution in markdown parsers.

id: TASK-12137
title: Record VZ launchd drill prepared-host evidence
status: Done
assignee: []
created_date: '2026-07-04 00:05'
updated_date: '2026-07-04 00:16'
labels:
- sandbox
- vz_linux
- evidence
- launchd
- lifecycle
dependencies: []
references:
- Docs/Sandbox/vz-linux-prepared-host-evidence.md
- Docs/Sandbox/macos-runtime-operator-notes.md
- tools/macos-vz-helper/README.md
priority: medium
---

## Acceptance Criteria
<!-- AC:BEGIN -->
- [x] #1 Run launchd-drill with a unique LaunchAgent label and private runtime/plist/log paths on the prepared macOS host.
- [x] #2 Record launchd bootstrap/kickstart/status/bootout result, runtime mode, helper stdout/stderr paths, cleanup state, and pass/fail/skip result in the prepared-host evidence tracker.
- [x] #3 Keep the slice evidence/docs-only and do not expand PR/push/scheduled CI triggers.
- [x] #4 Verification and Bandit applicability are recorded in Backlog.
<!-- AC:END -->

## Implementation Notes

<!-- SECTION:NOTES:BEGIN -->
Worktree: /Users/macbook-dev/Documents/GitHub/tldw_server2/.worktrees/vz-launchd-drill-evidence
Branch: codex/vz-launchd-drill-evidence
Base: origin/dev f2d9be986499eb1bfda36f566870a98e8dd90d0d
Accepted runtime artifact root: /private/tmp/tldw-vz-launchd-drill-launchd-drill-20260703-171446

Built the helper with vz-helperctl.py build outside the managed filesystem sandbox because Swift/Clang needed access to ~/.cache/clang/ModuleCache. The launchd drill signed the helper with tools/macos-vz-helper/macos-vz-helper.entitlements before bootstrap.

A first diagnostic launchd-drill attempt passed a relative --helper path. The generated LaunchAgent plist preserved that relative ProgramArguments value, so launchd loaded/kicked the service but helper readiness failed with helper_ping_failed. The accepted evidence reran with an absolute helper path and passed with exit 0. Results: launchd_preflight=launchd_service_absent, helper_signing=ok, launchd_bootstrap=ok, launchd_status=ok, launchd_kickstart=ok, helper_status=ok, protocol_version=1, helper_version=0.1.0, launchd_bootout=ok.

Cleanup evidence: after drill-owned bootout, explicit launchd status returned launchd_status_failed=113 and an extra bootout returned No such process, confirming the LaunchAgent was unloaded. Direct helper status reported no pid file, helper_not_running, and helper_ping_failed. The socket file remained as an inactive socket under the private 0700 runtime directory; this is documented as cleanup state and is covered by the separate stale-socket recovery drill. Helper stdout/stderr logs were empty with SHA-256 e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855.

Verification: /Users/macbook-dev/Documents/GitHub/tldw_server2/.venv/bin/python -m pytest tldw_Server_API/tests/Infrastructure/test_vz_linux_host_gated_workflow.py -q passed with 23 tests. git diff --check passed. Bandit skipped because the reviewable changes are Markdown/Backlog only; helper build artifacts and launchd artifacts are local evidence setup, not committed source.
<!-- SECTION:NOTES:END -->

## Final Summary

<!-- SECTION:FINAL_SUMMARY:BEGIN -->
Recorded a 2026-07-03 prepared-host launchd-drill evidence packet in Docs/Sandbox/vz-linux-prepared-host-evidence.md. The packet documents isolated LaunchAgent bootstrap/kickstart/helper readiness/protocol checks, drill-owned bootout, artifact/log pointers, expected skips, the relative-helper diagnostic failure, and cleanup state. Updated the residual-gap table so launchd-drill evidence is now recorded while launchd-managed VM smoke remains manual-only if explicitly requested.
<!-- SECTION:FINAL_SUMMARY:END -->

## Definition of Done
<!-- DOD:BEGIN -->
- [x] #1 Acceptance criteria completed
- [x] #2 Tests or verification recorded
- [x] #3 Documentation updated when relevant
- [x] #4 Bandit run for touched code when applicable or document non-code/environment skip
- [x] #5 Final summary added
- [x] #6 Known skips or blockers documented
<!-- DOD:END -->
Loading