Skip to content

fix(issue): auto-mode: artifact verification failures pause with low-actionability diagnostics#6187

Open
jeremymcs wants to merge 7 commits into
mainfrom
issue/5733-auto-mode-artifact-verification-failures-1778892830
Open

fix(issue): auto-mode: artifact verification failures pause with low-actionability diagnostics#6187
jeremymcs wants to merge 7 commits into
mainfrom
issue/5733-auto-mode-artifact-verification-failures-1778892830

Conversation

@jeremymcs
Copy link
Copy Markdown
Collaborator

@jeremymcs jeremymcs commented May 16, 2026

Summary

  • Improved artifact-verification diagnostics and added read-only forensics classification coverage with targeted tests passing.

Bugs Addressed

  • Missing artifact error omits completion-contract guidance
  • Read-only reconnaissance classification misses common tools

Verification

  • Completed in the repository worktree before push.

Related Issue

Repo

  • gsd-build/gsd-2

Branch

  • issue/5733-auto-mode-artifact-verification-failures-1778892830

Summary by CodeRabbit

  • New Features

    • Richer artifact verification errors that include expected-artifact details and contextual hints about task completion when an artifact is missing.
    • Execution-trace classification that detects traces composed solely of read-only reconnaissance actions.
  • Tests

    • Tests validating artifact diagnostic messaging with and without task-completion signals.
    • Tests covering read-only trace classification across multiple tool-call scenarios.

Review Change Stack

jeremymcs added 2 commits May 15, 2026 20:03
missing-artifact failures now include completion-contract guidance and an explicit execute-task completion-tool hint.
added read-only reconnaissance classification coverage for `skill` and read-only `gsd_exec` command patterns.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 16, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 582292ab-08be-447b-baf5-791e4e3932a8

📥 Commits

Reviewing files that changed from the base of the PR and between 9aa8fec and 3dfb7a4.

📒 Files selected for processing (1)
  • src/resources/extensions/gsd/commands-prefs-wizard.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/resources/extensions/gsd/commands-prefs-wizard.ts

📝 Walkthrough

Walkthrough

This PR enriches artifact verification failure diagnostics with expected-artifact context and execute-task completion-tool hints, adds trace classification to detect read-only reconnaissance-only tool calls, and fixes a this-binding in the prefs wizard with corresponding tests.

Changes

Artifact Verification and Trace Classification

Layer / File(s) Summary
Artifact verification failure message enrichment
src/resources/extensions/gsd/auto-post-unit.ts, src/resources/extensions/gsd/tests/auto-post-unit-artifact-diagnostic.test.ts
describeArtifactVerificationFailure() now accepts agentEndMessages, computes expected-artifact diagnostics earlier, augments missing-file and contract-failure messages with expected details, adds an execute-task hint when no completion tool call is detected, exports _describeArtifactVerificationFailureForTest, and passes agent end messages at retry sites.
Read-only reconnaissance trace classification
src/resources/extensions/gsd/session-forensics.ts, src/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts
Adds constants and regex-based heuristics to identify read-only tool calls (including gsd_exec patterns and explicit tool names), exports classifyTraceProgress() that reports whether a trace is read-only reconnaissance only, and fixes findLast polyfill termination; includes tests covering read-only and mutating cases.

Prefs Wizard this-binding fix

Layer / File(s) Summary
Preserve registry this-binding in configureModels
src/resources/extensions/gsd/commands-prefs-wizard.ts, src/resources/extensions/gsd/tests/prefs-wizard-coverage.test.ts
configureModels now captures ctx.modelRegistry in a registry variable and calls registry.getAllWithDiscovered() to avoid losing this; corresponding test mock updated to provide instance-style getAllWithDiscovered() so the call runs on the object.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • gsd-build/gsd-2#6189: Both PRs modify src/resources/extensions/gsd/commands-prefs-wizard.ts to call getAllWithDiscovered with the correct this binding and update tests accordingly.
  • gsd-build/gsd-2#6038: Similar fixes to configureModels usage of getAllWithDiscovered() and test adjustments for instance method mocking.
  • gsd-build/gsd-2#5852: Related changes touching artifact-verification diagnostics and retry/terminal reporting in auto-post-unit.ts.

Poem

🐇 I nibble logs and hop through traces, soft and spry,
I mark the missing files and whisper where they lie.
If tasks forget their end-call song, I leave a tiny clue,
I watch the quiet reconnaissance steps and hum a helpful cue.
A carrot for the tests — success hops into view.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: improving artifact verification failure diagnostics with more actionable messages for auto-mode, which is the core focus of this PR.
Linked Issues check ✅ Passed The PR fully implements the proposed fixes from issue #5733: always includes expected-artifact diagnostics in missing-artifact messages, adds completion-tool absence hints for execute-task, expands read-only tool classification (adds 'skill'), and includes adversarial regression tests.
Out of Scope Changes check ✅ Passed All changes are scoped to the stated objectives: artifact verification messaging, read-only classification, and targeted tests. The minor context-binding fix in commands-prefs-wizard.ts is a necessary supporting fix enabling proper test behavior and falls within scope.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch issue/5733-auto-mode-artifact-verification-failures-1778892830
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch issue/5733-auto-mode-artifact-verification-failures-1778892830

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 16, 2026

🔴 PR Risk Report — CRITICAL

Files changed 6
Systems affected 1
Overall risk 🔴 CRITICAL

Affected Systems

Risk System
🔴 critical Auto Engine
File Breakdown
Risk File Systems
🔴 src/resources/extensions/gsd/auto-post-unit.ts Auto Engine
src/resources/extensions/gsd/commands-prefs-wizard.ts (unclassified)
src/resources/extensions/gsd/session-forensics.ts (unclassified)
src/resources/extensions/gsd/tests/auto-post-unit-artifact-diagnostic.test.ts (unclassified)
src/resources/extensions/gsd/tests/prefs-wizard-coverage.test.ts (unclassified)
src/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts (unclassified)

⚠️ 🔴 Critical risk — the following systems require verification before merge:

  • 🔴 Auto Engine: validate auto-mode trigger conditions and loop termination

⛔ This PR should not be merged without executing this follow-up prompt.

Ask your coding agent to verify before submitting:

Review this PR for risks in: Auto Engine. Verify:

1. validate auto-mode trigger conditions and loop termination

Before modifying any code, assess the scope of this fix:

- Identify the root cause, not just the reported symptom.
- Search the codebase for other call sites, similar patterns, or duplicated logic that may share the same bug.
- List affected tests, documentation, and any downstream consumers that depend on the current behavior.
- Flag any changes that extend beyond the immediate file or function.

Report findings first. Then propose a fix scoped to the actual root cause, and wait for confirmation before applying changes outside the originally reported location.

💡 Have a Codex subscription? Get an independent second opinion: codex review --adversarial

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts (1)

17-32: ⚡ Quick win

Add adversarial cases for shell chaining/script execution.

Please add regression tests for commands like cat file && echo x > y and script-eval forms (python -c ..., node --print ...) to lock down read-only classification boundaries.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts`
around lines 17 - 32, Add regression tests in
session-forensics-readonly-classification.test.ts that exercise shell-chaining
and script-eval adversarial cases against classifyTraceProgress: create traces
using traceWithToolCalls invoking gsd_exec with a chained command like "cat file
&& echo x > y" and assert result.isReadOnlyReconnaissanceOnly === false, and
another trace using a script-eval form (e.g., "python -c \"...write...\"" or
"node -e \"require('fs').writeFileSync(...)\"") and assert
isReadOnlyReconnaissanceOnly === false; keep the test patterns consistent with
existing tests (use traceWithToolCalls and classifyTraceProgress) so the new
cases lock down read-only classification boundaries.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/resources/extensions/gsd/auto-post-unit.ts`:
- Around line 458-461: The message unconditionally appends a "No completion tool
call detected" hint for unitType === "execute-task" even though this function
doesn't inspect the unit's trace history; update the logic that builds
completionToolHint (the completionToolHint variable and the branch checking
unitType === "execute-task") so it only adds that hint after confirming the
unit's trace/tool-call history actually lacks a gsd_task_complete (and its known
aliases) entry—i.e., accept or access the unit's trace/toolCalls context, scan
for any call named "gsd_task_complete" (or alias) and only set
completionToolHint when none are found; keep the rest of the artifact-not-found
message unchanged.

In `@src/resources/extensions/gsd/session-forensics.ts`:
- Around line 79-87: The current READ_ONLY_EXEC_COMMAND_RE and
isReadOnlyReconnaissanceTool allow commands like "python -c '...; rm -rf /'" by
matching only a safe prefix; tighten the classifier by rejecting any command
that contains shell metacharacters or piping/redirection after the matched
read-only prefix. Update either READ_ONLY_EXEC_COMMAND_RE or add a secondary
check in isReadOnlyReconnaissanceTool (using the local variable command) to
ensure the remainder of the command contains only safe token characters/flags
(e.g., letters, numbers, dashes, dots) and explicitly disallow characters like ;
& | > < $ ` ( ) $() || && `>` and shell control operators, while still allowing
simple version/flag forms; keep READ_ONLY_TOOL_NAMES behavior intact and return
false for any command failing this stricter validation.

---

Nitpick comments:
In
`@src/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts`:
- Around line 17-32: Add regression tests in
session-forensics-readonly-classification.test.ts that exercise shell-chaining
and script-eval adversarial cases against classifyTraceProgress: create traces
using traceWithToolCalls invoking gsd_exec with a chained command like "cat file
&& echo x > y" and assert result.isReadOnlyReconnaissanceOnly === false, and
another trace using a script-eval form (e.g., "python -c \"...write...\"" or
"node -e \"require('fs').writeFileSync(...)\"") and assert
isReadOnlyReconnaissanceOnly === false; keep the test patterns consistent with
existing tests (use traceWithToolCalls and classifyTraceProgress) so the new
cases lock down read-only classification boundaries.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 605882d3-ccb5-4067-ad5f-4b435e6ff540

📥 Commits

Reviewing files that changed from the base of the PR and between db89d64 and b36c97c.

📒 Files selected for processing (4)
  • src/resources/extensions/gsd/auto-post-unit.ts
  • src/resources/extensions/gsd/session-forensics.ts
  • src/resources/extensions/gsd/tests/auto-post-unit-artifact-diagnostic.test.ts
  • src/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts

Comment thread src/resources/extensions/gsd/auto-post-unit.ts Outdated
Comment thread src/resources/extensions/gsd/session-forensics.ts Outdated
@jeremymcs
Copy link
Copy Markdown
Collaborator Author

jeremymcs commented May 16, 2026

Accepted — this comment requires code changes. Queuing fix...
🧰 Agent runningcodex is working on the fix...
Agent completed — verifying changes...
🎉 Resolved — addressed in commit 3ee901c.

1 similar comment
@jeremymcs
Copy link
Copy Markdown
Collaborator Author

jeremymcs commented May 16, 2026

Accepted — this comment requires code changes. Queuing fix...
🧰 Agent runningcodex is working on the fix...
Agent completed — verifying changes...
🎉 Resolved — addressed in commit 3ee901c.

@jeremymcs
Copy link
Copy Markdown
Collaborator Author

🤖 Dispatched codex with the following prompt:

Agent prompt (click to expand)
You are acting as an autonomous PR babysitter for gsd-build/gsd-2 PR #6187.
PR URL: https://github.com/gsd-build/gsd-2/pull/6187
Base repository: gsd-build/gsd-2
Head repository: gsd-build/gsd-2
Head branch: issue/5733-auto-mode-artifact-verification-failures-1778892830
Head remote: origin
You are running inside an isolated app-owned worktree under ~/.patchdeck.
Make only targeted changes that resolve the approved tasks.
Do not wait for user input, confirmation, or approval at any point.
Do not rewrite unrelated files.
Use the available git tooling for inspection and verification only.
If dependencies are missing, install them using the repository's lockfile/package manager as needed inside this isolated worktree.
Leave file edits uncommitted; the babysitter app will handle Git finalization after your run.
GitHub follow-up replies and review-thread resolution will be handled by the babysitter after your run.
If a task is invalid after inspection, explain it in your final response and include the exact audit token.

Approved review-comment tasks:
1. [general_comment] coderabbitai[bot]
   file=n/a line=n/a
   sourceId=4464911691 sourceUrl=https://github.com/gsd-build/gsd-2/pull/6187#issuecomment-4464911691
   replyKind=general_comment threadId=n/a threadResolved=n/a
   auditToken=codefactory-feedback:gh-issue-comment-4464911691
   <!-- This is an auto-generated comment: summarize by coderabbit.ai -->
<!-- walkthrough_start -->

<details>
<summary>📝 Walkthrough</summary>

## Walkthrough

This PR enriches artifact verification failure diagnostics and improves read-only trace classification. When artifact verification fails, the error message now includes expected contract details and execute-task-specific completion-tool hints. A new trace classification function identifies read-only reconnaissance actions to reduce false negatives in diagnostic output.

## Changes

**Artifact Verification and Trace Classification**

| Layer / File(s) | Summary |
|---|---|
| **Artifact verification failure message enrichment** <br> `src/resources/extensions/gsd/auto-post-unit.ts`, `src/resources/extensions/gsd/tests/auto-post-unit-artifact-diagnostic.test.ts` | `describeArtifactVerificationFailure()` now computes expected-artifact details upfront and augments both missing-artifact and contract-failure messages with expected context. For `execute-task` units, an additional hint is appended when no completion tool call is detected. The function is exported as `_describeArtifactVerificationFailureForTest` for testing. |
| **Read-only reconnaissance trace classification** <br> `src/resources/extensions/gsd/session-forensics.ts`, `src/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts` | New constants and logic identify read-only reconnaissance tool calls via tool-name set and regex-based filtering of `gsd_exec` commands. Exported `classifyTraceProgress()` scans execution traces and reports whether all tool calls are read-only reconnaissance. A polyfill fix ensures `findLast` cleanly returns `undefined` without a match. |

## Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

## Possibly related PRs

- [gsd-build/gsd-2#5852](https://github.com/gsd-build/gsd-2/pull/5852): Modifies artifact-verification diagnostics in `auto-post-unit.ts` to handle additional failure modes and enhance terminal failure reporting.

## Poem

> 🐰 Artifacts speak clearer now, with hints to guide the way,  
> When tasks complete, or tools fall mute, diagnostics save the day.  
> Read-only traces shine so bright, reconnaissance refined,  
> False negatives fade to the past—recovery's easier to find! ✨

</details>

<!-- walkthrough_end -->
<!-- pre_merge_checks_walkthrough_start -->

<details>
<summary>🚥 Pre-merge checks | ✅ 4 | ❌ 1</summary>

### ❌ Failed checks (1 warning)

|     Check name     | Status     | Explanation                                                                           | Resolution                                                                         |
| :----------------: | :--------- | :------------------------------------------------------------------------------------ | :--------------------------------------------------------------------------------- |
| Docstring Coverage | ⚠️ Warning | Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. | Write docstrings for the functions missing them to satisfy the coverage threshold. |

<details>
<summary>✅ Passed checks (4 passed)</summary>

|         Check name         | Status   | Explanation                                                                                                                                                                                                                                                                                                                                                                                              |
| :------------------------: | :------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|      Description Check     | ✅ Passed | Check skipped - CodeRabbit’s high-level summary is enabled.                                                                                                                                                                                                                                                                                                                                              |
|         Title check        | ✅ Passed | The title clearly describes the main change: improving artifact verification failure diagnostics in auto-mode with better guidance. It directly reflects the primary intent shown in the PR objectives.                                                                                                                                                                                                  |
|     Linked Issues check    | ✅ Passed | The PR addresses all four main objectives from issue `#5733`: (1) includes completion-contract guidance in missing-artifact messages via enhanced describeArtifactVerificationFailure, (2) adds completion-tool absence hint for execute-task via test coverage, (3) extends read-only detection via classifyTraceProgress to handle 'skill' and gsd_exec patterns, and (4) provides diagnostic framework. |
| Out of Scope Changes check | ✅ Passed | All changes are directly scoped to the PR objectives: artifact diagnostic improvements in auto-post-unit.ts, read-only forensics classification in session-forensics.ts, and corresponding unit tests. No unrelated modifications detected.                                                                                                                                                              |

</details>

<sub>✏️ Tip: You can configure your own custom pre-merge checks in the settings.</sub>

</details>

<!-- pre_merge_checks_walkthrough_end -->
<!-- finishing_touch_checkbox_start -->

<details>
<summary>✨ Finishing Touches</summary>

<details>
<summary>📝 Generate docstrings</summary>

- [ ] <!-- {"checkboxId": "7962f53c-55bc-4827-bfbf-6a18da830691"} --> Create stacked PR
- [ ] <!-- {"checkboxId": "3e1879ae-f29b-4d0d-8e06-d12b7ba33d98"} --> Commit on current branch

</details>
<details>
<summary>🧪 Generate unit tests (beta)</summary>

- [ ] <!-- {"checkboxId": "f47ac10b-58cc-4372-a567-0e02b2c3d479", "radioGroupId": "utg-output-choice-group-unknown_comment_id"} -->   Create PR with unit tests
- [ ] <!-- {"checkboxId": "6ba7b810-9dad-11d1-80b4-00c04fd430c8", "radioGroupId": "utg-output-choice-group-unknown_comment_id"} -->   Commit unit tests in branch `issue/5733-auto-mode-artifact-verification-failures-1778892830`

</details>

</details>

<!-- finishing_touch_checkbox_end -->
<!-- This is an auto-generated comment: all tool run failures by coderabbit.ai -->

> [!WARNING]
> There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.
> 
> <details>
> <summary>🔧 ESLint</summary>
> 
> > If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.
> 
> ESLint skipped: no ESLint configuration detected in root package.json. To enable, add `eslint` to devDependencies.
> 
> 
> 
> </details>

<!-- end of auto-generated comment: all tool run failures by coderabbit.ai -->
<!-- tips_start -->

---

Thanks for using [CodeRabbit](https://coderabbit.ai?utm_source=oss&utm_medium=github&utm_campaign=gsd-build/gsd-2&utm_content=6187)! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

<details>
<summary>❤️ Share</summary>

- [X](https://twitter.com/intent/tweet?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A&url=https%3A//coderabbit.ai)
- [Mastodon](https://mastodon.social/share?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A%20https%3A%2F%2Fcoderabbit.ai)
- [Reddit](https://www.reddit.com/submit?title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&text=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code.%20Check%20it%20out%3A%20https%3A//coderabbit.ai)
- [LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fcoderabbit.ai&mini=true&title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&summary=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code)

</details>


<sub>Comment `@coderabbitai help` to get the list of available commands and usage tips.</sub>

<!-- tips_end -->
<!-- internal state start -->


<!-- DwQgtGAEAqAWCWBnSTIEMB26CuAXA9mAOYCmGJATmriQCaQDG+Ats2bgFyQAOFk+AIwBWJBrngA3EsgEBPRvlqU0AgfFwA6NPEgQAfACgjoCEYDEZyAAUASpETZWaCrIPR1AGxJcAZvAAeABRIDiQAlFxoeITMit7oFOI+aGKQUhTwfgzU8PhYyfAe2BTSPFGIJJAA7uqwkB74VWAp4nkqherytPBoRBj4iOIMyIG2kGYAbACMABwA7GFGUADK+MUMlQJUGAywXCHYJAD0AKxzAMznzdFgsUrNiZktYOmZ8NmtGGAFRSWIYFM5nMZjMAJwAJhm5wADLo9DBnKROJBmNoMAYoABBPCwfAULgiErMWTMYYYyDLXDUbCILj4bhkcnLRyolz8HyMWCYUjITD0eAYGiCjgGSC6SAASWYvHwUnoziSz1eWRyeUg3V6/UG715GHltCU9BKaFoYDyHnkPjxZEQOsYHjQiFtKs+CnSvUqNVwdSpFCRdEgNEGyG4jttGCIGlF4oAohgHH8USEBUQHorUpQKHjkAKGEUlAppV5PmAmIKqKkiNh4LRMBso2KoDH/KG9chjabzfISmWMNonXXKnmw28PrksAQUDt80OWLEJ/h8B5kMwQgHeCQJLkaRaG+LMQbdZASC2PO91JAEILqrAyMf/KI8CQwFTEABrQvcYvjwOLjyMNAPGXBJKhUCpBT3KAAHluE+QDIAAVQADUvZwlAwFMuAZChbUGFFrXsBkGFHMoaWOQZ6UgY1EDyZArT4ZV3lVLATy5GlXVDb1ECjKAADVKFHZiuAAYRYL8SBofkJ1vKiSG4AZ1DxeQqjxN9cBKTYSHoypuBpWAeMgGwSAdSSRIaCocydQ5xjOS5IECKICFuOJIkeZJUkYsc1R+YpSlDMjqlqepGmaMRx3aM9cC6Ho+gGIZEEWKAjPko4tjrPZICIRBTQEasPFoI4stNcEABpIDSnY6gOY5bKuRyYjiNMnjEF4BJdcdvm0X5pABIEQQhKFoXMSwoOEURxCkOis2YeoBTfANqsQAwADFppQKzKjMWrIEAFAIeCzAQvBmvl+DGsLJpFKArAOo6uGxJy7kqEoGWoOgLVIizAu9QMZNIcgMgYFFpEQD1IAAIkxNyWjSNqmNdH46DB9AfBoPgT0fGgX0dD8SnU+BpAAbnVGKtSGIGBx5eoUg/FpwsOyoqxrQcDJsRdcAAsic3QpJ8doEUxSmDRIAAWWTCMmvc9mKt2fhV1wZANViipmyIyTIfTXBAg0bWwky6tax2EgypopMnRTb5CkqTNs3VfBIH6dncxnT9vzyUs8nU6Gr3lvdwSFoyTTNDB3p7PI+xCQd1Qk8a8VQujAIqe2SCIHJJvK0RykqGwY0xAARAB9KCADkABkAE18+gKCoJL/Oi8xYWY2WWX1GQJhWDVAglxGB2VHesH30KDwwd1k7aHwUoHfQBgmGwa96NkwOu0y7L8/RwHONR+MDOu+kBgDPwH2QI5TpEc7pH5yBBcgTEPCqNBZF5bgGT1e9VYDdvxJLMtPcrfXI8nKuM24sFTNXZmwCm0hfZC0xM/Mg8oWKnnPOzT+rsvhdw8M0AQ4ENiXgFOzBe68nxY3fDeO8/QXYSR/BggCQEo40DEAGQIJANCRjKkA8MRAV60Hzq+N8+dUHRyOIBHoCU9znH9nQbAuDkjLkqOQFOE1SgCkXp2YOXRo5hTVHIdABoUzg0HkBJGJ0uR6jPBGVRQd3pFTXg+De1At4jFjjSfRbAqS1ipEcXSFB5KJyqLeLAaAJBdT7uEPcAAWIWME4JAXkNhXC7M0D4RKIRUQJF/IVCOBRbgi8aJYAFIMEgJp2ToHKs4SoYMMkkCRgvTyzF7xsW1N5BoVRuLkhVuNAMaxcDt3iKAyWMMMjtW8l1XyOYMD0Rmv46gkA/BeHWqbThZVpkoI9hWdm99kAnnfrQZZt5vSUFKYIksNDsh0M2egbB7AyonX8d2CSGQ/KZ1oFGAwFhICiVYBeCBINKYOCcGyQIy9MiQDIj4bAHhEofLnK3S+19D6BFyqmKYEQRZiy4f06G1s+AsFbpQ7+azoaMwNrgvaJpaBtzEmg925YiX/0NugV+mA35nmIuzIhmM+H4o6jQ72v5Fnm0xakHyfxoGzICIi7AqZwSooDmokOogw79hBgy4cZthlYA4aUdu85fzd12roilkAAAGhiPDGsZUaIp8r5DGpsevC1OqTqb0oFgJg7pSD8o7FY+5vZlWRzVc6OG45Xn6GMOAKA8CSn1WIGQZQklCxsGFPtU+41JClB0UwJQVBVDqC0DoMNJgoBwFQKgZlMa/rxo/nOdgXAqBVHsCyZw8hM1xBzWoTQ2hdBgEMOG0wBhEAUAYEcP4awh3SCOCeIUtpaKFWysIm4vjcBgDnnm+WIowabreZYTEEpY3/VevQf5rJ5D4A5Lsbk0gjASiwMapQiAGAZAECQdWYD+JDODXkZaoySiBDCMasqBzZlzy0VgfoDbP5PmQEB7ZnTTRCvZorUm7xjzODPIck6UQiBJvlj9SogAcAgQ/bfABC1ivzVN0d8gBcAlmT+yoPzQZegyoEFFeG35wYltDJQVJCjIH8XeIJIT6Y3NfoEGVjLDXqHCv+PlC9jUcufHwi1q7cMCm6GOCxFDjnUL/LQ/8FzuNwaFhKRJy47bYG4B40oQG/B9n/Ph7o9Ap4g3EIgHw8ggPabVD/dZNGRWVEGBkCxk4nbYALEBkGbB2OMPoNxrqfGAn7WkNcy1b88S4e9KgW8HhsKzNjkGXDW4knGvzvex98Bn2vslu+wSnxv2FF8stPE0BpC4GNa84aN8PCo2YtBu2QGlDDioJ8ZAZ60uJC6XwXSh0UPsCk1e8kB5DTjYTQVy8xlsJcBK2Vp9L6oZiBqxq+r3UmsUBa3hAAvFHB9u2qstEO5+jAx3fIWpUaaodI7pBjo2IgSd/hp3jl+0VBdTkl0rowpoeWxqjAlwFNq0xpA+aQAANQTCOGAcERgYzalRAmrNT1Nz4wbVpeiyJhZ0HgI4Awm6wZLH7YO4do71gTqnTaQHc6CoWRnV8bS8YdQaHXdTrd7zd37qrUeptbIxsXojAtktK5FAQvkY0Q17ZrU+tkn6iOqqHTqse3ltGdi8DUIrFAyUjtyyK5+6Uio7Mxs0L7BAwMxoE2OinLeEogoFVL3UWVL8NJSklFIP4MAAhHQH0KKjA3Jr7V2MdXOPk0GuTswAI7YBEe59AauffvWYawjQZUdVqgKarH8Tq2yzXmiaj4RxbwmiODxjwRxlxHFswVIgz0R1EEKheQY1JfsNG74gXEVQjjaw0ABsoDA3weiOOkbnkA08CWkCJ+gZ45YBmNf0JQXjZDejyK9jAEh8BeXjGEGBSdictnSwfEDrpjWBsyLIaApvd4d+Bn+i1D7MC8joaHCgy2pygEwEKizKxqzYGM44L+KQJAFqJ0z06WCWEkHubGv8s4fOwY94eYLiUg709ut4ic3qy8ocGA4cA4DKtMtEQskARgmI9gqIdCMunqMyMG5GHINmlsuiQg7Ev2qMq4fYQYbGxqbeJcjobWPAS4sgcy/4z6XIW4scJsF4eYRS6iskuAxQ8YJqc8SgtmdAFq/GYGdsxkJAOGKI1AuwUCRgnWt8PWI2/KA2ogDow2gOJS2yN+9Asc02rKx4go82S0i2BodAW2bhiQwGOwroj+7m0BGwb+fwiAgQaBXAEBDAxueQ0R4QXAAA3utHKlBOokZFruQRsHkRaFwAIH+CoZAAAL6H4moM6fY0TM6/as586zrA5c4dS862jDAC6IDQ4GCw7kBtwI5BEo6XDo5TBY446HoKAFglBbgkDE4+Ck5cDk7dBU40505gBGD1FM7jrNH/Zs5tHzoFa/Yxpg4qacYtRIZxTvAC6ta9EbrC47p7qVrDYBjHrNolJMELZLYBhJJFxxAaBCDIAqaBitb2Rb4uQFbGq6wLxEY3HaiAzdK6QbKJZgEKYkJvhwH7aOwrjorIxR5gkcohowAyRraPpFKCFJI0DSh4hfHdA9gEBsiBbSLqElBlQCjH7zTIDbbSDlaVa4kPan7PYlCnbnZtar5Z4VCJBJ4E7snkD0AMaeo/xojICIBETpKwBUCfRqbBrBYyQIZgAOzfBkb0BljdCfA3Lcp5BZKS7yB97qF0T5bknYxSlJL4ZaZUpUKdy6ZnL/iGYxY0Z8olA+CUBkDEQWJ2qrx8ICLek0DCJniOjtZWHbpda2EuGTgOFDa9auHX4TYeFTbYAzaAxzbiALbFoQnkoLS3q7FfZNF/YA7HEFSnEg6EAXEQ5XHLqIlDD3GDC9HGpbYFaBBgwcL6KYlcpEahb3o2luqEqpAnReboK6bexgxlR/qQAXbwjZHj7VH/rkiAmciXp9YrZ0BeLFneEyi0Bsk/iYhWASjjI/SoBzIsKplpnCyYCZAQnLScGYh2ayAABelAMOcOwxl6SOyOgI6OQ0Bg2O4guO1acxhOixx4yx6WqxFOGxW6GI9OH2exP2jZRx8YHODerWv2HRbsXROoYAHYXYpYuuQap+fZkOtIQutOIurxca7xEuAKp656Ix/hS2vIl+kAgJSgwJ0GEJL50eD+DFT+6RsRwM7W5uU46kVupQSS2WuWk4ZYrJqQ4BRunw6RFqggZ8uGTGeqmC/p6o1AaAgG6Jh4gYKk4JmByosg+i3o1AIorGxWZqjqwBPARQwlRBqh0ZPCDqiaJ0qAkRvMWeJqSAuR+RiqpB/qhsJR8gF2mVzuhwk+J0NB4mSSzAeAOQUZseog8eTgr80Vclfg/xvJCV1q6VBRSq2uxRqhmVV2siFQKZVhIu3W8amZ/WMkg2ThuZY2IRCanhF5s2vh5Z/hUAgJ5A9kHB8y5K4yW88Ea2WllAuo9AFJ4eo26iRMFCE1Z5XhKGd5EoR5suYiIFQx11iOXAyO4xYAMFcF8ACF5pcQskCxSxKxIsmFzAbFdOvaRaPhHhHIFaXFMxOqtaVEaADanxbIra2aKgHa+a3ahgYN8uCy5aNwbxMNNayacWlsX1BYh0J+H4SN3Yx55U8g+O7aeaXaYaBgAA2pkWDNsKQBKLQGDBwJzZevnOCOErQHMKCCcD4LMAICcLzSVJUtQLAHzQPHhfWfsYRa0cRcDucXFODmuogGuQPL6LgIMdUhwOEicKVGDPAibXzeEtMHLdTUrXAKBLia1B+qfrRg1ikttXwOBsTJqPvHKdFq7riaCtwD4FmNeHyHsneDBi2HBgkBrGUN9CeEgLhrlOzCtZUKgGORGGVLjBocFe8CgcqZUJ5RbtgTOXHTsmACTcBAYegMEoUKEqlmtRJvJoZYptjMamABqWklkLOS+H+KWAFd7BoAbfjgAEINDT6fLiT+CdBK0tJgxVElQc1c0kA81K0b35znA+DAhTC0BTDQiggwjggG2cSK3811mNFq0tHc5A7zra2DC60sUG196JA21m1zATBy3W1w623AgO32lO0yT4ZLkKC0piB+Z0ZqEaHEYNpoBwIV6wYxa10STxb2QN0bjgS4BhBSlAY+0LKnXyi8mlb8m3ZCmwwil0bimtbtYT1xDT2U1z1eAL1RRL2NAr1r0C2y5b38071TAnCghzAGjggCB9TVJy2X1K033fYs6HEa2P2c7AydHWh849Hyzv1G1f0TBzB/16hf2gigjAO8VK1/FfXxhUiCi7XBREAoYhbcxP6QD4YhUKqFEqobA0ZAFATcSSiW7XkbDyiyTB7R7AGNABhuO2plUMAVWoivwuoUDxiclFgmHsCb4NUmhNXJVkGeMkDQB/if5SkkPV41XP6v5Zjv5Oif43jF1wNJNIEHJozpAeZ+kBWgHO4wELIxV1WWLEE5OpX1iMNKDMOz3ensOyBK1sDrHMDcPr2Xr8O8OkD5wSMkAzC0DrNTAkDAhzAX0K2yMq230EX33s7tGqOUXqPdHcRaMO06MAMcBnAzAGO0Bf2POmMnpK0ISWaHqTrxi+R7UyTCFqaiGDAWryQWjSH3h4o6IF1JP6LGraFaRw60BwEoyHIRbZBzmpOCFMbdKlK467Dj1y1T0z1visOp0cP83L2r3zN8O80COC3nBzBoDGMMATCXDi3nB7PegHOM6q3HOKMP0kWtnP3LqXFGk9l3EFa9HaMKhf1TDPPyv6MDwgP803pqWBMaUiVglrbl32ASSgk5I0kmE5KMkxxsgWXV0J0ADiywucgY2MydsAhewBZDO2FWe2GswpzEopJAdDoLUpjE1mw1JMtxgMpdU4ldGlrtudXCKkABKYgGjrrI80BUED1N4J/27pAqFiS5Q9S4I9dCIZWk4ZOw8LNisZxyxwIiyZRLYMJLLD4zi9VLXDNLSzm99L7b+caAoj4IR94SoItY5wst8tPL19hz8jBxTZmtJxZFWS5zPOlz1FtF6i9FI4GqzFMrtzcr9zCrVthj9zlw7zzaStwkLuWr5ADaa2dwSuP0MyOhoFpSxqaBAA6rUAU0uMJK6xakQ5ho5bgM5VCUoBwDCdKZQHYe6gJBYlE2kCuMVeILmzVfrrIUErkHwHJpEeUzAYpU6Aw8S0w6S+SxM5w1UCvQALp9qRpsE4BOQE145E3Ih12IWbCkuNq8Xw03V02zHKC5qdoFo9oGBg06rqD5w1iID5zzFE50D5wf3sws1g0CDnATAMAiMMAnDhJhloCS3nDi1oATC0DgjMuXBH1Ms+ATAzBoAzDhJoAkAiOgiY2UeJoidicSfIVVDSdRqFoRpJb5xsB+gkACK3jT7ieycOcGCZHRhgxIC2CjOpufI4ZWBxSIy+AJxGyRdIBQTpAZCBEYB820ZyIlSRfjzDB4wRiiQeqb2CiuqASUivR5cRdihijK18tHMKPTvKOkXBjztmwXOe5XM0XWp0UxVMXStaNZHRiNfgwEBUgeDLR36A55cKsTeNdgzgrhGA5vvei5wn6BYpj61cBDSTer3LfNcNGTvq1CtnM9eLt9f85jeQANeTdTckaARzfre0R5fnCFdPfgxregaICbewDbcldBZZSLfLfHeTenf4VtdEUdeiuv0yvjc/dgzTevfzcfdcClTLdNd/cjaA/A+7cRj7eQCHeNdVHRiQ/gz442Bo3qAvtZg0DXTuC4BeB5ddVpdNfD5rD5Sxe2Ds+pfffgyOY2BzzA+Uig+IDCRBdvh5fqSHBC9gwi9zws9eDS+iCy9cDy+c/C81ii8YC5wUPwCwTjjq/T4C8FeRfmKpsPmhCIAS95dgwAA6GALvTvuA7vnvHv3vwA1v0n1Uegbv3vXvXvFglgW0Fw5wkQNwj0rkSddS8MdGIYmcX0dQLSoUcEagkU0UAdSJiAbvrvwfRf7v7yqvJABfMasfidYCgytWP4/myfAUFl6flBfYWfnQ/tSs8UBf7yhvN2xvnwPflgu89MzALv90DUSFL0kk70VSfGQUxqd2Hk1D9SCMKLhJhyE5jruMjyiAZUGdbG/m5Mvy2dfWdsbxDj5/+sBOkHLgQsCEMp6pJAZdMkDQ9In26kLaJAchaHqfeLrfLdbZA6AEIhoh+hkNmB8kzgu8w+V8XWKLGASpgiM0sOoLQCzDcAg6ebHzF7HwQu8b0JqBHipgkqT4G6QGIjNJRzrop7KVsCgFmD4ARtYWmhD6tMx6A0B3oJ0AAFTsDx4k8EjJGxnCcCTUiJEgB0hixL9NY4+f9HrCZiGwNABfY1PIJBIu8jgJ8AgRDgkocBzg4IKYGAE0HnAXeIKQICAFTrBhlgsgHYA5FxJWAFaYQXWJkRd5igGBJqMQbXw1Se0vASOAACSZESgHgKwd6CqLVA3cU8K0NoX4BYBKMNMNFnwGJKGUQ0xqAmC7yqIu95BxqAvulTYwmRDkyAvgWFm1TzlEMGDXjBwDkEKD8+GAZQTR3bI61CBIJDQecBOA6ClOLvXSuynjoxZNynfLUMILaFqxcSgQFTNAFkAMgyoKmHmvv3Dx+DYAYQBIRgEcHtZtYxMJzLwJcxIBM84+LwagwTQAB+E1PZA2E9C6AVRSQVwCd7O8V6E+GYSkIL4xhli40LgGRBwiZQJIkAU4dJWCFmlTh5UPAMRkzoyR5o8gVvvZBYRsJo8W/d8FwGspcoIGNCeAvAkOTU1z8Q/SwOJjlQa4SCuTSOIGWZJxx8uicBRKnFKALwi8hhL4EVSpAIcuEGCMoXeiN7PpEqFoMQZiFwC0lYINTHwfjEOqlN12WHGIpUziJ/oNAmTWgNkw8aDh0qylSet8ONTZw84hcUuBXCrg1w64DcJuCZTlhn9gIxI96GRCkh9NVCalC8tIFxAkZ9EPgFpD3Be6HRbUflImAnDthhVbE5VACJ9D1blI9RbA/8rhHsDlZYIYiEocakUHlCT4FFG7mzh6K1DwkMwMABMHBDNDaI7MGUQXGLjlxK41cWuPXEbjNwrsl7CkBJECCs17BLwzmtalOFlRThbAWIC4HzhL4XApYosTYj+COBy+a5esavHXgycikQ6WAHWPHyFiKhsbE4QPDfBDxThLvMjtMOSHyD/RgYioSGNNK3dwxtIKYOLQxzQhwk+gjkIYKFEiiWqRRfJoU39I2CHuhYjsDuJSqtUSA6QzqqlxmFJCMAVw13hgBuFhkxAdaB5ByLvaJIUkJONNLgV9S7i8mGuceD8PQB3wH4WyOWGxkCx5BSAfAIhjQH+yzI1oNI/vnSMarqJGRzIk1prH/SyDHx7yGMFuHQgbAXeUAAAFJjo7M9gEfGrmn4BgwRH4SjJxAsLOk+AkWSoGCQspEYE+9fWBrCM/7FCsA4odrMDiECUTAIRwdgYOUEFIBmJsAW4OYVgAWoe0WhTslGmyIIYHsQRDntUWUnwhjURpHiW7B36yALUcmYWNCGhDggjgywKyUcGgDQgpgcBLCdKFwwKtSoEiUiZAB/JUTskuSYvMgAv72IyIgkxsOCQoD8F4Im4dgP5IwBbZThVSU4e1i8miE1stMLcFFE6a4JAgIkp+udE6BHApgEwCYGAAYlgALJVksALZPBBgAHJgsEEuaFhLUTGgyAL0tix0xLg9MqWChNJTJH1IqR+/b4dOVKCmphxQEOAq/AdERV/Se/WpjLExaNpn4cRd0fTSQ5MUwBu8XxAGB/L+AXe18TgbfHviPwchBYTYdWigbsxiUkcFRLGy7LH8PQiATgaFLCl4CDJi6aoWoJBIWpbp0lZATcmQb0A70obZWAcNoBiCtYOsfQuiSboOh6YeEv2JAAOkGgWUyCQejCKuQMpZMscBiU9MLHigJQkNUOrG1SzGouQiAMQeSyMq6YOquI2Av9Jfj0ArpDKBwDLEdDPTGuUAU4YeWhFtM6EWIphA6MrbxlYCiZURIiOd4YAJEiM9gUZE1a0yk4iidNI+Rg5Yi8QuMoSfuGRmjSh4FqScNKJzhJj5RqYpURmNVF4SOZkAaJNJjYHIzyiiLegLeGKBp07QcmGJlDLvD99YI3TLwGhnuQ555AQKPgJhzSA9AUZbKE1N4i2nGojgbiE0LZXFTGRaAiIjAJEmlmBArZbQf8MhFQgAEyAKYMIAIKsA7U06H0AnI6GLyCh+slASKf+D8nUQK5hSYpGNjDwpJjUVSMybHCMmIJGkHEBWtxDAEiCE0k9b/qhzxDj8oheGROD8VmncSV+ifL2qUG54Qp6AQAtEIJKgAvtk84qeZBQIQElQvJm8mZBAywGpALkZ03ZAfP2QoEeZnUrkifnqQXIwI1yS+fcjxhPIdReE6AZYDjBbhI6OGLyba3tbz5xwWPDQDMGhAaBoQXkzEOfCTBeAKI5ALJKyniAVTrJ1Urycdn0RglEmcU+8BjC7qkId+H4+BPoi4mh0u5pcgvhADAAu92BE/MXNxS45UVAYmAQCI/CQDsCg+IfIvsAGbxzR/eG0PQGuSt5iEzeb4IyA4G6wk9Way3R7lDwsLT4i4aANgI7z74+jXQYi4RSj0dI0g5eFABXjj3BiryQBeQR3mIvsDDi4E9AKAKJCUC09eOgATAJkACAIgPJK8C4E2OJ6Yhm33cFEtDFo5OII73vhwsIwWip7mDDxDwB7GdmMRUopUVcAleRvE3qYoh5C8xQcilbgorfBxLTa4MMvpyA15hKoeOikntrzSUrdjFzER3s7UDCeAhwPsigO9HdbPog6qIFRD8X2DSgswW4CxLPPdqr9YGkrYYFOEqHOQCwFlZ9FhL4BMz6wKlM1mIAVRmjxoQdXgB9S+L4IYp3PKoPkmkiVAxgplX8VAiKUrdR0RQT4HlwwAQoPA5Spro9CCXOAMIoSm5eDBIJ+AqwJQC3hUGeURKMg0SwCLEuUW5LUedS2nEd3KUZKmuWSnJY70GKptJQG0YYoUu+UlK9FBilHpUvOUJKalYwclHEQ0p0IQhdAtEKmnPhTQWA60UIDZEj5cAWMusYaZSnaluwT5l0+lLghunoo7ppdZAEVh8KmIgm12AUp6zfRzzxwvrdcgVUcp5t0ZOCSoFjMNz4KsSIc41nhFv4eh1y5wXWC0SNQqzNEroHlZhwUp8jgY/KUxLQHmQAByM1OatSwxNk6jiKUoEHCS6wZQRE0oEMqQmAr42b4PxSj1OWpFcuXAS5UBG+V3KElwSx5UQGOVNdXlUS3yJ8p15Q9IlfyjwACviXgw/etAW3ocHJlOhgYOGUFeT3BX+KoVgKx3lBG+FjZlgTABkB8gEoFLzeyK8kboq176KE1FS08KwsxXgxb4/4aeSBGJhMl3oD6ekAGCzIyR9lZ0JRLSGr4DJ3VH1F1WkxsYjLVBetfOurmXjMK24a0+pCojnGbreiUpJgDQOkDyQ9QWCiHC5R9iiVzMsw4yDMTuB19aI9CIzFGpeVfYzl44C5VcpDWBKw1DylMK+uLFlg3lcalLpbxR5JqBQ/ymXtCoSXdIoIPgKtSOul7HlMQuap0PmtSWyLi1MG0tQksJ6lcuE5XZQKQEA0oqwNXy/xRis/X4aduhGt0CRtP5XwJgGgPRgAFI5pVUcZA4GWLnh2AQsGpSUDTzwASgALP4LiHygLJwFkC6EGxp9XhK/VXasGAz3UCVBiuRPLKNHhswY9NCxMg5DNEnArC3MHmGSKqs9TehxNS4F5IBtDXgxw1AG75ZBpiW4a01SvOjZLwLVigKeYoMjor11y4BbAaijIMkoDXU9ZgDAAQAaB8Cghj65wNlqCB8AzATgJoHwNCAYBTB965naEFCHOAzASA0wfThCGhDnAbOZwLQTMGWIMB5gMwWYOAqmDpafADAMJWDH822Ay+jvC4Plo05aDwQRW6EBcAYAGhwk1nZlgIHBAnAGA2WmYOCES1zAJaJwcBYaGmBoAStMY4xuEkuATArJJwEgNZ0i0TBacFPQTt5w3C+dKAyzLJeJ084Ccwa9UfOFUhk7kiAuoXFmhzVa3PImRRkX6nQHi7qBRIc8XAHzWhBHbbt0Qe7ZnEe2vQ14r8fQEAA= -->

<!-- internal state end -->
2. [review_comment] coderabbitai[bot]
   file=src/resources/extensions/gsd/auto-post-unit.ts line=461
   sourceId=3251727627 sourceUrl=https://github.com/gsd-build/gsd-2/pull/6187#discussion_r3251727627
   replyKind=review_thread threadId=PRRT_kwDORj5hb86Cgq0b threadResolved=false
   auditToken=codefactory-feedback:gh-review-comment-3251727627
   _⚠️ Potential issue_ | _🟠 Major_ | _⚡ Quick win_

**Completion-tool absence is asserted without evidence.**

This message is emitted for every missing `execute-task` artifact, but this function does not inspect trace/tool-call history. It can report a false diagnosis and mislead retries.

 

<details>
<summary>Suggested fix</summary>

```diff
-    const completionToolHint = unitType === "execute-task"
-      ? " No completion tool call detected (`gsd_task_complete`/alias)."
+    const completionToolHint = unitType === "execute-task"
+      ? " Check whether completion tool (`gsd_task_complete`/alias) was called."
       : "";
```

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/resources/extensions/gsd/auto-post-unit.ts` around lines 458 - 461, The
message unconditionally appends a "No completion tool call detected" hint for
unitType === "execute-task" even though this function doesn't inspect the unit's
trace history; update the logic that builds completionToolHint (the
completionToolHint variable and the branch checking unitType === "execute-task")
so it only adds that hint after confirming the unit's trace/tool-call history
actually lacks a gsd_task_complete (and its known aliases) entry—i.e., accept or
access the unit's trace/toolCalls context, scan for any call named
"gsd_task_complete" (or alias) and only set completionToolHint when none are
found; keep the rest of the artifact-not-found message unchanged.
```

</details>

<!-- fingerprinting:phantom:poseidon:hawk -->

<!-- This is an auto-generated comment by CodeRabbit -->
3. [review_comment] coderabbitai[bot]
   file=src/resources/extensions/gsd/session-forensics.ts line=87
   sourceId=3251727629 sourceUrl=https://github.com/gsd-build/gsd-2/pull/6187#discussion_r3251727629
   replyKind=review_thread threadId=PRRT_kwDORj5hb86Cgq0d threadResolved=false
   auditToken=codefactory-feedback:gh-review-comment-3251727629
   _⚠️ Potential issue_ | _🟠 Major_ | _⚡ Quick win_

**Read-only classifier can mislabel mutating `gsd_exec` commands.**

The allowlist regex matches command prefixes and currently permits executable script forms (e.g., `python -c`, `node --print`) and shell chaining/redirection after a read-only prefix. That can classify write-capable commands as reconnaissance-only.

 

<details>
<summary>Suggested hardening</summary>

```diff
-const READ_ONLY_EXEC_COMMAND_RE = /^\s*(cat|head|tail|ls|find|grep|rg|git\s+(status|log|show|diff|branch|remote|rev-parse|ls-files)|npm\s+(ls|list|info|view|show|outdated|audit|doctor|ping|--version|-v)|node\s+(--print|--version|-v\b)|python[23]?\s+(-c\s+'[^']*'|--version|-V\b)|jq\s|yq\s|env\b|printenv\b)/;
+const UNSAFE_SHELL_TOKENS_RE = /(?:&&|\|\||;|[<>]|`|\$\(|\n)/;
+const READ_ONLY_EXEC_COMMAND_RE = /^\s*(cat|head|tail|ls|find|grep|rg|git\s+(status|log|show|diff|branch|remote|rev-parse|ls-files)|npm\s+(ls|list|info|view|show|outdated|audit|doctor|ping|--version|-v)|node\s+(--version|-v\b)|python[23]?\s+(--version|-V\b)|jq\b|yq\b|env\b|printenv\b)\s*$/;
...
   const command = String(call.input.command || call.input.cmd || "").trim();
   if (!command) return false;
+  if (UNSAFE_SHELL_TOKENS_RE.test(command)) return false;
   return READ_ONLY_EXEC_COMMAND_RE.test(command);
```

</details>

<!-- suggestion_start -->

<details>
<summary>📝 Committable suggestion</summary>

> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

```suggestion
const UNSAFE_SHELL_TOKENS_RE = /(?:&&|\|\||;|[<>]|`|\$\(|\n)/;
const READ_ONLY_EXEC_COMMAND_RE = /^\s*(cat|head|tail|ls|find|grep|rg|git\s+(status|log|show|diff|branch|remote|rev-parse|ls-files)|npm\s+(ls|list|info|view|show|outdated|audit|doctor|ping|--version|-v)|node\s+(--version|-v\b)|python[23]?\s+(--version|-V\b)|jq\b|yq\b|env\b|printenv\b)\s*$/;

function isReadOnlyReconnaissanceTool(call: ToolCall): boolean {
  const name = call.name.toLowerCase();
  if (READ_ONLY_TOOL_NAMES.has(name)) return true;
  if (name !== "gsd_exec") return false;
  const command = String(call.input.command || call.input.cmd || "").trim();
  if (!command) return false;
  if (UNSAFE_SHELL_TOKENS_RE.test(command)) return false;
  return READ_ONLY_EXEC_COMMAND_RE.test(command);
}
```

</details>

<!-- suggestion_end -->

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/resources/extensions/gsd/session-forensics.ts` around lines 79 - 87, The
current READ_ONLY_EXEC_COMMAND_RE and isReadOnlyReconnaissanceTool allow
commands like "python -c '...; rm -rf /'" by matching only a safe prefix;
tighten the classifier by rejecting any command that contains shell
metacharacters or piping/redirection after the matched read-only prefix. Update
either READ_ONLY_EXEC_COMMAND_RE or add a secondary check in
isReadOnlyReconnaissanceTool (using the local variable command) to ensure the
remainder of the command contains only safe token characters/flags (e.g.,
letters, numbers, dashes, dots) and explicitly disallow characters like ; & | >
< $ ` ( ) $() || && `>` and shell control operators, while still allowing simple
version/flag forms; keep READ_ONLY_TOOL_NAMES behavior intact and return false
for any command failing this stricter validation.
```

</details>

<!-- fingerprinting:phantom:poseidon:hawk -->

<!-- This is an auto-generated comment by CodeRabbit -->
4. [review] coderabbitai[bot]
   file=n/a line=n/a
   sourceId=4302326702 sourceUrl=https://github.com/gsd-build/gsd-2/pull/6187#pullrequestreview-4302326702
   replyKind=review threadId=n/a threadResolved=n/a
   auditToken=codefactory-feedback:gh-review-4302326702
   **Actionable comments posted: 2**

<details>
<summary>🧹 Nitpick comments (1)</summary><blockquote>

<details>
<summary>src/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts (1)</summary><blockquote>

`17-32`: _⚡ Quick win_

**Add adversarial cases for shell chaining/script execution.**

Please add regression tests for commands like `cat file && echo x > y` and script-eval forms (`python -c ...`, `node --print ...`) to lock down read-only classification boundaries.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts`
around lines 17 - 32, Add regression tests in
session-forensics-readonly-classification.test.ts that exercise shell-chaining
and script-eval adversarial cases against classifyTraceProgress: create traces
using traceWithToolCalls invoking gsd_exec with a chained command like "cat file
&& echo x > y" and assert result.isReadOnlyReconnaissanceOnly === false, and
another trace using a script-eval form (e.g., "python -c \"...write...\"" or
"node -e \"require('fs').writeFileSync(...)\"") and assert
isReadOnlyReconnaissanceOnly === false; keep the test patterns consistent with
existing tests (use traceWithToolCalls and classifyTraceProgress) so the new
cases lock down read-only classification boundaries.
```

</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/resources/extensions/gsd/auto-post-unit.ts`:
- Around line 458-461: The message unconditionally appends a "No completion tool
call detected" hint for unitType === "execute-task" even though this function
doesn't inspect the unit's trace history; update the logic that builds
completionToolHint (the completionToolHint variable and the branch checking
unitType === "execute-task") so it only adds that hint after confirming the
unit's trace/tool-call history actually lacks a gsd_task_complete (and its known
aliases) entry—i.e., accept or access the unit's trace/toolCalls context, scan
for any call named "gsd_task_complete" (or alias) and only set
completionToolHint when none are found; keep the rest of the artifact-not-found
message unchanged.

In `@src/resources/extensions/gsd/session-forensics.ts`:
- Around line 79-87: The current READ_ONLY_EXEC_COMMAND_RE and
isReadOnlyReconnaissanceTool allow commands like "python -c '...; rm -rf /'" by
matching only a safe prefix; tighten the classifier by rejecting any command
that contains shell metacharacters or piping/redirection after the matched
read-only prefix. Update either READ_ONLY_EXEC_COMMAND_RE or add a secondary
check in isReadOnlyReconnaissanceTool (using the local variable command) to
ensure the remainder of the command contains only safe token characters/flags
(e.g., letters, numbers, dashes, dots) and explicitly disallow characters like ;
& | > < $ ` ( ) $() || && `>` and shell control operators, while still allowing
simple version/flag forms; keep READ_ONLY_TOOL_NAMES behavior intact and return
false for any command failing this stricter validation.

---

Nitpick comments:
In
`@src/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts`:
- Around line 17-32: Add regression tests in
session-forensics-readonly-classification.test.ts that exercise shell-chaining
and script-eval adversarial cases against classifyTraceProgress: create traces
using traceWithToolCalls invoking gsd_exec with a chained command like "cat file
&& echo x > y" and assert result.isReadOnlyReconnaissanceOnly === false, and
another trace using a script-eval form (e.g., "python -c \"...write...\"" or
"node -e \"require('fs').writeFileSync(...)\"") and assert
isReadOnlyReconnaissanceOnly === false; keep the test patterns consistent with
existing tests (use traceWithToolCalls and classifyTraceProgress) so the new
cases lock down read-only classification boundaries.
```

</details>

<details>
<summary>🪄 Autofix (Beta)</summary>

Fix all unresolved CodeRabbit comments on this PR:

- [ ] <!-- {"checkboxId": "4b0d0e0a-96d7-4f10-b296-3a18ea78f0b9"} --> Push a commit to this branch (recommended)
- [ ] <!-- {"checkboxId": "ff5b1114-7d8c-49e6-8ac1-43f82af23a33"} --> Create a new PR with the fixes

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: defaults

**Review profile**: CHILL

**Plan**: Pro Plus

**Run ID**: `605882d3-ccb5-4067-ad5f-4b435e6ff540`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between db89d645b0ff6d7ae4d57cb1f418353638a0beba and b36c97c54feaf1395a6d27a33d137f68a84ae979.

</details>

<details>
<summary>📒 Files selected for processing (4)</summary>

* `src/resources/extensions/gsd/auto-post-unit.ts`
* `src/resources/extensions/gsd/session-forensics.ts`
* `src/resources/extensions/gsd/tests/auto-post-unit-artifact-diagnostic.test.ts`
* `src/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts`

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Approved status-check tasks:
None

Approved documentation tasks:
None

When done:
1) Run the relevant verification for your changes.
2) Leave any changed files in the worktree for the babysitter app to finalize.
3) For each feedback item you addressed or were blocked on, emit a summary block in the following format:
   FEEDBACK_SUMMARY_START <auditToken>
   <A concise 1-2 sentence summary of what you did or why you were blocked>
   FEEDBACK_SUMMARY_END
   Include one block per audit token. These summaries will be posted as follow-up comments on the PR.
4) If documentation tasks were assigned, emit exactly one docs summary block in the following format:
   DOCS_SUMMARY_START <changed|no_change>
   <A concise 1-2 sentence summary of the docs you updated, or why no docs changes were necessary after inspection>
   DOCS_SUMMARY_END

@jeremymcs
Copy link
Copy Markdown
Collaborator Author

Addressed in commit 3ee901c by the latest babysitter run.

Responding to comment by @coderabbitai[bot]:

Reviewed the general CodeRabbit walkthrough/check output and confirmed the actionable items were the two inline review concerns plus added regression tests; no separate code change was required from the summary-only comment itself.

@jeremymcs
Copy link
Copy Markdown
Collaborator Author

Addressed in commit 3ee901c by the latest babysitter run.

Responding to comment by @coderabbitai[bot]:
Actionable comments posted: 2

Added adversarial regression coverage for shell chaining (cat file && echo x > y) and script-eval (python -c ...write...) and asserted both are not classified as read-only reconnaissance. Existing positive/negative classification tests continue to pass.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 16, 2026

Tip

For best results, initiate chat on the files or code changes.

@jeremymcs: Thanks for the follow-up! The adversarial regression coverage for shell chaining (cat file && echo x > y) and script-eval (python -c ...write...) looks like exactly the right hardening — both are classic bypass vectors that a naive prefix/regex approach would miss. Good to hear the existing classification tests continue to pass as well.

🐇 ✨

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts (1)

26-48: ⚡ Quick win

Consider adding an empty trace edge case test.

The current tests cover various command patterns, but don't verify behavior when toolCalls is empty. An empty trace is a common edge case that should have explicit, tested behavior.

📋 Proposed empty trace test
test("classifyTraceProgress treats empty trace as reconnaissance-only", () => {
  const trace = traceWithToolCalls([]);
  const result = classifyTraceProgress(trace);
  // Adjust expected value based on desired semantics:
  // true if "no mutations" implies read-only, false if "no actions" is distinct
  assert.equal(result.isReadOnlyReconnaissanceOnly, true);
});
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts`
around lines 26 - 48, Add an edge-case unit test that passes an empty toolCalls
array to traceWithToolCalls and asserts classifyTraceProgress returns a
reconnaissance-only result; specifically add a test like "classifyTraceProgress
treats empty trace as reconnaissance-only" that constructs const trace =
traceWithToolCalls([]), calls const result = classifyTraceProgress(trace), and
asserts result.isReadOnlyReconnaissanceOnly === true so the classifier’s
behavior for empty traces is explicitly verified.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@src/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts`:
- Around line 17-24: Add two unit tests to
src/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts
that call classifyTraceProgress with traces created by traceWithToolCalls
containing only the "skill" tool and only the read-only "gsd_exec" command
respectively; for each test assert that result.isReadOnlyReconnaissanceOnly ===
true so we verify that classifyTraceProgress detects "skill" alone and read-only
"gsd_exec" alone as reconnaissance-only rather than requiring both together.

---

Nitpick comments:
In
`@src/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts`:
- Around line 26-48: Add an edge-case unit test that passes an empty toolCalls
array to traceWithToolCalls and asserts classifyTraceProgress returns a
reconnaissance-only result; specifically add a test like "classifyTraceProgress
treats empty trace as reconnaissance-only" that constructs const trace =
traceWithToolCalls([]), calls const result = classifyTraceProgress(trace), and
asserts result.isReadOnlyReconnaissanceOnly === true so the classifier’s
behavior for empty traces is explicitly verified.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 53ea4df4-a8f1-4861-bc69-6abbcb6e0cd4

📥 Commits

Reviewing files that changed from the base of the PR and between b36c97c and 3ee901c.

📒 Files selected for processing (4)
  • src/resources/extensions/gsd/auto-post-unit.ts
  • src/resources/extensions/gsd/session-forensics.ts
  • src/resources/extensions/gsd/tests/auto-post-unit-artifact-diagnostic.test.ts
  • src/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/resources/extensions/gsd/session-forensics.ts
  • src/resources/extensions/gsd/tests/auto-post-unit-artifact-diagnostic.test.ts
  • src/resources/extensions/gsd/auto-post-unit.ts

@jeremymcs
Copy link
Copy Markdown
Collaborator Author

jeremymcs commented May 16, 2026

Accepted — this comment requires code changes. Queuing fix...
🧰 Agent runningcodex is working on the fix...
Agent completed — verifying changes...
🎉 Resolved — addressed in commit 37c92c7.

@jeremymcs
Copy link
Copy Markdown
Collaborator Author

🤖 Dispatched codex with the following prompt:

Agent prompt (click to expand)
You are acting as an autonomous PR babysitter for gsd-build/gsd-2 PR #6187.
PR URL: https://github.com/gsd-build/gsd-2/pull/6187
Base repository: gsd-build/gsd-2
Head repository: gsd-build/gsd-2
Head branch: issue/5733-auto-mode-artifact-verification-failures-1778892830
Head remote: origin
You are running inside an isolated app-owned worktree under ~/.patchdeck.
Make only targeted changes that resolve the approved tasks.
Do not wait for user input, confirmation, or approval at any point.
Do not rewrite unrelated files.
Use the available git tooling for inspection and verification only.
If dependencies are missing, install them using the repository's lockfile/package manager as needed inside this isolated worktree.
Leave file edits uncommitted; the babysitter app will handle Git finalization after your run.
GitHub follow-up replies and review-thread resolution will be handled by the babysitter after your run.
If a task is invalid after inspection, explain it in your final response and include the exact audit token.

Approved review-comment tasks:
1. [review_comment] coderabbitai[bot]
   file=src/resources/extensions/gsd/auto-post-unit.ts line=461
   sourceId=3251764934 sourceUrl=https://github.com/gsd-build/gsd-2/pull/6187#discussion_r3251764934
   replyKind=review_thread threadId=PRRT_kwDORj5hb86Cgq0b threadResolved=true
   auditToken=codefactory-feedback:gh-review-comment-3251764934
   <details>
<summary>🧩 Analysis chain</summary>

🏁 Script executed:
```
#!/bin/bash
# Check ExecutionTrace type and session-forensics.ts structure
rg -n "ExecutionTrace" --type=ts -C2

echo "---"

# Look for gsd_task_complete references in the codebase
rg -n "gsd_task_complete" --type=ts -C2

echo "---"

# Check what tool names are used for completion in execute-task context
rg -n "task_complete\|gsd_task_complete\|completion.*tool\|tool.*completion" --type=ts -l
```

Length of output: 46847

---



</details>

`@jeremymcs`, acknowledged! Here are two concrete options for the fix — one minimal, one accurate:

---

### Option A — Minimal (advisory language, no signature change)

This directly addresses the false assertion with a two-file change:

**`src/resources/extensions/gsd/auto-post-unit.ts`** (Lines 458–460):
```diff
-    const completionToolHint = unitType === "execute-task"
-      ? " No completion tool call detected (`gsd_task_complete`/alias)."
-      : "";
+    const completionToolHint = unitType === "execute-task"
+      ? " Check whether a completion tool call (`gsd_task_complete`/alias) was made."
+      : "";
```

**`src/resources/extensions/gsd/tests/auto-post-unit-artifact-diagnostic.test.ts`** (Line 16):
```diff
-  assert.match(msg, /No completion tool call detected \(`gsd_task_complete`\/alias\)/);
+  assert.match(msg, /Check whether a completion tool call \(`gsd_task_complete`\/alias\) was made/);
```

---

### Option B — Accurate (scan `toolCalls` before emitting hint)

Pass an optional trace to the helper and only emit the hint when `gsd_task_complete` / `gsd_complete_task` is genuinely absent:

**`src/resources/extensions/gsd/auto-post-unit.ts`**:
```diff
-function describeArtifactVerificationFailure(unitType: string, unitId: string, basePath: string): string {
+import type { ExecutionTrace } from "./session-forensics.js";
+
+const TASK_COMPLETE_TOOL_NAMES = new Set(["gsd_task_complete", "gsd_complete_task"]);
+
+function describeArtifactVerificationFailure(
+  unitType: string,
+  unitId: string,
+  basePath: string,
+  trace?: ExecutionTrace | null,
+): string {
   // ...
   if (!existsSync(artifactPath)) {
-    const completionToolHint = unitType === "execute-task"
-      ? " No completion tool call detected (`gsd_task_complete`/alias)."
-      : "";
+    let completionToolHint = "";
+    if (unitType === "execute-task") {
+      const hasCompletionCall = trace?.toolCalls.some(c => TASK_COMPLETE_TOOL_NAMES.has(c.name));
+      completionToolHint = hasCompletionCall === false
+        ? " No completion tool call detected (`gsd_task_complete`/alias)."
+        : " Check whether a completion tool call (`gsd_task_complete`/alias) was made.";
+    }
     return `Artifact verification failed: ...${completionToolHint}`;
   }
```

Then at the call site, pass the session trace (if available):
```diff
  const failureDetails = describeArtifactVerificationFailure(
    s.currentUnit.type,
    s.currentUnit.id,
    s.basePath,
+   s.currentTrace ?? null,   // thread the live ExecutionTrace if AutoSession carries it
  );
```

And update `_describeArtifactVerificationFailureForTest` and the test to pass a mock trace with no matching tool calls.

---

**Recommendation:** Option A is the safer choice for this PR — Option B requires `AutoSession` to carry a live `ExecutionTrace` reference at the call site, which may not be available yet. Option B could be a clean follow-up once the trace-threading infrastructure is in place.

---

<details>
<summary>🧬 Code Graph Analysis Results</summary>

- File: src/resources/extensions/gsd/auto-post-unit.ts
  - Enqueue sidecar function (complete snippet)
  ```
  /** Enqueue a sidecar item (hook, triage, or quick-task) for the main loop to
   *  drain via runUnit. Logs the enqueue event and notifies the UI. */
  function enqueueSidecar(
    s: AutoSession,
    ctx: ExtensionContext,
    entry: SidecarItem,
    debugExtra: Record<string, unknown>,
    notification?: string,
  ): "continue" {
    s.sidecarQueue.push(entry);
    debugLog("postUnitPostVerification", {
      phase: "sidecar-enqueue",
      kind: entry.kind,
      unitId: entry.unitId,
      ...debugExtra,
    });
    if (notification) ctx.ui.notify(notification, "info");
    return "continue";
  }
  ```
  - Post-unit hooks queuing (summary description)
    - What it does: If a post-unit hook is applicable, closes out the current unit, persists hook state, and enqueues a sidecar item for a hook to run later via enqueueSidecar. If a hook requests a retry of the trigger unit, it resets the trigger state (DB pending task, deletes the trigger's SUMMARY if needed, deletes any related retry artifact) and invalidates caches. This flow is used to shift hook-driven actions into the main post-unit queue rather than executing inline.
  - Triages and Quick-tasks dispatch (summary description)
    - Triages: When conditions indicate pending captures exist, derives milestone/slice context, loads current plan/roadmap, builds a triage prompt, closes out the current unit if needed, and enqueues a triage sidecar with unitId of the form "<mid>/<sid>/triage".
    - Quick-tasks: When there are pending quick tasks, builds a quick-task prompt, closes out the current unit if needed, marks the capture executed, and enqueues a quick-task sidecar with unitId of the form "<mid>/<capture.id>".

- File: src/resources/extensions/gsd/auto-post-unit.ts
  - Triage dispatch (docstring-style description)
    - Parameters: s (AutoSession), ctx (ExtensionContext)
    - Returns: "continue" when enqueued, else "step-wizard"/"stopped" in step-mode paths
    - Notes: Only dispatched when _shouldDispatchTriageForTest(s) is true and there are pending captures; constructs a triage prompt via triage-captures flow and enqueues a sidecar with kind: "triage", unitType: "triage-captures", unitId: "<mid>/<sid>/triage".
  - Quick-task dispatch (docstring-style description)
    - Parameters: s (AutoSession), ctx (ExtensionContext)
    - Returns: "continue" or "stopped"/"step-wizard" in step-mode
    - Notes: When there are pending quick tasks, creates a quick-task prompt, marks the capture executed, and enqueues a sidecar with kind: "quick-task", unitType: "quick-task", unitId: "<mid>/<capture.id>".

- File: src/resources/extensions/gsd/auto.ts
  - SidecarItem interface (from another module, for reference)
  - Content:
  ```
  export interface SidecarItem {
    kind: "hook" | "triage" | "quick-task";
    unitType: string;
    unitId: string;
    prompt: string;
    /** Model override for hook units (e.g. "anthropic/claude-3-5-sonnet"). */
    model?: string;
    /** Capture ID for quick-task items (already marked executed at enqueue time). */
    captureId?: string;
  }
  ```

- File: src/resources/extensions/gsd/auto/verification-retry-policy.ts
  - verificationRetryKey function
  ```
  export function verificationRetryKey(unitType: string, unitId: string): string {
    return `${unitType}:${unitId}`;
  }
  ```

- File: src/resources/extensions/gsd/auto-post-unit.ts
  - (Optional contextual note)
    - The code paths above are invoked within postUnitPostVerification and are designed to move actions (hooks, triage, quick-tasks) into s.sidecarQueue for the main loop to drain, enabling re-dispatch and avoiding inline side effects during verification.

</details>



<!-- This is an auto-generated reply by CodeRabbit -->
2. [review_comment] coderabbitai[bot]
   file=src/resources/extensions/gsd/session-forensics.ts line=87
   sourceId=3251767287 sourceUrl=https://github.com/gsd-build/gsd-2/pull/6187#discussion_r3251767287
   replyKind=review_thread threadId=PRRT_kwDORj5hb86Cgq0d threadResolved=true
   auditToken=codefactory-feedback:gh-review-comment-3251767287
   `@jeremymcs`, thanks for accepting! Here's the planned fix:



## Fix: Tighten `gsd_exec` read-only classifier to reject mutating commands

- Add `UNSAFE_SHELL_TOKENS_RE` to reject any command containing shell chaining/redirection/subshell metacharacters (`&&`, `||`, `;`, `<>`, backtick, `$(`, newline)
- Tighten `READ_ONLY_EXEC_COMMAND_RE`: anchor with `\s*$`, remove the `python -c '...'` and `node --print` executable forms, switch `jq\s`/`yq\s` to `jq\b`/`yq\b`
- Add an early-return guard in `isReadOnlyReconnaissanceTool` using the new unsafe-tokens check

<!-- <requirement>
[requirement tighten_readonly_classifier_1/1]
Tighten the `gsd_exec` read-only classifier in `src/resources/extensions/gsd/session-forensics.ts` to:
1. Add a new `UNSAFE_SHELL_TOKENS_RE` constant that matches shell chaining/redirection/subshell metacharacters.
2. Update `READ_ONLY_EXEC_COMMAND_RE` to anchor at end (`\s*$`), remove `python[23]? -c '...'` and `node --print` (both can execute arbitrary code), and change `jq\s`/`yq\s` to `jq\b`/`yq\b`.
3. Add an early-return in `isReadOnlyReconnaissanceTool` to reject commands containing unsafe shell tokens.

File: `src/resources/extensions/gsd/session-forensics.ts`

```diff
-const READ_ONLY_EXEC_COMMAND_RE = /^\s*(cat|head|tail|ls|find|grep|rg|git\s+(status|log|show|diff|branch|remote|rev-parse|ls-files)|npm\s+(ls|list|info|view|show|outdated|audit|doctor|ping|--version|-v)|node\s+(--print|--version|-v\b)|python[23]?\s+(-c\s+'[^']*'|--version|-V\b)|jq\s|yq\s|env\b|printenv\b)/;
+const UNSAFE_SHELL_TOKENS_RE = /(?:&&|\|\||;|[<>]|`|\$\(|\n)/;
+const READ_ONLY_EXEC_COMMAND_RE = /^\s*(cat|head|tail|ls|find|grep|rg|git\s+(status|log|show|diff|branch|remote|rev-parse|ls-files)|npm\s+(ls|list|info|view|show|outdated|audit|doctor|ping|--version|-v)|node\s+(--version|-v\b)|python[23]?\s+(--version|-V\b)|jq\b|yq\b|env\b|printenv\b)\s*$/;
 
 function isReadOnlyReconnaissanceTool(call: ToolCall): boolean {
   const name = call.name.toLowerCase();
   if (READ_ONLY_TOOL_NAMES.has(name)) return true;
   if (name !== "gsd_exec") return false;
   const command = String(call.input.command || call.input.cmd || "").trim();
   if (!command) return false;
+  if (UNSAFE_SHELL_TOKENS_RE.test(command)) return false;
   return READ_ONLY_EXEC_COMMAND_RE.test(command);
 }
```

Key behavioural changes:
- `python -c 'import os; os.remove("x")'` → now **rejected** (unsafe token check catches the semicolon; also no longer matched by the tightened regex)
- `node --print "require('fs').readFileSync('x')"` → now **rejected** (removed from regex)
- `git status && rm -rf /` → now **rejected** (unsafe token `&&`)
- `cat file > /tmp/out` → now **rejected** (unsafe token `>`)
- `git status` → still **accepted** ✓
- `npm ls` → still **accepted** ✓
- `jq .key file.json` → still **accepted** ✓ (word boundary prevents `jqsomething` from matching)
</requirement> -->

---

<!-- agent-chat-trees: [{"path":"src/resources/extensions/gsd/auto/phases.ts","content":"// Project/App: GSD-2\n// File Purpose: Auto-loop pipeline phases, merge closeout, and finalize handling.\n/**\n * auto/phases.ts — Pipeline phases for the auto-loop.\n *\n * Contains: runPreDispatch, runDispatch, runGuards, runUnitPhase, runFinalize,\n * plus internal helpers generateMilestoneReport and closeoutAndStop.\n *\n * Imports from: auto/types, auto/detect-stuck, auto/run-unit, auto/loop-deps\n */\n\nimport { importExtensionModule, type ExtensionAPI, type ExtensionContext } from \"@gsd/pi-coding-agent\";\n\nimport type { AutoSession, SidecarItem } from \"./session.js\";\nimport type { LoopDeps } from \"./loop-deps.js\";\nimport {\n  USER_DRIVEN_DEEP_UNITS,\n  isAwaitingUserInput,\n  type PostUnitContext,\n  type PreVerificationOpts,\n} from \"../auto-post-unit.js\";\nimport type { Phase } from \"../types.js\";\nimport {\n  MAX_RECOVERY_CHARS,\n  BUDGET_THRESHOLDS,\n  MAX_FINALIZE_TIMEOUTS,\n  type PhaseResult,\n  type IterationContext,\n  type LoopState,\n  type PreDispatchData,\n  type IterationData,\n} from \"./types.js\";\nimport { detectStuck } from \"./detect-stuck.js\";\nimport { runUnit } from \"./run-unit.js\";\nimport { debugLog } from \"../debug-logger.js\";\nimport { resolveWorktreeProjectRoot, normalizeWorktreePathForCompare } from \"../worktree-root.js\";\nimport { classifyProject } from \"../detection.js\";\nimport { MergeConflictError } from \"../git-service.js\";\nimport { setCurrentPhase, clearCurrentPhase } from \"../../shared/gsd-phase-state.js\";\nimport { pauseAutoForProviderError } from \"../provider-error-pause.js\";\nimport { resumeAutoAfterProviderDelay } from \"../bootstrap/provider-error-resume.js\";\nimport { join, basename } from \"node:path\";\nimport { existsSync, cpSync } from \"node:fs\";\nimport {\n  logWarning,\n  logError,\n  _resetLogs,\n  drainLogs,\n  drainAndSummarize,\n  formatForNotification,\n  hasAnyIssues,\n} from \"../workflow-logger.js\";\nimport { gsdRoot } from \"../paths.js\";\nimport { atomicWriteSync } from \"../atomic-write.js\";\nimport { verifyExpectedArtifact, diagnoseExpectedArtifact, buildLoopRemediationSteps, refreshRecoveryDbForArtifact } from \"../auto-recovery.js\";\nimport { writeUnitRuntimeRecord } from \"../unit-runtime.js\";\nimport { withTimeout, FINALIZE_PRE_TIMEOUT_MS, FINALIZE_POST_TIMEOUT_MS } from \"./finalize-timeout.js\";\nimport { getEligibleSlices } from \"../slice-parallel-eligibility.js\";\nimport { startSliceParallel } from \"../slice-parallel-orchestrator.js\";\nimport { isDbAvailable, getMilestoneSlices } from \"../gsd-db.js\";\nimport { reconcileBeforeSpawn } from \"../state-reconciliation.js\";\nimport type { MinimalModelRegistry } from \"../context-budget.js\";\nimport type { PostflightResult, PreflightResult } from \"../clean-root-preflight.js\";\nimport { ensurePlanV2Graph, isEmptyPlanV2GraphResult, isMissingFinalizedContextResult } from \"../uok/plan-v2.js\";\nimport { resolveUokFlags } from \"../uok/flags.js\";\nimport { UokGateRunner } from \"../uok/gate-runner.js\";\nimport { resetEvidence, loadEvidenceFromDisk } from \"../safety/evidence-collector.js\";\nimport { parseUnitId } from \"../unit-id.js\";\nimport { createCheckpoint, cleanupCheckpoint, rollbackToCheckpoint } from \"../safety/git-checkpoint.js\";\nimport { resolveSafetyHarnessConfig } from \"../safety/safety-harness.js\";\nimport {\n  getWorkflowTransportSupportError,\n  getRequiredWorkflowToolsForAutoUnit,\n  supportsStructuredQuestions,\n} from \"../workflow-mcp.js\";\nimport { resolveManifest } from \"../unit-context-manifest.js\";\nimport { createWorktreeSafetyModule, type WorktreeSafetyResult } from \"../worktree-safety.js\";\nimport { isSuspiciousGhostCompletion } from \"../auto-unit-closeout.js\";\nimport { decideVerificationRetry, verificationRetryKey } from \"./verification-retry-policy.js\";\nimport { buildPhaseHandoffOutcome, setAutoOutcomeWidget } from \"../auto-dashboard.js\";\nimport { getConsecutiveDispatchBlocker } from \"../dispatch-guard.js\";\n\n// ─── Path Comparison Helper ───────────────────────────────────────────────\n/** Compare two paths for physical identity, tolerating trailing slashes and symlinks. */\nfunction isSamePathLocal(a: string, b: string): boolean {\n  return normalizeWorktreePathForCompare(a) === normalizeWorktreePathForCompare(b);\n}\n\nasync function applyVerificationRetryPolicy(\n  ic: IterationContext,\n  unitType: string | undefined,\n  phase: \"artifact-verification-retry\" | \"verification-retry\",\n): Promise<PhaseResult | null> {\n  const { ctx, pi, s, deps } = ic;\n  const retryInfo = s.pendingVerificationRetry;\n  const key = unitType && retryInfo\n    ? verificationRetryKey(unitType, retryInfo.unitId)\n    : undefined;\n  const decision = decideVerificationRetry({\n    unitType,\n    retryInfo,\n    previousFailureHash: key ? s.verificationRetryFailureHashes.get(key) : undefined,\n  });\n\n  if (decision.action === \"pause\") {\n    s.pendingVerificationRetry = null;\n    debugLog(\"autoLoop\", {\n      phase: `${phase}-paused`,\n      reason: decision.reason,\n      unitType,\n      unitId: retryInfo?.unitId,\n      failureHash: decision.failureHash,\n    });\n    const pauseMessage = decision.reason === \"duplicate-failure-context\"\n      ? `Verification retry for ${unitType ?? \"unit\"} ${retryInfo?.unitId ?? \"unknown\"} produced the same failure context. Pausing auto-mode instead of re-dispatching.`\n      : \"Verification retry requested without retry context. Pausing auto-mode instead of re-dispatching.\";\n    ctx.ui.notify(pauseMessage, \"warning\");\n    await deps.pauseAuto(ctx, pi, { message: pauseMessage, category: \"unknown\" });\n    return { action: \"break\", reason: decision.reason };\n  }\n\n\n  debugLog(\"autoLoop\", {\n    phase: `${phase}-backoff`,\n    iteration: ic.iteration,\n    unitType,\n    unitId: retryInfo?.unitId,\n    attempt: retryInfo?.attempt,\n    delayMs: decision.delayMs,\n    baseDelayMs: decision.baseDelayMs,\n    failureHash: decision.failureHash,\n  });\n  await new Promise<void>((resolve) => setTimeout(resolve, decision.delayMs));\n  return null;\n}\n\nexport function shouldDegradeEmptyWorktreeToProjectRoot(\n  worktreeClassification: ReturnType<typeof classifyProject>,\n  projectRootClassification: ReturnType<typeof classifyProject>,\n): boolean {\n  return (\n    worktreeClassification.kind === \"greenfield\" &&\n    projectRootClassification.kind !== \"greenfield\" &&\n    projectRootClassification.kind !== \"invalid-repo\"\n  );\n}\n\nfunction unitWritesSource(unitType: string): boolean | null {\n  const manifest = resolveManifest(unitType);\n  if (!manifest) return null;\n  return manifest.tools.mode === \"all\" || manifest.tools.mode === \"docs\";\n}\n\nfunction formatWorktreeSafetyFailure(result: Extract<WorktreeSafetyResult, { ok: false }>): string {\n  return `Worktree Safety failed (${result.kind}): ${result.reason} ${result.remediation}`;\n}\n\nfunction formatWorktreeSafetyStopReason(result: Extract<WorktreeSafetyResult, { ok: false }>): string {\n  if (result.kind === \"empty-worktree-with-project-content\") {\n    return `Worktree Safety failed (${result.kind}). Run /gsd doctor fix, then /gsd auto.`;\n  }\n  return `Worktree Safety failed (${result.kind}).`;\n}\n\nfunction resolveEmptyWorktreeWithProjectContent(\n  unitRoot: string,\n  projectRoot: string,\n): boolean {\n  if (isSamePathLocal(unitRoot, projectRoot)) return false;\n  const worktreeClassification = classifyProject(unitRoot);\n  if (worktreeClassification.kind !== \"greenfield\") return false;\n  const projectRootClassification = classifyProject(projectRoot);\n  return shouldDegradeEmptyWorktreeToProjectRoot(worktreeClassification, projectRootClassification);\n}\n\nasync function validateSourceWriteWorktreeSafety(\n  ic: IterationContext,\n  unitType: string,\n  unitId: string,\n  milestoneId: string | undefined,\n  phase: string,\n): Promise<{ action: \"break\"; reason: string } | null> {\n  const { ctx, pi, s, deps } = ic;\n  if (!s.basePath) return null;\n\n  // Custom engine workflows (graph-driven, registered via run dirs) define\n  // their own step ids that are not in the GSD UnitContextManifest. Don't\n  // fail closed for those — the custom engine owns its own dispatch\n  // contract. The fail-closed safety check applies only to built-in GSD\n  // units whose Tool Contract is registered in the manifest. Use a truthy\n  // check so undefined (test sessions that never set the field) routes\n  // through the safety check, matching the regression test contract.\n  if (s.activeEngineId) return null;\n\n  const writesSource = unitWritesSource(unitType);\n  if (writesSource === null) {\n    const msg = `Worktree Safety failed (missing-tool-contract): missing Tool Contract for ${unitType}. Add a UnitContextManifest entry before dispatching this Unit.`;\n    debugLog(\"worktreeSafety\", {\n      phase,\n      unitType,\n      unitId,\n      milestoneId,\n      result: { ok: false, kind: \"missing-tool-contract\", reason: msg },\n      basePath: s.basePath,\n    });\n    ctx.ui.notify(msg, \"error\");\n    await deps.stopAuto(ctx, pi, msg);\n    return { action: \"break\", reason: \"missing-tool-contract\" };\n  }\n  if (!writesSource) return null;\n\n  const projectRoot = s.canonicalProjectRoot ?? resolveWorktreeProjectRoot(s.basePath, s.originalBasePath);\n  if (deps.getIsolationMode(projectRoot) !== \"worktree\") return null;\n\n  const safety = createWorktreeSafetyModule();\n  const result = safety.validateUnitRoot({\n    unitType,\n    unitId,\n    writeScope: \"source-writing\",\n    projectRoot,\n    unitRoot: s.basePath,\n    milestoneId,\n    expectedBranch: milestoneId ? deps.autoWorktreeBranch(milestoneId) : null,\n    emptyWorktreeWithProjectContent: resolveEmptyWorktreeWithProjectContent(s.basePath, projectRoot),\n    lease: s.workerId\n      ? {\n          required: true,\n          held: s.currentMilestoneId === milestoneId && s.milestoneLeaseToken !== null,\n          owner: s.workerId,\n        }\n      : undefined,\n  });\n\n  if (result.ok) return null;\n\n  const msg = formatWorktreeSafetyFailure(result);\n  debugLog(\"worktreeSafety\", {\n    phase,\n    unitType,\n    unitId,\n    milestoneId,\n    result,\n    basePath: s.basePath,\n    projectRoot,\n  });\n  ctx.ui.notify(msg, \"error\");\n  await deps.stopAuto(ctx, pi, formatWorktreeSafetyStopReason(result));\n  return { action: \"break\", reason: result.kind };\n}\n\n// ─── Session timeout auto-resume state ────────────────────────────────────────\n\nlet consecutiveSessionTimeouts = 0;\nconst MAX_SESSION_TIMEOUT_AUTO_RESUMES = 3;\n\nexport function resetSessionTimeoutState(): void {\n  consecutiveSessionTimeouts = 0;\n}\n\n// ─── generateMilestoneReport ──────────────────────────────────────────────────\n\n/**\n * Resolve the base path for milestone reports.\n * Prefers originalBasePath (project root) over basePath (which may be a worktree).\n * Exported for testing as _resolveReportBasePath.\n */\nexport function _resolveReportBasePath(s: Pick<AutoSession, \"originalBasePath\" | \"basePath\">): string {\n  return resolveWorktreeProjectRoot(s.basePath, s.originalBasePath);\n}\n\n/**\n * Resolve the authoritative project base for dispatch guards.\n * Prior-milestone completion lives at the project root, even when the active\n * unit is running inside an auto worktree.\n */\nexport function _resolveDispatchGuardBasePath(\n  s: Pick<AutoSession, \"originalBasePath\" | \"basePath\">,\n): string {\n  return resolveWorktreeProjectRoot(s.basePath, s.originalBasePath);\n}\n\nconst PLAN_V2_GATE_PHASES: ReadonlySet<Phase> = new Set([\n  \"executing\",\n  \"summarizing\",\n  \"validating-milestone\",\n  \"completing-milestone\",\n]);\n\nexport function shouldRunPlanV2Gate(phase: Phase): boolean {\n  return PLAN_V2_GATE_PHASES.has(phase);\n}\n\nexport function _shouldProceedWithInvalidRepoClassificationForTest(\n  reason: string | undefined,\n  hasGit: boolean,\n): boolean {\n  return reason === \"missing .git\" && hasGit;\n}\n\nexport function _resolveCurrentUnitStartedAtForTest(\n  currentUnit: { startedAt: number } | null | undefined,\n): number | undefined {\n  return currentUnit?.startedAt;\n}\n\n/**\n * Generate and write an HTML milestone report snapshot.\n * Extracted from the milestone-transition block in autoLoop.\n */\nasync function generateMilestoneReport(\n  s: AutoSession,\n  ctx: ExtensionContext,\n  milestoneId: string,\n): Promise<void> {\n  const { loadVisualizerData } = await importExtensionModule<typeof import(\"../visualizer-data.js\")>(import.meta.url, \"../visualizer-data.js\");\n  const { generateHtmlReport } = await importExtensionModule<typeof import(\"../export-html.js\")>(import.meta.url, \"../export-html.js\");\n  const { writeReportSnapshot } = await importExtensionModule<typeof import(\"../reports.js\")>(import.meta.url, \"../reports.js\");\n  const { basename } = await import(\"node:path\");\n\n  const reportBasePath = _resolveReportBasePath(s);\n\n  const snapData = await loadVisualizerData(reportBasePath);\n  const completedMs = snapData.milestones.find(\n    (m: { id: string }) => m.id === milestoneId,\n  );\n  const msTitle = completedMs?.title ?? milestoneId;\n  const gsdVersion = process.env.GSD_VERSION ?? \"0.0.0\";\n  const projName = basename(reportBasePath);\n  const doneSlices = snapData.milestones.reduce(\n    (acc: number, m: { slices: { done: boolean }[] }) =>\n      acc + m.slices.filter((sl: { done: boolean }) => sl.done).length,\n    0,\n  );\n  const totalSlices = snapData.milestones.reduce(\n    (acc: number, m: { slices: unknown[] }) => acc + m.slices.length,\n    0,\n  );\n  const outPath = writeReportSnapshot({\n    basePath: reportBasePath,\n    html: generateHtmlReport(snapData, {\n      projectName: projName,\n      projectPath: reportBasePath,\n      gsdVersion,\n      milestoneId,\n      indexRelPath: \"index.html\",\n    }),\n    milestoneId,\n    milestoneTitle: msTitle,\n    kind: \"milestone\",\n    projectName: projName,\n    projectPath: reportBasePath,\n    gsdVersion,\n    totalCost: snapData.totals?.cost ?? 0,\n    totalTokens: snapData.totals?.tokens.total ?? 0,\n    totalDuration: snapData.totals?.duration ?? 0,\n    doneSlices,\n    totalSlices,\n    doneMilestones: snapData.milestones.filter(\n      (m: { status: string }) => m.status === \"complete\",\n    ).length,\n    totalMilestones: snapData.milestones.length,\n    phase: snapData.phase,\n  });\n  ctx.ui.notify(\n    `Report saved: .gsd/reports/${basename(outPath)} — open index.html to browse progression.`,\n    \"info\",\n  );\n}\n\n// ─── closeoutAndStop ──────────────────────────────────────────────────────────\n\n/**\n * If a unit is in-flight, close it out, then stop auto-mode.\n * Extracted from ~4 identical if-closeout-then-stop sequences in autoLoop.\n */\nasync function closeoutAndStop(\n  ctx: ExtensionContext,\n  pi: ExtensionAPI,\n  s: AutoSession,\n  deps: LoopDeps,\n  reason: string,\n): Promise<void> {\n  if (s.currentUnit) {\n    await deps.closeoutUnit(\n      ctx,\n      s.basePath,\n      s.currentUnit.type,\n      s.currentUnit.id,\n      s.currentUnit.startedAt,\n      deps.buildSnapshotOpts(s.currentUnit.type, s.currentUnit.id),\n    );\n    s.currentUnit = null;\n  }\n  await deps.stopAuto(ctx, pi, reason);\n}\n\nasync function stopOnPostflightRecoveryNeeded(\n  ic: IterationContext,\n  result: PostflightResult,\n  milestoneId: string,\n): Promise<{ action: \"break\"; reason: string } | null> {\n  if (!result.needsManualRecovery) return null;\n  const { ctx, pi, deps } = ic;\n  const reason = `Post-merge stash restore failed for milestone ${milestoneId}`;\n  ctx.ui.notify(\n    `${reason}. Resolve the working tree before resuming auto-mode. ${result.message}`,\n    \"error\",\n  );\n  await deps.stopAuto(ctx, pi, reason);\n  return { action: \"break\", reason: \"postflight-stash-restore-failed\" };\n}\n\nasync function restorePreflightStashOrStop(\n  ic: IterationContext,\n  preflight: PreflightResult,\n  milestoneId: string,\n): Promise<{ action: \"break\"; reason: string } | null> {\n  if (!preflight.stashPushed) return null;\n  const { ctx, s, deps } = ic;\n  const result = deps.postflightPopStash(\n    s.originalBasePath || s.basePath,\n    milestoneId,\n    preflight.stashMarker,\n    ctx.ui.notify.bind(ctx.ui),\n  );\n  return stopOnPostflightRecoveryNeeded(ic, result, milestoneId);\n}\n\n/**\n * Run a milestone merge surrounded by preflight stash + always-on postflight\n * pop. The previous code popped the stash only after a successful merge, which\n * leaked `gsd-preflight-stash:M00x:*` entries whenever `mergeAndExit` threw —\n * leaving the user's pre-merge working tree silently stashed away after a\n * merge-conflict or other merge error. This helper restores the stash on\n * every exit path, then surfaces the merge or stash failure (in priority\n * order) as the loop's stop reason.\n *\n * Returns a `break` action when auto-mode must stop, or `null` when the merge\n * succeeded and the stash (if any) was restored cleanly.\n */\nexport async function _runMilestoneMergeWithStashRestore(\n  ic: IterationContext,\n  milestoneId: string,\n): Promise<{ action: \"break\"; reason: string } | null> {\n  const { ctx, pi, s, deps } = ic;\n\n  const preflight = deps.preflightCleanRoot(\n    s.originalBasePath || s.basePath,\n    milestoneId,\n    ctx.ui.notify.bind(ctx.ui),\n  );\n\n  let mergeError: unknown = null;\n  const exitResult = deps.lifecycle.exitMilestone(\n    milestoneId,\n    { merge: true },\n    ctx.ui,\n  );\n  if (exitResult.ok) {\n    s.milestoneMergedInPhases = true;\n  } else {\n    mergeError = exitResult.cause ?? new Error(`exit ${exitResult.reason}`);\n  }\n\n  // Always attempt to restore the stashed working tree, even on merge error.\n  // postflightPopStash itself does not throw; failures surface via the\n  // PostflightResult.needsManualRecovery flag.\n  let stashResult: PostflightResult | null = null;\n  if (preflight.stashPushed) {\n    stashResult = deps.postflightPopStash(\n      s.originalBasePath || s.basePath,\n      milestoneId,\n      preflight.stashMarker,\n      ctx.ui.notify.bind(ctx.ui),\n    );\n  }\n\n  // Merge failure takes priority over stash recovery — the merge is the\n  // authoritative gate. If the stash also needed manual recovery, the user\n  // already saw the postflightPopStash notify above.\n  if (mergeError) {\n    if (mergeError instanceof MergeConflictError) {\n      ctx.ui.notify(\n        `Merge conflict: ${mergeError.conflictedFiles.join(\", \")}. Resolve conflicts manually and run /gsd auto to resume.`,\n        \"error\",\n      );\n      await deps.stopAuto(ctx, pi, `Merge conflict on milestone ${milestoneId}`);\n      return { action: \"break\", reason: \"merge-conflict\" };\n    }\n    logError(\"engine\", \"Milestone merge failed with non-conflict error\", {\n      milestone: milestoneId,\n      error: String(mergeError),\n    });\n    ctx.ui.notify(\n      `Merge failed: ${mergeError instanceof Error ? mergeError.message : String(mergeError)}. Resolve and run /gsd auto to resume.`,\n      \"error\",\n    );\n    await deps.stopAuto(\n      ctx,\n      pi,\n      `Merge error on milestone ${milestoneId}: ${String(mergeError)}`,\n    );\n    return { action: \"break\", reason: \"merge-failed\" };\n  }\n\n  if (stashResult) {\n    return stopOnPostflightRecoveryNeeded(ic, stashResult, milestoneId);\n  }\n  return null;\n}\n\nexport async function _runMilestoneMergeOnceWithStashRestore(\n  ic: IterationContext,\n  milestoneId: string,\n): Promise<{ action: \"break\"; reason: string } | null> {\n  if (ic.s.milestoneMergedInPhases) {\n    debugLog(\"autoLoop\", {\n      phase: \"milestone-merge-skip\",\n      reason: \"already-merged-in-phases\",\n      milestoneId,\n    });\n    return null;\n  }\n  return _runMilestoneMergeWithStashRestore(ic, milestoneId);\n}\n\nasync function emitCancelledUnitEnd(\n  ic: IterationContext,\n  unitType: string,\n  unitId: string,\n  unitStartSeq: number,\n  errorContext?: { message: string; category: string; stopReason?: string; isTransient?: boolean; retryAfterMs?: number },\n): Promise<void> {\n  ic.deps.emitJournalEvent({\n    ts: new Date().toISOString(),\n    flowId: ic.flowId,\n    seq: ic.nextSeq(),\n    eventType: \"unit-end\",\n    data: {\n      unitType,\n      unitId,\n      status: \"cancelled\",\n      artifactVerified: false,\n      ...(errorContext ? { errorContext } : {}),\n    },\n    causedBy: { flowId: ic.flowId, seq: unitStartSeq },\n  });\n}\n\nexport function _buildCancelledUnitStopReason(\n  unitType: string,\n  unitId: string,\n  errorContext?: { message: string; category: string },\n): {\n  notifyMessage: string;\n  stopReason: string;\n  loopReason: \"session-failed\" | \"unit-aborted\";\n} {\n  const cancellationMessage = errorContext?.message ?? \"unknown\";\n  const isSessionCreationFailure = errorContext?.category === \"session-failed\";\n\n  if (isSessionCreationFailure) {\n    return {\n      notifyMessage: `Session creation failed for ${unitType} ${unitId}: ${cancellationMessage}. Stopping auto-mode.`,\n      stopReason: `Session creation failed: ${cancellationMessage}`,\n      loopReason: \"session-failed\",\n    };\n  }\n\n  return {\n    notifyMessage: `Unit ${unitType} ${unitId} aborted after dispatch: ${cancellationMessage}. Stopping auto-mode.`,\n    stopReason: `Unit aborted: ${cancellationMessage}`,\n    loopReason: \"unit-aborted\",\n  };\n}\n\nasync function failClosedOnFinalizeTimeout(\n  ic: IterationContext,\n  iterData: IterationData,\n  loopState: LoopState,\n  stage: \"pre\" | \"post\",\n  startedAt: number,\n): Promise<PhaseResult> {\n  const { ctx, pi, s, deps } = ic;\n  const now = Date.now();\n  const unitType = iterData.unitType;\n  const unitId = iterData.unitId;\n  const timeoutMs = stage === \"pre\" ? FINALIZE_PRE_TIMEOUT_MS : FINALIZE_POST_TIMEOUT_MS;\n  const progressKind = stage === \"pre\" ? \"finalize-pre-timeout\" : \"finalize-post-timeout\";\n\n  writeUnitRuntimeRecord(s.basePath, unitType, unitId, startedAt, {\n    phase: \"finalize-timeout\",\n    timeoutAt: now,\n    lastProgressAt: now,\n    lastProgressKind: progressKind,\n  });\n\n  deps.emitJournalEvent({\n    ts: new Date(now).toISOString(),\n    flowId: ic.flowId,\n    seq: ic.nextSeq(),\n    eventType: \"unit-end\",\n    data: {\n      unitType,\n      unitId,\n      status: \"timed-out-finalize\",\n      artifactVerified: false,\n      finalizeStage: stage,\n    },\n  });\n\n  loopState.consecutiveFinalizeTimeouts++;\n  debugLog(\"autoLoop\", {\n    phase: progressKind,\n    iteration: ic.iteration,\n    unitType,\n    unitId,\n    consecutiveTimeouts: loopState.consecutiveFinalizeTimeouts,\n  });\n\n  ctx.ui.notify(\n    `${stage === \"pre\" ? \"postUnitPreVerification\" : \"postUnitPostVerification\"} timed out after ${timeoutMs / 1000}s for ${unitType} ${unitId} (${loopState.consecutiveFinalizeTimeouts}/${MAX_FINALIZE_TIMEOUTS}) — pausing auto-mode for recovery.`,\n    \"warning\",\n  );\n\n  await deps.pauseAuto(ctx, pi);\n  s.currentUnit = null;\n  clearCurrentPhase();\n  drainLogs();\n  return { action: \"break\", reason: progressKind };\n}\n\n// ─── runPreDispatch ───────────────────────────────────────────────────────────\n\n/**\n * Phase 1: Pre-dispatch — resource guard, health gate, state derivation,\n * milestone transition, terminal conditions.\n * Returns break to exit the loop, or next with PreDispatchData on success.\n */\nexport async function runPreDispatch(\n  ic: IterationContext,\n  loopState: LoopState,\n): Promise<PhaseResult<PreDispatchData>> {\n  const { ctx, pi, s, deps, prefs } = ic;\n  const uokFlags = resolveUokFlags(prefs);\n  const runPreDispatchGate = async (input: {\n    gateId: string;\n    gateType: string;\n    outcome: \"pass\" | \"fail\" | \"retry\" | \"manual-attention\";\n    failureClass: \"none\" | \"policy\" | \"input\" | \"execution\" | \"artifact\" | \"verification\" | \"closeout\" | \"git\" | \"timeout\" | \"manual-attention\" | \"unknown\";\n    rationale: string;\n    findings?: string;\n    milestoneId?: string;\n  }): Promise<void> => {\n    if (!uokFlags.gates) return;\n    const gateRunner = new UokGateRunner();\n    gateRunner.register({\n      id: input.gateId,\n      type: input.gateType,\n      execute: async () => ({\n        outcome: input.outcome,\n        failureClass: input.failureClass,\n        rationale: input.rationale,\n        findings: input.findings ?? \"\",\n      }),\n    });\n    await gateRunner.run(input.gateId, {\n      basePath: s.basePath,\n      traceId: `pre-dispatch:${ic.flowId}`,\n      turnId: `iter-${ic.iteration}`,\n      milestoneId: input.milestoneId ?? s.currentMilestoneId ?? undefined,\n      unitType: \"pre-dispatch\",\n      unitId: `iter-${ic.iteration}`,\n    });\n  };\n\n  // Resource version guard\n  const staleMsg = deps.checkResourcesStale(s.resourceVersionOnStart);\n  if (staleMsg) {\n    await runPreDispatchGate({\n      gateId: \"resource-version-guard\",\n      gateType: \"policy\",\n      outcome: \"fail\",\n      failureClass: \"policy\",\n      rationale: \"resource version guard blocked dispatch\",\n      findings: staleMsg,\n    });\n    await deps.stopAuto(ctx, pi, staleMsg);\n    debugLog(\"autoLoop\", { phase: \"exit\", reason: \"resources-stale\" });\n    return { action: \"break\", reason: \"resources-stale\" };\n  }\n  await runPreDispatchGate({\n    gateId: \"resource-version-guard\",\n    gateType: \"policy\",\n    outcome: \"pass\",\n    failureClass: \"none\",\n    rationale: \"resource version guard passed\",\n  });\n\n  deps.invalidateAllCaches();\n  s.lastPromptCharCount = undefined;\n  s.lastBaselineCharCount = undefined;\n\n  // Pre-dispatch health gate\n  try {\n    const healthGate = await deps.preDispatchHealthGate(s.basePath);\n    if (healthGate.fixesApplied.length > 0) {\n      ctx.ui.notify(\n        `Pre-dispatch: ${healthGate.fixesApplied.join(\", \")}`,\n        \"info\",\n      );\n    }\n    if (!healthGate.proceed) {\n      await runPreDispatchGate({\n        gateId: \"pre-dispatch-health-gate\",\n        gateType: \"execution\",\n        outcome: \"manual-attention\",\n        failureClass: \"manual-attention\",\n        rationale: \"pre-dispatch health gate blocked dispatch\",\n        findings: healthGate.reason,\n      });\n      ctx.ui.notify(\n        healthGate.reason || \"Pre-dispatch health check failed — run /gsd doctor for details.\",\n        \"error\",\n      );\n      await deps.pauseAuto(ctx, pi);\n      debugLog(\"autoLoop\", { phase: \"exit\", reason: \"health-gate-failed\" });\n      return { action: \"break\", reason: \"health-gate-failed\" };\n    }\n    await runPreDispatchGate({\n      gateId: \"pre-dispatch-health-gate\",\n      gateType: \"execution\",\n      outcome: \"pass\",\n      failureClass: \"none\",\n      rationale: \"pre-dispatch health gate passed\",\n      findings: healthGate.fixesApplied.length > 0 ? healthGate.fixesApplied.join(\", \") : \"\",\n    });\n  } catch (e) {\n    await runPreDispatchGate({\n      gateId: \"pre-dispatch-health-gate\",\n      gateType: \"execution\",\n      outcome: \"manual-attention\",\n      failureClass: \"manual-attention\",\n      rationale: \"pre-dispatch health gate threw unexpectedly\",\n      findings: String(e),\n    });\n    logWarning(\"engine\", \"Pre-dispatch health gate threw unexpectedly\", { error: String(e) });\n  }\n\n  // Sync project root artifacts into worktree\n  if (\n    s.originalBasePath &&\n    !isSamePathLocal(s.basePath, s.originalBasePath) &&\n    s.currentMilestoneId &&\n    s.scope\n  ) {\n    deps.worktreeProjection.projectRootToWorktree(s.scope);\n  }\n\n  // Derive state — use canonical project root so the cache key is stable\n  // across worktree↔project-root path-form alternation. See PR #5236\n  // (workspace handle infrastructure) and the Phase A pt 2 plan.\n  let state = await deps.deriveState(s.canonicalProjectRoot);\n  const { getDeepStageGate } = await import(\"../auto-dispatch.js\");\n  const deepStageGate = getDeepStageGate(prefs, s.basePath);\n  const canRunDeepSetupGate =\n    state.phase === \"pre-planning\" ||\n    state.phase === \"needs-discussion\" ||\n    state.phase === \"planning\";\n  if (\n    canRunDeepSetupGate &&\n    (deepStageGate.status === \"pending\" || deepStageGate.status === \"blocked\")\n  ) {\n    debugLog(\"autoLoop\", {\n      phase: \"deep-project-stage-gate\",\n      stage: deepStageGate.stage,\n      status: deepStageGate.status,\n      reason: deepStageGate.reason,\n    });\n    return {\n      action: \"next\",\n      data: {\n        state: {\n          ...state,\n          phase: \"pre-planning\",\n          activeMilestone: null,\n          activeSlice: null,\n          activeTask: null,\n          nextAction: deepStageGate.reason,\n        },\n        mid: \"PROJECT\",\n        midTitle: \"Project setup\",\n      },\n    };\n  }\n\n  if (uokFlags.planV2 && shouldRunPlanV2Gate(state.phase)) {\n    let compiled = ensurePlanV2Graph(s.basePath, state);\n    if (isEmptyPlanV2GraphResult(compiled)) {\n      deps.invalidateAllCaches();\n      state = await deps.deriveState(s.canonicalProjectRoot);\n      compiled = shouldRunPlanV2Gate(state.phase)\n        ? ensurePlanV2Graph(s.basePath, state)\n        : {\n            ok: true,\n            reason: \"empty plan-v2 graph recovered by state rederive\",\n            nodeCount: 0,\n          };\n    }\n    if (!compiled.ok) {\n      const reason = compiled.reason ?? \"Plan v2 compilation failed\";\n      if (isMissingFinalizedContextResult(compiled)) {\n        await runPreDispatchGate({\n          gateId: \"plan-v2-gate\",\n          gateType: \"policy\",\n          outcome: \"pass\",\n          failureClass: \"none\",\n          rationale: \"plan v2 missing context recovery deferred to dispatch\",\n          findings: reason,\n          milestoneId: state.activeMilestone?.id ?? undefined,\n        });\n      } else {\n        await runPreDispatchGate({\n          gateId: \"plan-v2-gate\",\n          gateType: \"policy\",\n          outcome: \"manual-attention\",\n          failureClass: \"manual-attention\",\n          rationale: \"plan v2 compile gate failed\",\n          findings: reason,\n          milestoneId: state.activeMilestone?.id ?? undefined,\n        });\n        ctx.ui.notify(`Plan gate failed-closed: ${reason}\\n\\nIf this keeps happening, try: /gsd doctor heal`, \"error\");\n        await deps.pauseAuto(ctx, pi);\n        return { action: \"break\", reason: \"plan-v2-gate-failed\" };\n      }\n    }\n    if (compiled.ok) {\n      await runPreDispatchGate({\n        gateId: \"plan-v2-gate\",\n        gateType: \"policy\",\n        outcome: \"pass\",\n        failureClass: \"none\",\n        rationale: \"plan v2 compile gate passed\",\n        milestoneId: state.activeMilestone?.id ?? undefined,\n      });\n    }\n  }\n  deps.syncCmuxSidebar(prefs, state);\n  let mid = state.activeMilestone?.id;\n  let midTitle = state.activeMilestone?.title;\n  debugLog(\"autoLoop\", {\n    phase: \"state-derived\",\n    iteration: ic.iteration,\n    mid,\n    statePhase: state.phase,\n  });\n\n  // ── Slice-level parallelism gate (#2340) ─────────────────────────────\n  // When slice_parallel is enabled, check if multiple slices are eligible\n  // for parallel execution. If so, dispatch them in parallel and stop the\n  // sequential loop. Workers are spawned via slice-parallel-orchestrator.ts.\n  if (\n    prefs?.slice_parallel?.enabled &&\n    mid &&\n    !process.env.GSD_PARALLEL_WORKER &&\n    isDbAvailable()\n  ) {\n    try {\n      const dbSlices = getMilestoneSlices(mid);\n      if (dbSlices.length > 0) {\n        const doneIds = new Set(dbSlices.filter(sl => sl.status === \"complete\" || sl.status === \"done\").map(sl => sl.id));\n        const sliceInputs = dbSlices.map(sl => ({\n          id: sl.id,\n          done: doneIds.has(sl.id),\n          depends: sl.depends ?? [],\n        }));\n        const eligible = getEligibleSlices(sliceInputs, doneIds);\n        if (eligible.length > 1) {\n          debugLog(\"autoLoop\", {\n            phase: \"slice-parallel-dispatch\",\n            iteration: ic.iteration,\n            mid,\n            eligibleSlices: eligible.map(e => e.id),\n          });\n          ctx.ui.notify(\n            `Slice-parallel: dispatching ${eligible.length} eligible slices for ${mid}.`,\n            \"info\",\n          );\n          // ADR-017 #5707: reconcile before spawning so each worker doesn't\n          // independently race on the same drift. Failure aborts the spawn.\n          const spawnGate = await reconcileBeforeSpawn(s.basePath);\n          if (!spawnGate.ok) {\n            ctx.ui.notify(\n              `Slice-parallel: aborting spawn — ${spawnGate.reason}`,\n              \"error\",\n            );\n            return { action: \"break\", reason: `slice-parallel-reconciliation-failed: ${spawnGate.reason}` };\n          }\n          const result = await startSliceParallel(\n            s.basePath,\n            mid,\n            eligible,\n            {\n              maxWorkers: prefs.slice_parallel.max_workers ?? 2,\n              useExecutionGraph: uokFlags.executionGraph,\n            },\n          );\n          if (result.started.length > 0) {\n            ctx.ui.notify(\n              `Slice-parallel: started ${result.started.length} worker(s): ${result.started.join(\", \")}.`,\n              \"info\",\n            );\n            await deps.stopAuto(ctx, pi, `Slice-parallel dispatched for ${mid}`);\n            return { action: \"break\", reason: \"slice-parallel-dispatched\" };\n          }\n          // Fall through to sequential if no workers started\n        }\n      }\n    } catch (err) {\n      debugLog(\"autoLoop\", {\n        phase: \"slice-parallel-check-error\",\n        error: err instanceof Error ? err.message : String(err),\n      });\n      // Non-fatal — fall through to sequential dispatch\n    }\n  }\n\n  // ── Milestone transition ────────────────────────────────────────────\n  if (mid && s.currentMilestoneId && mid !== s.currentMilestoneId) {\n    deps.emitJournalEvent({ ts: new Date().toISOString(), flowId: ic.flowId, seq: ic.nextSeq(), eventType: \"milestone-transition\", data: { from: s.currentMilestoneId, to: mid } });\n    ctx.ui.notify(\n      `Milestone ${s.currentMilestoneId} complete. Advancing to ${mid}: ${midTitle}.`,\n      \"info\",\n    );\n    deps.sendDesktopNotification(\n      \"GSD\",\n      `Milestone ${s.currentMilestoneId} complete!`,\n      \"success\",\n      \"milestone\",\n      basename(s.originalBasePath || s.basePath),\n    );\n    deps.logCmuxEvent(\n      prefs,\n      `Milestone ${s.currentMilestoneId} complete. Advancing to ${mid}.`,\n      \"success\",\n    );\n\n    const vizPrefs = prefs;\n    if (vizPrefs?.auto_visualize) {\n      ctx.ui.notify(\"Run /gsd visualize to see progress overview.\", \"info\");\n    }\n    if (vizPrefs?.auto_report !== false) {\n      try {\n        await generateMilestoneReport(s, ctx, s.currentMilestoneId!);\n      } catch (err) {\n        ctx.ui.notify(\n          `Report generation failed: ${err instanceof Error ? err.message : String(err)}`,\n          \"warning\",\n        );\n      }\n    }\n\n    // Reset dispatch counters for new milestone\n    s.unitDispatchCount.clear();\n    s.unitRecoveryCount.clear();\n    s.unitLifetimeDispatches.clear();\n    loopState.recentUnits.length = 0;\n    loopState.stuckRecoveryAttempts = 0;\n\n    // Worktree lifecycle on milestone transition — merge current, enter next.\n    // #2909 / #5538-followup: preflight stash + always-on postflight pop.\n    {\n      const stop = await _runMilestoneMergeOnceWithStashRestore(ic, s.currentMilestoneId!);\n      if (stop) return stop;\n    }\n\n    // PR creation (auto_pr) is handled inside mergeMilestoneToMain (#2302)\n\n    deps.invalidateAllCaches();\n\n    state = await deps.deriveState(s.canonicalProjectRoot);\n    mid = state.activeMilestone?.id;\n    midTitle = state.activeMilestone?.title;\n\n    if (mid) {\n      if (deps.getIsolationMode(s.basePath) !== \"none\") {\n        deps.captureIntegrationBranch(s.basePath, mid);\n      }\n      const enterResult = deps.lifecycle.enterMilestone(mid, ctx.ui);\n      if (!enterResult.ok) {\n        ctx.ui.notify(\n          `Milestone transition stopped: failed to enter ${mid} (${enterResult.reason}).`,\n          \"error\",\n        );\n        if (enterResult.reason === \"lease-conflict\") {\n          await deps.pauseAuto(ctx, pi);\n        }\n        return { action: \"break\", reason: \"milestone-enter-failed\" };\n      }\n    } else {\n      // mid is undefined — no milestone to capture integration branch for\n    }\n\n    const pendingIds = state.registry\n      .filter(\n        (m: { status: string }) =>\n          m.status !== \"complete\" && m.status !== \"parked\",\n      )\n      .map((m: { id: string }) => m.id);\n    deps.pruneQueueOrder(s.basePath, pendingIds);\n\n    // Archive the old completed-units.json instead of wiping it (#2313).\n    try {\n      const completedKeysPath = join(gsdRoot(s.basePath), \"completed-units.json\");\n      if (existsSync(completedKeysPath) && s.currentMilestoneId) {\n        const archivePath = join(\n          gsdRoot(s.basePath),\n          `completed-units-${s.currentMilestoneId}.json`,\n        );\n        cpSync(completedKeysPath, archivePath);\n      }\n      atomicWriteSync(completedKeysPath, JSON.stringify([], null, 2));\n    } catch (e) {\n      logWarning(\"engine\", \"Failed to archive completed-units on milestone transition\", { error: String(e) });\n    }\n\n    // Rebuild STATE.md immediately so it reflects the new active milestone.\n    // This bypasses the 30-second throttle in the normal rebuild path —\n    // milestone transitions are rare and important enough to warrant an\n    // immediate write.\n    try {\n      await deps.rebuildState(s.basePath);\n    } catch (e) {\n      logWarning(\"engine\", \"STATE.md rebuild failed after milestone transition\", { error: String(e) });\n    }\n  }\n\n  if (mid) {\n    s.currentMilestoneId = mid;\n    deps.setActiveMilestoneId(s.basePath, mid);\n  }\n\n  // ── Terminal conditions ──────────────────────────────────────────────\n\n  if (!mid) {\n    if (s.currentUnit) {\n      await deps.closeoutUnit(\n        ctx,\n        s.basePath,\n        s.currentUnit.type,\n        s.currentUnit.id,\n        s.currentUnit.startedAt,\n        deps.buildSnapshotOpts(s.currentUnit.type, s.currentUnit.id),\n      );\n    }\n\n    const incomplete = state.registry.filter(\n      (m: { status: string }) =>\n        m.status !== \"complete\" && m.status !== \"parked\",\n    );\n    if (incomplete.length === 0 && state.registry.length > 0) {\n      // All milestones complete — merge milestone branch before stopping.\n      if (s.currentMilestoneId) {\n        // #2909 / #5538-followup: preflight stash + always-on postflight pop.\n        const stop = await _runMilestoneMergeOnceWithStashRestore(ic, s.currentMilestoneId);\n        if (stop) return stop;\n        // PR creation (auto_pr) is handled inside mergeMilestoneToMain (#2302)\n      }\n      deps.sendDesktopNotification(\n        \"GSD\",\n        \"All milestones complete!\",\n        \"success\",\n        \"milestone\",\n        basename(s.originalBasePath || s.basePath),\n      );\n      deps.logCmuxEvent(\n        prefs,\n        \"All milestones complete.\",\n        \"success\",\n      );\n      await deps.stopAuto(ctx, pi, \"All milestones complete\", {\n        completionWidget: {\n          milestoneId: s.currentMilestoneId,\n          milestoneTitle: midTitle,\n          allMilestonesComplete: true,\n        },\n      });\n    } else if (incomplete.length === 0 && state.registry.length === 0) {\n      // Empty registry — no milestones visible, likely a path resolution bug\n      const diag = `basePath=${s.basePath}, phase=${state.phase}`;\n      ctx.ui.notify(\n        `No milestones visible in current scope. Possible path resolution issue.\\n   Diagnostic: ${diag}`,\n        \"error\",\n      );\n      await deps.stopAuto(\n        ctx,\n        pi,\n        `No milestones found — check basePath resolution`,\n      );\n    } else if (state.phase === \"blocked\") {\n      const blockerMsg = `Blocked: ${state.blockers.join(\", \")}`;\n      // Pause instead of hard-stop so the session is resumable with `/gsd auto`.\n      // Hard-stop here was causing premature termination when slice dependencies\n      // were temporarily unresolvable (e.g. after reassessment added new slices).\n      await deps.pauseAuto(ctx, pi);\n      ctx.ui.notify(`${blockerMsg}. Fix and run /gsd auto to resume.`, \"warning\");\n      deps.sendDesktopNotification(\"GSD\", blockerMsg, \"warning\", \"attention\", basename(s.originalBasePath || s.basePath));\n      deps.logCmuxEvent(prefs, blockerMsg, \"warning\");\n    } else {\n      const ids = incomplete.map((m: { id: string }) => m.id).join(\", \");\n      const diag = `basePath=${s.basePath}, milestones=[${state.registry.map((m: { id: string; status: string }) => `${m.id}:${m.status}`).join(\", \")}], phase=${state.phase}`;\n      ctx.ui.notify(\n        `Unexpected: ${incomplete.length} incomplete milestone(s) (${ids}) but no active milestone.\\n   Diagnostic: ${diag}`,\n        \"error\",\n      );\n      await deps.stopAuto(\n        ctx,\n        pi,\n        `No active milestone — ${incomplete.length} incomplete (${ids}), see diagnostic above`,\n      );\n    }\n    debugLog(\"autoLoop\", { phase: \"exit\", reason: \"no-active-milestone\" });\n    deps.emitJournalEvent({ ts: new Date().toISOString(), flowId: ic.flowId, seq: ic.nextSeq(), eventType: \"terminal\", data: { reason: \"no-active-milestone\" } });\n    return { action: \"break\", reason: \"no-active-milestone\" };\n  }\n\n  if (!midTitle) {\n    midTitle = mid;\n    ctx.ui.notify(\n      `Milestone ${mid} has no title in roadmap — using ID as fallback.`,\n      \"warning\",\n    );\n  }\n\n  // Mid-merge safety check\n  const mergeReconcileResult = deps.reconcileMergeState(s.basePath, ctx);\n  if (mergeReconcileResult === \"blocked\") {\n    await deps.pauseAuto(ctx, pi);\n    debugLog(\"autoLoop\", { phase: \"exit\", reason: \"merge-reconciliation-blocked\" });\n    return { action: \"break\", reason: \"merge-reconciliation-blocked\" };\n  }\n  if (mergeReconcileResult === \"reconciled\") {\n    deps.invalidateAllCaches();\n    state = await deps.deriveState(s.canonicalProjectRoot);\n    mid = state.activeMilestone?.id;\n    midTitle = state.activeMilestone?.title;\n  }\n\n  if (!mid || !midTitle) {\n    const noMilestoneReason = !mid\n      ? \"No active milestone after merge reconciliation\"\n      : `Milestone ${mid} has no title after reconciliation`;\n    await closeoutAndStop(ctx, pi, s, deps, noMilestoneReason);\n    debugLog(\"autoLoop\", {\n      phase: \"exit\",\n      reason: \"no-milestone-after-reconciliation\",\n    });\n    return { action: \"break\", reason: \"no-milestone-after-reconciliation\" };\n  }\n\n  // Terminal: complete\n  if (state.phase === \"complete\") {\n    // Milestone merge on complete (before closeout so branch state is clean).\n    if (s.currentMilestoneId) {\n      // #2909 / #5538-followup: preflight stash + always-on postflight pop.\n      const stop = await _runMilestoneMergeOnceWithStashRestore(ic, s.currentMilestoneId);\n      if (stop) return stop;\n      // PR creation (auto_pr) is handled inside mergeMilestoneToMain (#2302)\n    }\n    deps.sendDesktopNotification(\n      \"GSD\",\n      `Milestone ${mid} complete!`,\n      \"success\",\n      \"milestone\",\n      basename(s.originalBasePath || s.basePath),\n    );\n    deps.logCmuxEvent(\n      prefs,\n      `Milestone ${mid} complete.`,\n      \"success\",\n    );\n    if (s.currentUnit) {\n      await deps.closeoutUnit(\n        ctx,\n        s.basePath,\n        s.currentUnit.type,\n        s.currentUnit.id,\n        s.currentUnit.startedAt,\n        deps.buildSnapshotOpts(s.currentUnit.type, s.currentUnit.id),\n      );\n      s.currentUnit = null;\n    }\n    await deps.stopAuto(ctx, pi, `Milestone ${mid} complete`, {\n      completionWidget: {\n        milestoneId: mid,\n        milestoneTitle: midTitle,\n      },\n    });\n    debugLog(\"autoLoop\", { phase: \"exit\", reason: \"milestone-complete\" });\n    deps.emitJournalEvent({ ts: new Date().toISOString(), flowId: ic.flowId, seq: ic.nextSeq(), eventType: \"terminal\", data: { reason: \"milestone-complete\", milestoneId: mid } });\n    return { action: \"break\", reason: \"milestone-complete\" };\n  }\n\n  // Terminal: blocked — pause instead of hard-stop so the session is resumable.\n  if (state.phase === \"blocked\") {\n    const blockerMsg = `Blocked: ${state.blockers.join(\", \")}`;\n    if (s.currentUnit) {\n      await deps.closeoutUnit(\n        ctx,\n        s.basePath,\n        s.currentUnit.type,\n        s.currentUnit.id,\n        s.currentUnit.startedAt,\n        deps.buildSnapshotOpts(s.currentUnit.type, s.currentUnit.id),\n      );\n    }\n    await deps.pauseAuto(ctx, pi);\n    ctx.ui.notify(`${blockerMsg}. Fix and run /gsd auto to resume.`, \"warning\");\n    deps.sendDesktopNotification(\"GSD\", blockerMsg, \"warning\", \"attention\", basename(s.originalBasePath || s.basePath));\n    deps.logCmuxEvent(prefs, blockerMsg, \"warning\");\n    debugLog(\"autoLoop\", { phase: \"exit\", reason: \"blocked\" });\n    deps.emitJournalEvent({ ts: new Date().toISOString(), flowId: ic.flowId, seq: ic.nextSeq(), eventType: \"terminal\", data: { reason: \"blocked\", blockers: state.blockers } });\n    return { action: \"break\", reason: \"blocked\" };\n  }\n\n  return { action: \"next\", data: { state, mid, midTitle } };\n}\n\n// ─── runDispatch ──────────────────────────────────────────────────────────────\n\n/**\n * Phase 3: Dispatch resolution — resolve next unit, stuck detection, pre-dispatch hooks.\n * Returns break/continue to control the loop, or next with IterationData on success.\n */\nexport async function runDispatch(\n  ic: IterationContext,\n  preData: PreDispatchData,\n  loopState: LoopState,\n): Promise<PhaseResult<IterationData>> {\n  const { ctx, pi, s, deps, prefs } = ic;\n  const { state, mid, midTitle } = preData;\n  const STUCK_WINDOW_SIZE = 6;\n  const provider = ctx.model?.provider;\n  const authMode = provider && typeof ctx.modelRegistry?.getProviderAuthMode === \"function\"\n    ? ctx.modelRegistry.getProviderAuthMode(provider)\n    : undefined;\n  const activeTools = typeof pi.getActiveTools === \"function\" ? pi.getActiveTools() : [];\n  // Deep planning intentionally keeps human checkpoints in plain chat. In\n  // Claude Code/local MCP transports, structured question requests can be\n  // cancelled outside the normal chat flow, which made approval gates easy to\n  // skip or bury under tool output.\n  const structuredQuestionsAvailable = prefs?.planning_depth === \"deep\"\n    ? \"false\"\n    : supportsStructuredQuestions(activeTools, {\n        authMode,\n        baseUrl: ctx.model?.baseUrl,\n      }) ? \"true\" : \"false\";\n\n  debugLog(\"autoLoop\", { phase: \"dispatch-resolve\", iteration: ic.iteration });\n  const dispatchResult = await deps.resolveDispatch({\n    basePath: s.basePath,\n    mid,\n    midTitle,\n    state,\n    prefs,\n    session: s,\n    structuredQuestionsAvailable,\n    sessionContextWindow: ctx.model?.contextWindow,\n    sessionProvider: ctx.model?.provider,\n    modelRegistry: ctx.modelRegistry as MinimalModelRegistry | undefined,\n  });\n\n  if (dispatchResult.action === \"stop\") {\n    deps.emitJournalEvent({ ts: new Date().toISOString(), flowId: ic.flowId, seq: ic.nextSeq(), eventType: \"dispatch-stop\", rule: dispatchResult.matchedRule, data: { reason: dispatchResult.reason } });\n    // Warning-level stops are recoverable human checkpoints (e.g. UAT verdict\n    // gate) — pause instead of hard-stopping so the session is resumable with\n    // `/gsd auto`. Error/info-level stops remain hard stops for infrastructure\n    // failures and terminal conditions respectively.\n    // See: https://github.com/gsd-build/gsd-2/issues/2474\n    if (dispatchResult.level === \"warning\") {\n      ctx.ui.notify(dispatchResult.reason, \"warning\");\n      await deps.pauseAuto(ctx, pi, {\n        message: dispatchResult.reason,\n        category: \"unknown\",\n      });\n    } else {\n      await closeoutAndStop(ctx, pi, s, deps, dispatchResult.reason);\n    }\n    debugLog(\"autoLoop\", { phase: \"exit\", reason: \"dispatch-stop\" });\n    return { action: \"break\", reason: \"dispatch-stop\" };\n  }\n\n  if (dispatchResult.action !== \"dispatch\") {\n    // Non-dispatch action (e.g. \"skip\") — re-derive state\n    await new Promise((r) => setImmediate(r));\n    return { action: \"continue\" };\n  }\n\n  deps.emitJournalEvent({ ts: new Date().toISOString(), flowId: ic.flowId, seq: ic.nextSeq(), eventType: \"dispatch-match\", rule: dispatchResult.matchedRule, data: { unitType: dispatchResult.unitType, unitId: dispatchResult.unitId } });\n\n  let unitType = dispatchResult.unitType;\n  let unitId = dispatchResult.unitId;\n  let prompt = dispatchResult.prompt;\n  const pauseAfterUatDispatch = dispatchResult.pauseAfterDispatch ?? false;\n\n  // Resolve hooks and prior-slice gating before health/stuck accounting so\n  // those checks run against the final dispatch unit.\n  const preDispatchResult = deps.runPreDispatchHooks(\n    unitType,\n    unitId,\n    prompt,\n    s.basePath,\n  );\n  if (preDispatchResult.firedHooks.length > 0) {\n    ctx.ui.notify(\n      `Pre-dispatch hook${preDispatchResult.firedHooks.length > 1 ? \"s\" : \"\"}: ${preDispatchResult.firedHooks.join(\", \")}`,\n      \"info\",\n    );\n    deps.emitJournalEvent({ ts: new Date().toISOString(), flowId: ic.flowId, seq: ic.nextSeq(), eventType: \"pre-dispatch-hook\", data: { firedHooks: preDispatchResult.firedHooks, action: preDispatchResult.action } });\n  }\n  if (preDispatchResult.action === \"skip\") {\n    ctx.ui.notify(\n      `Skipping ${unitType} ${unitId} (pre-dispatch hook).`,\n      \"info\",\n    );\n    await new Promise((r) => setImmediate(r));\n    return { action: \"continue\" };\n  }\n  if (preDispatchResult.action === \"replace\") {\n    prompt = preDispatchResult.prompt ?? prompt;\n    if (preDispatchResult.unitType) unitType = preDispatchResult.unitType;\n  } else if (preDispatchResult.prompt) {\n    prompt = preDispatchResult.prompt;\n  }\n\n  const guardBasePath = _resolveDispatchGuardBasePath(s);\n  let mainBranch = \"main\";\n  try {\n    mainBranch = deps.getMainBranch(guardBasePath);\n  } catch (err) {\n    debugLog(\"autoLoop\", { phase: \"getMainBranch-failed\", error: String(err) });\n  }\n  const priorSliceBlocker = deps.getPriorSliceCompletionBlocker(\n    guardBasePath,\n    mainBranch,\n    unitType,\n    unitId,\n  );\n  if (priorSliceBlocker) {\n    await deps.stopAuto(ctx, pi, priorSliceBlocker);\n    debugLog(\"autoLoop\", { phase: \"exit\", reason: \"prior-slice-blocker\" });\n    return { action: \"break\", reason: \"prior-slice-blocker\" };\n  }\n\n  const consecutiveDispatchBlocker = getConsecutiveDispatchBlocker(\n    loopState,\n    state.phase,\n    unitType,\n    unitId,\n  );\n  if (consecutiveDispatchBlocker) {\n    await deps.stopAuto(ctx, pi, consecutiveDispatchBlocker);\n    debugLog(\"autoLoop\", { phase: \"exit\", reason: \"consecutive-dispatch-blocker\" });\n    return { action: \"break\", reason: \"consecutive-dispatch-blocker\" };\n  }\n\n  const worktreeSafetyBlock = await validateSourceWriteWorktreeSafety(\n    ic,\n    unitType,\n    unitId,\n    mid,\n    \"pre-dispatch\",\n  );\n  if (worktreeSafetyBlock) return worktreeSafetyBlock;\n\n  // ── Sliding-window stuck detection with graduated recovery ──\n  const derivedKey = `${unitType}/${unitId}`;\n\n  // Always record this dispatch in the sliding window and run detection so\n  // Rules 1/3/4 can catch retry loops with repeated failure content (#5719).\n  // Rules 2/2b suppress legitimate retry backoff through the dispatch ledger.\n  loopState.recentUnits.push({ key: derivedKey });\n  if (loopState.recentUnits.length > STUCK_WINDOW_SIZE) loopState.recentUnits.shift();\n\n  const stuckSignal = detectStuck(loopState.recentUnits);\n  if (stuckSignal) {\n      debugLog(\"autoLoop\", {\n        phase: \"stuck-check\",\n        unitType,\n        unitId,\n        reason: stuckSignal.reason,\n        recoveryAttempts: loopState.stuckRecoveryAttempts,\n      });\n\n      if (loopState.stuckRecoveryAttempts === 0) {\n        // Level 1: try verifying the artifact, then cache invalidation + retry\n        loopState.stuckRecoveryAttempts++;\n        const artifactExists = verifyExpectedArtifact(\n          unitType,\n          unitId,\n          s.basePath,\n        );\n        if (artifactExists) {\n          if (unitType === \"complete-milestone\") {\n            const stuckDiag = diagnoseExpectedArtifact(unitType, unitId, s.basePath);\n            const stuckParts = [\n              `Detected ${unitType} ${unitId} output on disk, but the same unit is still being derived.`,\n              \"This usually means the milestone summary exists while the DB row still does not mark the milestone complete.\",\n            ];\n            if (stuckDiag) stuckParts.push(`Expected: ${stuckDiag}`);\n            ctx.ui.notify(stuckParts.join(\" \"), \"warning\");\n            await deps.pauseAuto(ctx, pi);\n            return { action: \"break\", reason: \"complete-milestone-artifact-db-mismatch\" };\n          }\n          debugLog(\"autoLoop\", {\n            phase: \"stuck-recovery\",\n            level: 1,\n            action: \"artifact-found\",\n          });\n          const recoveryDb = refreshRecoveryDbForArtifact(unitType, unitId);\n          if (!recoveryDb.ok) {\n            ctx.ui.notify(\n              recoveryDb.fatal\n                ? `${recoveryDb.message} Pausing auto-mode for manual recovery.`\n                : `${recoveryDb.message} Keeping stuck state for retry.`,\n              \"warning\",\n            );\n            if (recoveryDb.fatal) {\n              await deps.pauseAuto(ctx, pi);\n              return { action: \"break\", reason: recoveryDb.reason };\n            }\n            return { action: \"continue\" };\n          }\n          ctx.ui.notify(\n            `Stuck recovery: artifact for ${unitType} ${unitId} found on disk. Invalidating caches.`,\n            \"info\",\n          );\n          deps.invalidateAllCaches();\n          loopState.recentUnits.length = 0;\n          return { action: \"continue\" };\n        }\n        ctx.ui.notify(\n          `Stuck on ${unitType} ${unitId} (${stuckSignal.reason}). Invalidating caches and retrying.`,\n          \"warning\",\n        );\n        deps.invalidateAllCaches();\n      } else {\n        // Level 2: hard stop — genuinely stuck\n        deps.invalidateAllCaches();\n        const artifactExists = verifyExpectedArtifact(\n          unitType,\n          unitId,\n          s.basePath,\n        );\n        if (artifactExists && unitType !== \"complete-milestone\") {\n          debugLog(\"autoLoop\", {\n            phase: \"stuck-recovery\",\n            level: 2,\n            action: \"artifact-found\",\n          });\n          const recoveryDb = refreshRecoveryDbForArtifact(unitType, unitId);\n          if (recoveryDb.ok) {\n            ctx.ui.notify(\n              `Stuck recovery: artifact for ${unitType} ${unitId} found on disk after cache invalidation. Continuing.`,\n              \"info\",\n            );\n            loopState.recentUnits.length = 0;\n            loopState.stuckRecoveryAttempts = 0;\n            return { action: \"continue\" };\n          }\n          ctx.ui.notify(\n            recoveryDb.fatal\n              ? `${recoveryDb.message} Pausing auto-mode for manual recovery.`\n              : `${recoveryDb.message} Stopping for manual recovery.`,\n            \"warning\",\n          );\n          if (recoveryDb.fatal) {\n            await deps.pauseAuto(ctx, pi);\n            return { action: \"break\", reason: recoveryDb.reason };\n          }\n        }\n        debugLog(\"autoLoop\", {\n          phase: \"stuck-detected\",\n          unitType,\n          unitId,\n          reason: stuckSignal.reason,\n        });\n        const stuckDiag = diagnoseExpectedArtifact(unitType, unitId, s.basePath);\n        const stuckRemediation = buildLoopRemediationSteps(unitType, unitId, s.basePath);\n        const stuckParts = [`Stuck on ${unitType} ${unitId} — ${stuckSignal.reason}.`];\n        if (stuckDiag) stuckParts.push(`Expected: ${stuckDiag}`);\n        if (stuckRemediation) stuckParts.push(`To recover:\\n${stuckRemediation}`);\n        ctx.ui.notify(stuckParts.join(\" \"), \"error\");\n        await deps.stopAuto(\n          ctx,\n          pi,\n          `Stuck: ${stuckSignal.reason}`,\n        );\n        return { action: \"break\", reason: \"stuck-detected\" };\n      }\n  } else {\n    // Progress detected — reset recovery counter\n    if (loopState.stuckRecoveryAttempts > 0) {\n      debugLog(\"autoLoop\", {\n        phase: \"stuck-counter-reset\",\n        from: loopState.recentUnits[loopState.recentUnits.length - 2]?.key ?? \"\",\n        to: derivedKey,\n      });\n      loopState.stuckRecoveryAttempts = 0;\n    }\n  }\n\n  return {\n    action: \"next\",\n    data: {\n      unitType, unitId, prompt, finalPrompt: prompt,\n      pauseAfterUatDispatch,\n      state, mid, midTitle,\n      isRetry: false, previousTier: undefined,\n      hookModelOverride: preDispatchResult.model,\n    },\n  };\n}\n\n// ─── runGuards ────────────────────────────────────────────────────────────────\n\n/**\n * Phase 2: Guards — stop directives, budget ceiling, context window, secrets re-check.\n * Returns break to exit the loop, or next to proceed to dispatch.\n */\nexport async function runGuards(\n  ic: IterationContext,\n  mid: string,\n): Promise<PhaseResult> {\n  const { ctx, pi, s, deps, prefs } = ic;\n\n  // ── Stop/Backtrack directive guard (#3487) ──\n  // Check for unexecuted stop or backtrack captures BEFORE dispatching any unit.\n  // This ensures user \"halt\" directives are honored immediately.\n  // IMPORTANT: Fail-closed — any exception during stop handling still breaks the loop\n  // to ensure user halt intent is never silently dropped.\n  try {\n    const { loadStopCaptures, markCaptureExecuted } = await import(\"../captures.js\");\n    const stopCaptures = loadStopCaptures(s.basePath);\n    if (stopCaptures.length > 0) {\n      const first = stopCaptures[0];\n      const isBacktrack = first.classification === \"backtrack\";\n      const label = isBacktrack\n        ? `Backtrack directive: ${first.text}`\n        : `Stop directive: ${first.text}`;\n\n      ctx.ui.notify(label, \"warning\");\n      deps.sendDesktopNotification(\n        \"GSD\", label, \"warning\", \"stop-directive\",\n        basename(s.originalBasePath || s.basePath),\n      );\n\n      // Pause first — ensures auto-mode stops even if later steps fail\n      await deps.pauseAuto(ctx, pi);\n\n      // For backtrack captures, write the backtrack trigger after pausing\n      if (isBacktrack) {\n        try {\n          const { executeBacktrack } = await import(\"../triage-resolution.js\");\n          executeBacktrack(s.basePath, mid, first);\n        } catch (e) {\n          debugLog(\"guards\", { phase: \"backtrack-execution-error\", error: String(e) });\n        }\n      }\n\n      // Mark captures as executed only after successful pause/transition\n      for (const cap of stopCaptures) {\n        markCaptureExecuted(s.basePath, cap.id);\n      }\n\n      debugLog(\"autoLoop\", { phase: \"exit\", reason: isBacktrack ? \"user-backtrack\" : \"user-stop\" });\n      return { action: \"break\", reason: isBacktrack ? \"user-backtrack\" : \"user-stop\" };\n    }\n  } catch (e) {\n    // Fail-closed: if anything in the stop guard throws, break the loop\n    // rather than silently continuing and dropping user halt intent\n    debugLog(\"guards\", { phase: \"stop-guard-error\", error: String(e) });\n    return { action: \"break\", reason: \"stop-guard-error\" };\n  }\n\n  // Budget ceiling guard\n  const budgetCeiling = prefs?.budget_ceiling;\n  if (budgetCeiling !== undefined && budgetCeiling > 0) {\n    const currentLedger = deps.getLedger() as { units: unknown } | null;\n    // In parallel worker mode, only count cost from the current auto-mode session\n    // to avoid hitting the ceiling due to historical project-wide spend (#2184).\n    let costUnits = currentLedger?.units;\n    if (process.env.GSD_PARALLEL_WORKER && s.autoStartTime && Array.isArray(costUnits)) {\n      const sessionStartISO = new Date(s.autoStartTime).toISOString();\n      costUnits = costUnits.filter(\n        (u: { startedAt?: string }) => u.startedAt != null && u.startedAt >= sessionStartISO,\n      );\n    }\n    const totalCost = costUnits\n      ? deps.getProjectTotals(costUnits).cost\n      : 0;\n    const budgetPct = totalCost / budgetCeiling;\n    const budgetAlertLevel = deps.getBudgetAlertLevel(budgetPct);\n    const newBudgetAlertLevel = deps.getNewBudgetAlertLevel(\n      s.lastBudgetAlertLevel,\n      budgetPct,\n    );\n    const enforcement = prefs?.budget_enforcement ?? \"pause\";\n    const budgetEnforcementAction = deps.getBudgetEnforcementAction(\n      enforcement,\n      budgetPct,\n    );\n\n    // Data-driven threshold check — loop descending, fire first match\n    const threshold = BUDGET_THRESHOLDS.find(\n      (t) => newBudgetAlertLevel >= t.pct,\n    );\n    if (threshold) {\n      s.lastBudgetAlertLevel =\n        newBudgetAlertLevel as AutoSession[\"lastBudgetAlertLevel\"];\n\n      // Emit Layer 2 budget_threshold event (post-plan hook recommendation).\n      // Extensions / Layer 0 shell hooks may return an action override.\n      let hookAction: \"pause\" | \"downgrade\" | \"continue\" | undefined;\n      try {\n        const { emitBudgetThreshold } = await import(\"../hook-emitter.js\");\n        const hookResult = await emitBudgetThreshold({\n          fraction: budgetPct,\n          spent: totalCost,\n          limit: budgetCeiling,\n        });\n        if (hookResult?.action) hookAction = hookResult.action;\n      } catch (hookErr) {\n        logWarning(\"engine\", `budget_threshold hook emission failed: ${(hookErr as Error).message}`);\n      }\n\n      // Apply hook override to enforcement action. \"continue\" → \"none\" (no enforcement),\n      // \"pause\" and \"downgrade\" map to the matching enforcement path below.\n      let effectiveAction = budgetEnforcementAction;\n      if (hookAction === \"continue\") {\n        effectiveAction = \"none\";\n      } else if (hookAction === \"pause\") {\n        effectiveAction = \"pause\";\n      } else if (hookAction === \"downgrade\") {\n        effectiveAction = \"warn\";\n      }\n\n      if (threshold.pct === 100 && effectiveAction !== \"none\") {\n        // 100% — special enforcement logic (halt/pause/warn)\n        const msg = `Budget ceiling ${deps.formatCost(budgetCeiling)} reached (spent ${deps.formatCost(totalCost)}).`;\n        if (effectiveAction === \"halt\") {\n          deps.sendDesktopNotification(\"GSD\", msg, \"error\", \"budget\", basename(s.originalBasePath || s.basePath));\n          await deps.stopAuto(ctx, pi, \"Budget ceiling reached\");\n          debugLog(\"autoLoop\", { phase: \"exit\", reason: \"budget-halt\" });\n          return { action: \"break\", reason: \"budget-halt\" };\n        }\n        if (effectiveAction === \"pause\") {\n          ctx.ui.notify(\n            `${msg} Pausing auto-mode — /gsd auto to override and continue.`,\n            \"warning\",\n          );\n          deps.sendDesktopNotification(\"GSD\", msg, \"warning\", \"budget\", basename(s.originalBasePath || s.basePath));\n          deps.logCmuxEvent(prefs, msg, \"warning\");\n          await deps.pauseAuto(ctx, pi);\n          debugLog(\"autoLoop\", { phase: \"exit\", reason: \"budget-pause\" });\n          return { action: \"break\", reason: \"budget-pause\" };\n        }\n        ctx.ui.notify(`${msg} Continuing (enforcement: warn).`, \"warning\");\n        deps.sendDesktopNotification(\"GSD\", msg, \"warning\", \"budget\", basename(s.originalBasePath || s.basePath));\n        deps.logCmuxEvent(prefs, msg, \"warning\");\n      } else if (threshold.pct < 100) {\n        // Sub-100% — simple notification\n        const msg = `${threshold.label}: ${deps.formatCost(totalCost)} / ${deps.formatCost(budgetCeiling)}`;\n        ctx.ui.notify(msg, threshold.notifyLevel);\n        deps.sendDesktopNotification(\n          \"GSD\",\n          msg,\n          threshold.notifyLevel,\n          \"budget\",\n          basename(s.originalBasePath || s.basePath),\n        );\n        deps.logCmuxEvent(prefs, msg, threshold.cmuxLevel);\n      }\n    } else if (budgetAlertLevel === 0) {\n      s.lastBudgetAlertLevel = 0;\n    }\n  } else {\n    s.lastBudgetAlertLevel = 0;\n  }\n\n  // Context window guard\n  const contextThreshold = prefs?.context_pause_threshold ?? 0;\n  if (contextThreshold > 0 && s.cmdCtx) {\n    const contextUsage = s.cmdCtx.getContextUsage();\n    if (\n      contextUsage &&\n      contextUsage.percent !== null &&\n      contextUsage.percent >= contextThreshold\n    ) {\n      const msg = `Context window at ${contextUsage.percent}% (threshold: ${contextThreshold}%). Pausing to prevent truncated output.`;\n      ctx.ui.notify(\n        `${msg} Run /gsd auto to continue (will start fresh session).`,\n        \"warning\",\n      );\n      deps.sendDesktopNotification(\n        \"GSD\",\n        `Context ${contextUsage.percent}% — paused`,\n        \"warning\",\n        \"attention\",\n        basename(s.originalBasePath || s.basePath),\n      );\n      await deps.pauseAuto(ctx, pi);\n      debugLog(\"autoLoop\", { phase: \"exit\", reason: \"context-window\" });\n      return { action: \"break\", reason: \"context-window\" };\n    }\n  }\n\n  // Secrets re-check gate\n  try {\n    const manifestStatus = await deps.getManifestStatus(s.basePath, mid, s.originalBasePath);\n    if (manifestStatus && manifestStatus.pending.length > 0) {\n      const result = await deps.collectSecretsFromManifest(\n        s.basePath,\n        mid,\n        ctx,\n      );\n      if (\n        result &&\n        result.applied &&\n        result.skipped &&\n        result.existingSkipped\n      ) {\n        ctx.ui.notify(\n          `Secrets collected: ${result.applied.length} applied, ${result.skipped.length} skipped, ${result.existingSkipped.length} already set.`,\n          \"info\",\n        );\n      } else {\n        ctx.ui.notify(\"Secrets collection skipped.\", \"info\");\n      }\n    }\n  } catch (err) {\n    ctx.ui.notify(\n      `Secrets collection error: ${err instanceof Error ? err.message : String(err)}. Continuing with next task.`,\n      \"warning\",\n    );\n  }\n\n  return { action: \"next\", data: undefined as void };\n}\n\n// ─── runUnitPhase ─────────────────────────────────────────────────────────────\n\n/**\n * Phase 4: Unit execution — dispatch prompt, await agent_end, closeout, artifact verify.\n * Returns break or next with unitStartedAt for downstream phases.\n */\nexport async function runUnitPhase(\n  ic: IterationContext,\n  iterData: IterationData,\n  loopState: LoopState,\n  sidecarItem?: SidecarItem,\n): Promise<PhaseResult<{ unitStartedAt?: number; requestDispatchedAt?: number }>> {\n  const { ctx, pi, s, deps, prefs } = ic;\n  const { unitType, unitId, prompt, state, mid } = iterData;\n\n  debugLog(\"autoLoop\", {\n    phase: \"unit-execution\",\n    iteration: ic.iteration,\n    unitType,\n    unitId,\n  });\n\n  const worktreeSafetyBlock = await validateSourceWriteWorktreeSafety(\n    ic,\n    unitType,\n    unitId,\n    mid,\n    \"unit-execution\",\n  );\n  if (worktreeSafetyBlock) return worktreeSafetyBlock;\n\n  // ── Project classification notice (#1833, #1843) ─────────────────────\n  // Worktree Safety owns source-write root validity. Classification now only\n  // shapes user/model guidance for valid roots.\n  let projectClassification: ReturnType<typeof classifyProject> | null = null;\n  if (s.basePath && unitType === \"execute-task\") {\n    projectClassification = classifyProject(s.basePath);\n    if (projectClassification.kind === \"invalid-repo\") {\n      const msg = `Worktree health check failed: ${s.basePath} classified as invalid-repo (${projectClassification.reason}) — refusing to dispatch ${unitType} ${unitId}`;\n      debugLog(\"runUnitPhase\", { phase: \"worktree-health-invalid-repo\", basePath: s.basePath, classification: projectClassification });\n      const hasGit = deps.existsSync(join(s.basePath, \".git\"));\n      if (_shouldProceedWithInvalidRepoClassificationForTest(projectClassification.reason, hasGit)) {\n        ctx.ui.notify(\n          `Warning: ${s.basePath} project classification could not confirm .git; assuming it has no project content yet — proceeding as greenfield project because worktree health reported .git present`,\n          \"warning\",\n        );\n      } else {\n        ctx.ui.notify(msg, \"error\");\n        await deps.stopAuto(ctx, pi, msg);\n        return { action: \"break\", reason: \"worktree-invalid\" };\n      }\n    }\n\n    if (projectClassification.kind === \"greenfield\") {\n      debugLog(\"runUnitPhase\", { phase: \"worktree-health-greenfield\", basePath: s.basePath, classification: projectClassification });\n      ctx.ui.notify(`Warning: ${s.basePath} has no project content yet — proceeding as greenfield project`, \"warning\");\n    } else if (projectClassification.kind === \"untyped-existing\") {\n      debugLog(\"runUnitPhase\", { phase: \"worktree-health-untyped-existing\", basePath: s.basePath, classification: projectClassification });\n      ctx.ui.notify(\n        `Notice: ${s.basePath} has existing project content but no recognized tooling markers — using generic file-level workflow guidance`,\n        \"info\",\n      );\n    }\n  }\n\n  // Detect retry and capture previous tier for escalation\n  const isRetry = !!(\n    s.currentUnit &&\n    s.currentUnit.type === unitType &&\n    s.currentUnit.id === unitId\n  );\n  const previousTier = s.currentUnitRouting?.tier;\n  const dispatchKey = `${unitType}/${unitId}`;\n  const nextDispatchCount = (s.unitDispatchCount.get(dispatchKey) ?? 0) + 1;\n\n  // Status bar (widget + preconditions deferred until after model selection — see #2899)\n  ctx.ui.setStatus(\"gsd-auto\", \"auto\");\n  if (mid)\n    deps.updateSliceProgressCache(s.basePath, mid, state.activeSlice?.id);\n\n  // ── Safety harness: reset evidence + create checkpoint ──\n  const safetyConfig = resolveSafetyHarnessConfig(\n    prefs?.safety_harness as Record<string, unknown> | undefined,\n  );\n  if (safetyConfig.enabled && safetyConfig.evidence_collection) {\n    resetEvidence();\n    // Restore persisted evidence so session-restart resumes don't produce\n    // false-positive \"no bash calls\" warnings (Bug #4385).\n    if (s.basePath && unitType === \"execute-task\") {\n      const { milestone: eMid, slice: eSid, task: eTid } = parseUnitId(unitId);\n      if (eMid && eSid && eTid) {\n        loadEvidenceFromDisk(s.basePath, eMid, eSid, eTid);\n      }\n    }\n  }\n  // Only checkpoint code-executing units (not lifecycle/planning units)\n  if (safetyConfig.enabled && safetyConfig.checkpoints && unitType === \"execute-task\") {\n    s.checkpointSha = createCheckpoint(s.basePath, unitId);\n    if (s.checkpointSha) {\n      debugLog(\"runUnitPhase\", { phase: \"checkpoint-created\", unitId, sha: s.checkpointSha.slice(0, 8) });\n    }\n  }\n\n  // Prompt injection\n  let finalPrompt = prompt;\n\n  if (unitType === \"execute-task\") {\n    projectClassification ??= classifyProject(s.basePath);\n    if (projectClassification.kind === \"untyped-existing\") {\n      const samples = projectClassification.contentFiles.slice(0, 8).join(\", \") || \"project files\";\n      finalPrompt +=\n        \"\\n\\n**Project classification:** Existing untyped project. No recognized build/tooling markers were detected, \" +\n        \"so use generic file-level workflow guidance. Task plans and completion summaries must list every concrete \" +\n        `project file changed in \\`files\\` or \\`expected_output\\`. Detected content sample: ${samples}.`;\n    }\n  }\n\n  if (s.pendingVerificationRetry) {\n    const retryCtx = s.pendingVerificationRetry;\n    s.pendingVerificationRetry = null;\n    const capped =\n      retryCtx.failureContext.length > MAX_RECOVERY_CHARS\n        ? retryCtx.failureContext.slice(0, MAX_RECOVERY_CHARS) +\n          \"\\n\\n[...failure context truncated]\"\n        : retryCtx.failureContext;\n    finalPrompt = `**VERIFICATION FAILED — AUTO-FIX ATTEMPT ${retryCtx.attempt}**\\n\\nThe verification gate ran after your previous attempt and found failures. Fix these issues before completing the task.\\n\\n${capped}\\n\\n---\\n\\n${finalPrompt}`;\n  }\n\n  if (s.pendingCrashRecovery) {\n    const capped =\n      s.pendingCrashRecovery.length > MAX_RECOVERY_CHARS\n        ? s.pendingCrashRecovery.slice(0, MAX_RECOVERY_CHARS) +\n          \"\\n\\n[...recovery briefing truncated to prevent memory exhaustion]\"\n        : s.pendingCrashRecovery;\n    finalPrompt = `${capped}\\n\\n---\\n\\n${finalPrompt}`;\n    s.pendingCrashRecovery = null;\n  } else if (nextDispatchCount > 1) {\n    const diagnostic = deps.getDeepDiagnostic(s.basePath);\n    if (diagnostic) {\n      const cappedDiag =\n        diagnostic.length > MAX_RECOVERY_CHARS\n          ? diagnostic.slice(0, MAX_RECOVERY_CHARS) +\n            \"\\n\\n[...diagnostic truncated to prevent memory exhaustion]\"\n          : diagnostic;\n      finalPrompt = `**RETRY — your previous attempt did not produce the required artifact.**\\n\\nDiagnostic from previous attempt:\\n${cappedDiag}\\n\\nFix whatever went wrong and make sure you write the required file this time.\\n\\n---\\n\\n${finalPrompt}`;\n    }\n  }\n\n  // Prompt char measurement\n  s.lastPromptCharCount = finalPrompt.length;\n  s.lastBaselineCharCount = undefined;\n  if (deps.isDbAvailable()) {\n    try {\n      const { inlineGsdRootFile } = await importExtensionModule<typeof import(\"../auto-prompts.js\")>(import.meta.url, \"../auto-prompts.js\");\n      const [decisionsContent, requirementsContent, projectContent] =\n        await Promise.all([\n          inlineGsdRootFile(s.basePath, \"decisions.md\", \"Decisions\"),\n          inlineGsdRootFile(s.basePath, \"requirements.md\", \"Requirements\"),\n          inlineGsdRootFile(s.basePath, \"project.md\", \"Project\"),\n        ]);\n      s.lastBaselineCharCount =\n        (decisionsContent?.length ?? 0) +\n        (requirementsContent?.length ?? 0) +\n        (projectContent?.length ?? 0);\n    } catch (e) {\n      logWarning(\"engine\", \"Baseline char count measurement failed\", { error: String(e) });\n    }\n  }\n\n  // Cache-optimize prompt section ordering\n  try {\n    finalPrompt = deps.reorderForCaching(finalPrompt);\n  } catch (reorderErr) {\n    const msg =\n      reorderErr instanceof Error ? reorderErr.message : String(reorderErr);\n    logWarning(\"engine\", \"Prompt reorder failed\", { error: msg });\n  }\n\n  // Select and apply model (with tier escalation on retry — normal units only)\n  const prevUnitRouting = s.currentUnitRouting;\n  const prevUnitModel = s.currentUnitModel;\n  const prevDispatchedModelId = s.currentDispatchedModelId;\n  const prevSessionModel = ctx.model;\n  const prevSessionThinkingLevel = pi.getThinkingLevel();\n  const modelResult = await deps.selectAndApplyModel(\n    ctx,\n    pi,\n    unitType,\n    unitId,\n    s.basePath,\n    prefs,\n    s.verbose,\n    s.autoModeStartModel,\n    sidecarItem ? undefined : { isRetry, previousTier },\n    undefined,\n    s.manualSessionModelOverride,\n    s.autoModeStartThinkingLevel,\n  );\n  s.currentUnitRouting =\n    modelResult.routing as AutoSession[\"currentUnitRouting\"];\n  s.currentUnitModel =\n    modelResult.appliedModel as AutoSession[\"currentUnitModel\"];\n\n  // Apply sidecar/pre-dispatch hook model override (takes priority over standard model selection)\n  const hookModelOverride = sidecarItem?.model ?? iterData.hookModelOverride;\n  if (hookModelOverride) {\n    const availableModels = ctx.modelRegistry.getAvailable();\n    const match = deps.resolveModelId(hookModelOverride, availableModels, ctx.model?.provider);\n    if (match) {\n      const ok = await pi.setModel(match, { persist: false });\n      if (ok) {\n        if (s.autoModeStartThinkingLevel) {\n          pi.setThinkingLevel(s.autoModeStartThinkingLevel);\n        }\n        s.currentUnitModel = match as AutoSession[\"currentUnitModel\"];\n        ctx.ui.notify(`Hook model override: ${match.provider}/${match.id}`, \"info\");\n      } else {\n        ctx.ui.notify(\n          `Hook model \"${hookModelOverride}\" found but setModel failed. Using default.`,\n          \"warning\",\n        );\n      }\n    } else {\n      ctx.ui.notify(\n        `Hook model \"${hookModelOverride}\" not found in available models. Falling back to current session model. ` +\n        `Ensure the model is defined in models.json and has auth configured.`,\n        \"warning\",\n      );\n    }\n  }\n\n  // Store the final dispatched model ID so the dashboard can read it (#2899).\n  // This accounts for hook model overrides applied after selectAndApplyModel.\n  s.currentDispatchedModelId = s.currentUnitModel\n    ? `${(s.currentUnitModel as any).provider ?? \"\"}/${(s.currentUnitModel as any).id ?? \"\"}`\n    : null;\n\n  const compatibilityError = getWorkflowTransportSupportError(\n    s.currentUnitModel?.provider ?? ctx.model?.provider,\n    getRequiredWorkflowToolsForAutoUnit(unitType),\n    {\n      projectRoot: s.basePath,\n      surface: \"auto-mode\",\n      unitType,\n      authMode: s.currentUnitModel?.provider\n        ? ctx.modelRegistry.getProviderAuthMode(s.currentUnitModel.provider)\n        : ctx.model?.provider\n          ? ctx.modelRegistry.getProviderAuthMode(ctx.model.provider)\n          : undefined,\n      baseUrl: (s.currentUnitModel as any)?.baseUrl ?? ctx.model?.baseUrl,\n      activeTools: typeof pi.getActiveTools === \"function\" ? pi.getActiveTools() : [],\n    },\n  );\n  if (compatibilityError) {\n    s.currentUnitRouting = prevUnitRouting;\n    s.currentUnitModel = prevUnitModel;\n    s.currentDispatchedModelId = prevDispatchedModelId;\n    if (s.checkpointSha) {\n      cleanupCheckpoint(s.basePath, unitId);\n      s.checkpointSha = null;\n    }\n    if (prevSessionModel) {\n      const ok = await pi.setModel(prevSessionModel, { persist: false });\n      if (!ok) {\n        ctx.ui.notify(\"Failed to restore previous session model after compatibility check failure.\", \"warning\");\n      }\n      if (prevSessionThinkingLevel) {\n        pi.setThinkingLevel(prevSessionThinkingLevel);\n      }\n    }\n    ctx.ui.notify(compatibilityError, \"error\");\n    await deps.stopAuto(ctx, pi, compatibilityError);\n    return { action: \"break\", reason: \"workflow-capability\" };\n  }\n\n  // Scope workflow-logger buffer to this unit so post-finalize drains are\n  // per-unit. Without this, the module-level _buffer accumulates across every\n  // unit in the same Node process (see workflow-logger.ts module header).\n  _resetLogs();\n  const unitStartedAt = Date.now();\n  s.unitDispatchCount.set(dispatchKey, nextDispatchCount);\n  s.currentUnit = { type: unitType, id: unitId, startedAt: unitStartedAt };\n  s.lastGitActionFailure = null;\n  s.lastGitActionStatus = null;\n  s.lastUnitAgentEndMessages = null;\n  setCurrentPhase(unitType, {\n    basePath: s.basePath,\n    traceId: ic.flowId,\n    turnId: `iter-${ic.iteration}`,\n    causedBy: \"unit-start\",\n  });\n  s.lastToolInvocationError = null; // #2883: clear stale error from previous unit\n  const unitStartSeq = ic.nextSeq();\n  deps.emitJournalEvent({ ts: new Date().toISOString(), flowId: ic.flowId, seq: unitStartSeq, eventType: \"unit-start\", data: { unitType, unitId } });\n  deps.captureAvailableSkills();\n  writeUnitRuntimeRecord(\n    s.basePath,\n    unitType,\n    unitId,\n    unitStartedAt,\n    {\n      phase: \"dispatched\",\n      wrapupWarningSent: false,\n      timeoutAt: null,\n      lastProgressAt: unitStartedAt,\n      progressCount: 0,\n      lastProgressKind: \"dispatch\",\n      recoveryAttempts: 0, // Reset so re-dispatched units get full recovery budget (#2322)\n    },\n  );\n\n  // Progress widget + preconditions — deferred to after model selection so the\n  // widget's first render tick shows the correct model (#2899).\n  deps.updateProgressWidget(ctx, unitType, unitId, state);\n  deps.ensurePreconditions(unitType, unitId, s.basePath, state);\n\n  // Start unit supervision\n  deps.clearUnitTimeout();\n  deps.startUnitSupervision({\n    s,\n    ctx,\n    pi,\n    unitType,\n    unitId,\n    prefs,\n    buildSnapshotOpts: () => deps.buildSnapshotOpts(unitType, unitId),\n    buildRecoveryContext: () => ({\n      basePath: s.basePath,\n      verbose: s.verbose,\n      currentUnitStartedAt: s.currentUnit?.startedAt ?? Date.now(),\n      unitRecoveryCount: s.unitRecoveryCount,\n    }),\n    pauseAuto: deps.pauseAuto,\n  });\n\n  // Write preliminary lock (no session path yet — runUnit creates a new session).\n  // Crash recovery can still identify the in-flight unit from this lock.\n  deps.writeLock(\n    deps.lockBase(),\n    unitType,\n    unitId,\n  );\n\n  debugLog(\"autoLoop\", {\n    phase: \"runUnit-start\",\n    iteration: ic.iteration,\n    unitType,\n    unitId,\n  });\n  const unitResult = await runUnit(\n    ctx,\n    pi,\n    s,\n    unitType,\n    unitId,\n    finalPrompt,\n  );\n  s.lastUnitAgentEndMessages = unitResult.event?.messages ?? null;\n  debugLog(\"autoLoop\", {\n    phase: \"runUnit-end\",\n    iteration: ic.iteration,\n    unitType,\n    unitId,\n    status: unitResult.status,\n  });\n\n  if (\n    unitResult.status === \"completed\" &&\n    s.currentUnit &&\n    (unitResult.event?.messages?.length ?? 0) === 0 &&\n    isSuspiciousGhostCompletion(ctx, unitResult.requestDispatchedAt ?? s.currentUnit.startedAt)\n  ) {\n    const message =\n      `${unitType} ${unitId} completed without assistant output or tool calls; treating as a stale ghost completion.`;\n    debugLog(\"autoLoop\", {\n      phase: \"ghost-completion\",\n      iteration: ic.iteration,\n      unitType,\n      unitId,\n      elapsedMs: Date.now() - (unitResult.requestDispatchedAt ?? s.currentUnit.startedAt),\n    });\n    logWarning(\"engine\", message);\n    ctx.ui.notify(`${message} Pausing auto-mode before closeout side effects.`, \"warning\");\n    await emitCancelledUnitEnd(ic, unitType, unitId, unitStartSeq, {\n      message,\n      category: \"unknown\",\n      isTransient: true,\n    });\n    s.currentUnit = null;\n    await deps.pauseAuto(ctx, pi);\n    return { action: \"break\", reason: \"ghost-completion\" };\n  }\n\n  // Now that runUnit has called newSession(), the session file path is correct.\n  const sessionFile = deps.getSessionFile(ctx);\n  deps.updateSessionLock(\n    deps.lockBase(),\n    unitType,\n    unitId,\n    sessionFile,\n  );\n  deps.writeLock(\n    deps.lockBase(),\n    unitType,\n    unitId,\n    sessionFile,\n  );\n\n  // Tag the most recent window entry with error info for stuck detection\n  const lastEntry = loopState.recentUnits[loopState.recentUnits.length - 1];\n  if (lastEntry) {\n    if (unitResult.errorContext) {\n      lastEntry.error = `${unitResult.errorContext.category}:${unitResult.errorContext.message}`.slice(0, 200);\n    } else if (unitResult.status === \"error\" || unitResult.status === \"cancelled\") {\n      lastEntry.error = `${unitResult.status}:${unitType}/${unitId}`;\n    } else if (unitResult.event?.messages?.length) {\n      const lastMsg = unitResult.event.messages[unitResult.event.messages.length - 1];\n      const msgStr = typeof lastMsg === \"string\" ? lastMsg : JSON.stringify(lastMsg);\n      if (/error|fail|exception/i.test(msgStr)) {\n        lastEntry.error = msgStr.slice(0, 200);\n      }\n    }\n  }\n\n  if (unitResult.status === \"cancelled\") {\n    const errorCategory = unitResult.errorContext?.category;\n    // Provider-error pause: agent_end recovery normally pauses before this\n    // branch. Provider readiness failures happen before dispatch, so pause here\n    // if nothing upstream already did.\n    if (errorCategory === \"provider\") {\n      if (!s.paused) {\n        const detail = unitResult.errorContext?.message ?? `Provider unavailable for ${unitType} ${unitId}`;\n        await pauseAutoForProviderError(\n          ctx.ui,\n          detail,\n          () => deps.pauseAuto(ctx, pi),\n          {\n            isRateLimit: false,\n            isTransient: Boolean(unitResult.errorContext?.isTransient),\n            retryAfterMs: unitResult.errorContext?.retryAfterMs,\n          },\n        );\n      }\n      await emitCancelledUnitEnd(ic, unitType, unitId, unitStartSeq, unitResult.errorContext);\n      debugLog(\"autoLoop\", { phase: \"exit\", reason: \"provider-pause\", isTransient: unitResult.errorContext?.isTransient });\n      return { action: \"break\", reason: \"provider-pause\" };\n    }\n    // Timeout category covers two distinct scenarios:\n    //   1. Session creation timeout (120s) — transient, auto-resume with backoff\n    //   2. Unit hard timeout (30min+) — stuck agent, pause for manual review\n    // Transient session-failed covers recoverable newSession failures and should\n    // pause instead of hard-stopping.\n    // Structural errors (TypeError, is not a function) are NOT transient\n    // and must hard-stop to avoid infinite retry loops.\n    if (\n      unitResult.errorContext?.isTransient &&\n      errorCategory === \"timeout\"\n    ) {\n      const isSessionCreationTimeout = unitResult.errorContext.message?.includes(\"Session creation timed out\");\n\n      if (isSessionCreationTimeout) {\n        consecutiveSessionTimeouts += 1;\n        const baseRetryAfterMs = 30_000;\n        const retryAfterMs = baseRetryAfterMs * 2 ** Math.max(0, consecutiveSessionTimeouts - 1);\n        const allowAutoResume = consecutiveSessionTimeouts <= MAX_SESSION_TIMEOUT_AUTO_RESUMES;\n\n        if (!allowAutoResume) {\n          ctx.ui.notify(\n            `Session creation timed out ${consecutiveSessionTimeouts} consecutive times for ${unitType} ${unitId}. Pausing for manual review.`,\n            \"warning\",\n          );\n        }\n\n        debugLog(\"autoLoop\", {\n          phase: \"session-timeout-pause\",\n          unitType, unitId,\n          consecutiveSessionTimeouts,\n          retryAfterMs,\n          allowAutoResume,\n        });\n\n        const errorDetail = ` for ${unitType} ${unitId}`;\n        await pauseAutoForProviderError(\n          ctx.ui,\n          errorDetail,\n          () => deps.pauseAuto(ctx, pi),\n          {\n            isRateLimit: false,\n            isTransient: allowAutoResume,\n            retryAfterMs,\n            resume: allowAutoResume\n              ? () => {\n                  void resumeAutoAfterProviderDelay(pi, ctx).catch((err) => {\n                    const message = err instanceof Error ? err.message : String(err);\n                    ctx.ui.notify(\n                      `Session timeout recovery failed: ${message}`,\n                      \"error\",\n                    );\n                  });\n                }\n              : undefined,\n          },\n        );\n        await deps.autoCommitUnit?.(s.basePath, unitType, unitId, ctx);\n        await emitCancelledUnitEnd(ic, unitType, unitId, unitStartSeq, unitResult.errorContext);\n        return { action: \"break\", reason: \"session-timeout\" };\n      }\n\n      // Unit hard timeout (30min+): pause without auto-resume — stuck agent\n      ctx.ui.notify(\n        `Unit timed out for ${unitType} ${unitId} (supervision may have failed). Pausing auto-mode.`,\n        \"warning\",\n      );\n      debugLog(\"autoLoop\", { phase: \"unit-hard-timeout-pause\", unitType, unitId });\n      await deps.pauseAuto(ctx, pi);\n      await deps.autoCommitUnit?.(s.basePath, unitType, unitId, ctx);\n      await emitCancelledUnitEnd(ic, unitType, unitId, unitStartSeq, unitResult.errorContext);\n      return { action: \"break\", reason: \"unit-hard-timeout\" };\n    }\n    if (\n      unitResult.errorContext?.isTransient &&\n      errorCategory === \"session-failed\"\n    ) {\n      ctx.ui.notify(\n        `Session creation failed transiently for ${unitType} ${unitId}: ${unitResult.errorContext?.message ?? \"unknown\"}. Pausing auto-mode (recoverable).`,\n        \"warning\",\n      );\n      debugLog(\"autoLoop\", { phase: \"session-start-transient-pause\", unitType, unitId, category: errorCategory });\n      await deps.pauseAuto(ctx, pi);\n      await deps.autoCommitUnit?.(s.basePath, unitType, unitId, ctx);\n      await emitCancelledUnitEnd(ic, unitType, unitId, unitStartSeq, unitResult.errorContext);\n      return { action: \"break\", reason: \"session-timeout\" };\n    }\n    if (\n      unitResult.errorContext?.isTransient &&\n      errorCategory === \"aborted\"\n    ) {\n      ctx.ui.notify(\n        `Unit ${unitType} ${unitId} was aborted by the user. Pausing auto-mode (recoverable).`,\n        \"warning\",\n      );\n      debugLog(\"autoLoop\", { phase: \"unit-aborted-transient-pause\", unitType, unitId, category: errorCategory });\n      await deps.pauseAuto(ctx, pi);\n      await deps.autoCommitUnit?.(s.basePath, unitType, unitId, ctx);\n      await emitCancelledUnitEnd(ic, unitType, unitId, unitStartSeq, unitResult.errorContext);\n      return { action: \"break\", reason: \"unit-aborted-pause\" };\n    }\n    // All other cancelled states (structural errors, non-transient failures): hard stop\n    if (s.currentUnit) {\n      await deps.closeoutUnit(\n        ctx,\n        s.basePath,\n        unitType,\n        unitId,\n        s.currentUnit.startedAt,\n        deps.buildSnapshotOpts(unitType, unitId),\n      );\n    }\n    await deps.autoCommitUnit?.(s.basePath, unitType, unitId, ctx);\n    await emitCancelledUnitEnd(ic, unitType, unitId, unitStartSeq, unitResult.errorContext);\n\n    const cancelledStop = _buildCancelledUnitStopReason(\n      unitType,\n      unitId,\n      unitResult.errorContext,\n    );\n    ctx.ui.notify(cancelledStop.notifyMessage, \"warning\");\n    await deps.stopAuto(ctx, pi, cancelledStop.stopReason);\n    debugLog(\"autoLoop\", { phase: \"exit\", reason: cancelledStop.loopReason });\n    return { action: \"break\", reason: cancelledStop.loopReason };\n  }\n\n  // ── Immediate unit closeout (metrics, activity log, memory) ────────\n  // Run right after runUnit() returns so telemetry is never lost to a\n  // crash between iterations.\n  // Guard: stopAuto() may have nulled s.currentUnit via s.reset() while\n  // this coroutine was suspended at `await runUnit(...)` (#2939).\n  if (s.currentUnit) {\n    // Reset session timeout counter — any successful unit clears the slate\n    consecutiveSessionTimeouts = 0;\n    await deps.closeoutUnit(\n      ctx,\n      s.basePath,\n      unitType,\n      unitId,\n      s.currentUnit.startedAt,\n      deps.buildSnapshotOpts(unitType, unitId),\n    );\n  }\n\n  // ── Zero tool-call guard (#1833, #2653) ──────────────────────────\n  // Any unit that completes with 0 tool calls made no real progress —\n  // likely context exhaustion where all tool calls errored out. Treat\n  // as failed so the unit is retried in a fresh context instead of\n  // silently passing through to artifact verification (which loops\n  // forever when the unit never produced its artifact).\n  {\n    const currentLedger = deps.getLedger() as { units: Array<{ type: string; id: string; startedAt: number; toolCalls: number }> } | null;\n    if (currentLedger?.units) {\n      const lastUnit = [...currentLedger.units].reverse().find(\n        (u: { type: string; id: string; startedAt: number; toolCalls: number }) => u.type === unitType && u.id === unitId && u.startedAt === _resolveCurrentUnitStartedAtForTest(s.currentUnit),\n      );\n      if (lastUnit && lastUnit.toolCalls === 0) {\n        if (USER_DRIVEN_DEEP_UNITS.has(unitType) && isAwaitingUserInput(s.lastUnitAgentEndMessages ?? undefined)) {\n          debugLog(\"runUnitPhase\", {\n            phase: \"zero-tool-calls-awaiting-user-input\",\n            unitType,\n            unitId,\n          });\n        } else {\n          debugLog(\"runUnitPhase\", {\n            phase: \"zero-tool-calls\",\n            unitType,\n            unitId,\n            warning: \"Unit completed with 0 tool calls — likely context exhaustion, marking as failed\",\n          });\n          ctx.ui.notify(\n            `${unitType} ${unitId} completed with 0 tool calls — context exhaustion, will retry`,\n            \"warning\",\n          );\n          // Fall through to next iteration where dispatch will re-derive\n          // and re-dispatch this unit.\n          return { action: \"next\", data: { unitStartedAt: _resolveCurrentUnitStartedAtForTest(s.currentUnit), requestDispatchedAt: unitResult.requestDispatchedAt } };\n        }\n      }\n    }\n  }\n\n  const skipArtifactVerification = unitType.startsWith(\"hook/\") || unitType === \"custom-step\";\n  const artifactVerified =\n    skipArtifactVerification ||\n    verifyExpectedArtifact(unitType, unitId, s.basePath);\n  if (s.currentUnitRouting) {\n    deps.recordOutcome(\n      unitType,\n      s.currentUnitRouting.tier as \"light\" | \"standard\" | \"heavy\",\n      artifactVerified,\n    );\n  }\n  if (artifactVerified) {\n    s.unitDispatchCount.delete(dispatchKey);\n    s.unitRecoveryCount.delete(`${unitType}/${unitId}`);\n  }\n\n  // Write phase handoff anchor after successful research/planning completion\n  const anchorPhases = new Set([\"research-milestone\", \"research-slice\", \"plan-milestone\", \"plan-slice\"]);\n  if (artifactVerified && mid && anchorPhases.has(unitType)) {\n    try {\n      const { writePhaseAnchor } = await import(\"../phase-anchor.js\");\n      writePhaseAnchor(s.basePath, mid, {\n        phase: unitType,\n        milestoneId: mid,\n        generatedAt: new Date().toISOString(),\n        intent: `Completed ${unitType} for ${unitId}`,\n        decisions: [],\n        blockers: [],\n        nextSteps: [],\n      });\n    } catch (err) { /* non-fatal — anchor is advisory */\n      logWarning(\"engine\", `phase anchor failed: ${err instanceof Error ? err.message : String(err)}`);\n    }\n  }\n\n  const unitEndStatus =\n    !artifactVerified && unitResult.status === \"completed\"\n      ? \"no-artifact\"\n      : unitResult.status;\n  deps.emitJournalEvent({ ts: new Date().toISOString(), flowId: ic.flowId, seq: ic.nextSeq(), eventType: \"unit-end\", data: { unitType, unitId, status: unitEndStatus, artifactVerified, ...(unitResult.errorContext ? { errorContext: unitResult.errorContext } : {}) }, causedBy: { flowId: ic.flowId, seq: unitStartSeq } });\n\n  // ── Safety harness: checkpoint cleanup or rollback ──\n  if (s.checkpointSha) {\n    if (unitResult.status === \"error\" && safetyConfig.auto_rollback) {\n      const rolled = rollbackToCheckpoint(s.basePath, unitId, s.checkpointSha);\n      if (rolled) {\n        ctx.ui.notify(`Rolled back to pre-unit checkpoint for ${unitId}`, \"info\");\n        debugLog(\"runUnitPhase\", { phase: \"checkpoint-rollback\", unitId });\n      }\n    } else if (unitResult.status === \"error\") {\n      ctx.ui.notify(\n        `Unit ${unitId} failed. Pre-unit checkpoint available at ${s.checkpointSha.slice(0, 8)}`,\n        \"warning\",\n      );\n    } else {\n      // Success — clean up checkpoint ref\n      cleanupCheckpoint(s.basePath, unitId);\n      debugLog(\"runUnitPhase\", { phase: \"checkpoint-cleaned\", unitId });\n    }\n    s.checkpointSha = null;\n  }\n\n  return { action: \"next\", data: { unitStartedAt: _resolveCurrentUnitStartedAtForTest(s.currentUnit), requestDispatchedAt: unitResult.requestDispatchedAt } };\n}\n\n// ─── runFinalize ──────────────────────────────────────────────────────────────\n\n/**\n * Phase 5: Post-unit finalize — pre/post verification, UAT pause, step-wizard.\n * Returns break/continue/next to control the outer loop.\n */\nexport async function runFinalize(\n  ic: IterationContext,\n  iterData: IterationData,\n  loopState: LoopState,\n  sidecarItem?: SidecarItem,\n): Promise<PhaseResult> {\n  const { ctx, pi, s, deps } = ic;\n  const { pauseAfterUatDispatch } = iterData;\n\n  debugLog(\"autoLoop\", { phase: \"finalize\", iteration: ic.iteration });\n\n  // Clear unit timeout (unit completed)\n  deps.clearUnitTimeout();\n\n  // Post-unit context for pre/post verification\n  const postUnitCtx: PostUnitContext = {\n    s,\n    ctx,\n    pi,\n    buildSnapshotOpts: deps.buildSnapshotOpts,\n    lockBase: deps.lockBase,\n    stopAuto: deps.stopAuto,\n    pauseAuto: deps.pauseAuto,\n    updateProgressWidget: deps.updateProgressWidget,\n  };\n\n  // Pre-verification processing (commit, doctor, state rebuild, etc.)\n  // Timeout guard: if postUnitPreVerification hangs (e.g., safety harness\n  // deadlock, browser teardown hang, worktree sync stall), force-continue\n  // after timeout so the auto-loop is not permanently frozen (#3757).\n  //\n  // On timeout, null out s.currentUnit so the timed-out task's late async\n  // mutations are harmless — postUnitPreVerification guards all side effects\n  // behind `if (s.currentUnit)`. The next iteration sets a fresh currentUnit.\n  // Sidecar items use lightweight pre-verification opts\n  const preVerificationOpts: PreVerificationOpts = sidecarItem\n    ? sidecarItem.kind === \"hook\"\n      ? { skipSettleDelay: true, skipWorktreeSync: true, agentEndMessages: s.lastUnitAgentEndMessages ?? undefined }\n      : { skipSettleDelay: true, agentEndMessages: s.lastUnitAgentEndMessages ?? undefined }\n    : { agentEndMessages: s.lastUnitAgentEndMessages ?? undefined };\n  const preUnitSnapshot = s.currentUnit\n    ? { type: s.currentUnit.type, id: s.currentUnit.id, startedAt: s.currentUnit.startedAt }\n    : null;\n  s.currentUnit = null;\n  clearCurrentPhase();\n  const preResultGuard = await withTimeout(\n    deps.postUnitPreVerification(postUnitCtx, preVerificationOpts),\n    FINALIZE_PRE_TIMEOUT_MS,\n    \"postUnitPreVerification\",\n  );\n\n  if (preResultGuard.timedOut) {\n    return failClosedOnFinalizeTimeout(\n      ic,\n      iterData,\n      loopState,\n      \"pre\",\n      preUnitSnapshot?.startedAt ?? Date.now(),\n    );\n  }\n\n  const preResult = preResultGuard.value;\n  if (preResult === \"dispatched\") {\n    const dispatchedReason = s.lastGitActionFailure\n      ? \"git-closeout-failure\"\n      : \"pre-verification-dispatched\";\n    debugLog(\"autoLoop\", {\n      phase: \"exit\",\n      reason: dispatchedReason,\n      gitError: s.lastGitActionFailure ?? undefined,\n    });\n    return { action: \"break\", reason: dispatchedReason };\n  }\n  if (preResult === \"retry\") {\n    if (sidecarItem) {\n      // Sidecar artifact retries are skipped — just continue\n      debugLog(\"autoLoop\", { phase: \"sidecar-artifact-retry-skipped\", iteration: ic.iteration });\n    } else {\n      // s.pendingVerificationRetry was set by postUnitPreVerification.\n      // Emit a dedicated journal event so forensics can distinguish bounded\n      // verification retries from genuine stuck-loop dispatch repetitions (#4540).\n      const retryInfo = s.pendingVerificationRetry;\n      deps.emitJournalEvent({\n        ts: new Date().toISOString(),\n        flowId: ic.flowId,\n        seq: ic.nextSeq(),\n        eventType: \"artifact-verification-retry\",\n        data: {\n          unitType: preUnitSnapshot?.type,\n          unitId: retryInfo?.unitId,\n          attempt: retryInfo?.attempt,\n        },\n      });\n      const retryPolicyResult = await applyVerificationRetryPolicy(\n        ic,\n        preUnitSnapshot?.type,\n        \"artifact-verification-retry\",\n      );\n      if (retryPolicyResult) {\n        return retryPolicyResult;\n      }\n      // Continue the loop — next iteration will inject the retry context into the prompt.\n      debugLog(\"autoLoop\", { phase: \"artifact-verification-retry\", iteration: ic.iteration });\n      return { action: \"continue\" };\n    }\n  }\n\n  if (pauseAfterUatDispatch) {\n    ctx.ui.notify(\n      \"UAT requires human execution. Auto-mode will pause after this unit writes the result file.\",\n      \"info\",\n    );\n    await deps.pauseAuto(ctx, pi);\n    debugLog(\"autoLoop\", { phase: \"exit\", reason: \"uat-pause\" });\n    return { action: \"break\", reason: \"uat-pause\" };\n  }\n\n  // Verification gate\n  // Hook sidecar items skip verification entirely.\n  // Non-hook sidecar items run verification but skip retries (just continue).\n  const skipVerification = sidecarItem?.kind === \"hook\";\n  if (!skipVerification) {\n    const verificationResult = await deps.runPostUnitVerification(\n      { s, ctx, pi },\n      deps.pauseAuto,\n    );\n\n    if (verificationResult === \"pause\") {\n      debugLog(\"autoLoop\", { phase: \"exit\", reason: \"verification-pause\" });\n      return { action: \"break\", reason: \"verification-pause\" };\n    }\n\n    if (verificationResult === \"retry\") {\n      if (sidecarItem) {\n        // Sidecar verification retries are skipped — just continue\n        debugLog(\"autoLoop\", { phase: \"sidecar-verification-retry-skipped\", iteration: ic.iteration });\n      } else {\n        // s.pendingVerificationRetry was set by runPostUnitVerification.\n        const retryPolicyResult = await applyVerificationRetryPolicy(\n          ic,\n          iterData.unitType,\n          \"verification-retry\",\n        );\n        if (retryPolicyResult) {\n          return retryPolicyResult;\n        }\n        // Continue the loop — next iteration will inject the retry context into the prompt.\n        debugLog(\"autoLoop\", { phase: \"verification-retry\", iteration: ic.iteration });\n        return { action: \"continue\" };\n      }\n    }\n  }\n\n  // Post-verification processing (DB dual-write, hooks, triage, quick-tasks)\n  // Timeout guard: if postUnitPostVerification hangs (e.g., module import\n  // deadlock, SQLite transaction hang), force-continue after timeout so the\n  // auto-loop is not permanently frozen (#2344).\n  const postResultGuard = await withTimeout(\n    deps.postUnitPostVerification(postUnitCtx),\n    FINALIZE_POST_TIMEOUT_MS,\n    \"postUnitPostVerification\",\n  );\n\n  if (postResultGuard.timedOut) {\n    return failClosedOnFinalizeTimeout(\n      ic,\n      iterData,\n      loopState,\n      \"post\",\n      preUnitSnapshot?.startedAt ?? Date.now(),\n    );\n  }\n\n  const postResult = postResultGuard.value;\n\n  if (postResult === \"stopped\") {\n    debugLog(\"autoLoop\", {\n      phase: \"exit\",\n      reason: \"post-verification-stopped\",\n    });\n    return { action: \"break\", reason: \"post-verification-stopped\" };\n  }\n\n  if (postResult === \"step-wizard\") {\n    // Step mode — exit the loop (caller handles wizard)\n    debugLog(\"autoLoop\", { phase: \"exit\", reason: \"step-wizard\" });\n    return { action: \"break\", reason: \"step-wizard\" };\n  }\n\n  if (preUnitSnapshot?.type === \"complete-milestone\" && s.currentMilestoneId) {\n    const stop = await _runMilestoneMergeOnceWithStashRestore(ic, s.currentMilestoneId);\n    if (stop) return stop;\n  }\n\n  // Both pre and post verification completed without timeout — reset counter\n  loopState.consecutiveFinalizeTimeouts = 0;\n  if (preUnitSnapshot) {\n    writeUnitRuntimeRecord(s.basePath, preUnitSnapshot.type, preUnitSnapshot.id, preUnitSnapshot.startedAt, {\n      phase: \"finalized\",\n      lastProgressAt: Date.now(),\n      lastProgressKind: \"finalize-success\",\n    });\n    if (\n      !preUnitSnapshot.type.startsWith(\"hook/\") &&\n      preUnitSnapshot.type !== \"custom-step\" &&\n      preUnitSnapshot.type !== \"complete-milestone\"\n    ) {\n      setAutoOutcomeWidget(ctx, {\n        ...buildPhaseHandoffOutcome({\n          unitType: preUnitSnapshot.type,\n          unitId: preUnitSnapshot.id,\n          agentEndMessages: s.lastUnitAgentEndMessages,\n        }),\n        startedAt: s.autoStartTime,\n      });\n    }\n  }\n  // Surface accumulated workflow-logger issues for this unit to the user.\n  // Warnings/errors logged during the unit are buffered in the logger and\n  // drained here so the user sees a single consolidated post-unit alert.\n  if (hasAnyIssues()) {\n    const { logs } = drainAndSummarize();\n    if (logs.length > 0) {\n      const severity = logs.some((e) => e.severity === \"error\") ? \"error\" : \"warning\";\n      ctx.ui.notify(formatForNotification(logs), severity);\n    }\n  }\n\n  return { action: \"next\", data: undefined as void };\n}\n"}] -->
- [ ] <!-- {"checkboxId": "3cf7720d-6018-47fe-804e-ec1c51a05790"} --> 📌 Commit the changes to this branch



<!-- This is an auto-generated reply by CodeRabbit -->
3. [review_comment] coderabbitai[bot]
   file=src/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts line=24
   sourceId=3251769098 sourceUrl=https://github.com/gsd-build/gsd-2/pull/6187#discussion_r3251769098
   replyKind=review_thread threadId=PRRT_kwDORj5hb86Cgyj6 threadResolved=false
   auditToken=codefactory-feedback:gh-review-comment-3251769098
   _🛠️ Refactor suggestion_ | _🟠 Major_ | _⚡ Quick win_

**Add tests for `skill` alone and read-only `gsd_exec` alone.**

The current test verifies that `skill` + read-only `gsd_exec` together are classified as reconnaissance-only, but doesn't isolate whether each works independently. Per the PR objectives, both "skill" (added to READ_ONLY_TOOL_NAMES) and read-only gsd_exec patterns should be detected on their own.

Consider adding:
1. A test with only `{ name: "skill", ... }` → expects `isReadOnlyReconnaissanceOnly = true`
2. A test with only `{ name: "gsd_exec", input: { command: "rg -n TODO src" }, ... }` → expects `isReadOnlyReconnaissanceOnly = true`

This ensures the implementation correctly classifies each independently and would catch bugs where both are incorrectly required together.




<details>
<summary>✅ Proposed additional test cases</summary>

```typescript
test("classifyTraceProgress treats skill alone as reconnaissance-only", () => {
  const trace = traceWithToolCalls([
    { name: "skill", input: { name: "diagnose" }, isError: false },
  ]);
  const result = classifyTraceProgress(trace);
  assert.equal(result.isReadOnlyReconnaissanceOnly, true);
});

test("classifyTraceProgress treats read-only gsd_exec alone as reconnaissance-only", () => {
  const trace = traceWithToolCalls([
    { name: "gsd_exec", input: { command: "rg -n TODO src" }, isError: false },
  ]);
  const result = classifyTraceProgress(trace);
  assert.equal(result.isReadOnlyReconnaissanceOnly, true);
});
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts`
around lines 17 - 24, Add two unit tests to
src/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts
that call classifyTraceProgress with traces created by traceWithToolCalls
containing only the "skill" tool and only the read-only "gsd_exec" command
respectively; for each test assert that result.isReadOnlyReconnaissanceOnly ===
true so we verify that classifyTraceProgress detects "skill" alone and read-only
"gsd_exec" alone as reconnaissance-only rather than requiring both together.
```

</details>

<!-- fingerprinting:phantom:poseidon:puma -->

<!-- This is an auto-generated comment by CodeRabbit -->
4. [review] coderabbitai[bot]
   file=n/a line=n/a
   sourceId=4302381229 sourceUrl=https://github.com/gsd-build/gsd-2/pull/6187#pullrequestreview-4302381229
   replyKind=review threadId=n/a threadResolved=n/a
   auditToken=codefactory-feedback:gh-review-4302381229
   **Actionable comments posted: 1**

<details>
<summary>🧹 Nitpick comments (1)</summary><blockquote>

<details>
<summary>src/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts (1)</summary><blockquote>

`26-48`: _⚡ Quick win_

**Consider adding an empty trace edge case test.**

The current tests cover various command patterns, but don't verify behavior when `toolCalls` is empty. An empty trace is a common edge case that should have explicit, tested behavior.




<details>
<summary>📋 Proposed empty trace test</summary>

```typescript
test("classifyTraceProgress treats empty trace as reconnaissance-only", () => {
  const trace = traceWithToolCalls([]);
  const result = classifyTraceProgress(trace);
  // Adjust expected value based on desired semantics:
  // true if "no mutations" implies read-only, false if "no actions" is distinct
  assert.equal(result.isReadOnlyReconnaissanceOnly, true);
});
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts`
around lines 26 - 48, Add an edge-case unit test that passes an empty toolCalls
array to traceWithToolCalls and asserts classifyTraceProgress returns a
reconnaissance-only result; specifically add a test like "classifyTraceProgress
treats empty trace as reconnaissance-only" that constructs const trace =
traceWithToolCalls([]), calls const result = classifyTraceProgress(trace), and
asserts result.isReadOnlyReconnaissanceOnly === true so the classifier’s
behavior for empty traces is explicitly verified.
```

</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@src/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts`:
- Around line 17-24: Add two unit tests to
src/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts
that call classifyTraceProgress with traces created by traceWithToolCalls
containing only the "skill" tool and only the read-only "gsd_exec" command
respectively; for each test assert that result.isReadOnlyReconnaissanceOnly ===
true so we verify that classifyTraceProgress detects "skill" alone and read-only
"gsd_exec" alone as reconnaissance-only rather than requiring both together.

---

Nitpick comments:
In
`@src/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts`:
- Around line 26-48: Add an edge-case unit test that passes an empty toolCalls
array to traceWithToolCalls and asserts classifyTraceProgress returns a
reconnaissance-only result; specifically add a test like "classifyTraceProgress
treats empty trace as reconnaissance-only" that constructs const trace =
traceWithToolCalls([]), calls const result = classifyTraceProgress(trace), and
asserts result.isReadOnlyReconnaissanceOnly === true so the classifier’s
behavior for empty traces is explicitly verified.
```

</details>

<details>
<summary>🪄 Autofix (Beta)</summary>

Fix all unresolved CodeRabbit comments on this PR:

- [ ] <!-- {"checkboxId": "4b0d0e0a-96d7-4f10-b296-3a18ea78f0b9"} --> Push a commit to this branch (recommended)
- [ ] <!-- {"checkboxId": "ff5b1114-7d8c-49e6-8ac1-43f82af23a33"} --> Create a new PR with the fixes

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: defaults

**Review profile**: CHILL

**Plan**: Pro Plus

**Run ID**: `53ea4df4-a8f1-4861-bc69-6abbcb6e0cd4`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between b36c97c54feaf1395a6d27a33d137f68a84ae979 and 3ee901cc2f1ee3f9d0447ee7a76b3f3bdafc1583.

</details>

<details>
<summary>📒 Files selected for processing (4)</summary>

* `src/resources/extensions/gsd/auto-post-unit.ts`
* `src/resources/extensions/gsd/session-forensics.ts`
* `src/resources/extensions/gsd/tests/auto-post-unit-artifact-diagnostic.test.ts`
* `src/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts`

</details>

<details>
<summary>🚧 Files skipped from review as they are similar to previous changes (3)</summary>

* src/resources/extensions/gsd/session-forensics.ts
* src/resources/extensions/gsd/tests/auto-post-unit-artifact-diagnostic.test.ts
* src/resources/extensions/gsd/auto-post-unit.ts

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Approved status-check tasks:
None

Approved documentation tasks:
None

When done:
1) Run the relevant verification for your changes.
2) Leave any changed files in the worktree for the babysitter app to finalize.
3) For each feedback item you addressed or were blocked on, emit a summary block in the following format:
   FEEDBACK_SUMMARY_START <auditToken>
   <A concise 1-2 sentence summary of what you did or why you were blocked>
   FEEDBACK_SUMMARY_END
   Include one block per audit token. These summaries will be posted as follow-up comments on the PR.
4) If documentation tasks were assigned, emit exactly one docs summary block in the following format:
   DOCS_SUMMARY_START <changed|no_change>
   <A concise 1-2 sentence summary of the docs you updated, or why no docs changes were necessary after inspection>
   DOCS_SUMMARY_END

@jeremymcs
Copy link
Copy Markdown
Collaborator Author

Addressed in commit 37c92c7 by the latest babysitter run.

Responding to comment by @coderabbitai[bot]:
Actionable comments posted: 1

Added an explicit empty-trace edge-case test and aligned it to current classifier semantics (toolCalls.length === 0 is not reconnaissance-only), ensuring behavior is now documented and guarded by test.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 16, 2026

Tip

For best results, initiate chat on the files or code changes.

@jeremymcs: Good call on the toolCalls.length === 0 semantics — treating an empty trace as not reconnaissance-only is the safer default (no evidence of read-only activity ≠ confirmed read-only activity), and pinning that invariant in a test prevents silent regressions if the classifier logic shifts later. Thanks for closing the loop!

🐇 ✨

📝 CodeRabbit Chat: Implement requested code changes
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/resources/extensions/gsd/commands-prefs-wizard.ts (1)

590-593: ⚡ Quick win

Comment inconsistency: internal field name.

The inline comment mentions this.models but the test fixture in prefs-wizard-coverage.test.ts (line 185) shows the real implementation uses this._all. Update the comment for accuracy.

📝 Suggested fix
-  // Call getAllWithDiscovered as a method so `this` stays bound to the
-  // registry — invoking a detached reference loses `this` and the method's
-  // internal `this.models` access throws.
+  // Call getAllWithDiscovered as a method so `this` stays bound to the
+  // registry — invoking a detached reference loses `this` and the method's
+  // internal `this._all` access throws.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/resources/extensions/gsd/commands-prefs-wizard.ts` around lines 590 -
593, Comment mentions internal field `this.models` but real implementation uses
`this._all`; update the inline comment near the registry declaration and any
related explanation to reference `this._all` (and `getAllWithDiscovered` /
`registry`) so the note accurately describes why the method must be called with
the registry bound (i.e., to preserve access to this._all).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/resources/extensions/gsd/commands-prefs-wizard.ts`:
- Around line 590-593: Comment mentions internal field `this.models` but real
implementation uses `this._all`; update the inline comment near the registry
declaration and any related explanation to reference `this._all` (and
`getAllWithDiscovered` / `registry`) so the note accurately describes why the
method must be called with the registry bound (i.e., to preserve access to
this._all).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 1ce65604-24c1-4546-9560-34bd84b339d2

📥 Commits

Reviewing files that changed from the base of the PR and between 37c92c7 and 9aa8fec.

📒 Files selected for processing (2)
  • src/resources/extensions/gsd/commands-prefs-wizard.ts
  • src/resources/extensions/gsd/tests/prefs-wizard-coverage.test.ts

@jeremymcs
Copy link
Copy Markdown
Collaborator Author

jeremymcs commented May 16, 2026

Accepted — this comment requires code changes. Queuing fix...
🧰 Agent runningcodex is working on the fix...
Agent completed — verifying changes...
🎉 Resolved — addressed in commit 3dfb7a4.

@jeremymcs
Copy link
Copy Markdown
Collaborator Author

🤖 Dispatched codex with the following prompt:

Agent prompt (click to expand)
You are acting as an autonomous PR babysitter for gsd-build/gsd-2 PR #6187.
PR URL: https://github.com/gsd-build/gsd-2/pull/6187
Base repository: gsd-build/gsd-2
Head repository: gsd-build/gsd-2
Head branch: issue/5733-auto-mode-artifact-verification-failures-1778892830
Head remote: origin
You are running inside an isolated app-owned worktree under ~/.patchdeck.
Make only targeted changes that resolve the approved tasks.
Do not wait for user input, confirmation, or approval at any point.
Do not rewrite unrelated files.
Use the available git tooling for inspection and verification only.
If dependencies are missing, install them using the repository's lockfile/package manager as needed inside this isolated worktree.
Leave file edits uncommitted; the babysitter app will handle Git finalization after your run.
GitHub follow-up replies and review-thread resolution will be handled by the babysitter after your run.
If a task is invalid after inspection, explain it in your final response and include the exact audit token.

Approved review-comment tasks:
1. [review] coderabbitai[bot]
   file=n/a line=n/a
   sourceId=4302585229 sourceUrl=https://github.com/gsd-build/gsd-2/pull/6187#pullrequestreview-4302585229
   replyKind=review threadId=n/a threadResolved=n/a
   auditToken=codefactory-feedback:gh-review-4302585229
   

<details>
<summary>🧹 Nitpick comments (1)</summary><blockquote>

<details>
<summary>src/resources/extensions/gsd/commands-prefs-wizard.ts (1)</summary><blockquote>

`590-593`: _⚡ Quick win_

**Comment inconsistency: internal field name.**

The inline comment mentions `this.models` but the test fixture in `prefs-wizard-coverage.test.ts` (line 185) shows the real implementation uses `this._all`. Update the comment for accuracy.


<details>
<summary>📝 Suggested fix</summary>

```diff
-  // Call getAllWithDiscovered as a method so `this` stays bound to the
-  // registry — invoking a detached reference loses `this` and the method's
-  // internal `this.models` access throws.
+  // Call getAllWithDiscovered as a method so `this` stays bound to the
+  // registry — invoking a detached reference loses `this` and the method's
+  // internal `this._all` access throws.
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/resources/extensions/gsd/commands-prefs-wizard.ts` around lines 590 -
593, Comment mentions internal field `this.models` but real implementation uses
`this._all`; update the inline comment near the registry declaration and any
related explanation to reference `this._all` (and `getAllWithDiscovered` /
`registry`) so the note accurately describes why the method must be called with
the registry bound (i.e., to preserve access to this._all).
```

</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

```
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/resources/extensions/gsd/commands-prefs-wizard.ts`:
- Around line 590-593: Comment mentions internal field `this.models` but real
implementation uses `this._all`; update the inline comment near the registry
declaration and any related explanation to reference `this._all` (and
`getAllWithDiscovered` / `registry`) so the note accurately describes why the
method must be called with the registry bound (i.e., to preserve access to
this._all).
```

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: defaults

**Review profile**: CHILL

**Plan**: Pro Plus

**Run ID**: `1ce65604-24c1-4546-9560-34bd84b339d2`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 37c92c78e5d15ad9ee1a901a1228aed16bf2fbf5 and 9aa8fec7524d135f82c82f3a315b6f1954bd369f.

</details>

<details>
<summary>📒 Files selected for processing (2)</summary>

* `src/resources/extensions/gsd/commands-prefs-wizard.ts`
* `src/resources/extensions/gsd/tests/prefs-wizard-coverage.test.ts`

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Approved status-check tasks:
None

Approved documentation tasks:
None

When done:
1) Run the relevant verification for your changes.
2) Leave any changed files in the worktree for the babysitter app to finalize.
3) For each feedback item you addressed or were blocked on, emit a summary block in the following format:
   FEEDBACK_SUMMARY_START <auditToken>
   <A concise 1-2 sentence summary of what you did or why you were blocked>
   FEEDBACK_SUMMARY_END
   Include one block per audit token. These summaries will be posted as follow-up comments on the PR.
4) If documentation tasks were assigned, emit exactly one docs summary block in the following format:
   DOCS_SUMMARY_START <changed|no_change>
   <A concise 1-2 sentence summary of the docs you updated, or why no docs changes were necessary after inspection>
   DOCS_SUMMARY_END

@jeremymcs
Copy link
Copy Markdown
Collaborator Author

Addressed in commit 3dfb7a4 by the latest babysitter run.

Responding to comment by @coderabbitai[bot]:

Verified the finding was still valid and made a minimal one-line comment correction in commands-prefs-wizard.ts, updating this.models to this._all to match the real getAllWithDiscovered internals. Validation was attempted, but the repo currently has pre-existing TypeScript syntax errors in prefs-wizard-coverage.test.ts that block test/typecheck execution.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 16, 2026

Tip

For best results, initiate chat on the files or code changes.

@jeremymcs: Good — the this.modelsthis._all correction is the right call; that's the actual backing field iterated by getAllWithDiscovered, so the this-binding fix is now consistent end-to-end.

On the pre-existing TypeScript syntax errors in prefs-wizard-coverage.test.ts blocking typecheck/test execution — that's worth tracking separately so it doesn't quietly persist. Would you like me to open a follow-up issue for it?

@jeremymcs
Copy link
Copy Markdown
Collaborator Author

🤖 Dispatched codex with the following prompt:

Agent prompt (click to expand)
You are acting as an autonomous PR babysitter for gsd-build/gsd-2 PR #6187.
PR URL: https://github.com/gsd-build/gsd-2/pull/6187
Base repository: gsd-build/gsd-2
Head repository: gsd-build/gsd-2
Head branch: issue/5733-auto-mode-artifact-verification-failures-1778892830
Base branch: main
Head remote: origin
You are running inside an isolated app-owned worktree under ~/.patchdeck.

A merge from the base branch into the head branch has been started but has conflicts.
The following files have merge conflicts:
  - src/resources/extensions/gsd/commands-prefs-wizard.ts

Your task:
1) Resolve ALL merge conflicts in the listed files.
2) Preserve the intent of both the base branch and head branch changes.
3) When in doubt, prefer the head branch (PR) changes, since that is the author's work.
4) Leave the resolved files in the worktree. The babysitter app will handle Git finalization.
5) Do not stage files, create commits, or push branches.
6) Summarize what you resolved in your final response.

Do not wait for user input, confirmation, or approval at any point.
Do not rewrite unrelated files.
Use the available git tooling for inspection and verification only.

@jeremymcs
Copy link
Copy Markdown
Collaborator Author

Code-owner check: validated latest actionable feedback on current HEAD; no additional code changes required.

@jeremymcs
Copy link
Copy Markdown
Collaborator Author

Code-owner follow-up on latest review feedback at commit 3dfb7a4f5bd97761c0485ba01c5bfa0b82835d98:

  1. auto-post-unit.ts completion-tool hint gating: accepted, already fixed at HEAD.
  • Verified completionToolHint is now conditional on unitType === "execute-task" and !hasTaskCompletionToolCall(agentEndMessages).
  • Verified by test: missing execute-task artifact skips completion-tool hint when completion tool call is present.
  1. session-forensics.ts read-only classifier too permissive: accepted, already fixed at HEAD.
  • Verified regex now rejects shell control/meta characters in gsd_exec commands.
  • Verified by tests for mutating, shell-chained, and script-eval command forms.
  1. session-forensics-readonly-classification.test.ts add adversarial tests: accepted, already fixed at HEAD.
  • Verified tests exist for cat file && echo x > y and script-eval command patterns; both assert non-reconnaissance.
  1. session-forensics-readonly-classification.test.ts add skill-alone and read-only-exec-alone tests: accepted, already fixed at HEAD.
  • Verified both tests exist and pass.
  1. session-forensics-readonly-classification.test.ts empty trace should be reconnaissance-only: rejected.
  • Current implementation explicitly treats empty traces as not reconnaissance-only (toolCalls.length > 0 guard), and test coverage now locks this behavior intentionally.
  1. commands-prefs-wizard.ts comment should reference this._all: accepted, already fixed at HEAD.
  • Verified inline comment now references this._all.

Verification run:

  • node --import ./src/resources/extensions/gsd/tests/resolve-ts.mjs --experimental-strip-types --test src/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts src/resources/extensions/gsd/tests/auto-post-unit-artifact-diagnostic.test.ts
  • Result: 9 passed, 0 failed.

Note: I could post this comment, but I could not query thread IDs/resolution state via GraphQL in this environment (error connecting to api.github.com), so I could not resolve individual conversations programmatically from here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

auto-mode: artifact verification failures pause with low-actionability diagnostics

1 participant