fix(issue): auto-mode: artifact verification failures pause with low-actionability diagnostics#6187
Conversation
missing-artifact failures now include completion-contract guidance and an explicit execute-task completion-tool hint.
added read-only reconnaissance classification coverage for `skill` and read-only `gsd_exec` command patterns.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR enriches artifact verification failure diagnostics with expected-artifact context and execute-task completion-tool hints, adds trace classification to detect read-only reconnaissance-only tool calls, and fixes a this-binding in the prefs wizard with corresponding tests. ChangesArtifact Verification and Trace Classification
Prefs Wizard this-binding fix
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🔴 PR Risk Report — CRITICAL
Affected Systems
File Breakdown
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts (1)
17-32: ⚡ Quick winAdd adversarial cases for shell chaining/script execution.
Please add regression tests for commands like
cat file && echo x > yand script-eval forms (python -c ...,node --print ...) to lock down read-only classification boundaries.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts` around lines 17 - 32, Add regression tests in session-forensics-readonly-classification.test.ts that exercise shell-chaining and script-eval adversarial cases against classifyTraceProgress: create traces using traceWithToolCalls invoking gsd_exec with a chained command like "cat file && echo x > y" and assert result.isReadOnlyReconnaissanceOnly === false, and another trace using a script-eval form (e.g., "python -c \"...write...\"" or "node -e \"require('fs').writeFileSync(...)\"") and assert isReadOnlyReconnaissanceOnly === false; keep the test patterns consistent with existing tests (use traceWithToolCalls and classifyTraceProgress) so the new cases lock down read-only classification boundaries.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/resources/extensions/gsd/auto-post-unit.ts`:
- Around line 458-461: The message unconditionally appends a "No completion tool
call detected" hint for unitType === "execute-task" even though this function
doesn't inspect the unit's trace history; update the logic that builds
completionToolHint (the completionToolHint variable and the branch checking
unitType === "execute-task") so it only adds that hint after confirming the
unit's trace/tool-call history actually lacks a gsd_task_complete (and its known
aliases) entry—i.e., accept or access the unit's trace/toolCalls context, scan
for any call named "gsd_task_complete" (or alias) and only set
completionToolHint when none are found; keep the rest of the artifact-not-found
message unchanged.
In `@src/resources/extensions/gsd/session-forensics.ts`:
- Around line 79-87: The current READ_ONLY_EXEC_COMMAND_RE and
isReadOnlyReconnaissanceTool allow commands like "python -c '...; rm -rf /'" by
matching only a safe prefix; tighten the classifier by rejecting any command
that contains shell metacharacters or piping/redirection after the matched
read-only prefix. Update either READ_ONLY_EXEC_COMMAND_RE or add a secondary
check in isReadOnlyReconnaissanceTool (using the local variable command) to
ensure the remainder of the command contains only safe token characters/flags
(e.g., letters, numbers, dashes, dots) and explicitly disallow characters like ;
& | > < $ ` ( ) $() || && `>` and shell control operators, while still allowing
simple version/flag forms; keep READ_ONLY_TOOL_NAMES behavior intact and return
false for any command failing this stricter validation.
---
Nitpick comments:
In
`@src/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts`:
- Around line 17-32: Add regression tests in
session-forensics-readonly-classification.test.ts that exercise shell-chaining
and script-eval adversarial cases against classifyTraceProgress: create traces
using traceWithToolCalls invoking gsd_exec with a chained command like "cat file
&& echo x > y" and assert result.isReadOnlyReconnaissanceOnly === false, and
another trace using a script-eval form (e.g., "python -c \"...write...\"" or
"node -e \"require('fs').writeFileSync(...)\"") and assert
isReadOnlyReconnaissanceOnly === false; keep the test patterns consistent with
existing tests (use traceWithToolCalls and classifyTraceProgress) so the new
cases lock down read-only classification boundaries.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 605882d3-ccb5-4067-ad5f-4b435e6ff540
📒 Files selected for processing (4)
src/resources/extensions/gsd/auto-post-unit.tssrc/resources/extensions/gsd/session-forensics.tssrc/resources/extensions/gsd/tests/auto-post-unit-artifact-diagnostic.test.tssrc/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts
|
⏳ Accepted — this comment requires code changes. Queuing fix... |
1 similar comment
|
⏳ Accepted — this comment requires code changes. Queuing fix... |
|
🤖 Dispatched Agent prompt (click to expand) |
|
Addressed in commit
Reviewed the general CodeRabbit walkthrough/check output and confirmed the actionable items were the two inline review concerns plus added regression tests; no separate code change was required from the summary-only comment itself. |
|
Addressed in commit
Added adversarial regression coverage for shell chaining ( |
|
Tip For best results, initiate chat on the files or code changes.
🐇 ✨ |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts (1)
26-48: ⚡ Quick winConsider adding an empty trace edge case test.
The current tests cover various command patterns, but don't verify behavior when
toolCallsis empty. An empty trace is a common edge case that should have explicit, tested behavior.📋 Proposed empty trace test
test("classifyTraceProgress treats empty trace as reconnaissance-only", () => { const trace = traceWithToolCalls([]); const result = classifyTraceProgress(trace); // Adjust expected value based on desired semantics: // true if "no mutations" implies read-only, false if "no actions" is distinct assert.equal(result.isReadOnlyReconnaissanceOnly, true); });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts` around lines 26 - 48, Add an edge-case unit test that passes an empty toolCalls array to traceWithToolCalls and asserts classifyTraceProgress returns a reconnaissance-only result; specifically add a test like "classifyTraceProgress treats empty trace as reconnaissance-only" that constructs const trace = traceWithToolCalls([]), calls const result = classifyTraceProgress(trace), and asserts result.isReadOnlyReconnaissanceOnly === true so the classifier’s behavior for empty traces is explicitly verified.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@src/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts`:
- Around line 17-24: Add two unit tests to
src/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts
that call classifyTraceProgress with traces created by traceWithToolCalls
containing only the "skill" tool and only the read-only "gsd_exec" command
respectively; for each test assert that result.isReadOnlyReconnaissanceOnly ===
true so we verify that classifyTraceProgress detects "skill" alone and read-only
"gsd_exec" alone as reconnaissance-only rather than requiring both together.
---
Nitpick comments:
In
`@src/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts`:
- Around line 26-48: Add an edge-case unit test that passes an empty toolCalls
array to traceWithToolCalls and asserts classifyTraceProgress returns a
reconnaissance-only result; specifically add a test like "classifyTraceProgress
treats empty trace as reconnaissance-only" that constructs const trace =
traceWithToolCalls([]), calls const result = classifyTraceProgress(trace), and
asserts result.isReadOnlyReconnaissanceOnly === true so the classifier’s
behavior for empty traces is explicitly verified.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 53ea4df4-a8f1-4861-bc69-6abbcb6e0cd4
📒 Files selected for processing (4)
src/resources/extensions/gsd/auto-post-unit.tssrc/resources/extensions/gsd/session-forensics.tssrc/resources/extensions/gsd/tests/auto-post-unit-artifact-diagnostic.test.tssrc/resources/extensions/gsd/tests/session-forensics-readonly-classification.test.ts
🚧 Files skipped from review as they are similar to previous changes (3)
- src/resources/extensions/gsd/session-forensics.ts
- src/resources/extensions/gsd/tests/auto-post-unit-artifact-diagnostic.test.ts
- src/resources/extensions/gsd/auto-post-unit.ts
|
⏳ Accepted — this comment requires code changes. Queuing fix... |
|
🤖 Dispatched Agent prompt (click to expand) |
|
Addressed in commit
Added an explicit empty-trace edge-case test and aligned it to current classifier semantics ( |
|
Tip For best results, initiate chat on the files or code changes.
🐇 ✨ |
📝 CodeRabbit Chat: Implement requested code changes
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/resources/extensions/gsd/commands-prefs-wizard.ts (1)
590-593: ⚡ Quick winComment inconsistency: internal field name.
The inline comment mentions
this.modelsbut the test fixture inprefs-wizard-coverage.test.ts(line 185) shows the real implementation usesthis._all. Update the comment for accuracy.📝 Suggested fix
- // Call getAllWithDiscovered as a method so `this` stays bound to the - // registry — invoking a detached reference loses `this` and the method's - // internal `this.models` access throws. + // Call getAllWithDiscovered as a method so `this` stays bound to the + // registry — invoking a detached reference loses `this` and the method's + // internal `this._all` access throws.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/resources/extensions/gsd/commands-prefs-wizard.ts` around lines 590 - 593, Comment mentions internal field `this.models` but real implementation uses `this._all`; update the inline comment near the registry declaration and any related explanation to reference `this._all` (and `getAllWithDiscovered` / `registry`) so the note accurately describes why the method must be called with the registry bound (i.e., to preserve access to this._all).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/resources/extensions/gsd/commands-prefs-wizard.ts`:
- Around line 590-593: Comment mentions internal field `this.models` but real
implementation uses `this._all`; update the inline comment near the registry
declaration and any related explanation to reference `this._all` (and
`getAllWithDiscovered` / `registry`) so the note accurately describes why the
method must be called with the registry bound (i.e., to preserve access to
this._all).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 1ce65604-24c1-4546-9560-34bd84b339d2
📒 Files selected for processing (2)
src/resources/extensions/gsd/commands-prefs-wizard.tssrc/resources/extensions/gsd/tests/prefs-wizard-coverage.test.ts
|
⏳ Accepted — this comment requires code changes. Queuing fix... |
|
🤖 Dispatched Agent prompt (click to expand) |
|
Addressed in commit
Verified the finding was still valid and made a minimal one-line comment correction in |
|
Tip For best results, initiate chat on the files or code changes.
On the pre-existing TypeScript syntax errors in |
|
🤖 Dispatched Agent prompt (click to expand) |
|
Code-owner check: validated latest actionable feedback on current HEAD; no additional code changes required. |
|
Code-owner follow-up on latest review feedback at commit
Verification run:
Note: I could post this comment, but I could not query thread IDs/resolution state via GraphQL in this environment ( |
Summary
Bugs Addressed
Verification
Related Issue
Repo
gsd-build/gsd-2Branch
issue/5733-auto-mode-artifact-verification-failures-1778892830Summary by CodeRabbit
New Features
Tests