Skip to content

Port consumer-repo preflight and coord hardening#74

Merged
justin808 merged 2 commits into
mainfrom
jg/port-ror-preflight-coord-hardening
Jul 5, 2026
Merged

Port consumer-repo preflight and coord hardening#74
justin808 merged 2 commits into
mainfrom
jg/port-ror-preflight-coord-hardening

Conversation

@justin808

@justin808 justin808 commented Jul 4, 2026

Copy link
Copy Markdown
Member

Summary

react_on_rails' repo-pinned copies of pr-security-preflight and agent-coord-bounded diverged ahead of this shared pack with deliberate hardening that was never upstreamed. The skill-picker dedupe re-sync (react_on_rails #4356) would silently revert that hardening downstream if the shared pack lacks it.

Closeout note: this branch has now been merged with current main. The pr-security-preflight implementation/test hardening from the original PR is already present in current mainline, so the remaining current-head diff preserves that implementation, adds the missing operator documentation/changelog coverage for those preflight semantics, and carries the agent-coord-bounded implementation/test hardening.

pr-security-preflight

  • Documents fail-closed handling when AGENT_WORKFLOWS_TRUST_CONFIG points to a missing file.
  • Documents explicit --trust-config repo-local vs user-global classification and owner-qualified global team slugs.
  • Documents suspicious-text warning scans for trusted metadata-bot content.
  • Documents keeping blocking-pattern warnings visible for resolved trusted-bot/metadata-bot review threads.
  • Documents that trusted-source diff-warning downgrades require full source-actor coverage, including timelineItems.

agent-coord-bounded

  • Preserve captured child stdout/stderr when exiting on interrupt or timeout.
  • Wait for the whole process group to exit during termination, not just the direct child.
  • Treat defunct helper processes as no longer live in the test liveness helper, so cleanup assertions do not fail on unreaped zombies.

Provenance: react_on_rails #4148/#4151/#4170/#4288 (preflight) and #4161/#4381 (agent-coord-bounded).

Review and closeout

  • Addressed Codex review: the process-cleanup test no longer treats zombie helpers as live processes.
  • Addressed Claude review: docs/trust-and-preflight.md now covers metadata-bot suspicious-text warnings, resolved-thread blocking-pattern visibility, and timeline coverage requirements.
  • Addressed Claude review: the duplicate actor_trusted_for_suspicious_warning? predicate is gone after merging current mainline, which already uses allowed_metadata_actor? / trusted_metadata_text?.

Validation

  • PR_BATCH_SKILL_DIR=skills/pr-batch skills/pr-batch/bin/pr-security-preflight --repo shakacode/agent-workflows 74SECURITY_PREFLIGHT_OK.
  • ruby skills/pr-batch/bin/agent-coord-bounded-test.rb — 14 runs, 81 assertions, 0 failures.
  • ruby skills/pr-batch/bin/pr-security-preflight-test.rb — 94 runs, 760 assertions, 0 failures.
  • bin/validatePASS agent-workflows validation.
  • codex review --base origin/main — no actionable correctness issues.
  • Claude CLI review pass attempted with a verified diff and $5 budget cap, but local Claude reported weekly quota exhausted until July 6 at 9pm Pacific/Honolulu; no Claude local result was used as approval evidence.

Follow-up

Once this merges, react_on_rails #4356 should re-sync its repo-pinned helper copies from the updated shared pack (for example with bin/push-downstream), restoring full parity with zero behavior loss.

Summary by CodeRabbit

  • Bug Fixes

    • Improved handling of interrupted and timed-out runs so captured output is now preserved and shown before exit.
    • Made process termination more reliable by waiting for the full process group to exit and escalating when needed.
    • Tightened trust and preflight behavior so missing or unclear trust settings are handled more safely, with clearer warning visibility in reviews and comments.
  • Documentation

    • Clarified trust-config path rules and how warning-related metadata is interpreted in security preflight checks.

Port react_on_rails repo-local hardening into the shared pack so the
skill-picker dedupe re-sync (react_on_rails #4356) does not revert it:

- pr-security-preflight: abort when AGENT_WORKFLOWS_TRUST_CONFIG points
  to a missing file; treat explicit --trust-config paths outside the
  consuming repo's git root as user-global and warn on ignored
  unqualified team slugs; scan trusted metadata-bot comments for
  suspicious-text warnings; keep blocking-pattern warnings visible on
  resolved trusted-bot review threads; require full source-actor
  timeline coverage before trusting a PR source for diff-warning
  downgrades.
- agent-coord-bounded: preserve captured stdout/stderr on interrupt and
  timeout exits; wait for the whole process group to exit during
  termination.

Ported from react_on_rails #4148/#4151/#4170/#4288 (preflight) and
#4161/#4381 (agent-coord-bounded), re-expressed on top of the
--strict-trust semantics from #48.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jul 4, 2026

Copy link
Copy Markdown

Review Change Stack

Walkthrough

This PR ports hardening documentation and behavior changes. Documentation updates clarify trust-config resolution and suspicious-text handling for pr-security-preflight. The agent-coord-bounded script gains process-group liveness checks and output flushing on interrupt/timeout/completion, with corresponding test updates including zombie-process detection.

Changes

Trust and Preflight Documentation

Layer / File(s) Summary
Trust-config and suspicious-text docs
docs/trust-and-preflight.md, CHANGELOG.md
Documents repo-local vs user-global --trust-config PATH classification rules and trusted metadata-bot suspicious-text warning/downgrade behavior including timeline coverage requirements; changelog records both hardening ports.

agent-coord-bounded Termination Hardening

Layer / File(s) Summary
Process-group liveness and termination refactor
skills/pr-batch/bin/agent-coord-bounded
Adds process_group_alive? and wait_for_process_group_exit helpers and refactors terminate_process_group to wait for the whole process group across signal and KILL escalation phases.
Captured output flushing
skills/pr-batch/bin/agent-coord-bounded
Adds flush_captured_output and wires it into interrupt, timeout, and normal-completion exit paths, replacing inline dump logic.
Test coverage updates
skills/pr-batch/bin/agent-coord-bounded-test.rb
Updates timeout/interruption tests to assert flushed output and exact timeout messages, adds a TERM-output regression test, and updates process_alive? with a new zombie_process? helper.

Estimated code review effort: 3 (Moderate) | ~25 minutes

Sequence Diagram(s)

sequenceDiagram
  participant Script as agent-coord-bounded
  participant ProcessGroup
  participant CapturedOutput as stdout/stderr tempfiles

  Script->>ProcessGroup: send configured signal (terminate)
  Script->>ProcessGroup: wait_for_process_group_exit(deadline)
  alt still alive after deadline
    Script->>ProcessGroup: send KILL signal
    Script->>ProcessGroup: wait_for_process_group_exit(deadline)
  end
  ProcessGroup-->>Script: process exited (or interrupted/timed out)
  Script->>CapturedOutput: flush_captured_output(stdout_file, stderr_file)
  CapturedOutput-->>Script: rewind and print contents
  Script-->>Script: exit with computed status code
Loading

Related PRs: None identified.

Suggested labels: documentation, skills, testing

Suggested reviewers: None identified.

🐰 A process group waits, then bows to KILL,
Stdout and stderr flushed at will,
Trust configs mapped by git remote's name,
Zombies detected, tests do the same,
Hardening ported, quiet and still.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: porting preflight and coordinator hardening into the shared pack.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch jg/port-ror-preflight-coord-hardening

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b4560fd63b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread skills/pr-batch/bin/agent-coord-bounded-test.rb
Comment thread docs/trust-and-preflight.md Outdated
Comment thread skills/pr-batch/bin/pr-security-preflight Outdated
@claude

claude Bot commented Jul 4, 2026

Copy link
Copy Markdown

Review summary

Went through the ported hardening for pr-security-preflight and agent-coord-bounded line by line against the PR description's claims, traced the new logic paths (resolved_trust_config, path_inside_git_root?/canonical_path, source_actor_coverage_findings, the resolved-trusted-bot blocking-pattern change in suspicious_pr_review_comments, and the agent-coord-bounded process-group/output-flush changes) against their tests, and checked for portability issues per AGENTS.md.

Findings posted inline:

  • docs/trust-and-preflight.md — the doc only documents 2 of the 5 behavior changes in this PR (fail-closed env var, repo-local vs. global --trust-config). The metadata-bot suspicious-text scanning, resolved-thread blocking-pattern visibility, and timeline-coverage requirement changes aren't mentioned, even though this file is the canonical description of the trust model operators rely on.
  • skills/pr-batch/bin/pr-security-preflightactor_trusted_for_suspicious_warning? duplicates the existing allowed_metadata_actor? predicate (identical logic). Minor simplification opportunity, not a bug.

What checked out fine:

  • resolved_trust_config/path_inside_git_root?: correctly fails closed on a missing AGENT_WORKFLOWS_TRUST_CONFIG target, and correctly classifies an explicit --trust-config outside the consuming repo's git root as global (requiring OWNER/team-slug). canonical_path uses File.realpath with an expand_path fallback, guarding against /tmp symlink mismatches in tests.
  • source_actor_coverage_findings correctly folds the timelineItems pagination-coverage finding into trusted_pr_source?, verified against the new truncated-timeline-warning-diff test fixture.
  • The suspicious_pr_review_comments change correctly keeps blocking-pattern findings on resolved trusted-bot comments visible (via pattern.equal?(BLOCKING_SUSPICIOUS_PATTERN) identity check against the two pattern constants) while still suppressing warning-pattern noise on resolved threads — confirmed against the updated test_resolved_trusted_bot_review_comment_with_suspicious_text_does_not_block and friends.
  • agent-coord-bounded's wait_for_process_group_exit/flush_captured_output additions are wired into all three exit paths (interrupted mid-wait, interrupted post-wait, timed out) plus the normal-exit path — no path was missed, and output is flushed only after terminate_process_group returns, so the captured stdout/stderr tempfiles are complete by the time they're read.
  • No shell-injection or unsafe interpolation: Process.spawn calls use array-form commands throughout, git_toplevel uses Open3.capture3 with array args, and the new RbConfig.ruby.dump interpolation in the added test is properly escaped.
  • No hardcoded consumer-repo names/paths/branches introduced in skills/; the justin808 test login is a pre-existing fixture convention, not new.

I wasn't able to execute ruby -c / the Ruby test suites in this sandbox (tool execution for ruby was blocked), so I relied on close manual tracing of the new logic against the accompanying tests rather than running them — worth a maintainer running bin/validate and the two test files locally before merge, per the PR's own validation notes.

@justin808

Copy link
Copy Markdown
Member Author

Address-review summary

Scan scope: full PR #74 review-thread closeout through head ff0088e2c5709d1c8529814417620ce28833d9e7.

Mattered

  • Codex P2 zombie-helper test finding: fixed in ff0088e; process_alive? now treats Z/defunct helper processes as no longer live, and the thread was replied to and resolved.
  • Claude docs finding: fixed in ff0088e; docs/trust-and-preflight.md now documents metadata-bot suspicious-text warnings, resolved-thread blocking-pattern visibility, and timelineItems source-actor coverage requirements, and the thread was replied to and resolved.
  • Claude duplicate-predicate finding: resolved in ff0088e by merging current mainline; the duplicate helper is gone and the current helper uses allowed_metadata_actor? / trusted_metadata_text?, and the thread was replied to and resolved.

Optional

  • None.

Skipped

  • None.

Validation recorded in the PR body: ruby skills/pr-batch/bin/agent-coord-bounded-test.rb, ruby skills/pr-batch/bin/pr-security-preflight-test.rb, bin/validate, PR_SECURITY_PREFLIGHT_OK, and codex review --base origin/main clean.

Remaining non-code gate: hosted claude-review failed before model work with is_error=true, total_cost_usd=0, num_turns=1, pointing at Claude auth/quota/secret state. This summary does not waive that external gate.

Next default scan starts after this comment. Say check all reviews to rescan the full PR.

@claude

claude Bot commented Jul 5, 2026

Copy link
Copy Markdown

Review

Scope: this diff only touches CHANGELOG.md, docs/trust-and-preflight.md, skills/pr-batch/bin/agent-coord-bounded, and its test script — pr-security-preflight itself is unchanged here (already merged to main per the PR description), so the review focused on the agent-coord-bounded process-group hardening and doc accuracy.

Correctness — agent-coord-bounded

  • Traced terminate_process_group -> wait_for_process_group_exit -> process_group_alive? end-to-end, including the shared term_deadline/kill_deadline budget reuse between wait_for_child and wait_for_process_group_exit. No double-counted or unbounded waits; the TERM -> (wait for group) -> KILL escalation is correctly bounded by grace_seconds/kill_grace_seconds in all branches (interrupt, timeout, and normal exit).
  • flush_captured_output is called exactly once per exit path (each branch calls exit immediately after), so no duplicate-output or ordering bugs.
  • Verified against the new test_timeout_replays_helper_output_before_killing_process_group test: since a helper process spawned by the fake agent-coord inherits the wrapper's stdout/stderr file descriptors (no pgroup override), and terminate_process_group now waits for the whole process group before returning, the helper's delayed post-TERM output is reliably captured before the tempfiles are read. This is a real fix — previously such output could be lost on interrupt/timeout.
  • The only theoretical edge case (a lingering un-reaped grandchild zombie causing process_group_alive? to report alive until the deadline) just results in a redundant, harmless KILL — not a correctness issue.

Docs (docs/trust-and-preflight.md)

  • Cross-checked each new claim against the actual (unchanged) pr-security-preflight implementation: --trust-config repo-local/global classification, unqualified team-slug warning, metadata-bot suspicious-text scanning, resolved trusted-bot/metadata-bot thread handling that still surfaces blocking-pattern findings, and the timelineItems-inclusive source-actor coverage requirement. All statements match current code — no doc drift found.

Portability

  • No consumer-repo-specific commands, labels, branches, paths, or trackers were introduced in skills/ or docs/; the new doc language and CHANGELOG entries stay generic and reuse existing AGENTS.md-seam-style env vars (AGENT_WORKFLOWS_TRUST_CONFIG, PR_BATCH_GIT_PROBE_TIMEOUT_SECONDS, etc).

Shell/Ruby helper safety

  • agent-coord-bounded: no shell interpolation of untrusted input; Process.spawn/Process.kill calls use array args and numeric pids throughout.
  • Test helper zombie_process? shells out via Open3.capture2("ps", "-o", "stat=", "-p", pid.to_s) with array args (no injection risk), and is test-only code.

Minor/non-blocking nit

  • The new CHANGELOG Changed entry describing ported pr-security-preflight hardening overlaps in content with the existing Fixed entry a few lines below (Harden pr-security-preflight trust resolution...). Worth a quick pass so a reader doesn't read these as two separate changes — editorial only, not functional.

No bugs or security issues found. LGTM.

@justin808 justin808 merged commit 2963595 into main Jul 5, 2026
10 of 12 checks passed
@justin808 justin808 deleted the jg/port-ror-preflight-coord-hardening branch July 5, 2026 00:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant