Skip to content

Add durable knowledge contracts and task observer#69

Merged
justin808 merged 22 commits into
mainfrom
codex/awf-b5-knowledge-contracts
Jul 5, 2026
Merged

Add durable knowledge contracts and task observer#69
justin808 merged 22 commits into
mainfrom
codex/awf-b5-knowledge-contracts

Conversation

@justin808

@justin808 justin808 commented Jul 3, 2026

Copy link
Copy Markdown
Member

Summary

  • Adds a portable docs/solutions convention with schema validation and two durable lessons for coordination degradation and untrusted GitHub content.
  • Defines docs/review-finding-schema.md and adopts it in adversarial-pr-review while preserving human-readable output.
  • Adds canonical readiness vocabulary and a machine-readable handoff contract guard across spec, plan-pr-batch, pr-batch, and workflows/pr-processing.md.
  • Adds autoreview-target-state state-machine fixtures and an optional task-observer skill with attribution, privacy rules, staged observation storage, metadata, and docs inventory links.

Batch Evidence

Validation

  • ruby bin/validate-solutions-test.rb
  • ruby skills/pr-batch/bin/goal-completion-contract-test.rb
  • ruby skills/autoreview/bin/autoreview-target-state-test.rb
  • ruby skills/task-observer/bin/task-observer-test.rb
  • git diff --check origin/main...HEAD
  • bin/validate

QA Evidence

  • qa/knowledge-contracts coverage verified locally for schema/fixture validation, optional task-observer metadata, docs inventory links, and repo-wide validation.
  • Upstream attribution/license live check: rebelytics/one-skill-to-rule-them-all and AllstarGER/one-skill-to-rule-them-all both report CC BY 4.0 via GitHub metadata.
  • Dedicated QA subagent was launched read-only after final branch push; coordinator evidence remains the authoritative fallback if the subagent run exceeds the publish window.

Merge Authority

merge_authority: ask - do not merge without maintainer approval.

Summary by CodeRabbit

  • New Features
    • Added the optional task-observer skill for capturing sanitized session observations.
    • Introduced canonical readiness vocabulary guidance across planning and batch/PR closeout.
    • Added durable solutions documentation plus a structured review-finding schema.
  • Bug Fixes
    • Strengthened validation to enforce correct solution-doc metadata/body and strict review-finding JSON schema, including proper UNKNOWN handling rules.
  • Tests
    • Expanded automated coverage for solution/review validation and autoreview target-state selection.
  • Documentation
    • Added/installed the new solutions docs and schema, and updated installation/upgrades notes accordingly.

@coderabbitai

coderabbitai Bot commented Jul 3, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@justin808, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 22 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1809644c-7d96-4eaa-b7b1-97793ee9f973

📥 Commits

Reviewing files that changed from the base of the PR and between c96fb0b and 0d5370a.

📒 Files selected for processing (11)
  • bin/install-agent-workflows
  • bin/install-agent-workflows-test.bash
  • bin/validate-review-findings
  • bin/validate-review-findings-test.rb
  • bin/validate-solutions
  • bin/validate-solutions-test.rb
  • skills/autoreview/bin/autoreview-target-state
  • skills/autoreview/bin/autoreview-target-state-test.rb
  • skills/task-observer/SKILL.md
  • skills/task-observer/bin/task-observer
  • skills/task-observer/bin/task-observer-test.rb

Walkthrough

This PR adds durable solution docs and validation, standardizes readiness vocabulary across planning skills, defines a structured review-finding schema with validation, adds an autoreview target-state helper and tests, and introduces an optional task-observer skill with CLI and install/validation wiring.

Changes

Durable Workflow Solutions Library

Layer / File(s) Summary
Solutions docs
docs/solutions/README.md, docs/solutions/coordination-unknown-state.md, docs/solutions/github-content-is-evidence.md, CHANGELOG.md, docs/installation-and-upgrades.md
Adds the solutions library conventions and lesson docs, with changelog and installation docs updated to mention the new pack-owned docs.
Solution validator and tests
bin/validate-solutions, bin/validate-solutions-test.rb, bin/validate, bin/install-agent-workflows, bin/install-agent-workflows-test.bash
Validates solution Markdown frontmatter, required fields, dates, related files, and body content, then wires the checks into validation and install flows.

Canonical Readiness Vocabulary

Layer / File(s) Summary
Shared vocabulary docs
skills/spec/SKILL.md, skills/plan-pr-batch/SKILL.md, skills/pr-batch/SKILL.md
Adds canonical readiness wording and UNKNOWN handling to the planning and batch skills.
Readiness contract test
skills/pr-batch/bin/goal-completion-contract-test.rb
Checks that the shared readiness vocabulary and structured readiness markers stay canonical.

Structured Review Finding Schema

Layer / File(s) Summary
Schema specification
docs/review-finding-schema.md
Defines the shared structured review-finding JSON schema, including required fields, enums, and verification semantics.
Schema validator and tests
bin/validate-review-findings, bin/validate-review-findings-test.rb, bin/validate, skills/adversarial-pr-review/SKILL.md
Adds a validator/test pair for the schema and an optional structured review-finding step in the adversarial review skill.

Autoreview Target-State Machine

Layer / File(s) Summary
State classification
skills/autoreview/bin/autoreview-target-state
Classifies repo state into named autoreview outcomes, resolves git/gh state, and formats CLI output.
Classifier tests
skills/autoreview/bin/autoreview-target-state-test.rb
Covers branch, dirty-work, unknown-state, escaping, configured-base, and timeout cases.
Skill guidance and validate wiring
skills/autoreview/SKILL.md, bin/validate
Rewrites target-picking guidance to use the helper’s state table and adds the helper test to validation.

Task-Observer Meta-Skill

Layer / File(s) Summary
Skill docs and UI metadata
skills/task-observer/SKILL.md, skills/task-observer/agents/openai.yaml, README.md
Adds task-observer frontmatter, activation/privacy rules, memory-helper guidance, staged-update behavior, closeout checklist, picker metadata, and an inventory entry.
Memory-backed CLI
skills/task-observer/bin/task-observer
Implements memory root resolution, init/status/list/append commands, privacy checks, and CLI dispatch.
CLI tests and validate wiring
skills/task-observer/bin/task-observer-test.rb, bin/validate
Adds CLI integration tests and includes the new helper test in the validation gate.

Estimated code review effort: 4 (Complex) | ~75 minutes

Sequence Diagram(s)

sequenceDiagram
  participant Skill as skills/autoreview/SKILL.md
  participant Runner as AutoreviewTargetState::Runner
  participant Git as git
  participant GH as gh
  Skill->>Runner: run --text
  Runner->>Git: current_branch, status_facts, branch_diff
  Runner->>GH: pr_base
  GH-->>Runner: baseRefName or no-PR state
  Runner->>Runner: classify facts
  Runner-->>Skill: state, disposition, review_targets, message
Loading
sequenceDiagram
  participant Skill as skills/task-observer/SKILL.md
  participant CLI as TaskObserver CLI
  participant FS as task-observer memory files
  Skill->>CLI: append --kind --summary --source
  CLI->>CLI: check_privacy!
  CLI->>FS: append JSONL observation
  FS-->>CLI: write confirmation
  CLI-->>Skill: success or error
Loading

Possibly related PRs

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title is concise and captures the two main additions: durable knowledge contracts and the task-observer skill.
Linked Issues check ✅ Passed The PR addresses the linked issues with new solution docs, readiness vocabulary, review-finding schema, autoreview fixtures, and task-observer support.
Out of Scope Changes check ✅ Passed The changes stay focused on the documented objectives and related validation/docs, with no obvious unrelated code paths.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/awf-b5-knowledge-contracts

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c0e96f210d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread bin/validate-solutions

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (7)
bin/validate-solutions (1)

43-47: 🗄️ Data Integrity & Integration | 🔵 Trivial | ⚡ Quick win

Consider validating related_files entries reference real paths.

related_files is currently only checked for being a non-empty list of non-empty strings, not that the referenced paths exist. Since these docs are meant to be durable/portable references, a stale/renamed path would silently go undetected.

♻️ Proposed enhancement
     REQUIRED_LIST_FIELDS.each do |field|
       next unless frontmatter.key?(field)

       value = frontmatter[field]
       if !value.is_a?(Array)
         failures << "#{relative}: #{field} must be a list"
       elsif value.any? { |entry| !entry.is_a?(String) || entry.strip.empty? }
         failures << "#{relative}: #{field} entries must be non-empty strings"
+      elsif field == "related_files"
+        missing = value.reject { |entry| File.exist?(File.join(root, entry)) }
+        failures << "#{relative}: related_files not found: #{missing.join(', ')}" unless missing.empty?
       end
     end

Note: root would need to be threaded into validate_required_fields (currently only validate_file has it).

Also applies to: 86-102

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bin/validate-solutions` around lines 43 - 47, The related_files validation
only checks for non-empty strings and can miss stale or renamed references.
Update validate_required_fields (and any callers such as validate_file) to
accept root, then add existence checks for each related_files entry using the
same root-relative path resolution used elsewhere so every referenced file must
actually exist. Keep the current non-empty list/string validation in place, and
wire the new path check into the existing validation flow that handles
related_files.
skills/adversarial-pr-review/SKILL.md (1)

56-67: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Handle contradicted findings explicitly.

The schema allows verification.status: contradicted, but this step only distinguishes verified/current from advisory. Add one sentence for false positives so the structured block always maps contradicted findings to a declared disposition instead of leaving the outcome implicit.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@skills/adversarial-pr-review/SKILL.md` around lines 56 - 67, The structured
review-findings guidance is missing an explicit disposition for contradicted
false positives. Update the instructions around the Review Finding schema
mapping in SKILL.md so false positives are called out directly and mapped to a
declared outcome instead of being left implicit, using the existing labels like
NOISE and the `verification.status: contradicted` case. Keep the human-readable
report first, and ensure the guidance clearly tells authors how to represent
contradicted findings in the `review-findings` block.
skills/task-observer/bin/task-observer (2)

123-128: 🎯 Functional Correctness | 🔵 Trivial | 💤 Low value

Time.parse is more lenient than strict ISO 8601.

The error message promises TASK_OBSERVER_TIME must be an ISO 8601 timestamp, but Time.parse accepts many non-ISO formats too, so invalid-but-parseable input silently passes. Time.iso8601 would enforce the stated contract.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@skills/task-observer/bin/task-observer` around lines 123 - 128, The
current_time method accepts non-ISO timestamp formats because it uses
Time.parse, which conflicts with the TASK_OBSERVER_TIME validation contract.
Update current_time in the task-observer script to use Time.iso8601 for parsing
the ENV["TASK_OBSERVER_TIME"] value, keep the empty-value fallback to
Time.now.utc, and preserve the existing rescue Error path with the ISO 8601
message.

133-133: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win

Replace URI.extract with URI::DEFAULT_PARSER.extract URI.extract is obsolete and emits warnings under -w; calling the parser directly keeps the same behavior without the warning.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@skills/task-observer/bin/task-observer` at line 133, `URI.extract` in the
task observer URL parsing path is obsolete and triggers warnings; update the
extraction call in the observer script to use `URI::DEFAULT_PARSER.extract`
instead while preserving the existing `text` and scheme list behavior. Use the
`URI`/`extract` call site in the task-observer entrypoint to locate the change.
skills/task-observer/bin/task-observer-test.rb (1)

124-139: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win

Consider adding a malformed-query-string case.

Given the ArgumentError gap flagged in bin/task-observer's check_privacy!, a test with a malformed percent-encoded query (e.g., ?foo=%GG) would catch the crash if unfixed.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@skills/task-observer/bin/task-observer-test.rb` around lines 124 - 139, Add a
test for malformed percent-encoded query strings to cover the ArgumentError path
in check_privacy! used by task-observer append. Extend task-observer-test.rb
with a case similar to test_append_rejects_private_urls_with_query_strings that
passes a summary containing a URL like ?foo=%GG, then assert the command fails
cleanly and reports a private/invalid URL rejection instead of crashing. Use the
existing capture_task_observer and append flow so the new case exercises the
same validation path in check_privacy!.
skills/task-observer/SKILL.md (1)

41-52: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Doc lists 5 observation categories but CLI accepts a 6th (self-improvement).

ALLOWED_KINDS in bin/task-observer includes self-improvement, which isn't described here. Worth syncing the doc and the CLI's allowed values so users aren't surprised by an undocumented valid --kind.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@skills/task-observer/SKILL.md` around lines 41 - 52, The observation-kind
documentation is out of sync with the CLI because `ALLOWED_KINDS` in
`bin/task-observer` includes `self-improvement` but `SKILL.md` only lists five
categories. Update the doc’s “What To Capture” list to include the missing kind,
and verify the CLI’s allowed values and the shared guidance stay aligned so
`--kind` matches the documented set.
skills/autoreview/bin/autoreview-target-state (1)

147-161: 🔒 Security & Privacy | 🔵 Trivial | ⚡ Quick win

Unescaped branch/base name interpolated into displayed "command" string.

branch_target/local_target build the "command" field by interpolating base (from configured_base YAML or gh pr view's baseRefName) directly into a double-quoted shell string. Git ref names are permitted to contain shell-special characters ($, ;, backticks, etc. are not disallowed by git-check-ref-format), so if this "command" string is ever copy/pasted or auto-executed by a caller (an agent, a script) rather than only displayed, a crafted branch/base name could alter or inject commands.

Since this field is explicitly meant to be run (codex review --base "origin/$base"), consider using Shellwords.escape instead of manual quoting so the value is safe to execute even with unusual ref names.

🔒 Proposed fix
+require "shellwords"
+
 def branch_target(base)
   {
     "kind" => "branch",
-    "command" => %(codex review --base "origin/#{base}"),
+    "command" => "codex review --base #{Shellwords.escape("origin/#{base}")}",
     "reason" => "Review committed branch changes against origin/#{base}."
   }
 end
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@skills/autoreview/bin/autoreview-target-state` around lines 147 - 161, The
command string built in branch_target is interpolating base directly into a
shell command, which can break or be abused with unusual ref names. Update
branch_target (and any similar command builders like local_target if they ever
interpolate user-controlled values) to construct the displayed/executable
command using Shellwords.escape or equivalent safe escaping instead of manual
double-quoted interpolation, so baseRefName/configured_base values remain safe
even when copied or executed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@skills/autoreview/bin/autoreview-target-state`:
- Around line 190-195: The helper methods that shell out via Open3.capture3 in
git and pr_base have no execution bound, so they can hang indefinitely on
stalled git/gh calls. Update those subprocess invocations to run through the
timeout(1) utility instead of relying on Ruby Timeout, and ensure both methods
still return stdout, stderr, and exit status (or raise the existing Error path)
when the command times out or fails.

In `@skills/autoreview/SKILL.md`:
- Around line 72-75: The shared skill doc currently hardcodes the fallback
package path in the AUTOREVIEW_SKILL_DIR default, which violates the
shared-skills guideline. Update the referenced autoreview command so it resolves
the target path through the appropriate AGENTS.md seam key or another
repo-agnostic indirection instead of embedding .agents/skills/autoreview
directly. Keep the change localized to the AUTOREVIEW_SKILL_DIR usage in the
skill file and preserve the existing autoreview-target-state invocation.

In `@skills/task-observer/bin/task-observer`:
- Around line 130-150: The check_privacy! validation can still crash on
malformed query strings because URI.decode_www_form may raise ArgumentError and
it is not handled. Update check_privacy! to rescue ArgumentError alongside
URI::InvalidURIError in the URI.extract loop, and re-raise TaskObserver::Error
with a user-safe message so malformed URLs are rejected gracefully without a raw
backtrace.

---

Nitpick comments:
In `@bin/validate-solutions`:
- Around line 43-47: The related_files validation only checks for non-empty
strings and can miss stale or renamed references. Update
validate_required_fields (and any callers such as validate_file) to accept root,
then add existence checks for each related_files entry using the same
root-relative path resolution used elsewhere so every referenced file must
actually exist. Keep the current non-empty list/string validation in place, and
wire the new path check into the existing validation flow that handles
related_files.

In `@skills/adversarial-pr-review/SKILL.md`:
- Around line 56-67: The structured review-findings guidance is missing an
explicit disposition for contradicted false positives. Update the instructions
around the Review Finding schema mapping in SKILL.md so false positives are
called out directly and mapped to a declared outcome instead of being left
implicit, using the existing labels like NOISE and the `verification.status:
contradicted` case. Keep the human-readable report first, and ensure the
guidance clearly tells authors how to represent contradicted findings in the
`review-findings` block.

In `@skills/autoreview/bin/autoreview-target-state`:
- Around line 147-161: The command string built in branch_target is
interpolating base directly into a shell command, which can break or be abused
with unusual ref names. Update branch_target (and any similar command builders
like local_target if they ever interpolate user-controlled values) to construct
the displayed/executable command using Shellwords.escape or equivalent safe
escaping instead of manual double-quoted interpolation, so
baseRefName/configured_base values remain safe even when copied or executed.

In `@skills/task-observer/bin/task-observer`:
- Around line 123-128: The current_time method accepts non-ISO timestamp formats
because it uses Time.parse, which conflicts with the TASK_OBSERVER_TIME
validation contract. Update current_time in the task-observer script to use
Time.iso8601 for parsing the ENV["TASK_OBSERVER_TIME"] value, keep the
empty-value fallback to Time.now.utc, and preserve the existing rescue Error
path with the ISO 8601 message.
- Line 133: `URI.extract` in the task observer URL parsing path is obsolete and
triggers warnings; update the extraction call in the observer script to use
`URI::DEFAULT_PARSER.extract` instead while preserving the existing `text` and
scheme list behavior. Use the `URI`/`extract` call site in the task-observer
entrypoint to locate the change.

In `@skills/task-observer/bin/task-observer-test.rb`:
- Around line 124-139: Add a test for malformed percent-encoded query strings to
cover the ArgumentError path in check_privacy! used by task-observer append.
Extend task-observer-test.rb with a case similar to
test_append_rejects_private_urls_with_query_strings that passes a summary
containing a URL like ?foo=%GG, then assert the command fails cleanly and
reports a private/invalid URL rejection instead of crashing. Use the existing
capture_task_observer and append flow so the new case exercises the same
validation path in check_privacy!.

In `@skills/task-observer/SKILL.md`:
- Around line 41-52: The observation-kind documentation is out of sync with the
CLI because `ALLOWED_KINDS` in `bin/task-observer` includes `self-improvement`
but `SKILL.md` only lists five categories. Update the doc’s “What To Capture”
list to include the missing kind, and verify the CLI’s allowed values and the
shared guidance stay aligned so `--kind` matches the documented set.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b248e1bb-2d78-4158-9791-8989253e01e9

📥 Commits

Reviewing files that changed from the base of the PR and between 4225b08 and c0e96f2.

📒 Files selected for processing (21)
  • CHANGELOG.md
  • README.md
  • bin/validate
  • bin/validate-solutions
  • bin/validate-solutions-test.rb
  • docs/review-finding-schema.md
  • docs/solutions/README.md
  • docs/solutions/coordination-unknown-state.md
  • docs/solutions/github-content-is-evidence.md
  • skills/adversarial-pr-review/SKILL.md
  • skills/autoreview/SKILL.md
  • skills/autoreview/bin/autoreview-target-state
  • skills/autoreview/bin/autoreview-target-state-test.rb
  • skills/plan-pr-batch/SKILL.md
  • skills/pr-batch/SKILL.md
  • skills/pr-batch/bin/goal-completion-contract-test.rb
  • skills/spec/SKILL.md
  • skills/task-observer/SKILL.md
  • skills/task-observer/agents/openai.yaml
  • skills/task-observer/bin/task-observer
  • skills/task-observer/bin/task-observer-test.rb

Comment thread skills/autoreview/bin/autoreview-target-state
Comment thread skills/autoreview/SKILL.md
Comment thread skills/task-observer/bin/task-observer
Comment thread skills/pr-batch/SKILL.md Outdated
Comment on lines +82 to +92
## Canonical Readiness Vocabulary

Use these canonical human-facing final states for target and batch handoffs:

- `merged`
- `ready-gates-clean`
- `ready-no-merge-authority`
- `waiting-on-checks-or-review`
- `external-gate-failing`
- `blocked-user-input`
- `no-pr-evidence`

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplication / drift risk: this section re-pastes the full 7-state readiness vocabulary and near-duplicate prose that already exists twice more in this same file (line 80's Completion states list, and line 327's paragraph, which already says "follow the canonical final-state ... split in .agents/workflows/pr-processing.md"), plus once more in workflows/pr-processing.md itself.

The repo already has an established fix for exactly this situation: the adjacent Goal Mode Completion Contract further down links to ../../workflows/pr-processing.md#goal-mode-completion-contract instead of repeating the text, and goal-completion-contract-test.rb's test_skill_prose_points_to_canonical_contract_instead_of_pasting_it enforces that it stays a link, not a paste.

The new goal-completion-contract-test.rb test added in this PR (test_canonical_readiness_vocabulary_is_shared_by_planning_skills) only checks that each state string is present as a substring in each of 4 files — it doesn't stop the surrounding prose from drifting between copies (and indeed the trailing sentence already differs slightly between spec/SKILL.md, plan-pr-batch/SKILL.md, and this file). Consider linking to the workflows/pr-processing.md section here (and in spec/SKILL.md / plan-pr-batch/SKILL.md) instead of re-pasting the list, consistent with how the Goal Mode Completion Contract is handled.

Comment thread skills/spec/SKILL.md Outdated
Comment on lines +27 to +45
## Canonical Readiness Vocabulary

When a spec describes downstream batch or PR readiness, use the canonical
human-facing final states from `workflows/pr-processing.md`:

- `merged`
- `ready-gates-clean`
- `ready-no-merge-authority`
- `waiting-on-checks-or-review`
- `external-gate-failing`
- `blocked-user-input`
- `no-pr-evidence`

Normal interactive output stays human-readable. Do not collapse these states
into vague labels like `ready`, `complete`, or `done`. If a fact needed to
choose a state cannot be verified, write `UNKNOWN` for that fact and keep the
state unresolved instead of guessing. Optional structured handoff blocks are
allowed only when they help a planner or validator; they supplement the normal
markdown summary and do not make JSON mandatory.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same duplication concern as the equivalent new section in skills/pr-batch/SKILL.md and skills/plan-pr-batch/SKILL.md: this fully re-pastes the 7-state list and near-duplicate prose from workflows/pr-processing.md rather than linking to it, unlike this repo's existing convention for the Goal Mode Completion Contract (link + test_skill_prose_points_to_canonical_contract_instead_of_pasting_it). Worth converging on a single canonical copy with a link from each skill.

Comment thread skills/plan-pr-batch/SKILL.md Outdated
Comment on lines +159 to +177
## Canonical Readiness Vocabulary

Use the same human-facing readiness states as `$pr-batch` and
`workflows/pr-processing.md`:

- `merged`
- `ready-gates-clean`
- `ready-no-merge-authority`
- `waiting-on-checks-or-review`
- `external-gate-failing`
- `blocked-user-input`
- `no-pr-evidence`

Normal interactive output stays human-readable. Use these states in planning
notes, done conditions, and final-bucket handoffs instead of vague labels such
as `ready`, `complete`, or `done`. Preserve explicit `UNKNOWN` for facts that
cannot be verified, including coordination, file-touch, review, CI, QA, or
merge-ledger evidence; do not turn unknown evidence into an optimistic state.
Optional structured handoff blocks may be added when they reduce ambiguity for a

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same duplication concern noted on skills/pr-batch/SKILL.md and skills/spec/SKILL.md: this re-pastes the 7-state list and near-duplicate prose from workflows/pr-processing.md instead of linking to it. Recommend one canonical copy (in the workflow) with skills linking to it, matching the existing Goal Mode Completion Contract pattern.

@claude

claude Bot commented Jul 3, 2026

Copy link
Copy Markdown

Review summary

Reviewed for code quality, bugs, security, performance, and portability (per AGENTS.md editing rules). Overall this is a solid, well-tested batch: bin/validate-solutions, autoreview-target-state, and task-observer all avoid shell injection (array-form Open3.capture3, no system/backticks with interpolated data), use YAML.safe_load with aliases: false and a restricted permitted_classes, and ship matching unit tests. Path conventions (*_SKILL_DIR defaults, agents/openai.yaml) match existing skills, and every file referenced from the new docs/solutions/*.md lessons (workflows/pr-processing.md, docs/trust-and-preflight.md, skills/pr-batch/bin/pr-security-preflight, etc.) actually exists. No consumer-repo-specific commands, labels, branches, or paths were introduced into skills/ or workflows/.

One consistency/maintainability finding worth addressing (left as inline comments on the three affected files): the new "Canonical Readiness Vocabulary" sections in skills/spec/SKILL.md, skills/plan-pr-batch/SKILL.md, and skills/pr-batch/SKILL.md fully re-paste the 7-state list plus near-duplicate prose that already lives in workflows/pr-processing.md (### Batch Handoff Format) — and, in pr-batch/SKILL.md's case, duplicates content that already existed twice more in the same file (lines 80 and 327). This repo has an established pattern for avoiding exactly this drift risk: the adjacent Goal Mode Completion Contract links to workflows/pr-processing.md#goal-mode-completion-contract instead of repeating it, enforced by a test that asserts the skill points to the canonical copy rather than pasting it. The new test_canonical_readiness_vocabulary_is_shared_by_planning_skills test only checks substring presence, so the four copies (workflow + 3 skills) can drift in wording without failing CI — and the trailing sentence already differs slightly between the three skill copies. Suggest converging on one canonical copy with links, matching the Goal Mode Completion Contract precedent. (should_fix / P2 — not a merge blocker, but worth cleaning up before the vocabulary drifts further.)

No blocking correctness, security, or portability issues found.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 003ffa58fd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread bin/validate-solutions
blocked-user-input
no-pr-evidence
].freeze
READINESS_STATE_KEYS = /\b(?:final_state|readiness_state|target_state):\s*`?([a-z0-9_-]+|UNKNOWN)`?/

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit (low severity): READINESS_STATE_KEYS only matches a token composed of [a-z0-9_-]+ or the exact literal UNKNOWN. A mixed-case value such as final_state: Ready or readiness_state: Unknown won't match either alternative, so invalid_readiness_marker_values silently skips it instead of flagging it as invalid. Since this regex is the guard that's supposed to keep structured readiness markers on the canonical vocabulary, consider adding a case-insensitive fallback (or an explicit check that a matched key's value is exactly one of the canonical tokens/UNKNOWN) so a future stray Ready/Unknown value doesn't slip through this test undetected.

Not a runtime bug today (current skill/workflow text doesn't use mixed case), just a gap in the guard's coverage.

Comment thread skills/task-observer/bin/task-observer
@claude

claude Bot commented Jul 3, 2026

Copy link
Copy Markdown

Review Summary

This PR adds a docs/solutions/ durable-lessons library (with schema-validated frontmatter), a shared review-finding JSON schema adopted by adversarial-pr-review, a canonical readiness-state vocabulary cross-linked from spec/plan-pr-batch/pr-batch back to workflows/pr-processing.md (with a new consistency test guarding it), an autoreview-target-state helper that turns diff-target selection into an explicit, well-tested state machine, and an optional task-observer skill for capturing sanitized session observations (with attribution to the upstream CC BY 4.0 methodology it's adapted from).

Overall assessment: solid. No confirmed correctness, security, or portability bugs found after a line-by-line pass over the diff, the new Ruby helpers (bin/validate-solutions, bin/validate-review-findings, skills/autoreview/bin/autoreview-target-state, skills/task-observer/bin/task-observer), and their test suites.

Specifically checked and looked clean:

  • Shell/Ruby helper safety: autoreview-target-state and task-observer use Process.spawn/argument arrays (no shell interpolation), Shellwords.escape for any display-only command strings, bounded command timeouts with proper process-group cleanup, and YAML.safe_load for config parsing.
  • Portability: no consumer-repo-specific commands, labels, branches, or release trackers found hardcoded in skills/ or workflows/. The docs/solutions/*.md issue links point at this source pack's own tracker (provenance evidence for the lesson, not consumer policy), which is consistent with the library's own stated scope in docs/solutions/README.md. The agents/openai.yaml metadata file for task-observer follows the existing convention used by other skills.
  • Cross-file consistency: the new goal-completion-contract-test.rb assertions against workflows/pr-processing.md, skills/spec/SKILL.md, skills/plan-pr-batch/SKILL.md, and skills/pr-batch/SKILL.md all match the current file contents.

Left two low-severity, non-blocking nitpicks as inline comments (a test-coverage gap in the new readiness-vocabulary regex guard, and a heuristic-filter gap in task-observer's privacy check) — neither affects current behavior, both are optional hardening.

@claude

claude Bot commented Jul 3, 2026

Copy link
Copy Markdown

Review

Reviewed the diff (docs/solutions library + validators, review-finding schema, canonical readiness vocabulary, autoreview-target-state, and the new task-observer skill).

Overall: the new Ruby helpers (bin/validate-solutions, bin/validate-review-findings, skills/autoreview/bin/autoreview-target-state, skills/task-observer/bin/task-observer) are well-tested, avoid shell-injection (all subprocess calls use argv arrays, not shell strings, and shell-facing suggestions go through Shellwords.escape), and the new docs stay generic/portable. Left 3 inline comments on real issues:

  1. skills/task-observer/bin/task-observer memory_root and skills/task-observer/SKILL.md's helper-dir fallback only ever resolve CODEX_HOME/~/.codex, never CLAUDE_HOME/~/.claude. This repo's own installer (bin/install-agent-workflows) and docs/installation-and-upgrades.md treat Codex and Claude as parallel, equally-supported hosts. On a Claude-only install with no repo-local .agents/skills/task-observer copy, the documented invocation resolves to a nonexistent Codex path and the skill's helper calls fail outright.
  2. skills/autoreview/SKILL.md introduces AUTOREVIEW_SKILL_DIR=\"${AUTOREVIEW_SKILL_DIR:?...}\" (hard-fail if unset), which is inconsistent with every other skill in the pack (address-review, update-changelog, post-merge-audit, pr-batch, etc.), all of which use ${X_SKILL_DIR:-.agents/skills/x}. Copy-pasting this new block verbatim errors immediately instead of resolving a default.

Everything else — schema validators, date/YAML edge cases, the readiness-vocabulary tests, and the autoreview-target-state state machine — checked out logically consistent with their test coverage.

Comment thread skills/task-observer/bin/task-observer
Comment thread skills/task-observer/SKILL.md
Comment thread skills/autoreview/SKILL.md Outdated
`SKILL.md`, then run the read-only helper:

```bash
AUTOREVIEW_SKILL_DIR="${AUTOREVIEW_SKILL_DIR:?set AUTOREVIEW_SKILL_DIR to the installed or repo-local autoreview skill directory}"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent with every other skill's _SKILL_DIR default.

AUTOREVIEW_SKILL_DIR="${AUTOREVIEW_SKILL_DIR:?set AUTOREVIEW_SKILL_DIR to the installed or repo-local autoreview skill directory}"

This is the only _SKILL_DIR in the whole pack that hard-fails (:?) instead of defaulting (:-). Every sibling skill uses ${X_SKILL_DIR:-.agents/skills/x}, e.g.:

  • skills/address-review/SKILL.md:159: ADDRESS_REVIEW_SKILL_DIR="${ADDRESS_REVIEW_SKILL_DIR:-.agents/skills/address-review}"
  • skills/update-changelog/SKILL.md:112: UPDATE_CHANGELOG_SKILL_DIR="${UPDATE_CHANGELOG_SKILL_DIR:-.agents/skills/update-changelog}"
  • skills/post-merge-audit/SKILL.md:33: POST_MERGE_AUDIT_SKILL_DIR="${POST_MERGE_AUDIT_SKILL_DIR:-.agents/skills/post-merge-audit}"

An agent that copies this new block verbatim without having already exported AUTOREVIEW_SKILL_DIR gets an immediate shell error instead of the read-only helper running, unlike every other copy-pasteable example in the pack. Suggest ${AUTOREVIEW_SKILL_DIR:-.agents/skills/autoreview} for consistency (the surrounding prose already says "resolve AUTOREVIEW_SKILL_DIR to the installed or repo-local directory," which the required-var form doesn't actually do).

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4fb193bbc1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +41 to +42
elsif pr["state"] == "unknown" && branch_diff == true
unknown_pr_result(pr)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve unknown PR-base probes before declaring no target

When gh pr view fails and origin/<configured_base>...HEAD is empty, this condition is false and the later else returns NO_REVIEW_TARGET. In a repo with base_branch: main, a branch/PR targeting a release branch can have an empty main diff but a non-empty release diff, so an auth/network failure makes the helper tell the user to stop instead of resolving PR_BASE_UNKNOWN; checked the new autoreview state table, which says PR-base probe failures should stay UNKNOWN.

Useful? React with 👍 / 👎.

Comment thread skills/adversarial-pr-review/SKILL.md

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d7de8e5154

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread skills/task-observer/bin/task-observer-test.rb
@justin808 justin808 force-pushed the codex/awf-b5-knowledge-contracts branch from d7de8e5 to 03aed7f Compare July 5, 2026 01:18
Comment thread skills/task-observer/bin/task-observer Outdated
"update_mode" => "staged-review-only"
}
path = File.join(memory_root, "observations", "#{observed_at.utc.strftime('%Y-%m-%d')}.jsonl")
File.open(path, "a", encoding: "UTF-8") { |file| file.puts(JSON.generate(record)) }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observation files/dirs get default (umask-derived) permissions, not locked down.

FileUtils.mkdir_p (line 70), File.write (line 73), and this File.open(path, "a", ...) never set an explicit mode, so on a typical shared/multi-user host (umask 022) the observation directory and JSONL files end up world-readable. Given the sanitizer can miss things (see the other two comments on this file), anything that slips through ends up readable by any local user via ~/.codex/memories/task-observer/observations/*.jsonl. Consider File.chmod(0o700, dir) in init and 0o600 on the JSONL file after each write.

Comment thread skills/task-observer/SKILL.md Outdated
4. Stage any skill or workflow edits as normal repo changes and wait for the
user's explicit request before overwriting live installed skills or personal
memory.
5. Run the relevant helper tests and `bin/validate` before publishing changes.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Portability: hardcodes this repo's own validate command instead of the installed-skill seam path.

Every other shared skill that references the pre-push validation command uses the seam path .agents/bin/validate (see skills/verify/SKILL.md, skills/run-ci/SKILL.md, skills/tdd/SKILL.md, skills/pr-batch/SKILL.md:262, skills/replicate-ci/SKILL.md), per AGENTS.md's "Editing Rules" ("Do not hardcode consumer repo commands ... paths in shared skills"). This line says bare bin/validate, which is this framework repo's own root-level dev command and won't exist at that path once task-observer is installed into a consumer repo.

Suggested change
5. Run the relevant helper tests and `bin/validate` before publishing changes.
5. Run the relevant helper tests and `.agents/bin/validate` before publishing changes.

Comment thread bin/validate-review-findings
Comment thread bin/validate-solutions
"pr_state" => pull_request["state"],
"review_targets" => [
branch_target(base),
local_target("Review staged, unstaged, and untracked local work.")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reason string is hardcoded to "Review staged, unstaged, and untracked local work." even when facts[:untracked_only] is true and only untracked files are dirty. The sibling local_dirty_result (line 98) correctly branches on untracked_only for the analogous single-target case, but this combined branch-plus-dirty path never checks it — minor, but the advisory text shown to the operator/agent is misleading about what actually changed locally.

@claude

claude Bot commented Jul 5, 2026

Copy link
Copy Markdown

Review Summary

Reviewed this PR's ~2.5k line diff across the new docs/solutions library, the review-finding schema/validator, the readiness-vocabulary changes, the autoreview-target-state helper, and the new task-observer skill. Posted 7 inline comments; overview below.

Security (task-observer)

The new bin/task-observer skill documents strong privacy guarantees ("never store secrets, tokens ... or credentials", "never store private URLs"), but the enforcement has real gaps:

  • The sensitive-token regex doesn't match GitHub's newer github_pat_... fine-grained PATs, so those tokens pass check_privacy! unmodified and get written to disk (see inline comment on line 21).
  • The private-host blocklist only matches whole DNS labels (internal/corp bounded by .), so hyphenated internal hostnames like wiki-internal.acmecorp.io bypass detection entirely (line 31).
  • Observation files/directories are created with default umask permissions rather than being locked down (e.g. 0600/0700), which matters given the above gaps (line 136).

None of these three cases are covered by task-observer-test.rb, which only tests the exact-label/known-prefix happy paths.

Portability (AGENTS.md "Editing Rules")

skills/task-observer/SKILL.md:126 tells the reader to run bare bin/validate before publishing, while every other shared skill in this repo (verify, run-ci, tdd, pr-batch, replicate-ci) consistently uses the installed-skill seam path .agents/bin/validate. This is this framework repo's own dev command leaking into a portable skill doc.

Robustness

  • bin/validate-review-findings's File.read(path, ...) has no rescue, so a missing/unreadable path crashes with a raw Ruby backtrace instead of the clean PASS/failure-list output the rest of the tool produces.
  • bin/validate-solutions silently reports "PASS 0 solution docs" (exit 0) if the docs/solutions/*.md glob ever comes back empty (wrong root, renamed/deleted directory), which would mask a real misconfiguration in the gate that bin/validate relies on.
  • autoreview-target-state's branch_plus_dirty_result always emits the "staged, unstaged, and untracked" reason text even when only untracked files are dirty — cosmetic, but misleading advisory output (minor).

Lower-priority / not inline

  • docs/review-finding-schema.md documents owner: UNKNOWN (uppercase) while every other enum in the same doc uses lowercase unknown, and it isn't validated either way; the example also includes an undocumented verification.checked_at field.
  • skills/pr-batch/bin/goal-completion-contract-test.rb's new READINESS_STATE_KEYS regex-based guard currently matches zero occurrences in tracked docs (final_state:/readiness_state:/target_state: aren't used anywhere yet), so it's forward-looking scaffolding rather than an active guard today — fine as written, just worth knowing it isn't exercising real content yet.

Verified clean

  • docs/solutions/coordination-unknown-state.md and docs/solutions/github-content-is-evidence.md both satisfy the schema in docs/solutions/README.md (frontmatter, date format, related_files all resolve).
  • bin/install-agent-workflows changes use array-form Process.spawn/quoted paths, guard against symlink/non-directory collisions, and don't introduce eval/curl-pipe patterns.
  • Readiness vocabulary is consistent across skills/spec, skills/plan-pr-batch, skills/pr-batch, and workflows/pr-processing.md.
  • No shell-injection surface found in autoreview-target-state (array-form Process.spawn, Shellwords.escape on interpolated branch names) or in task-observer (no system/backticks anywhere).
  • task-observer-test.rb's environment isolation (via CODEX_HOME/CLAUDE_HOME + mktmpdir) is solid — no test path falls through to a real ~/.codex/~/.claude.

Comment thread skills/task-observer/bin/task-observer
Comment thread skills/task-observer/bin/task-observer
Comment thread skills/task-observer/bin/task-observer
@claude

claude Bot commented Jul 5, 2026

Copy link
Copy Markdown

Review summary

Reviewed this batch (docs/solutions library + validators, canonical readiness vocabulary, autoreview-target-state, and the new task-observer skill/helper) with a focus on correctness, security, and portability.

Portability / AGENTS.md compliance: clean. No consumer-repo commands, labels, branches, or paths hardcoded in the new/changed skills/*/SKILL.md or docs/. The docs/solutions/*.md self-references to shakacode/agent-workflows issues are this pack's own issue tracker (documenting the lesson's origin), not a consumer-repo hardcode, so that's fine. The "Batch Handoff Format" section referenced by the new "Canonical Readiness Vocabulary" additions already exists in workflows/pr-processing.md from a prior PR — this PR only links to it, doesn't duplicate it.

Shell/Ruby helper safety:

  • bin/install-agent-workflows: the new ensure_real_directory / symlink-vs-real-file guards for docs/ correctly handle the copy-after-symlink and symlink-after-copy transitions (verified against the new tests, including that rm -rf on a stale symlink never touches the symlink's target). No injection risk — no user input reaches a shell string unsafely.
  • skills/autoreview/bin/autoreview-target-state: uses Process.spawn with argv arrays (no shell interpolation), so no command injection despite handling arbitrary branch names; the Shellwords.escape use is only for a human-readable display string, not for execution. Timeout/process-group handling looks correct.
  • bin/validate-solutions / bin/validate-review-findings: YAML/JSON parsing is properly guarded (Psych::Exception, JSON::ParserError rescued), and the date validator correctly cross-checks the raw frontmatter text against the parsed value to catch YAML's implicit date coercion. No issues found.

skills/task-observer/bin/task-observer: this is the highest-value target since it's explicitly meant to sanitize text that may originate from untrusted GitHub content before persisting it. Left 3 inline comments on concrete gaps in the privacy filter:

  1. URL fragments (#access_token=..., #code=...) are never inspected — only query string and path are checked, so OAuth-style implicit-grant tokens in a fragment can bypass the filter.
  2. The secret-pattern list only recognizes AWS access-key IDs and legacy gh[pousr]_ GitHub tokens (plus generic keyword:/keyword= shapes) — modern formats like github_pat_..., Stripe sk_live_..., Slack xoxb-..., and JWTs have no dedicated pattern and no trigger keyword, so they can pass through silently.
  3. (Lower confidence, not execution-verified) private_host?'s IPAddr-based check may not catch IPv4-mapped IPv6 (::ffff:127.0.0.1) or decimal/octal IP literals (http://2130706433/), both of which are valid host syntax but likely fail to match the plain-form PRIVATE_IP_RANGES list.

None of these are exploitable for RCE/SSRF (the tool only writes text to a local file, it doesn't fetch URLs), but they weaken the "never store secrets" guarantee the skill explicitly promises, so worth a look before broader adoption. Given this is documented as a best-effort filter, another option is just tightening the SKILL.md language so operators don't over-trust it as a full secret scanner.

Everything else — the review-finding schema, solution-doc validators, and readiness-vocabulary wiring across spec/plan-pr-batch/pr-batch — looked solid and well-covered by tests.

Comment thread skills/task-observer/bin/task-observer Outdated
/\b(?:sk|rk)_(?:live|test)_[A-Za-z0-9]{16,}\b/,
/\bxox[abprs]-[A-Za-z0-9-]{10,}\b/,
/\beyJ[A-Za-z0-9_-]*\.[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+\b/,
/\b(?:customer|patient|payment|cardholder|diagnosis|prescription|medical record|health data)\b/i,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: bare-keyword regex causes false-positive rejections of ordinary engineering observations.

This pattern rejects any summary/source/skill value that merely contains the standalone word "customer", "patient", "payment", "cardholder", "diagnosis", "prescription", "medical record", or "health data" — regardless of whether actual regulated data is present. Since check_privacy! raises on any match (task-observer lines 196-200), this means a perfectly sanitized, useful observation like:

--summary "Payment webhook retries should use an idempotency key to avoid duplicate charges."

or

--summary "Customer-facing error messages should not leak stack traces."

is unconditionally rejected with "observation appears to contain sensitive material," even though no PII/regulated data is present — these are just common engineering topic words. This directly undercuts the skill's stated purpose (skills/task-observer/SKILL.md: capture reusable skill-improvement observations), since any lesson about payment/customer/health-adjacent features can never be recorded.

Consider requiring these keywords to co-occur with an actual data shape (e.g., a number, name, or key: value pattern) rather than triggering on the bare topic word, or moving this class of check to reviewer guidance (SKILL.md "Privacy Rules") rather than a hard raise.

Comment thread skills/task-observer/bin/task-observer
Comment thread skills/task-observer/bin/task-observer
Comment thread skills/task-observer/bin/task-observer
Comment thread skills/task-observer/bin/task-observer Outdated
Comment on lines +36 to +37
/(?:^|[.-])internal(?:[.-]|$)/i,
/(?:^|[.-])corp(?:[.-]|$)/i,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internal/corp host patterns require an adjacent ./- separator, so they miss plausible real hostnames.

/(?:^|[.-])internal(?:[.-]|$)/i and /(?:^|[.-])corp(?:[.-]|$)/i don't match hosts like internalapi.example.com or corpwiki.example.com — there's no separator between the keyword and the following letters, so these are treated as public hosts and pass private_host? unflagged.

Comment thread skills/task-observer/SKILL.md
Comment thread skills/task-observer/bin/task-observer
Comment thread bin/validate-solutions Outdated
line = raw_frontmatter.lines.find { |candidate| candidate.match?(/\Adate:/) }
return nil unless line

raw = line.sub(/\Adate:\s*/, "").sub(/\s+#.*\z/, "").strip

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing-YAML-comment stripper is a silent no-op, causing valid dates to be wrongly rejected.

line.sub(/\Adate:\s*/, "").sub(/\s+#.*\z/, "").strip — lines produced by String#lines retain their trailing "\n" (true for every frontmatter field that isn't the last line). Since . doesn't match \n (no /m) and \z requires the absolute end of the string, \s+#.*\z can never match when the line still ends in \n, so the comment-stripping sub never fires.

Concrete trigger: a solution doc with date: 2026-07-02 # note (a perfectly valid unquoted ISO date with an inline comment) leaves raw = "2026-07-02 # note" after .strip only trims the trailing newline — the comment survives, DATE_FORMAT fails to match, and bin/validate-solutions reports "date must be ISO 8601 YYYY-MM-DD" for a valid date.

None of the current docs/solutions/*.md files use inline date comments, so this is latent rather than actively broken today, but the comment-stripping logic itself doesn't work as written.

Comment thread bin/install-agent-workflows Outdated
link_pack_docs() {
local docs_target="$1"
local destination source_path
mkdir -p "$docs_target" "$docs_target/solutions"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link_pack_docs is the only doc-install path that doesn't use the new ensure_real_directory guard.

Every other path this PR added (copy_pack_docs's calls to ensure_real_directory "$docs_target" / "$docs_target/solutions", and the top-level copy-mode block) converts a pre-existing symlink at the target path into a real directory before writing, or refuses with an explicit error for a non-directory. Here, plain mkdir -p "$docs_target" "$docs_target/solutions" is used instead.

If $docs_target (or its solutions subdir) already exists as a symlink to a directory elsewhere — plausible in a dotfiles-managed home, e.g. ln -s ~/dotfiles/codex-docs ~/.codex/docs before ever running the installer — mkdir -p silently succeeds (it resolves through the symlink) and this function then writes review-finding-schema.md/solutions/* symlinks into that other directory instead of ~/.codex/docs, with no warning. Every other guarded path in this script fails loudly ("Refusing to replace...") in the equivalent situation; this one silently writes to the wrong place.

Suggest using ensure_real_directory here too for consistency (note it would need -L-then-mkdir semantics compatible with symlink mode, or an equivalent explicit-refusal check for the symlink-to-directory case specifically, since ensure_real_directory as written would convert it to a real directory, which is not what symlink mode wants either).

@claude

claude Bot commented Jul 5, 2026

Copy link
Copy Markdown

Review Summary

Reviewed this PR against code quality, correctness, security, performance, portability (per AGENTS.md "Editing Rules"), and shell/Ruby helper safety. Overall this is a well-engineered, heavily-tested change (the new Ruby helpers — bin/validate-solutions, bin/validate-review-findings, skills/autoreview/bin/autoreview-target-state, skills/task-observer/bin/task-observer — each ship with thorough unit tests, use safe subprocess invocation (Process.spawn with argv arrays, not shell strings), and the install-agent-workflows symlink/copy-mode changes correctly handle the ensure_real_directory conversion case (switching a prior symlink-mode install back to copy mode without corrupting the source repo via a stale symlinked workflows/docs directory).

Left 3 inline comments:

  1. Bug (skills/task-observer/bin/task-observer:27): the regulated-data keyword regex (customer|patient|payment|cardholder|diagnosis|prescription|medical record|health data) triggers on the bare topic word, not actual PII — so a sanitized, useful observation like "Payment webhook retries should use an idempotency key" gets hard-rejected. This meaningfully narrows what the skill can actually record.
  2. Test-coverage gap (skills/task-observer/bin/task-observer:28-29): the SSN-shaped and generic 13–19 digit sequence patterns have no test coverage at all, unlike every other entry in SENSITIVE_PATTERNS.
  3. Doc nit (skills/task-observer/SKILL.md:78-82): the documented default memory path only mentions the Codex fallback; the actual agent_home resolution also falls back to CLAUDE_HOME/~/.claude.

Portability check: no hardcoded consumer-repo commands/labels/branches were found in the changed skills/*/SKILL.md or workflows/ content. The two new docs/solutions/*.md files do link to https://github.com/shakacode/agent-workflows/issues/37 in related_issues, but that's provenance for the lesson's own origin (this source pack's issue tracker), not an instruction pointing at a consumer repo's tracker, so it doesn't appear to violate the "no consumer repo hardcoding" rule — flagging for awareness in case maintainers want provenance links scrubbed before install-time distribution regardless.

No security-critical (secret leakage, injection, path traversal) issues found in the installer script or the Ruby helpers beyond the false-positive/test-gap items above.

@claude

claude Bot commented Jul 5, 2026

Copy link
Copy Markdown

Review summary

Reviewed the new knowledge-contracts/task-observer additions (docs/solutions, review-finding schema, canonical readiness vocabulary, autoreview-target-state, and the task-observer skill) for correctness, security, portability, and shell/Ruby safety. Posted 6 inline comments on concrete, verified issues:

task-observer (skills/task-observer/bin/task-observer) — this is the highest-value area to double-check given it's a new skill whose whole job is privacy filtering before persisting data to disk:

  • Sensitive-pattern list has no rule for OpenAI/Anthropic-style API keys (sk-proj-..., sk-ant-...) — only Stripe-style sk_live_/sk_test_ is covered (line 24).
  • Private-host/IP filtering only runs against http(s):// URLs; a private hostname in another scheme or with no scheme at all bypasses it entirely (line 176).
  • internal/corp host regexes require an adjacent ./- separator, so internalapi.example.com/corpwiki.example.com aren't flagged (lines 36-37).
  • observation_records only rescues JSON::ParserError; a structurally-odd-but-valid JSON line (non-Hash, or Hash missing expected keys) crashes list/status with a raw Ruby exception instead of degrading gracefully (lines 94-109).

bin/validate-solutions — the trailing-YAML-comment stripper in raw_date_value is a no-op due to how String#lines retains \n (line 131); a valid date like 2026-07-02 # note gets wrongly rejected. Currently latent since no existing doc uses inline date comments.

bin/install-agent-workflowslink_pack_docs (symlink-mode docs install) is the only doc-install path that doesn't reuse the new ensure_real_directory guard (line 191). If $docs_target is already a symlink to a directory elsewhere (dotfiles-style setup), mkdir -p silently resolves through it and pack-doc symlinks get written to the wrong location with no warning, unlike every other guarded path in this script.

What looked solid:

  • skills/autoreview/bin/autoreview-target-state's state machine, shell-escaping (Shellwords.escape), and process-group timeout/kill handling all traced correctly against the test suite; the SKILL.md table matches the helper's actual states field-for-field.
  • The new "Canonical Readiness Vocabulary" prose across spec/plan-pr-batch/pr-batch/adversarial-pr-review stays generic — no consumer-repo-specific commands, labels, or branches found, consistent with AGENTS.md's editing rules. workflows/pr-processing.md's Batch Handoff Format section defines all 7 canonical states the new tests check for.
  • docs/solutions/*.md content only references this pack's own files/issue tracker (appropriate for a pack-internal lessons library) and passes bin/validate-solutions's frontmatter rules.
  • The disposition mapping in skills/adversarial-pr-review/SKILL.md (BLOCKING/DISCUSS/FOLLOWUP/etc. → must_fix/needs_decision/etc.) is internally consistent with docs/review-finding-schema.md's enum.

Minor/lower-priority (not filed inline): goal-completion-contract-test.rb's READINESS_STATE_KEYS regex is case-sensitive and doesn't handle double-quoted values, so it could miss some malformed readiness markers in future skill edits — a test-tooling gap, not a shipped runtime bug. bin/validate-solutions's related_files existence check has no ../ containment guard and has an unstripped-vs-stripped whitespace mismatch between its emptiness and existence checks — both low-impact since this only validates the maintainer's own docs locally.

Note: sandboxed review agents couldn't execute ruby/bash test suites in this environment (all script execution required approval that wasn't grantable), so the findings above come from careful static tracing rather than live test runs — worth confirming bin/validate still passes clean before merge.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3a4a2e62d9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread skills/task-observer/bin/task-observer Outdated
raise Error, "observation contains an invalid URL; summarize without the URL" if malformed_url_encoding?(text)
raise Error, "observation appears to contain a private URL (URL credentials)" if url_userinfo?(text)

URI::DEFAULT_PARSER.extract(text, %w[http https]).each do |url|

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Scan non-HTTP private URL hosts

The private-host/path/query checks only run for http and https URLs, so a sanitized observation containing a non-HTTP internal endpoint such as ssh://internal.example.test/repo is appended without hitting private_host?. The helper already rejects private HTTP URLs without query strings, so the same privacy guarantee is bypassed for private repository/database-style URLs unless host scanning is made scheme-agnostic before writing the JSONL record.

Useful? React with 👍 / 👎.

Comment thread skills/task-observer/bin/task-observer
@justin808 justin808 merged commit 2af5091 into main Jul 5, 2026
15 checks passed
@justin808 justin808 deleted the codex/awf-b5-knowledge-contracts branch July 5, 2026 04:07

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0d5370a4ff

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

ALLOWED_KINDS = %w[correction gap skill-improvement simplification cross-cutting self-improvement].freeze
MAX_FIELD_LENGTH = 500
SENSITIVE_PATTERNS = [
/(?:password|passwd|secret|token|api[\s_-]?key|access[\s_-]?key|private[\s_-]?key|session[\s_-]?cookie)\s*[:=]/i,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject PEM private-key blocks

When a summary/source contains a pasted private-key block or prefix such as -----BEGIN OPENSSH PRIVATE KEY-----, the assignment regex on this line does not match because it requires private key to be followed by : or =. Fresh evidence beyond the earlier assignment-form comment is that PEM/OpenSSH key headers are still accepted and appended to observation memory, which violates the skill privacy rule to never store private keys.

Useful? React with 👍 / 👎.

Comment thread bin/validate-solutions
elsif value.any? { |entry| !entry.is_a?(String) || entry.strip.empty? }
failures << "#{relative}: #{field} entries must be non-empty strings"
elsif field == "related_files"
missing = value.reject { |entry| File.exist?(File.join(root, entry)) }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep related_files inside the repo

When a solution doc lists related_files: ['../../etc/passwd'], this check follows the traversal via File.join(root, entry) and passes on normal systems because the expanded path exists outside the checkout. That lets bin/validate accept non-portable lesson metadata even though docs/solutions/README.md defines related_files as pack files; normalize and require the path to remain under root before accepting it.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant