Add durable knowledge contracts and task observer#69
Conversation
|
Warning Review limit reached
Next review available in: 22 minutes Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available. How can I continue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews. How do review limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please refer docs for additional details. Review details⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (11)
WalkthroughThis PR adds durable solution docs and validation, standardizes readiness vocabulary across planning skills, defines a structured review-finding schema with validation, adds an autoreview target-state helper and tests, and introduces an optional task-observer skill with CLI and install/validation wiring. ChangesDurable Workflow Solutions Library
Canonical Readiness Vocabulary
Structured Review Finding Schema
Autoreview Target-State Machine
Task-Observer Meta-Skill
Estimated code review effort: 4 (Complex) | ~75 minutes Sequence Diagram(s)sequenceDiagram
participant Skill as skills/autoreview/SKILL.md
participant Runner as AutoreviewTargetState::Runner
participant Git as git
participant GH as gh
Skill->>Runner: run --text
Runner->>Git: current_branch, status_facts, branch_diff
Runner->>GH: pr_base
GH-->>Runner: baseRefName or no-PR state
Runner->>Runner: classify facts
Runner-->>Skill: state, disposition, review_targets, message
sequenceDiagram
participant Skill as skills/task-observer/SKILL.md
participant CLI as TaskObserver CLI
participant FS as task-observer memory files
Skill->>CLI: append --kind --summary --source
CLI->>CLI: check_privacy!
CLI->>FS: append JSONL observation
FS-->>CLI: write confirmation
CLI-->>Skill: success or error
Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c0e96f210d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (7)
bin/validate-solutions (1)
43-47: 🗄️ Data Integrity & Integration | 🔵 Trivial | ⚡ Quick winConsider validating
related_filesentries reference real paths.
related_filesis currently only checked for being a non-empty list of non-empty strings, not that the referenced paths exist. Since these docs are meant to be durable/portable references, a stale/renamed path would silently go undetected.♻️ Proposed enhancement
REQUIRED_LIST_FIELDS.each do |field| next unless frontmatter.key?(field) value = frontmatter[field] if !value.is_a?(Array) failures << "#{relative}: #{field} must be a list" elsif value.any? { |entry| !entry.is_a?(String) || entry.strip.empty? } failures << "#{relative}: #{field} entries must be non-empty strings" + elsif field == "related_files" + missing = value.reject { |entry| File.exist?(File.join(root, entry)) } + failures << "#{relative}: related_files not found: #{missing.join(', ')}" unless missing.empty? end endNote:
rootwould need to be threaded intovalidate_required_fields(currently onlyvalidate_filehas it).Also applies to: 86-102
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@bin/validate-solutions` around lines 43 - 47, The related_files validation only checks for non-empty strings and can miss stale or renamed references. Update validate_required_fields (and any callers such as validate_file) to accept root, then add existence checks for each related_files entry using the same root-relative path resolution used elsewhere so every referenced file must actually exist. Keep the current non-empty list/string validation in place, and wire the new path check into the existing validation flow that handles related_files.skills/adversarial-pr-review/SKILL.md (1)
56-67: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick winHandle contradicted findings explicitly.
The schema allows
verification.status: contradicted, but this step only distinguishes verified/current from advisory. Add one sentence for false positives so the structured block always maps contradicted findings to a declared disposition instead of leaving the outcome implicit.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@skills/adversarial-pr-review/SKILL.md` around lines 56 - 67, The structured review-findings guidance is missing an explicit disposition for contradicted false positives. Update the instructions around the Review Finding schema mapping in SKILL.md so false positives are called out directly and mapped to a declared outcome instead of being left implicit, using the existing labels like NOISE and the `verification.status: contradicted` case. Keep the human-readable report first, and ensure the guidance clearly tells authors how to represent contradicted findings in the `review-findings` block.skills/task-observer/bin/task-observer (2)
123-128: 🎯 Functional Correctness | 🔵 Trivial | 💤 Low value
Time.parseis more lenient than strict ISO 8601.The error message promises
TASK_OBSERVER_TIME must be an ISO 8601 timestamp, butTime.parseaccepts many non-ISO formats too, so invalid-but-parseable input silently passes.Time.iso8601would enforce the stated contract.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@skills/task-observer/bin/task-observer` around lines 123 - 128, The current_time method accepts non-ISO timestamp formats because it uses Time.parse, which conflicts with the TASK_OBSERVER_TIME validation contract. Update current_time in the task-observer script to use Time.iso8601 for parsing the ENV["TASK_OBSERVER_TIME"] value, keep the empty-value fallback to Time.now.utc, and preserve the existing rescue Error path with the ISO 8601 message.
133-133: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick winReplace
URI.extractwithURI::DEFAULT_PARSER.extractURI.extractis obsolete and emits warnings under-w; calling the parser directly keeps the same behavior without the warning.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@skills/task-observer/bin/task-observer` at line 133, `URI.extract` in the task observer URL parsing path is obsolete and triggers warnings; update the extraction call in the observer script to use `URI::DEFAULT_PARSER.extract` instead while preserving the existing `text` and scheme list behavior. Use the `URI`/`extract` call site in the task-observer entrypoint to locate the change.skills/task-observer/bin/task-observer-test.rb (1)
124-139: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick winConsider adding a malformed-query-string case.
Given the
ArgumentErrorgap flagged inbin/task-observer'scheck_privacy!, a test with a malformed percent-encoded query (e.g.,?foo=%GG) would catch the crash if unfixed.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@skills/task-observer/bin/task-observer-test.rb` around lines 124 - 139, Add a test for malformed percent-encoded query strings to cover the ArgumentError path in check_privacy! used by task-observer append. Extend task-observer-test.rb with a case similar to test_append_rejects_private_urls_with_query_strings that passes a summary containing a URL like ?foo=%GG, then assert the command fails cleanly and reports a private/invalid URL rejection instead of crashing. Use the existing capture_task_observer and append flow so the new case exercises the same validation path in check_privacy!.skills/task-observer/SKILL.md (1)
41-52: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winDoc lists 5 observation categories but CLI accepts a 6th (
self-improvement).
ALLOWED_KINDSinbin/task-observerincludesself-improvement, which isn't described here. Worth syncing the doc and the CLI's allowed values so users aren't surprised by an undocumented valid--kind.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@skills/task-observer/SKILL.md` around lines 41 - 52, The observation-kind documentation is out of sync with the CLI because `ALLOWED_KINDS` in `bin/task-observer` includes `self-improvement` but `SKILL.md` only lists five categories. Update the doc’s “What To Capture” list to include the missing kind, and verify the CLI’s allowed values and the shared guidance stay aligned so `--kind` matches the documented set.skills/autoreview/bin/autoreview-target-state (1)
147-161: 🔒 Security & Privacy | 🔵 Trivial | ⚡ Quick winUnescaped branch/base name interpolated into displayed "command" string.
branch_target/local_targetbuild the"command"field by interpolatingbase(fromconfigured_baseYAML orgh pr view'sbaseRefName) directly into a double-quoted shell string. Git ref names are permitted to contain shell-special characters ($,;, backticks, etc. are not disallowed bygit-check-ref-format), so if this "command" string is ever copy/pasted or auto-executed by a caller (an agent, a script) rather than only displayed, a crafted branch/base name could alter or inject commands.Since this field is explicitly meant to be run (
codex review --base "origin/$base"), consider usingShellwords.escapeinstead of manual quoting so the value is safe to execute even with unusual ref names.🔒 Proposed fix
+require "shellwords" + def branch_target(base) { "kind" => "branch", - "command" => %(codex review --base "origin/#{base}"), + "command" => "codex review --base #{Shellwords.escape("origin/#{base}")}", "reason" => "Review committed branch changes against origin/#{base}." } end🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@skills/autoreview/bin/autoreview-target-state` around lines 147 - 161, The command string built in branch_target is interpolating base directly into a shell command, which can break or be abused with unusual ref names. Update branch_target (and any similar command builders like local_target if they ever interpolate user-controlled values) to construct the displayed/executable command using Shellwords.escape or equivalent safe escaping instead of manual double-quoted interpolation, so baseRefName/configured_base values remain safe even when copied or executed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@skills/autoreview/bin/autoreview-target-state`:
- Around line 190-195: The helper methods that shell out via Open3.capture3 in
git and pr_base have no execution bound, so they can hang indefinitely on
stalled git/gh calls. Update those subprocess invocations to run through the
timeout(1) utility instead of relying on Ruby Timeout, and ensure both methods
still return stdout, stderr, and exit status (or raise the existing Error path)
when the command times out or fails.
In `@skills/autoreview/SKILL.md`:
- Around line 72-75: The shared skill doc currently hardcodes the fallback
package path in the AUTOREVIEW_SKILL_DIR default, which violates the
shared-skills guideline. Update the referenced autoreview command so it resolves
the target path through the appropriate AGENTS.md seam key or another
repo-agnostic indirection instead of embedding .agents/skills/autoreview
directly. Keep the change localized to the AUTOREVIEW_SKILL_DIR usage in the
skill file and preserve the existing autoreview-target-state invocation.
In `@skills/task-observer/bin/task-observer`:
- Around line 130-150: The check_privacy! validation can still crash on
malformed query strings because URI.decode_www_form may raise ArgumentError and
it is not handled. Update check_privacy! to rescue ArgumentError alongside
URI::InvalidURIError in the URI.extract loop, and re-raise TaskObserver::Error
with a user-safe message so malformed URLs are rejected gracefully without a raw
backtrace.
---
Nitpick comments:
In `@bin/validate-solutions`:
- Around line 43-47: The related_files validation only checks for non-empty
strings and can miss stale or renamed references. Update
validate_required_fields (and any callers such as validate_file) to accept root,
then add existence checks for each related_files entry using the same
root-relative path resolution used elsewhere so every referenced file must
actually exist. Keep the current non-empty list/string validation in place, and
wire the new path check into the existing validation flow that handles
related_files.
In `@skills/adversarial-pr-review/SKILL.md`:
- Around line 56-67: The structured review-findings guidance is missing an
explicit disposition for contradicted false positives. Update the instructions
around the Review Finding schema mapping in SKILL.md so false positives are
called out directly and mapped to a declared outcome instead of being left
implicit, using the existing labels like NOISE and the `verification.status:
contradicted` case. Keep the human-readable report first, and ensure the
guidance clearly tells authors how to represent contradicted findings in the
`review-findings` block.
In `@skills/autoreview/bin/autoreview-target-state`:
- Around line 147-161: The command string built in branch_target is
interpolating base directly into a shell command, which can break or be abused
with unusual ref names. Update branch_target (and any similar command builders
like local_target if they ever interpolate user-controlled values) to construct
the displayed/executable command using Shellwords.escape or equivalent safe
escaping instead of manual double-quoted interpolation, so
baseRefName/configured_base values remain safe even when copied or executed.
In `@skills/task-observer/bin/task-observer`:
- Around line 123-128: The current_time method accepts non-ISO timestamp formats
because it uses Time.parse, which conflicts with the TASK_OBSERVER_TIME
validation contract. Update current_time in the task-observer script to use
Time.iso8601 for parsing the ENV["TASK_OBSERVER_TIME"] value, keep the
empty-value fallback to Time.now.utc, and preserve the existing rescue Error
path with the ISO 8601 message.
- Line 133: `URI.extract` in the task observer URL parsing path is obsolete and
triggers warnings; update the extraction call in the observer script to use
`URI::DEFAULT_PARSER.extract` instead while preserving the existing `text` and
scheme list behavior. Use the `URI`/`extract` call site in the task-observer
entrypoint to locate the change.
In `@skills/task-observer/bin/task-observer-test.rb`:
- Around line 124-139: Add a test for malformed percent-encoded query strings to
cover the ArgumentError path in check_privacy! used by task-observer append.
Extend task-observer-test.rb with a case similar to
test_append_rejects_private_urls_with_query_strings that passes a summary
containing a URL like ?foo=%GG, then assert the command fails cleanly and
reports a private/invalid URL rejection instead of crashing. Use the existing
capture_task_observer and append flow so the new case exercises the same
validation path in check_privacy!.
In `@skills/task-observer/SKILL.md`:
- Around line 41-52: The observation-kind documentation is out of sync with the
CLI because `ALLOWED_KINDS` in `bin/task-observer` includes `self-improvement`
but `SKILL.md` only lists five categories. Update the doc’s “What To Capture”
list to include the missing kind, and verify the CLI’s allowed values and the
shared guidance stay aligned so `--kind` matches the documented set.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b248e1bb-2d78-4158-9791-8989253e01e9
📒 Files selected for processing (21)
CHANGELOG.mdREADME.mdbin/validatebin/validate-solutionsbin/validate-solutions-test.rbdocs/review-finding-schema.mddocs/solutions/README.mddocs/solutions/coordination-unknown-state.mddocs/solutions/github-content-is-evidence.mdskills/adversarial-pr-review/SKILL.mdskills/autoreview/SKILL.mdskills/autoreview/bin/autoreview-target-stateskills/autoreview/bin/autoreview-target-state-test.rbskills/plan-pr-batch/SKILL.mdskills/pr-batch/SKILL.mdskills/pr-batch/bin/goal-completion-contract-test.rbskills/spec/SKILL.mdskills/task-observer/SKILL.mdskills/task-observer/agents/openai.yamlskills/task-observer/bin/task-observerskills/task-observer/bin/task-observer-test.rb
| ## Canonical Readiness Vocabulary | ||
|
|
||
| Use these canonical human-facing final states for target and batch handoffs: | ||
|
|
||
| - `merged` | ||
| - `ready-gates-clean` | ||
| - `ready-no-merge-authority` | ||
| - `waiting-on-checks-or-review` | ||
| - `external-gate-failing` | ||
| - `blocked-user-input` | ||
| - `no-pr-evidence` |
There was a problem hiding this comment.
Duplication / drift risk: this section re-pastes the full 7-state readiness vocabulary and near-duplicate prose that already exists twice more in this same file (line 80's Completion states list, and line 327's paragraph, which already says "follow the canonical final-state ... split in .agents/workflows/pr-processing.md"), plus once more in workflows/pr-processing.md itself.
The repo already has an established fix for exactly this situation: the adjacent Goal Mode Completion Contract further down links to ../../workflows/pr-processing.md#goal-mode-completion-contract instead of repeating the text, and goal-completion-contract-test.rb's test_skill_prose_points_to_canonical_contract_instead_of_pasting_it enforces that it stays a link, not a paste.
The new goal-completion-contract-test.rb test added in this PR (test_canonical_readiness_vocabulary_is_shared_by_planning_skills) only checks that each state string is present as a substring in each of 4 files — it doesn't stop the surrounding prose from drifting between copies (and indeed the trailing sentence already differs slightly between spec/SKILL.md, plan-pr-batch/SKILL.md, and this file). Consider linking to the workflows/pr-processing.md section here (and in spec/SKILL.md / plan-pr-batch/SKILL.md) instead of re-pasting the list, consistent with how the Goal Mode Completion Contract is handled.
| ## Canonical Readiness Vocabulary | ||
|
|
||
| When a spec describes downstream batch or PR readiness, use the canonical | ||
| human-facing final states from `workflows/pr-processing.md`: | ||
|
|
||
| - `merged` | ||
| - `ready-gates-clean` | ||
| - `ready-no-merge-authority` | ||
| - `waiting-on-checks-or-review` | ||
| - `external-gate-failing` | ||
| - `blocked-user-input` | ||
| - `no-pr-evidence` | ||
|
|
||
| Normal interactive output stays human-readable. Do not collapse these states | ||
| into vague labels like `ready`, `complete`, or `done`. If a fact needed to | ||
| choose a state cannot be verified, write `UNKNOWN` for that fact and keep the | ||
| state unresolved instead of guessing. Optional structured handoff blocks are | ||
| allowed only when they help a planner or validator; they supplement the normal | ||
| markdown summary and do not make JSON mandatory. |
There was a problem hiding this comment.
Same duplication concern as the equivalent new section in skills/pr-batch/SKILL.md and skills/plan-pr-batch/SKILL.md: this fully re-pastes the 7-state list and near-duplicate prose from workflows/pr-processing.md rather than linking to it, unlike this repo's existing convention for the Goal Mode Completion Contract (link + test_skill_prose_points_to_canonical_contract_instead_of_pasting_it). Worth converging on a single canonical copy with a link from each skill.
| ## Canonical Readiness Vocabulary | ||
|
|
||
| Use the same human-facing readiness states as `$pr-batch` and | ||
| `workflows/pr-processing.md`: | ||
|
|
||
| - `merged` | ||
| - `ready-gates-clean` | ||
| - `ready-no-merge-authority` | ||
| - `waiting-on-checks-or-review` | ||
| - `external-gate-failing` | ||
| - `blocked-user-input` | ||
| - `no-pr-evidence` | ||
|
|
||
| Normal interactive output stays human-readable. Use these states in planning | ||
| notes, done conditions, and final-bucket handoffs instead of vague labels such | ||
| as `ready`, `complete`, or `done`. Preserve explicit `UNKNOWN` for facts that | ||
| cannot be verified, including coordination, file-touch, review, CI, QA, or | ||
| merge-ledger evidence; do not turn unknown evidence into an optimistic state. | ||
| Optional structured handoff blocks may be added when they reduce ambiguity for a |
There was a problem hiding this comment.
Same duplication concern noted on skills/pr-batch/SKILL.md and skills/spec/SKILL.md: this re-pastes the 7-state list and near-duplicate prose from workflows/pr-processing.md instead of linking to it. Recommend one canonical copy (in the workflow) with skills linking to it, matching the existing Goal Mode Completion Contract pattern.
Review summaryReviewed for code quality, bugs, security, performance, and portability (per One consistency/maintainability finding worth addressing (left as inline comments on the three affected files): the new "Canonical Readiness Vocabulary" sections in No blocking correctness, security, or portability issues found. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 003ffa58fd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| blocked-user-input | ||
| no-pr-evidence | ||
| ].freeze | ||
| READINESS_STATE_KEYS = /\b(?:final_state|readiness_state|target_state):\s*`?([a-z0-9_-]+|UNKNOWN)`?/ |
There was a problem hiding this comment.
Nit (low severity): READINESS_STATE_KEYS only matches a token composed of [a-z0-9_-]+ or the exact literal UNKNOWN. A mixed-case value such as final_state: Ready or readiness_state: Unknown won't match either alternative, so invalid_readiness_marker_values silently skips it instead of flagging it as invalid. Since this regex is the guard that's supposed to keep structured readiness markers on the canonical vocabulary, consider adding a case-insensitive fallback (or an explicit check that a matched key's value is exactly one of the canonical tokens/UNKNOWN) so a future stray Ready/Unknown value doesn't slip through this test undetected.
Not a runtime bug today (current skill/workflow text doesn't use mixed case), just a gap in the guard's coverage.
Review SummaryThis PR adds a Overall assessment: solid. No confirmed correctness, security, or portability bugs found after a line-by-line pass over the diff, the new Ruby helpers ( Specifically checked and looked clean:
Left two low-severity, non-blocking nitpicks as inline comments (a test-coverage gap in the new readiness-vocabulary regex guard, and a heuristic-filter gap in |
ReviewReviewed the diff (docs/solutions library + validators, review-finding schema, canonical readiness vocabulary, Overall: the new Ruby helpers (
Everything else — schema validators, date/YAML edge cases, the readiness-vocabulary tests, and the |
| `SKILL.md`, then run the read-only helper: | ||
|
|
||
| ```bash | ||
| AUTOREVIEW_SKILL_DIR="${AUTOREVIEW_SKILL_DIR:?set AUTOREVIEW_SKILL_DIR to the installed or repo-local autoreview skill directory}" |
There was a problem hiding this comment.
Inconsistent with every other skill's _SKILL_DIR default.
AUTOREVIEW_SKILL_DIR="${AUTOREVIEW_SKILL_DIR:?set AUTOREVIEW_SKILL_DIR to the installed or repo-local autoreview skill directory}"This is the only _SKILL_DIR in the whole pack that hard-fails (:?) instead of defaulting (:-). Every sibling skill uses ${X_SKILL_DIR:-.agents/skills/x}, e.g.:
skills/address-review/SKILL.md:159:ADDRESS_REVIEW_SKILL_DIR="${ADDRESS_REVIEW_SKILL_DIR:-.agents/skills/address-review}"skills/update-changelog/SKILL.md:112:UPDATE_CHANGELOG_SKILL_DIR="${UPDATE_CHANGELOG_SKILL_DIR:-.agents/skills/update-changelog}"skills/post-merge-audit/SKILL.md:33:POST_MERGE_AUDIT_SKILL_DIR="${POST_MERGE_AUDIT_SKILL_DIR:-.agents/skills/post-merge-audit}"
An agent that copies this new block verbatim without having already exported AUTOREVIEW_SKILL_DIR gets an immediate shell error instead of the read-only helper running, unlike every other copy-pasteable example in the pack. Suggest ${AUTOREVIEW_SKILL_DIR:-.agents/skills/autoreview} for consistency (the surrounding prose already says "resolve AUTOREVIEW_SKILL_DIR to the installed or repo-local directory," which the required-var form doesn't actually do).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4fb193bbc1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| elsif pr["state"] == "unknown" && branch_diff == true | ||
| unknown_pr_result(pr) |
There was a problem hiding this comment.
Preserve unknown PR-base probes before declaring no target
When gh pr view fails and origin/<configured_base>...HEAD is empty, this condition is false and the later else returns NO_REVIEW_TARGET. In a repo with base_branch: main, a branch/PR targeting a release branch can have an empty main diff but a non-empty release diff, so an auth/network failure makes the helper tell the user to stop instead of resolving PR_BASE_UNKNOWN; checked the new autoreview state table, which says PR-base probe failures should stay UNKNOWN.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d7de8e5154
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
d7de8e5 to
03aed7f
Compare
| "update_mode" => "staged-review-only" | ||
| } | ||
| path = File.join(memory_root, "observations", "#{observed_at.utc.strftime('%Y-%m-%d')}.jsonl") | ||
| File.open(path, "a", encoding: "UTF-8") { |file| file.puts(JSON.generate(record)) } |
There was a problem hiding this comment.
Observation files/dirs get default (umask-derived) permissions, not locked down.
FileUtils.mkdir_p (line 70), File.write (line 73), and this File.open(path, "a", ...) never set an explicit mode, so on a typical shared/multi-user host (umask 022) the observation directory and JSONL files end up world-readable. Given the sanitizer can miss things (see the other two comments on this file), anything that slips through ends up readable by any local user via ~/.codex/memories/task-observer/observations/*.jsonl. Consider File.chmod(0o700, dir) in init and 0o600 on the JSONL file after each write.
| 4. Stage any skill or workflow edits as normal repo changes and wait for the | ||
| user's explicit request before overwriting live installed skills or personal | ||
| memory. | ||
| 5. Run the relevant helper tests and `bin/validate` before publishing changes. |
There was a problem hiding this comment.
Portability: hardcodes this repo's own validate command instead of the installed-skill seam path.
Every other shared skill that references the pre-push validation command uses the seam path .agents/bin/validate (see skills/verify/SKILL.md, skills/run-ci/SKILL.md, skills/tdd/SKILL.md, skills/pr-batch/SKILL.md:262, skills/replicate-ci/SKILL.md), per AGENTS.md's "Editing Rules" ("Do not hardcode consumer repo commands ... paths in shared skills"). This line says bare bin/validate, which is this framework repo's own root-level dev command and won't exist at that path once task-observer is installed into a consumer repo.
| 5. Run the relevant helper tests and `bin/validate` before publishing changes. | |
| 5. Run the relevant helper tests and `.agents/bin/validate` before publishing changes. |
| "pr_state" => pull_request["state"], | ||
| "review_targets" => [ | ||
| branch_target(base), | ||
| local_target("Review staged, unstaged, and untracked local work.") |
There was a problem hiding this comment.
This reason string is hardcoded to "Review staged, unstaged, and untracked local work." even when facts[:untracked_only] is true and only untracked files are dirty. The sibling local_dirty_result (line 98) correctly branches on untracked_only for the analogous single-target case, but this combined branch-plus-dirty path never checks it — minor, but the advisory text shown to the operator/agent is misleading about what actually changed locally.
Review SummaryReviewed this PR's ~2.5k line diff across the new Security (task-observer)The new
None of these three cases are covered by Portability (AGENTS.md "Editing Rules")
Robustness
Lower-priority / not inline
Verified clean
|
Review summaryReviewed this batch (docs/solutions library + validators, canonical readiness vocabulary, Portability / AGENTS.md compliance: clean. No consumer-repo commands, labels, branches, or paths hardcoded in the new/changed Shell/Ruby helper safety:
None of these are exploitable for RCE/SSRF (the tool only writes text to a local file, it doesn't fetch URLs), but they weaken the "never store secrets" guarantee the skill explicitly promises, so worth a look before broader adoption. Given this is documented as a best-effort filter, another option is just tightening the SKILL.md language so operators don't over-trust it as a full secret scanner. Everything else — the review-finding schema, solution-doc validators, and readiness-vocabulary wiring across |
| /\b(?:sk|rk)_(?:live|test)_[A-Za-z0-9]{16,}\b/, | ||
| /\bxox[abprs]-[A-Za-z0-9-]{10,}\b/, | ||
| /\beyJ[A-Za-z0-9_-]*\.[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+\b/, | ||
| /\b(?:customer|patient|payment|cardholder|diagnosis|prescription|medical record|health data)\b/i, |
There was a problem hiding this comment.
Bug: bare-keyword regex causes false-positive rejections of ordinary engineering observations.
This pattern rejects any summary/source/skill value that merely contains the standalone word "customer", "patient", "payment", "cardholder", "diagnosis", "prescription", "medical record", or "health data" — regardless of whether actual regulated data is present. Since check_privacy! raises on any match (task-observer lines 196-200), this means a perfectly sanitized, useful observation like:
--summary "Payment webhook retries should use an idempotency key to avoid duplicate charges."
or
--summary "Customer-facing error messages should not leak stack traces."
is unconditionally rejected with "observation appears to contain sensitive material," even though no PII/regulated data is present — these are just common engineering topic words. This directly undercuts the skill's stated purpose (skills/task-observer/SKILL.md: capture reusable skill-improvement observations), since any lesson about payment/customer/health-adjacent features can never be recorded.
Consider requiring these keywords to co-occur with an actual data shape (e.g., a number, name, or key: value pattern) rather than triggering on the bare topic word, or moving this class of check to reviewer guidance (SKILL.md "Privacy Rules") rather than a hard raise.
| /(?:^|[.-])internal(?:[.-]|$)/i, | ||
| /(?:^|[.-])corp(?:[.-]|$)/i, |
There was a problem hiding this comment.
internal/corp host patterns require an adjacent ./- separator, so they miss plausible real hostnames.
/(?:^|[.-])internal(?:[.-]|$)/i and /(?:^|[.-])corp(?:[.-]|$)/i don't match hosts like internalapi.example.com or corpwiki.example.com — there's no separator between the keyword and the following letters, so these are treated as public hosts and pass private_host? unflagged.
| line = raw_frontmatter.lines.find { |candidate| candidate.match?(/\Adate:/) } | ||
| return nil unless line | ||
|
|
||
| raw = line.sub(/\Adate:\s*/, "").sub(/\s+#.*\z/, "").strip |
There was a problem hiding this comment.
Trailing-YAML-comment stripper is a silent no-op, causing valid dates to be wrongly rejected.
line.sub(/\Adate:\s*/, "").sub(/\s+#.*\z/, "").strip — lines produced by String#lines retain their trailing "\n" (true for every frontmatter field that isn't the last line). Since . doesn't match \n (no /m) and \z requires the absolute end of the string, \s+#.*\z can never match when the line still ends in \n, so the comment-stripping sub never fires.
Concrete trigger: a solution doc with date: 2026-07-02 # note (a perfectly valid unquoted ISO date with an inline comment) leaves raw = "2026-07-02 # note" after .strip only trims the trailing newline — the comment survives, DATE_FORMAT fails to match, and bin/validate-solutions reports "date must be ISO 8601 YYYY-MM-DD" for a valid date.
None of the current docs/solutions/*.md files use inline date comments, so this is latent rather than actively broken today, but the comment-stripping logic itself doesn't work as written.
| link_pack_docs() { | ||
| local docs_target="$1" | ||
| local destination source_path | ||
| mkdir -p "$docs_target" "$docs_target/solutions" |
There was a problem hiding this comment.
link_pack_docs is the only doc-install path that doesn't use the new ensure_real_directory guard.
Every other path this PR added (copy_pack_docs's calls to ensure_real_directory "$docs_target" / "$docs_target/solutions", and the top-level copy-mode block) converts a pre-existing symlink at the target path into a real directory before writing, or refuses with an explicit error for a non-directory. Here, plain mkdir -p "$docs_target" "$docs_target/solutions" is used instead.
If $docs_target (or its solutions subdir) already exists as a symlink to a directory elsewhere — plausible in a dotfiles-managed home, e.g. ln -s ~/dotfiles/codex-docs ~/.codex/docs before ever running the installer — mkdir -p silently succeeds (it resolves through the symlink) and this function then writes review-finding-schema.md/solutions/* symlinks into that other directory instead of ~/.codex/docs, with no warning. Every other guarded path in this script fails loudly ("Refusing to replace...") in the equivalent situation; this one silently writes to the wrong place.
Suggest using ensure_real_directory here too for consistency (note it would need -L-then-mkdir semantics compatible with symlink mode, or an equivalent explicit-refusal check for the symlink-to-directory case specifically, since ensure_real_directory as written would convert it to a real directory, which is not what symlink mode wants either).
Review SummaryReviewed this PR against code quality, correctness, security, performance, portability (per Left 3 inline comments:
Portability check: no hardcoded consumer-repo commands/labels/branches were found in the changed No security-critical (secret leakage, injection, path traversal) issues found in the installer script or the Ruby helpers beyond the false-positive/test-gap items above. |
Review summaryReviewed the new knowledge-contracts/task-observer additions (docs/solutions, review-finding schema, canonical readiness vocabulary, autoreview-target-state, and the task-observer skill) for correctness, security, portability, and shell/Ruby safety. Posted 6 inline comments on concrete, verified issues: task-observer (skills/task-observer/bin/task-observer) — this is the highest-value area to double-check given it's a new skill whose whole job is privacy filtering before persisting data to disk:
bin/validate-solutions — the trailing-YAML-comment stripper in bin/install-agent-workflows — What looked solid:
Minor/lower-priority (not filed inline): Note: sandboxed review agents couldn't execute |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3a4a2e62d9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| raise Error, "observation contains an invalid URL; summarize without the URL" if malformed_url_encoding?(text) | ||
| raise Error, "observation appears to contain a private URL (URL credentials)" if url_userinfo?(text) | ||
|
|
||
| URI::DEFAULT_PARSER.extract(text, %w[http https]).each do |url| |
There was a problem hiding this comment.
Scan non-HTTP private URL hosts
The private-host/path/query checks only run for http and https URLs, so a sanitized observation containing a non-HTTP internal endpoint such as ssh://internal.example.test/repo is appended without hitting private_host?. The helper already rejects private HTTP URLs without query strings, so the same privacy guarantee is bypassed for private repository/database-style URLs unless host scanning is made scheme-agnostic before writing the JSONL record.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0d5370a4ff
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ALLOWED_KINDS = %w[correction gap skill-improvement simplification cross-cutting self-improvement].freeze | ||
| MAX_FIELD_LENGTH = 500 | ||
| SENSITIVE_PATTERNS = [ | ||
| /(?:password|passwd|secret|token|api[\s_-]?key|access[\s_-]?key|private[\s_-]?key|session[\s_-]?cookie)\s*[:=]/i, |
There was a problem hiding this comment.
When a summary/source contains a pasted private-key block or prefix such as -----BEGIN OPENSSH PRIVATE KEY-----, the assignment regex on this line does not match because it requires private key to be followed by : or =. Fresh evidence beyond the earlier assignment-form comment is that PEM/OpenSSH key headers are still accepted and appended to observation memory, which violates the skill privacy rule to never store private keys.
Useful? React with 👍 / 👎.
| elsif value.any? { |entry| !entry.is_a?(String) || entry.strip.empty? } | ||
| failures << "#{relative}: #{field} entries must be non-empty strings" | ||
| elsif field == "related_files" | ||
| missing = value.reject { |entry| File.exist?(File.join(root, entry)) } |
There was a problem hiding this comment.
Keep related_files inside the repo
When a solution doc lists related_files: ['../../etc/passwd'], this check follows the traversal via File.join(root, entry) and passes on normal systems because the expanded path exists outside the checkout. That lets bin/validate accept non-portable lesson metadata even though docs/solutions/README.md defines related_files as pack files; normalize and require the path to remain under root before accepting it.
Useful? React with 👍 / 👎.
Summary
docs/solutionsconvention with schema validation and two durable lessons for coordination degradation and untrusted GitHub content.docs/review-finding-schema.mdand adopts it inadversarial-pr-reviewwhile preserving human-readable output.spec,plan-pr-batch,pr-batch, andworkflows/pr-processing.md.autoreview-target-statestate-machine fixtures and an optionaltask-observerskill with attribution, privacy rules, staged observation storage, metadata, and docs inventory links.Batch Evidence
Validation
ruby bin/validate-solutions-test.rbruby skills/pr-batch/bin/goal-completion-contract-test.rbruby skills/autoreview/bin/autoreview-target-state-test.rbruby skills/task-observer/bin/task-observer-test.rbgit diff --check origin/main...HEADbin/validateQA Evidence
qa/knowledge-contractscoverage verified locally for schema/fixture validation, optional task-observer metadata, docs inventory links, and repo-wide validation.rebelytics/one-skill-to-rule-them-allandAllstarGER/one-skill-to-rule-them-allboth report CC BY 4.0 via GitHub metadata.Merge Authority
merge_authority: ask- do not merge without maintainer approval.Summary by CodeRabbit
UNKNOWNhandling rules.