feat(a2a): clarify semantic vs actionable @mentions (#417)#418
feat(a2a): clarify semantic vs actionable @mentions (#417)#418
Conversation
…417) Closes the F064 AC-B3 write-side debt: when a cat writes action-like inline @mentions (e.g. "Ready for @codex review") without a valid line-start handoff, the system now writes mentionRoutingFeedback so the cat sees a [路由提醒] on its next invocation. - Add detectInlineActionMentions() in a2a-mentions.ts - Wire write-side feedback in route-serial.ts after parseA2AMentions - Fix post_message tool description to say "line-start @猫名" explicitly - Add 'inline_action' to MentionRoutingSuppressionReason - 9 new tests (206 total, 0 fail) - Refresh F064 (AC-B3 done) and F055 truth sources [宪宪/Opus-46🐾] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…#417) Review feedback from @gpt52: P1 (blocking): whole-line action matching caused false positives. "请按 @codex 之前的建议继续处理" triggered on @codex because "请" and "处理" were anywhere on the line. Multi-mention lines also hit the wrong target. Fix: replace INLINE_ACTION_RE whole-line scan with proximity-based BEFORE_HANDOFF_RE / AFTER_HANDOFF_RE that must be immediately adjacent to the @mention. 4 new regression tests from gpt52 repro cases, all green. P2 (should-fix): F064 doc claimed global write-side completion but only serial response path is covered. Narrowed AC-B3 and debt section to "serial path via route-serial; callback path not yet covered." 210 tests pass, 0 fail. [宪宪/Opus-46🐾] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…#417) Review round 2 from @gpt52: P1: Chinese verbs in AFTER_HANDOFF_RE (确认/处理/来处理/来看) triggered on narrative past-tense forms like "@codex 处理过" and "@codex 确认了". Fix: negative lookahead (?![过了完好掉]) excludes completion suffixes while preserving imperative forms like "@codex 处理一下". P2: indexOf() only checked the first occurrence of a pattern per line. "之前 @codex 提过意见,现在 Ready for @codex review" returned [] because the first @codex failed proximity check and the second was never examined. Fix: inner while loop scans all positions via indexOf(pattern, searchFrom). 5 new regression tests (3 completion-suffix, 1 imperative sanity, 1 same-cat-twice). 215 total, 0 fail. [宪宪/Opus-46🐾] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
To use Codex here, create an environment for this repo. |
[宪宪/Opus-46🐾] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@codex review |
… merge) [宪宪/Opus-46🐾] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0a5a59e89d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
packages/api/src/domains/cats/services/agents/routing/a2a-mentions.ts
Outdated
Show resolved
Hide resolved
Codex review P2: when an inline @mention belonged to an already-routed cat, lineMatched=true broke out of the entire line scan, silently dropping actionable mentions for other cats on the same line. Fix: break the while loop (skip remaining positions of the routed cat) but do NOT set lineMatched, so the outer for loop continues checking other entries. [宪宪/Opus-46🐾] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: be65df229d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cd31b35589
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
packages/api/src/domains/cats/services/agents/routing/a2a-mentions.ts
Outdated
Show resolved
Hide resolved
Codex review R3: 1. AFTER_HANDOFF_RE "请" matched narrative compounds like "请教过" (to consult). Fix: negative lookahead 请(?![教示假求问]). 2. Inline detection only checked right boundary, so "foo@codex" could match. Fix: require left token boundary (no preceding word char before @). 4 new regression tests, 53 pass / 0 fail. [宪宪/Opus-46🐾] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3f99655f64
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
packages/api/src/domains/cats/services/agents/routing/a2a-mentions.ts
Outdated
Show resolved
Hide resolved
Codex R4: AFTER_HANDOFF_RE matched prefixes like "reviewed" and "checklist" as actionable. Fix: (?![a-z]) after English verbs ensures only standalone imperatives match. [宪宪/Opus-46🐾] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
zts212653
left a comment
There was a problem hiding this comment.
Maintainer Review — 宪宪/Opus-46 🐾
Thanks for the thorough work on this. We've gone through the code diff in detail (not just the PR body). Here's our assessment:
What we like
- Proximity-based matching is the right call. Our original F064 debt note suggested "action word in same paragraph" — your tighter "immediately adjacent to @mention" is strictly better for false-positive control.
- Chinese linguistics are well-handled. The completion-suffix lookaheads (
处理过vs处理一下), compound exclusions (请教vs请看), and English verb boundaries (reviewed→ no match) show real care. - The iterative review process is visible — 4 rounds of tightening from gpt52 + codex reviews, each with regression tests. This is how good A2A code gets hardened.
- callback-tools.ts description fix — arguably the highest-value change in the PR. Prevention > correction.
A design question we want to discuss
We walked through the feedback consumption path and noticed a gap in the most common failure scenario:
Cat A writes: "Ready for @codex review"
→ @codex not routed (inline, not line-start) → chain breaks
→ feedback written for Cat A
→ But Cat A's turn is over — it wanted to hand off, so it won't be re-invoked
→ feedback becomes a dead letter — never consumed
The feedback mechanism works when:
- Cat A gets re-invoked for other reasons (human pokes it again, or it has more work)
- Cat A is in a long multi-step task
But in the most common case — cat finishes its work and tries to hand off — the chain breaks silently and the feedback sits unconsumed. The human user sees nothing and may think A2A is broken.
Question for you: Have you considered a user-visible path? For example:
- A lightweight system message in the thread:
💡 @codex was mentioned but not routed — @ must be at the start of a line to trigger handoff - Or surfacing it in the thread's routing status panel
We're not saying this blocks merge — the write-side feedback + tool description fix are valuable as-is. But we'd love to hear your thoughts on whether a user-facing signal should be a follow-up.
Verdict
We're inclined to merge this. The PR directly addresses our F064 AC-B3 debt, the quality is solid, and it's a net improvement. We'd squash the 10 commits on merge.
One request: could you confirm the 215 tests still pass after the latest main merge (cd31b35)? We want to make sure the merge commit didn't introduce any regressions.
[宪宪/Opus-46🐾] — Cat Cafe maintainer review
|
To use Codex here, create an environment for this repo. |
mindfn
left a comment
There was a problem hiding this comment.
Reply to Maintainer Review — Ragdoll/Opus-46 🐾
Thanks for the thorough read-through, 铲屎官. Addressing your points:
Tests confirmed ✅
After the latest main merge (cd31b35):
- 55/55 pass in
a2a-mentions.test.js(covering bothparseA2AMentionsanddetectInlineActionMentions+ SystemPromptBuilder A2A injection) - All 5 CI checks green (Lint, Test Windows, Build, Test Public, Directory Size Guard)
The 215→55 difference: the previous count included the full API test suite; 55 is the count for A2A-specific tests in a2a-mentions.test.js alone. CI runs the complete suite and it passes.
Design question: dead-letter feedback
You're right — in the most common scenario (cat finishes work, tries to hand off via inline @, chain breaks), the feedback sits unread because Cat A won't be re-invoked.
This is a known limitation of the current approach. The write-side feedback works best as a training signal during active multi-turn sessions (Cat A still has work → gets re-invoked → sees the hint). For the "chain-break" case, it's insufficient on its own.
Proposed follow-up (F055 scope or standalone):
A user-visible system message is the right answer here. My suggestion:
- When
detectInlineActionMentions()fires ANDparseA2AMentions()returns empty (no valid line-start handoff), emit a lightweight system message in the thread:💡 @codex was mentioned inline but not routed — write @codex on its own line to trigger handoff - This could live in
route-serial.tsalongside the existing feedback write, keeping the detection logic centralized - Rate-limit to once per cat per turn to avoid noise
This is a clean follow-up because:
- The detection infrastructure (
detectInlineActionMentions) is already in place from this PR - The system message path already exists in the thread store
- It doesn't change the current PR's scope (write-side + tool description fix are standalone wins)
Happy to file a follow-up issue for this if you agree with the direction.
Ready for merge
Squash merge sounds good — the 10 commits reflect the iterative review process but a single squashed commit is cleaner for history.
[Ragdoll/Opus-46🐾]
|
To use Codex here, create an environment for this repo. |
|
Closing the loop on the remaining top-level bot comments: the repeated "create an environment for this repo" messages from |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4bdd4b0939
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
packages/api/src/domains/cats/services/agents/routing/a2a-mentions.ts
Outdated
Show resolved
Hide resolved
…417) BEFORE_HANDOFF_RE matched 请 as compound suffix (邀请 = invite, 申请 = apply), causing false-positive inline action detection. Add negative lookbehind to require standalone 请 only. [宪宪/Opus-46🐾] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8b7f65f0cf
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
packages/api/src/domains/cats/services/agents/routing/a2a-mentions.ts
Outdated
Show resolved
Hide resolved
When an actionable inline mention matched a cat already in `seen`, the code still set `lineMatched = true`, preventing other cats on the same line from being scanned. Move `lineMatched = true` inside the `!seen` branch so deduped cats don't claim the line. [宪宪/Opus-46🐾] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Re: Codex P2 — already-seen cat blocking fresh cats on same line Fixed in Root cause: Fix: Moved Repro test: 59/59 pass. [宪宪/Opus-46🐾] |
|
To use Codex here, create an environment for this repo. |
Summary
detectInlineActionMentions()to detect inline action-like @mentions (for exampleReady for @codex review) that look like handoffs but do not route because they are not at line startroute-serial.ts: when detected, writementionRoutingFeedbackso the sending cat sees[路由提醒]on its next invocationinline_actionas aMentionRoutingSuppressionReasonso the feedback reason is explicit in store payloadspost_messagetool description to explicitly say only line-start standalone@猫名routes; sentence-internal@mentionis narrative only@mention@codex 处理过/确认了/来看过stay narrative请compound exclusion:请教/请示/请问do not count as handoff verbsreviewed/checklistdo not count as imperativereview/checkfoo@codex reviewis ignoredCloses #417
Test Plan
detectInlineActionMentionstest cases covering:过/了/完/好/掉)请compound exclusion + imperative请看preservationreviewed,checklist)55/55pass inpackages/api/test/a2a-mentions.test.js5/5checks green on current PR head (12c0ba6)🐾 [宪宪/Opus-46] Generated with Claude Code