Problem
Regex-based censors cannot distinguish between discussing a tool pattern vs performing it. This causes false blocks on normal conversation.
Six censors were deactivated on March 31 due to this:
71dfcc16 — web_fetch + learn_fact chain
93d4c933 — web_fetch + http patterns
a557e770 — raw web content ingestion
0a136e16 — web_fetch + create_censor/record_decision
dd464186 — duplicate email censor
66b7ecbc — procedure discussion false positives
Solution
Move tool-chain enforcement from regex censors to Critic diagnostics. The Critic has full context awareness via ExecutionLedger.
New Critic Diagnostics
#7 Raw Content Gate
- Detects: web_fetch result in context + learn_fact call without summarization
- Nudge: Summarize fetched content before storing
- Severity: block
#8 Web-to-Action Chain
- Detects: web_fetch output flowing into send_email, create_censor, bash
- Nudge: Web content must pass through isolation boundary
- Severity: block
#9 Unverified Claim Storage
- Detects: multiple learn_fact calls with claims not backed by tool results
- Nudge: Verify claims before storing
- Severity: warn
#10 Duplicate Action Guard
- Detects: same tool called with near-identical args within a turn
- Nudge: Duplicate action detected
- Severity: warn
Implementation
- Add to CriticAgent._run_diagnostics() in critic.py
- Each diagnostic checks ExecutionLedger for actual tool calls, not text patterns
- Returns DiagnosticResult with nudge injection
- Block-severity prevents output (same as censor BLOCK)
Benefits
- Zero false positives on discussion
- Leverages existing ExecutionLedger + DiagnosticResult infrastructure
- Replaces brittle regex with contextual intelligence
- Natural path to Phase 1a DAG enforcement
Depends On
Effort
~6 hours
Problem
Regex-based censors cannot distinguish between discussing a tool pattern vs performing it. This causes false blocks on normal conversation.
Six censors were deactivated on March 31 due to this:
71dfcc16— web_fetch + learn_fact chain93d4c933— web_fetch + http patternsa557e770— raw web content ingestion0a136e16— web_fetch + create_censor/record_decisiondd464186— duplicate email censor66b7ecbc— procedure discussion false positivesSolution
Move tool-chain enforcement from regex censors to Critic diagnostics. The Critic has full context awareness via ExecutionLedger.
New Critic Diagnostics
#7 Raw Content Gate
#8 Web-to-Action Chain
#9 Unverified Claim Storage
#10 Duplicate Action Guard
Implementation
Benefits
Depends On
Effort
~6 hours