Skip to content

F024: Migrate tool-chain censors to Critic diagnostics #223

@tfatykhov

Description

@tfatykhov

Problem

Regex-based censors cannot distinguish between discussing a tool pattern vs performing it. This causes false blocks on normal conversation.

Six censors were deactivated on March 31 due to this:

  • 71dfcc16 — web_fetch + learn_fact chain
  • 93d4c933 — web_fetch + http patterns
  • a557e770 — raw web content ingestion
  • 0a136e16 — web_fetch + create_censor/record_decision
  • dd464186 — duplicate email censor
  • 66b7ecbc — procedure discussion false positives

Solution

Move tool-chain enforcement from regex censors to Critic diagnostics. The Critic has full context awareness via ExecutionLedger.

New Critic Diagnostics

#7 Raw Content Gate

  • Detects: web_fetch result in context + learn_fact call without summarization
  • Nudge: Summarize fetched content before storing
  • Severity: block

#8 Web-to-Action Chain

  • Detects: web_fetch output flowing into send_email, create_censor, bash
  • Nudge: Web content must pass through isolation boundary
  • Severity: block

#9 Unverified Claim Storage

  • Detects: multiple learn_fact calls with claims not backed by tool results
  • Nudge: Verify claims before storing
  • Severity: warn

#10 Duplicate Action Guard

  • Detects: same tool called with near-identical args within a turn
  • Nudge: Duplicate action detected
  • Severity: warn

Implementation

  • Add to CriticAgent._run_diagnostics() in critic.py
  • Each diagnostic checks ExecutionLedger for actual tool calls, not text patterns
  • Returns DiagnosticResult with nudge injection
  • Block-severity prevents output (same as censor BLOCK)

Benefits

  • Zero false positives on discussion
  • Leverages existing ExecutionLedger + DiagnosticResult infrastructure
  • Replaces brittle regex with contextual intelligence
  • Natural path to Phase 1a DAG enforcement

Depends On

Effort

~6 hours

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions