Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions guardrails/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
[workspace]

[package]
name = "iii-guardrails"
version = "0.1.0"
edition = "2021"
publish = false

[[bin]]
name = "iii-guardrails"
path = "src/main.rs"

[dependencies]
iii-sdk = { version = "0.10.0", features = ["otel"] }
tokio = { version = "1", features = ["rt-multi-thread", "macros", "sync", "signal"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
serde_yaml = "0.9"
anyhow = "1"
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["fmt", "env-filter"] }
clap = { version = "4", features = ["derive"] }
chrono = { version = "0.4", features = ["serde"] }
regex = "1"
78 changes: 78 additions & 0 deletions guardrails/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# iii-guardrails

Every LLM call should pass through a safety check before and after. iii-guardrails does this with zero LLM overhead — pure regex and keyword matching, all patterns pre-compiled at startup. It detects PII (email, phone, SSN, credit cards, IP addresses), prompt injection attempts (9 keyword patterns), and leaked secrets (API keys, tokens, private keys). Wire it as middleware in front of any function, or call it on-demand from the agent.

**Plug and play:** Build with `cargo build --release`, then run `./target/release/iii-guardrails --url ws://your-engine:49134`. It registers 3 functions with 5 PII patterns and 7 secret patterns compiled from defaults — no config file needed. Call `guardrails::check_input` before processing user input, `guardrails::check_output` before returning responses, or `guardrails::classify` for a lightweight risk score.

## Functions

| Function ID | Description |
|---|---|
| `guardrails::check_input` | Validate input text for PII, injections, and length limits |
| `guardrails::check_output` | Validate output text for PII leakage and secret exposure |
| `guardrails::classify` | Lightweight risk classification without blocking or audit trail |

## iii Primitives Used

- **State** -- audit trail of checks, custom rules (future), aggregate stats (future)
- **PubSub** -- subscribes to `guardrails.check` topic for async input checks
- **HTTP** -- all functions exposed as POST endpoints

## Prerequisites

- Rust 1.75+
- Running iii engine on `ws://127.0.0.1:49134`

## Build

```bash
cargo build --release
```

## Usage

```bash
./target/release/iii-guardrails --url ws://127.0.0.1:49134 --config ./config.yaml
```

```
Options:
--config <PATH> Path to config.yaml [default: ./config.yaml]
--url <URL> WebSocket URL of the iii engine [default: ws://127.0.0.1:49134]
--manifest Output module manifest as JSON and exit
-h, --help Print help
```

## Configuration

```yaml
pii_patterns:
- name: "email"
pattern: "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}"
- name: "phone"
pattern: "\\b\\d{3}[-.]?\\d{3}[-.]?\\d{4}\\b"
- name: "ssn"
pattern: "\\b\\d{3}-\\d{2}-\\d{4}\\b"
- name: "credit_card"
pattern: "\\b\\d{4}[- ]?\\d{4}[- ]?\\d{4}[- ]?\\d{4}\\b"
- name: "ip_address"
pattern: "\\b\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\b"
injection_keywords:
- "ignore previous instructions"
- "ignore all instructions"
- "disregard the above"
- "you are now"
- "pretend you are"
- "act as if"
- "system prompt"
- "reveal your instructions"
- "what are your rules"
max_input_length: 50000 # max input text length before flagging
max_output_length: 100000 # max output text length before flagging
```

## Tests

```bash
cargo test
```
145 changes: 145 additions & 0 deletions guardrails/SPEC.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# iii-guardrails

Safety layer worker for the III engine that checks function I/O for PII, injection attacks, jailbreaks, and content policy violations.

## Architecture

Pure regex + keyword matching. No LLM calls. Designed to be called on every function invocation as middleware.

## Functions

### `guardrails::check_input`
Validates input text before it reaches a function.

**Input:**
```json
{
"text": "string (required)",
"context": {
"function_id": "string (optional)",
"user_id": "string (optional)"
}
}
```

**Output:**
```json
{
"passed": true,
"risk": "none|low|medium|high",
"pii": [{ "pattern_name": "email", "count": 1 }],
"injections": [{ "keyword": "ignore previous instructions", "position": 0 }],
"over_length": false,
"check_id": "chk-in-1712345678-42"
}
```

### `guardrails::check_output`
Validates output text for PII leakage and secret exposure.

**Input:**
```json
{
"text": "string (required)",
"context": {
"function_id": "string (optional)",
"user_id": "string (optional)"
}
}
```

**Output:**
```json
{
"passed": true,
"risk": "none|low|medium|high",
"pii": [{ "pattern_name": "ssn", "count": 1 }],
"secrets": [{ "pattern_name": "openai_key", "count": 1 }],
"over_length": false,
"check_id": "chk-out-1712345678-42"
}
```

### `guardrails::classify`
Lightweight classification without blocking or audit trail.

**Input:**
```json
{
"text": "string (required)"
}
```

**Output:**
```json
{
"risk": "none|low|medium|high",
"categories": ["pii", "injection", "secrets", "over_length"],
"pii_types": ["email", "phone"],
"details": {
"pii_count": 2,
"injection_count": 0,
"secret_count": 0,
"text_length": 150,
"within_input_limit": true
}
}
```

## Triggers

| Type | Path/Topic | Function |
|------|-----------|----------|
| HTTP POST | `guardrails/check_input` | `guardrails::check_input` |
| HTTP POST | `guardrails/check_output` | `guardrails::check_output` |
| HTTP POST | `guardrails/classify` | `guardrails::classify` |
| Subscribe | `guardrails.check` | `guardrails::check_input` |

## State Scopes

| Scope | Purpose |
|-------|---------|
| `guardrails:checks` | Audit trail of all checks performed |
| `guardrails:rules` | Custom rules (future: user-defined patterns) |
| `guardrails:stats` | Aggregate stats (future: checks/day, block rate) |

## Risk Classification

| Level | Condition |
|-------|-----------|
| `high` | Any injection keyword detected |
| `medium` | More than 2 PII matches OR over length limit |
| `low` | 1-2 PII matches |
| `none` | Clean |

## PII Patterns (default config)

- Email addresses
- US phone numbers
- Social Security Numbers
- Credit card numbers
- IP addresses

## Secret Patterns (hardcoded in check_output)

- Bearer tokens
- OpenAI API keys (`sk-`)
- GitHub PATs (`ghp_`, `ghs_`, `ghr_`)
- AWS access keys (`AKIA`)
- Private key blocks (`-----BEGIN`)

## Configuration

See `config.yaml` for default patterns, keywords, and length limits. All PII regex patterns are compiled once at startup and stored in `Arc` for zero-copy sharing across async handlers.

## Running

```bash
cargo run --release -- --url ws://127.0.0.1:49134 --config ./config.yaml
```

## Manifest

```bash
cargo run --release -- --manifest
```
6 changes: 6 additions & 0 deletions guardrails/build.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
fn main() {
println!(
"cargo:rustc-env=TARGET={}",
std::env::var("TARGET").unwrap()
);
}
23 changes: 23 additions & 0 deletions guardrails/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
pii_patterns:
- name: "email"
pattern: "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}"
- name: "phone"
pattern: "\\b\\d{3}[-.]?\\d{3}[-.]?\\d{4}\\b"
- name: "ssn"
pattern: "\\b\\d{3}-\\d{2}-\\d{4}\\b"
- name: "credit_card"
pattern: "\\b\\d{4}[- ]?\\d{4}[- ]?\\d{4}[- ]?\\d{4}\\b"
- name: "ip_address"
pattern: "\\b\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\b"
injection_keywords:
- "ignore previous instructions"
- "ignore all instructions"
- "disregard the above"
- "you are now"
- "pretend you are"
- "act as if"
- "system prompt"
- "reveal your instructions"
- "what are your rules"
max_input_length: 50000
max_output_length: 100000
Loading
Loading