ntntlang · larimonious · Feb 28, 2026 · Feb 28, 2026 · Feb 28, 2026 · Feb 28, 2026
diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
@@ -719,6 +719,9 @@ server 8080 {
 | `NTNT_ENV` | `production`, `prod` | Disables hot-reload for better performance |
 | `NTNT_STRICT` | `1`, `true` | Blocks execution on type errors (runs type checker before `ntnt run`) |
 | `NTNT_ALLOW_PRIVATE_IPS` | `true` | Allows `fetch()` to connect to private/internal IPs (see below) |
+| `NTNT_BLOCKING_THREADS` | integer | `spawn_blocking` thread pool size for per-request interpreters (default: Tokio default ~512) |
+| `NTNT_REQUEST_TIMEOUT` | integer (seconds) | Max handler execution time before 504 (default: 30) |
+| `NTNT_HOT_RELOAD_INTERVAL_MS` | integer (ms) | File watcher poll interval in dev mode (default: 500) |
 
 ```bash
 # Development (default) - hot-reload enabled
@@ -746,6 +749,63 @@ services:
 
 ⚠️ Only enable this when your app needs to call internal services. Keep disabled in public-facing apps that don't need internal network access.
 
+### Per-Request Interpreter Architecture
+
+ntnt uses a **per-request interpreter** model for HTTP serving. Each incoming HTTP request gets its own fresh `Interpreter` instance running in a `spawn_blocking` task, enabling true parallel request handling across all CPU cores.
+
+**Key implications:**
+
+- **Module-level mutable state is isolated per request.** Each request starts from a snapshot taken at server startup. Mutations to module-level variables in one request are not visible to other requests.
+- **Database connections work correctly.** PostgreSQL, Redis/KV, and SQLite use global static registries — connection handles are integer IDs, not live objects, so they resolve correctly in any interpreter instance.
+- **No migration needed for stateless handlers.** If your handler only reads module-level constants and uses database calls, it works identically.
+
+#### Migrating Module-Level Mutable State
+
+If your code relies on shared mutable state across requests, migrate to Redis:
+
+```ntnt
+// ❌ BROKEN: each request sees count=0 (isolated snapshot)
+let mut count = 0
+fn counter(req) {
+    count = count + 1
+    return text(str(count))
+}
+
+// ✅ CORRECT: use Redis for cross-request state
+fn counter(req) {
+    let count = int(kv_get("request_count") ?? "0") + 1
+    kv_set("request_count", str(count))
+    return text(str(count))
+}
+```
+
+The same applies to in-memory session stores — any middleware that writes session data to a module-level map must migrate to Redis-backed sessions using `kv_set`/`kv_get`.
+
+#### Thread Pool Sizing
+
+Size `NTNT_BLOCKING_THREADS` based on your workload:
+
+| Target RPS | Avg Handler Time | Threads Needed |
+|-----------|-----------------|----------------|
+| 1,000 | 10ms | 10 |
+| 5,000 | 10ms | 50 |
+| 10,000 | 10ms | 100 |
+| 30,000 | 5ms | 150 |
+
+Formula: `threads = target_rps × avg_handler_ms / 1000`
+
+#### Performance
+
+Interpreter construction benchmarks (release build, criterion):
+
+| Benchmark | Median |
+|-----------|--------|
+| `Interpreter::new()` — full construction + all 23 stdlib modules | **43.9 µs** |
+| `new()` + eval trivial expression | **44.1 µs** |
+| `new()` + define fn + call realistic handler | **53.3 µs** |
+
+At 43.9 µs per construction, the per-request model supports ~22K interpreter constructions/sec per core — well within budget for high-throughput deployments. Static files bypass the interpreter entirely via Axum/tower-http.
+
 ### Response Builder Functions
 
 All response builders are imported from `std/http/server`:

diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md
@@ -42,9 +42,9 @@ src/
     ├── crypto.rs        # std/crypto - SHA256, HMAC, UUID, random
     ├── url.rs           # std/url - URL encoding/parsing
     ├── http.rs          # std/http - HTTP client (fetch, download)
-    ├── http_server.rs   # std/http/server - Response builders
-    ├── http_server_async.rs  # Async HTTP server (Axum + Tokio)
-    ├── http_bridge.rs   # Bridge between async server and sync interpreter
+    ├── http_server.rs   # std/http/server - Route registration, SharedState, StoredHandler, response builders
+    ├── http_server_async.rs  # Axum server, per-request execution, hot-reload watcher
+    ├── http_bridge.rs   # Send-safe HTTP types (BridgeRequest/BridgeResponse)
-    ├── http_server_async.rs  # Axum server, per-request execution, hot-reload watcher
-    ├── http_bridge.rs   # Send-safe HTTP types (BridgeRequest/BridgeResponse)
+    ├── http_server_async.rs  # Axum server, per-request execution, hot-reload watcher, BridgeRequest/BridgeResponse
-    ├── http_server_async.rs  # Axum server, per-request execution, hot-reload watcher
-    ├── http_bridge.rs   # Send-safe HTTP types (BridgeRequest/BridgeResponse)
+    ├── http_server_async.rs  # Axum server, per-request execution, hot-reload watcher, BridgeRequest/BridgeResponse
     ├── template.rs      # External template loading
     ├── postgres.rs      # std/db/postgres - PostgreSQL client
     └── concurrent.rs    # std/concurrent - Channels, sleep
@@ -112,34 +112,57 @@ Runtime contract checking:
 
 ## HTTP Server Architecture
 
-The HTTP server uses a bridge pattern to connect async Axum handlers to the synchronous interpreter:
+The HTTP server uses a **per-request interpreter** model for true parallel request handling:
 
 ```
-┌─────────────────────────────────────────────────────────────────┐
-│                     Tokio Async Runtime                         │
-│  ┌─────────┐  ┌─────────┐  ┌─────────┐                         │
-│  │ Task 1  │  │ Task 2  │  │ Task N  │  (async handlers)       │
-│  └────┬────┘  └────┬────┘  └────┬────┘                         │
-│       └────────────┼────────────┘                               │
-│                    │                                            │
-│              ┌─────▼─────┐                                      │
-│              │  Channel  │  (mpsc + oneshot reply)              │
-│              └─────┬─────┘                                      │
-└────────────────────┼────────────────────────────────────────────┘
-                     │
-┌────────────────────▼────────────────────────────────────────────┐
-│                  Interpreter Thread                              │
-│  - Receives requests via channel                                 │
-│  - Finds and calls NTNT handler function                         │
-│  - Sends response back via oneshot channel                       │
-│  - Uses Rc<RefCell<>> (not thread-safe, hence single thread)     │
-└─────────────────────────────────────────────────────────────────┘
+Startup phase:
+  Parse .tnt files → register routes/middleware → snapshot closures into SharedState
+  Wrap SharedState in Arc<RwLock<SharedState>>
+
+Request phase (fully parallel):
+  ┌─────────────────────────────────────────────────────────────────┐
+  │                     Tokio Async Runtime (Axum)                  │
+  │                                                                 │
+  │  Request 1 → route lookup → spawn_blocking → Interpreter → Resp│
+  │  Request 2 → route lookup → spawn_blocking → Interpreter → Resp│
+  │  Request N → route lookup → spawn_blocking → Interpreter → Resp│
+  │                                                                 │
+  │  Static files → Axum/tower-http directly (no interpreter)       │
+  └─────────────────────────────────────────────────────────────────┘
 ```
 
+Each request gets its own `Interpreter` instance with its own `Environment` chain. No locks during execution, no channels, no contention. The interpreter uses `Rc<RefCell<>>` internally (not thread-safe), but this is safe because each instance is confined to a single `spawn_blocking` task.
+
+### SharedState
+
+Route handlers are stored as `StoredHandler` — a `Send`-safe representation that snapshots the handler's closure environment at registration time, converting all `Value::Function` instances to `Value::FlatFunction` (no `Rc`). At request time, `StoredHandler::to_call_value()` reconstitutes a live `Value::Function` with a fresh `Rc<RefCell<Environment>>`.
+
+`SharedState` also carries type context (`structs`, `enums`, `type_aliases`, `trait_definitions`, `trait_implementations`) so per-request interpreters can use user-defined types.
+
+### Hot-Reload
+
+A background async task polls for file changes at `NTNT_HOT_RELOAD_INTERVAL_MS` (default: 500ms). On change, `rebuild_shared_state()` creates a fresh interpreter, re-parses all .tnt files, and atomically swaps the `Arc<RwLock<SharedState>>`. In-flight requests complete with the old state; new requests use the new state. Zero dropped requests.
+
+### Key Environment Variables
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `NTNT_BLOCKING_THREADS` | Tokio default (~512) | `spawn_blocking` thread pool size |
+| `NTNT_REQUEST_TIMEOUT` | 30s | Max handler execution time (504 on breach) |
+| `NTNT_HOT_RELOAD_INTERVAL_MS` | 500ms | File watcher poll interval (dev mode only) |
+
+### Performance
+
+Interpreter construction: **43.9 µs** (full construction + 23 stdlib modules, release build). This enables ~22K constructions/sec per core. Static files bypass the interpreter entirely.
+
+### Behavioral Change: Module-Level Mutable State
+
+Module-level mutable state is snapshotted at registration time. Each request starts from that snapshot — mutations in one request are not visible to others. Use `kv_set`/`kv_get` (Redis) for cross-request state.
+
 **Key files:**
-- `http_server_async.rs` - Axum server setup, async handlers, static files
-- `http_bridge.rs` - Request/response types, channel communication
-- `http_server.rs` - Response builders (`json()`, `html()`, etc.)
+- `http_server.rs` - Route registration, `SharedState`, `StoredHandler`, response builders, stdlib HTTP functions
+- `http_server_async.rs` - Axum runner, `execute_request()`, static files, graceful shutdown, hot-reload watcher
+- `http_bridge.rs` - Bridge types (`BridgeRequest`/`BridgeResponse`) for `Send`-safe HTTP representation
- `http_server_async.rs` - Axum runner, `execute_request()`, static files, graceful shutdown, hot-reload watcher
- `http_bridge.rs` - Bridge types (`BridgeRequest`/`BridgeResponse`) for `Send`-safe HTTP representation
+- `http_server_async.rs` - Axum runner, `execute_request()`, bridge types (`BridgeRequest`/`BridgeResponse`) for `Send`-safe HTTP representation, static files, graceful shutdown, hot-reload watcher
- `http_server_async.rs` - Axum runner, `execute_request()`, static files, graceful shutdown, hot-reload watcher
- `http_bridge.rs` - Bridge types (`BridgeRequest`/`BridgeResponse`) for `Send`-safe HTTP representation
+- `http_server_async.rs` - Axum runner, `execute_request()`, bridge types (`BridgeRequest`/`BridgeResponse`) for `Send`-safe HTTP representation, static files, graceful shutdown, hot-reload watcher
 
 ## Intent Assertion Language (IAL)
 

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -739,6 +739,9 @@ server 8080 {
 | `NTNT_ENV` | `production`, `prod` | Disables hot-reload for better performance |
 | `NTNT_STRICT` | `1`, `true` | Blocks execution on type errors (runs type checker before `ntnt run`) |
 | `NTNT_ALLOW_PRIVATE_IPS` | `true` | Allows `fetch()` to connect to private/internal IPs (see below) |
+| `NTNT_BLOCKING_THREADS` | integer | `spawn_blocking` thread pool size for per-request interpreters (default: Tokio default ~512) |
+| `NTNT_REQUEST_TIMEOUT` | integer (seconds) | Max handler execution time before 504 (default: 30) |
+| `NTNT_HOT_RELOAD_INTERVAL_MS` | integer (ms) | File watcher poll interval in dev mode (default: 500) |
 
 ```bash
 # Development (default) - hot-reload enabled
@@ -766,6 +769,63 @@ services:
 
 ⚠️ Only enable this when your app needs to call internal services. Keep disabled in public-facing apps that don't need internal network access.
 
+### Per-Request Interpreter Architecture
+
+ntnt uses a **per-request interpreter** model for HTTP serving. Each incoming HTTP request gets its own fresh `Interpreter` instance running in a `spawn_blocking` task, enabling true parallel request handling across all CPU cores.
+
+**Key implications:**
+
+- **Module-level mutable state is isolated per request.** Each request starts from a snapshot taken at server startup. Mutations to module-level variables in one request are not visible to other requests.
+- **Database connections work correctly.** PostgreSQL, Redis/KV, and SQLite use global static registries — connection handles are integer IDs, not live objects, so they resolve correctly in any interpreter instance.
+- **No migration needed for stateless handlers.** If your handler only reads module-level constants and uses database calls, it works identically.
+
+#### Migrating Module-Level Mutable State
+
+If your code relies on shared mutable state across requests, migrate to Redis:
+
+```ntnt
+// ❌ BROKEN: each request sees count=0 (isolated snapshot)
+let mut count = 0
+fn counter(req) {
+    count = count + 1
+    return text(str(count))
+}
+
+// ✅ CORRECT: use Redis for cross-request state
+fn counter(req) {
+    let count = int(kv_get("request_count") ?? "0") + 1
+    kv_set("request_count", str(count))
+    return text(str(count))
+}
+```
+
+The same applies to in-memory session stores — any middleware that writes session data to a module-level map must migrate to Redis-backed sessions using `kv_set`/`kv_get`.
+
+#### Thread Pool Sizing
+
+Size `NTNT_BLOCKING_THREADS` based on your workload:
+
+| Target RPS | Avg Handler Time | Threads Needed |
+|-----------|-----------------|----------------|
+| 1,000 | 10ms | 10 |
+| 5,000 | 10ms | 50 |
+| 10,000 | 10ms | 100 |
+| 30,000 | 5ms | 150 |
+
+Formula: `threads = target_rps × avg_handler_ms / 1000`
+
+#### Performance
+
+Interpreter construction benchmarks (release build, criterion):
+
+| Benchmark | Median |
+|-----------|--------|
+| `Interpreter::new()` — full construction + all 23 stdlib modules | **43.9 µs** |
+| `new()` + eval trivial expression | **44.1 µs** |
+| `new()` + define fn + call realistic handler | **53.3 µs** |
+
+At 43.9 µs per construction, the per-request model supports ~22K interpreter constructions/sec per core — well within budget for high-throughput deployments. Static files bypass the interpreter entirely via Axum/tower-http.
+
 ### Response Builder Functions
 
 All response builders are imported from `std/http/server`: