diff --git a/docs/features/F014-reasoning-scaffolds.md b/docs/features/F014-reasoning-scaffolds.md
new file mode 100644
index 0000000..5b4153a
--- /dev/null
+++ b/docs/features/F014-reasoning-scaffolds.md
@@ -0,0 +1,444 @@
+# F014 — Frame Reasoning Scaffolds
+
+**Status:** Draft (v4 — approved with required revisions)  
+**Author:** Nous (with Tim)  
+**Created:** 2026-03-05  
+**Revised:** 2026-03-05  
+**Depends on:** F003 (Frames), F002 (Brain/Decisions), F001 (Heart/Memory)  
+**Inspired by:** Paper #11 (Semi-Formal Reasoning Templates, arXiv 2603.01896v1)
+
+---
+
+## Problem
+
+Nous has 7 cognitive frames (task, question, decision, creative, conversation, debug, initiation) that control **tool availability** and **context budgets**, but they provide only generic tool-use nudges. There is no structured reasoning guidance — the agent decides *how* to think about each task ad hoc.
+
+Research (Paper #11, Kostka 2026) shows that semi-formal reasoning templates improve LLM accuracy by up to 30% on code analysis tasks. The key insight: **structured scaffolds constrain the reasoning path without constraining the conclusion**, reducing hallucination, improving consistency, and producing auditable deliberation traces.
+
+Currently:
+- Frame instructions in `runner.py::_get_frame_instructions()` are tool-use nudges, not reasoning scaffolds
+- `deliberation.py` has a basic `_should_deliberate()` gate and `_validate_decision_quality()` but no structured deliberation process
+- `record_decision` captures *what* was decided but not the structured *reasoning process*
+
+## Solution
+
+Add **Reasoning Scaffold Templates** — structured step-by-step thinking patterns that **replace** existing frame instructions (not layer on top). Pilot with Decision frame first, measure, then expand.
+
+### Design Principles (from Paper #11)
+
+1. **Guide process, not conclusions** — scaffolds tell you *what steps to take*, never *what to decide*
+2. **Frame-native** — each scaffold matches its frame's cognitive purpose
+3. **Token-efficient** — 150-300 tokens per scaffold with turn-aware compression
+4. **Auditable** — scaffold steps create traceable deliberation artifacts
+5. **Additive, not duplicative** — scaffolds REPLACE existing tool nudges, merging tool guidance into scaffold steps
+
+---
+
+## Phase 1 — Decision Frame Pilot (~2h)
+
+### Rollout Strategy
+
+**Decision frame only** for 2 weeks. Measure before expanding.
+
+Why pilot Decision first:
+- Highest-stakes frame — bad decisions are expensive
+- Most measurable — `decisions` table tracks confidence, reasons, categories, stakes
+- Already has deliberation infrastructure (`_should_deliberate()`, `_validate_decision_quality()`)
+- Richest existing instructions — most to merge/replace
+
+### Success Criteria (2-week measurement window)
+
+**Success (expand to next frame):**
+- Average reason count per decision increases (baseline: current avg)
+- Confidence calibration improves (fewer 0.95+ decisions that get marked as failures)
+- No increase in scaffold-related token costs > 15% per decision turn
+- Qualitative: decision descriptions become more structured without feeling formulaic
+
+**Failure (kill or redesign):**
+- Token cost increase > 25% with no measurable quality improvement
+- Scaffold steps are parroted mechanically without genuine reasoning (cargo cult)
+- Decision recording frequency drops (scaffold overhead discourages recording)
+- Tim reports the agent feels formulaic or robotic in decision conversations
+
+**Pre-pilot baseline task:** Before enabling scaffolds, run a query against the `decisions` table to capture:
+- Average reason count per decision (last 30 days)
+- Confidence distribution histogram (buckets: 0-0.5, 0.5-0.7, 0.7-0.85, 0.85-0.95, 0.95-1.0)
+- Decision-to-failure rate by confidence bucket
+- Average token usage per decision-frame turn
+Store as a fact in Heart for comparison at pilot end.
+
+### File Structure
+
+Create `nous/nous/cognitive/scaffolds.py` — standalone module, imported by `runner.py`.
+
+```python
+# nous/nous/cognitive/scaffolds.py
+"""Reasoning scaffold templates for cognitive frames.
+
+Each scaffold provides structured step-by-step reasoning guidance.
+Scaffolds REPLACE frame tool-nudge instructions (not layered on top).
+Phase 1: Decision frame only. Expand after measurement.
+"""
+
+# Full scaffold — injected on turn 1 of a frame
+DECISION_SCAFFOLD = """## Reasoning Scaffold — Decision Frame
+
+Follow these steps. You may skip steps that don't apply, but don't skip RECALL or CALIBRATE.
+
+1. **CONTEXT** — State what needs to be decided and why now.
+2. **RECALL** — Search memory for similar past decisions. Note what worked and what didn't.
+   → Use `recall_deep` to find relevant prior decisions.
+3. **CONSTRAINTS** — List hard constraints (time, resources, compatibility, censors).
+4. **OPTIONS** — Enumerate viable alternatives. If only one reasonable option exists, state why others were rejected.
+5. **TRADEOFFS** — For each option: pros, cons, risks, reversibility.
+   → Use `web_search` and `web_fetch` to research options if needed.
+6. **EVIDENCE** — What data supports each option? Flag gaps where you're guessing.
+7. **CALIBRATE** — Set confidence honestly. What would change your mind? What's your uncertainty?
+8. **DECIDE** — Record with `record_decision`. Include category, stakes, confidence, and structured reasons.
+
+Do NOT record status reports, routine completions, or greetings as decisions."""
+
+# Compressed reminder — injected on turn 2+ within same frame
+DECISION_SCAFFOLD_SHORT = (
+    "Continue following the Decision scaffold: "
+    "CONTEXT → RECALL → CONSTRAINTS → OPTIONS → TRADEOFFS → EVIDENCE → CALIBRATE → DECIDE. "
+    "Use `record_decision` for real decisions only."
+)
+
+
+import os
+
+# Kill switch — set NOUS_REASONING_SCAFFOLDS=false to disable all scaffolds
+SCAFFOLDS_ENABLED = os.getenv("NOUS_REASONING_SCAFFOLDS", "true").lower() in ("true", "1", "yes")
+
+
+def get_scaffold(frame_id: str, turn_in_frame: int, message_length: int) -> str | None:
+    """Return the appropriate scaffold for a frame and turn.
+
+    Args:
+        frame_id: The active cognitive frame identifier.
+        turn_in_frame: Which turn within this frame (1-indexed).
+        message_length: Character length of the user's message.
+
+    Returns:
+        Scaffold string, or None if no scaffold applies.
+    """
+    # Kill switch
+    if not SCAFFOLDS_ENABLED:
+        return None
+
+    # Phase 1: Only Decision frame gets a scaffold
+    if frame_id != "decision":
+        return None
+
+    # Turn-aware compression
+    if turn_in_frame <= 1:
+        return DECISION_SCAFFOLD
+    else:
+        return DECISION_SCAFFOLD_SHORT
+```
+
+### Integration Point — `runner.py::_get_frame_instructions()`
+
+```python
+# In _get_frame_instructions():
+from nous.cognitive.scaffolds import get_scaffold
+
+def _get_frame_instructions(self, turn_context: TurnContext) -> str:
+    frame_id = turn_context.frame.frame_id
+
+    # Check for reasoning scaffold (replaces tool nudges for scaffolded frames)
+    scaffold = get_scaffold(
+        frame_id=frame_id,
+        turn_in_frame=turn_context.turn_in_frame,  # tracked in runner.py turn loop
+        message_length=turn_context.message_length,  # set by runner.py from user message
+    )
+    if scaffold:
+        return scaffold
+
+    # Fallback to existing tool nudges for non-scaffolded frames
+    if frame_id == "task":
+        return "## Tool Instructions\n\n..."  # existing
+    # ... etc
+```
+
+Key: scaffolds **replace** the existing `_get_frame_instructions()` return for their frame. No layering. The tool guidance ("Use `recall_deep`", "Use `record_decision`") is merged INTO the scaffold steps.
+
+### Turn Tracking
+
+Add two fields to `TurnContext` in `nous/nous/cognitive/schemas.py`:
+
+```python
+class TurnContext(BaseModel):
+    # ... existing fields ...
+    turn_in_frame: int = 1       # resets when frame changes
+    message_length: int = 0      # char length of user message (for complexity gate)
+```
+
+**Tracking site: `runner.py` turn loop.** The runner already owns the turn loop and has access to the user message. Before calling `_get_frame_instructions()`:
+- Set `turn_context.message_length = len(user_message)`
+- Track frame changes: if `current_frame != previous_frame`, reset `turn_in_frame` to 1; otherwise increment
+- This avoids passing raw user messages into TurnContext (only the length is needed for the complexity gate)
+
+**Config flag:** Add `NOUS_REASONING_SCAFFOLDS=true` to `.env` (kill switch, follows existing `NOUS_*_ENABLED` pattern).
+
+Token savings from compression:
+- Turn 1: ~280 tokens (full scaffold)
+- Turn 2+: ~45 tokens (compressed)
+- 5-turn decision session: ~460 tokens total vs ~1,400 if full scaffold every turn (67% savings)
+
+---
+
+## Phase 1.5 — Expand to Other Frames (after 2-week pilot succeeds)
+
+Only proceed if Decision frame pilot meets success criteria.
+
+### Debug Frame Scaffold
+
+```python
+DEBUG_SCAFFOLD = """## Reasoning Scaffold — Debug Frame
+
+1. **SYMPTOM** — What's the observable problem? Error messages, unexpected behavior, log output.
+2. **REPRODUCE** — Can you trigger it reliably? What are the exact steps?
+3. **ISOLATE** — Narrow the search space. Which component, file, function?
+   → Use `bash` and `read_file` for investigation.
+4. **HYPOTHESIZE** — What could cause this? List 2-3 candidates ranked by likelihood.
+   → Use `recall_deep` to check for similar past bugs.
+5. **TEST** — Design a test for each hypothesis. Run it.
+   → Use `web_search` and `web_fetch` to look up error messages or docs.
+6. **VERIFY** — Confirm the root cause. Don't fix symptoms.
+7. **FIX** — Implement the fix. Explain why it addresses the root cause.
+8. **RECORD** — Store root cause with `learn_fact`. Record meaningful debugging decisions with `record_decision` (root cause identified, fix approach chosen). Do NOT record routine debug steps.
+
+Do NOT record routine status observations as decisions."""
+
+DEBUG_SCAFFOLD_SHORT = (
+    "Continue following the Debug scaffold: "
+    "SYMPTOM → REPRODUCE → ISOLATE → HYPOTHESIZE → TEST → VERIFY → FIX → RECORD. "
+    "Store root causes with `learn_fact`."
+)
+```
+
+### Task Frame Scaffold
+
+```python
+TASK_SCAFFOLD = """## Reasoning Scaffold — Task Frame
+
+1. **UNDERSTAND** — What's being asked? Restate the goal in your own terms.
+2. **PLAN** — Break into steps. Identify dependencies and order.
+   → Use `recall_deep` for relevant past work.
+3. **EXECUTE** — Work through each step. Use all available tools.
+4. **VERIFY** — Check your work. Does the output match the goal?
+5. **REPORT** — Summarize what was done and any follow-ups needed.
+   → Store important outcomes with `learn_fact`."""
+
+TASK_SCAFFOLD_SHORT = (
+    "Continue following the Task scaffold: "
+    "UNDERSTAND → PLAN → EXECUTE → VERIFY → REPORT."
+)
+```
+
+### Question Frame Scaffold
+
+```python
+QUESTION_SCAFFOLD = """## Reasoning Scaffold — Question Frame
+
+1. **RECALL** — Search memory first. What do you already know?
+   → Use `recall_deep` to search for relevant knowledge.
+2. **ASSESS** — Is memory sufficient, or do you need external info?
+   → Use `web_search` and `web_fetch` for current events or topics not in memory.
+3. **SYNTHESIZE** — Combine sources. Flag confidence level and gaps.
+4. **ANSWER** — Respond clearly. Cite sources when possible."""
+
+QUESTION_SCAFFOLD_SHORT = (
+    "Continue following the Question scaffold: "
+    "RECALL → ASSESS → SYNTHESIZE → ANSWER."
+)
+```
+
+### Complexity Gate (Phase 1.5)
+
+Add complexity sensing for lighter frames:
+
+```python
+# Frames that ALWAYS get scaffolds regardless of message length
+_ALWAYS_SCAFFOLD = {"decision", "debug", "task"}
+
+# Frames that skip scaffolds for very short messages
+_COMPLEXITY_GATED = {"question"}
+
+# Frames that NEVER get scaffolds
+_NEVER_SCAFFOLD = {"conversation", "creative", "initiation"}
+
+def get_scaffold(frame_id: str, turn_in_frame: int, message_length: int) -> str | None:
+    if frame_id in _NEVER_SCAFFOLD:
+        return None
+
+    if frame_id in _COMPLEXITY_GATED and message_length < 30:
+        return None
+
+    if frame_id in _ALWAYS_SCAFFOLD or frame_id in _COMPLEXITY_GATED:
+        # return appropriate scaffold based on frame_id and turn
+        ...
+```
+
+### Frames NOT scaffolded (rationale)
+
+- **Conversation** — should feel natural, not transactional. Existing minimal nudge is sufficient.
+- **Creative** — scaffolds constrain creative thinking. Keep it open.
+- **Initiation** — has its own guided flow (store_identity, complete_initiation).
+
+---
+
+## Research Subtask Scaffold (separate from frame system)
+
+"Research" is NOT a frame in FRAME_TOOLS. Research runs as subtasks via `spawn_task(frame_type="research")`.
+
+### Injection Point — `build_subtask_prefix()` in `nous/nous/api/tools.py`
+
+```python
+# In build_subtask_prefix():
+RESEARCH_SUBTASK_SCAFFOLD = (
+    "You are executing a research subtask. Follow this process:\n"
+    "1. SCOPE — Define what you're researching and success criteria.\n"
+    "2. RECALL — Use recall_deep to check what is already known. Don't re-research existing knowledge.\n"
+    "3. SEARCH — Use web_search with multiple query angles for gaps not covered by memory.\n"
+    "4. GATHER — Use web_fetch to read promising sources. Note conflicts between sources and memory.\n"
+    "5. SYNTHESIZE — Combine memory + new findings. Separate facts from inference.\n"
+    "6. DELIVER — Structured summary with key findings, sources, and confidence.\n"
+    "Deliver a clear, complete result. Do not ask questions."
+)
+
+def build_subtask_prefix(task: str, frame_type: str | None = None) -> str:
+    if frame_type == "research":
+        return f"{RESEARCH_SUBTASK_SCAFFOLD}\n\nTask: {task}"
+
+    # existing logic for other frame types
+    base = "You are executing a background subtask.\n..."
+    ...
+```
+
+---
+
+## Phase 2 — DB-Driven Templates (future, ~4h)
+
+**Phase 2 scope is narrower than v2 spec.** No scaffold compliance checking — that creates perverse incentives (performing steps for appearance rather than genuine reasoning). The existing `_validate_decision_quality()` in `deliberation.py` handles structural quality.
+
+Phase 2 is about **extensibility without code changes**:
+
+### Schema
+
+```sql
+CREATE TABLE reasoning_templates (
+    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+    frame_id VARCHAR(30) NOT NULL,              -- 'decision', 'debug', etc.
+    domain VARCHAR(50),                          -- 'architecture', 'security', 'performance', 'integration', 'process', 'memory', 'debugging'
+    name VARCHAR(100) NOT NULL,
+    steps JSONB NOT NULL,                        -- [{"name": "CONTEXT", "instruction": "...", "tool_hint": "recall_deep"}]
+    compressed TEXT NOT NULL,                    -- 1-line reminder version
+    is_default BOOLEAN DEFAULT false,
+    created_at TIMESTAMPTZ DEFAULT now(),
+    updated_at TIMESTAMPTZ DEFAULT now()
+);
+
+-- Concrete domain values (aligned with record_decision categories):
+-- architecture, security, performance, integration, process, memory, debugging
+-- NULL domain = default template for that frame
+```
+
+### Lookup logic
+
+```python
+def get_scaffold(frame_id, turn_in_frame, message_length, domain=None):
+    # 1. Check DB for domain-specific template
+    # 2. Fall back to DB default template (domain=NULL)
+    # 3. Fall back to hardcoded static scaffolds
+    ...
+```
+
+### CSTP Extension (future)
+
+When Cognition Engine protocol ships:
+- Templates served to connected agents via CSTP
+- Agent declares frame/domain → receives appropriate scaffold
+- Reasoning traces stored with decision artifacts
+- Domain template library as productizable feature
+
+---
+
+## Affected Files
+
+| File | Change |
+|------|--------|
+| `nous/nous/cognitive/scaffolds.py` | **NEW** — scaffold templates, `get_scaffold()` function, kill switch |
+| `nous/nous/api/runner.py` | Modify `_get_frame_instructions()` to call `get_scaffold()` first; track `turn_in_frame` + `message_length` in turn loop |
+| `nous/nous/api/tools.py` | Modify `build_subtask_prefix()` for research scaffold |
+| `nous/nous/cognitive/schemas.py` | Add `turn_in_frame: int` and `message_length: int` fields to `TurnContext` |
+| `.env` | Add `NOUS_REASONING_SCAFFOLDS=true` config flag |
+| `nous/nous/heart/heart.py` | Phase 2 only — template storage/retrieval |
+
+---
+
+## What This Spec Does NOT Do
+
+- **No scaffold compliance checking** — verifying "did you follow step 3?" creates perverse incentives. Removed per review.
+- **No conversation/creative/initiation scaffolds** — these frames don't benefit from structured reasoning.
+- **No automatic expansion** — each frame gets scaffolds only after the previous frame's pilot shows measurable improvement.
+- **No modifications to `record_decision` schema** — scaffolds improve the reasoning *input*, the decision schema captures the *output* unchanged.
+
+---
+
+## Token Budget
+
+### Phase 1 (Decision only)
+- Turn 1: ~280 tokens (full scaffold)
+- Turn 2+: ~45 tokens (compressed)
+- 5-turn decision session: ~460 tokens total
+- vs no compression: ~1,400 tokens (67% savings)
+- vs current tool nudges: ~150 tokens/turn → scaffolds add ~130 tokens on turn 1, save on subsequent turns
+
+### Phase 1.5 (All scaffolded frames)
+- Each scaffold: 150-280 tokens (turn 1), 30-50 tokens (turn 2+)
+- Worst case (decision): +130 tokens on turn 1 vs current nudges
+- Best case (question): ~120 tokens, similar to current nudges
+
+---
+
+## Revision History
+
+**v4 (2026-03-05) — Approved with required revisions (Round 2 review):**
+- **Kill switch re-added:** `NOUS_REASONING_SCAFFOLDS=true` env var with `SCAFFOLDS_ENABLED` gate in `get_scaffold()` (follows `NOUS_*_ENABLED` pattern; caught as regression from v2)
+- **TurnContext fixed:** `turn_context.user_message` reference replaced with `message_length: int` field (`user_message` doesn't exist on TurnContext)
+- **Affected files fixed:** `context.py` → `schemas.py` for TurnContext changes; added `.env` to affected files
+- **Turn tracking clarified:** `runner.py` is the tracking site; both `turn_in_frame` and `message_length` set there before `_get_frame_instructions()` call
+- **Research scaffold:** Added RECALL step (step 2) before SEARCH — check memory before hitting the web
+- **Failure criteria added:** Explicit kill/redesign conditions alongside success metrics
+- **Baseline metrics:** Promoted from open question to concrete pre-pilot task with specific queries
+
+**v3 (2026-03-05) — Post-architecture-review:**
+- **Rollout changed:** Decision frame pilot first (2 weeks), expand only after measurement
+- **Research scaffold:** Moved to `build_subtask_prefix()` in tools.py (not frame system)
+- **Tool nudge merge:** Scaffolds REPLACE `_get_frame_instructions()` output, not layer on top
+- **File path fixed:** `nous/nous/heart/heart.py` (was `nous/memory/heart.py`)
+- **scaffolds.py:** Extracted to `nous/nous/cognitive/scaffolds.py` from day 1
+- **Complexity gate:** Added intra-frame sensing (question frame skips for < 30 char messages)
+- **Phase 2 compliance checking removed:** Creates perverse incentives, existing `_validate_decision_quality()` is sufficient
+- **Success criteria added:** Measurable metrics for pilot evaluation
+
+**v2 (2026-03-05) — Post-review updates:**
+- Dropped conversation and initiation scaffolds
+- Renamed Research Frame → Research Subtask Scaffold
+- Softened OPTIONS step from "at least 2 alternatives" to "enumerate if viable"
+- Added turn-aware scaffold compression
+- Merged Phase 2 into existing Brain validation flow
+- Fixed complexity gate, Phase 3 schema, added run_python interaction section
+
+**v1 (2026-03-05) — Initial draft**
+
+---
+
+## Open Questions
+
+1. Should scaffolds be visible in the agent's response, or purely internal (thinking block only)?
+2. Template inheritance — should subtasks inherit the parent's scaffold or get their own based on frame_type?
+3. How should scaffold compression handle frame switches mid-conversation? (Reset turn count per frame switch — proposed default)