diff --git a/docs/features/F013-frame-splitting.md b/docs/features/F013-frame-splitting.md new file mode 100644 index 0000000..9384902 --- /dev/null +++ b/docs/features/F013-frame-splitting.md @@ -0,0 +1,206 @@ +# F013: Frame Splitting — Parallel Cognitive Frames via Sub-Agents + +## Status: Draft +## Priority: Medium +## Dependencies: F003 (Cognitive Layer), F009 (Async Subtasks) + +## Summary + +Enable Nous to decompose complex multi-faceted tasks into parallel cognitive sub-tasks, each running in a purpose-specific frame with appropriate tools, recall patterns, and working memory scope. Results from parallel frames are synchronized and synthesized into a unified response. + +## Motivation + +### Problem + +Nous currently operates in a single cognitive frame per turn. Complex tasks requiring multiple cognitive modes (research + analysis + writing, or investigation + decision + implementation) are processed serially within one frame. This creates: + +1. **Mode interference** — Research context pollutes decision-making, creative output is constrained by analytical framing +2. **Serial latency** — Multi-step tasks that could run in parallel are bottlenecked +3. **Cognitive dilution** — A single frame can't optimize for multiple task types simultaneously +4. **Underutilized infrastructure** — F009 subtasks exist but lack cognitive frame awareness + +### Solution + +Frame Splitting introduces a **Split → Execute → Synthesize** pattern: +- The parent agent identifies sub-problems and maps each to the most appropriate cognitive frame +- Sub-agents execute in parallel via the existing subtask worker pool, each with frame-specific configuration +- Results are collected at a synchronization barrier and synthesized into a coherent response + +### Minsky Alignment + +Society of Mind, Chapter 18 (Parallel Bundles): Multiple agencies work simultaneously on different aspects of a problem. Each agency operates within its own frame of reference, contributing partial solutions that are combined by higher-level processes. + +Chapter 19 (Words and Ideas): The meaning of a concept emerges from the interaction of multiple partial representations — each frame contributes a perspective that no single frame could provide alone. + +## Design + +### Core Concepts + +#### Frame Split +A cognitive operation where the parent agent decomposes a task into N parallel sub-tasks, each assigned to a specific frame type. A Frame Split is: +- **Planned** — The parent agent explicitly decides the decomposition +- **Frame-typed** — Each sub-task carries a target frame (question, decision, creative, task, debug) +- **Scoped** — Each sub-agent receives filtered context, not the full conversation +- **Bounded** — Maximum split width and depth are configurable + +#### Frame-Aware Subtask +An extension of F009 subtasks that carries: +- Explicit frame type assignment (overrides auto-detection) +- Frame-specific tool set (from FRAME_TOOLS) +- Frame-specific system prompt overlay (from _get_frame_instructions) +- Scoped context (working memory subset, relevant recalls) +- Frame-specific recall boost patterns (frame_boost from CognitiveLayer) + +#### Sync Barrier +A coordination point where the parent agent waits for all sub-agents in a split to complete before proceeding. Unlike fire-and-forget subtasks (spawn_task), Frame Splits use a blocking-within-turn pattern with configurable timeout. + +#### Synthesis +The merge step where parallel frame outputs are combined into a coherent response. Three modes: +1. **Inline** — Parent agent receives all results as tool output and synthesizes in the same turn +2. **Agent** — A dedicated synthesis sub-agent evaluates and combines (more expensive, higher quality) +3. **Template** — Structured concatenation based on frame type ordering (cheapest, lowest quality) + +### Architecture + +``` +┌─────────────────────────────────────┐ +│ Parent Agent Turn │ +│ │ +│ 1. Identify multi-faceted task │ +│ 2. Plan frame decomposition │ +│ 3. Call split_frames tool │ +└──────────────┬──────────────────────┘ + │ + ┌──────────┼──────────┐ + ▼ ▼ ▼ +┌─────────┐ ┌─────────┐ ┌─────────┐ +│ Question│ │Decision │ │ Task │ +│ Frame │ │ Frame │ │ Frame │ +│ │ │ │ │ │ +│Research │ │Evaluate │ │ Build │ +│& recall │ │& decide │ │& create │ +└────┬────┘ └────┬────┘ └────┬────┘ + │ │ │ + └───────────┼───────────┘ + ▼ + ┌────────────────────────┐ + │ Sync Barrier │ + │ (await all results) │ + └───────────┬────────────┘ + ▼ + ┌────────────────────────┐ + │ Synthesis Step │ + │ (merge results) │ + └───────────┬────────────┘ + ▼ + ┌────────────────────────┐ + │ Response to User │ + └────────────────────────┘ +``` + +### New Tool: split_frames + +```python +def split_frames( + subtasks: list[FrameSplitTask], + synthesis_mode: str = "inline", # inline | agent | template + timeout_seconds: int = 120, +) -> FrameSplitResult +``` + +Each FrameSplitTask: +```python +{ + "task": str, # What this sub-agent should do + "frame_type": str, # question | decision | creative | task | debug + "context_hints": list[str], # Working memory items / keywords to include + "max_turns": int, # Max tool-use loops for this sub-agent (default 5) +} +``` + +### Context Scoping + +Each sub-agent receives: +- The specific sub-task instruction (as user message) +- Core identity (from character config, always included) +- Active censors (always enforced) +- Relevant working memory items (filtered by context_hints) +- Access to shared memory (Brain, Heart) for recall +- **No access** to sibling sub-agent state (isolation) +- **No access** to parent conversation history (clean slate) + +### Result Schema + +```python +@dataclass +class FrameSplitResult: + split_id: str + subtasks: list[FrameSplitSubResult] + synthesis: str | None # Populated if synthesis_mode is "agent" or "template" + total_duration_ms: int + +@dataclass +class FrameSplitSubResult: + task_id: str + frame_type: str + task: str + result: str + confidence: float | None + artifacts: list[str] # files created, decisions recorded, facts learned + duration_ms: int + tool_calls_count: int + status: str # completed | failed | timeout +``` + +### Safety & Limits + +| Constraint | Default | Configurable | +|---|---|---| +| Max split width | 5 sub-agents | Yes | +| Max split depth | 1 (no recursive splits) | Future (max 3) | +| Max turns per sub-agent | 5 | Yes, per sub-task | +| Timeout per sub-agent | 60s | Yes | +| Timeout for full split | 120s | Yes | +| Max concurrent sub-agents | 3 | Via worker pool | + +### Memory Coherence + +Sub-agents share read access to Heart/Brain but have isolated working memory: +- **Facts**: Sub-agents can create facts (additive, no conflicts) +- **Decisions**: Sub-agents can record decisions (additive, no conflicts) +- **Episodes**: Each sub-agent creates its own episode +- **Censors**: Sub-agents cannot create censors (parent only) +- **Working memory**: Isolated per sub-agent (no cross-contamination) + +## Integration Points + +### F003 Cognitive Layer +- Frame selection for sub-agents uses fixed assignment (overrides auto-detection) +- Pre/post turn hooks still execute per sub-agent +- Deliberation traces recorded per sub-agent + +### F009 Async Subtasks +- Frame Splits use the existing SubtaskWorkerPool +- New subtask_type: "frame_split" (vs existing "spawn") +- Split_id groups related subtasks for barrier synchronization + +### F006 Event Bus +- New events: frame_split.started, frame_split.completed, frame_split.failed +- Per-sub-agent events: frame_split.subtask.started, frame_split.subtask.completed + +## Open Questions + +1. **Automatic splitting**: Should Nous auto-detect split-worthy tasks, or always require explicit tool call? +2. **Cost governance**: Per-turn cost ceiling for splits? Budget parameter? +3. **Streaming**: Can sub-agent progress be streamed to user during execution? +4. **Partial failure**: If one sub-agent fails, retry just that one or abort all? +5. **Dependencies**: Should sub-tasks be able to depend on each other (DAG vs parallel)? +6. **Model selection**: Should sub-agents use cheaper models (Haiku) for simple frames? + +## Risks + +1. **Cost explosion** — 5 parallel sub-agents × 5 turns = 25 API calls per split +2. **Complexity** — Significant architectural addition +3. **Synthesis quality** — Merging parallel outputs coherently is hard +4. **Debugging** — Parallel execution harder to trace than serial +5. **Latency** — Sync barrier + synthesis adds overhead despite parallelism diff --git a/docs/implementation/012.1-frame-splitting.md b/docs/implementation/012.1-frame-splitting.md new file mode 100644 index 0000000..4a00315 --- /dev/null +++ b/docs/implementation/012.1-frame-splitting.md @@ -0,0 +1,582 @@ +# 012.1 Frame Splitting — Parallel Cognitive Frames via Sub-Agents + +**Feature**: F013 Frame Splitting +**Priority**: P2 +**Estimated effort**: 10-13 hours (3 phases) +**Dependencies**: F003 (Cognitive Layer), F009 (Async Subtasks) — both shipped +**Review score**: 7.0/10 composite (3-agent review) + +## Overview + +Add the ability for Nous to decompose complex tasks into parallel cognitive sub-agents, each running in a purpose-specific frame. This builds on the existing subtask worker pool (F009) by adding frame-awareness, a synchronization barrier, and inline result synthesis. + +The implementation follows a **Split → Execute → Synthesize** pattern: +1. Parent agent calls `split_frames` tool with a list of frame-typed sub-tasks +2. Sub-agents execute in parallel via the subtask worker pool, each with frame-specific tools and prompts +3. Results are collected at a sync barrier and returned to the parent for inline synthesis + +### Minsky Reference +Society of Mind Ch. 18 (Parallel Bundles): Multiple agencies work simultaneously on different aspects of a problem, each in its own frame of reference. + +## Architecture + +### Data Flow + +``` +User message → Parent agent turn + → Parent calls split_frames([ + {task: "Research X", frame: "question"}, + {task: "Evaluate Y", frame: "decision"}, + {task: "Draft Z", frame: "creative"} + ]) + → Tool handler creates 3 Subtask records (type="frame_split", split_id=UUID) + → Worker pool executes 3 sub-agents in parallel + → Each sub-agent gets: identity + censors + frame tools + frame prompt + task instruction + → Each sub-agent runs up to max_turns tool loops + → Results collected as FrameSplitSubResult + → Sync barrier: asyncio.gather with timeout + → Results returned as tool output to parent + → Parent synthesizes and responds to user +``` + +### Key Design Decisions + +1. **New tool (not extended spawn_task)** — split_frames has blocking/sync semantics that are fundamentally different from fire-and-forget spawn_task. Mixing them would confuse the agent. + +2. **Inline synthesis only (v1)** — Parent agent receives all sub-agent results as tool output and synthesizes in the same turn. No separate synthesis agent. + +3. **No recursive splits (v1)** — Sub-agents cannot call split_frames. Max depth = 1. + +4. **Explicit invocation only** — No auto-detection of split-worthy tasks. The agent must choose to split. + +5. **Frame assignment by parent** — Parent specifies which frame each sub-task runs in. No self-correction by sub-agents. + +## Schema Changes + +### Subtask Model (Alembic migration) + +Add two columns to the `subtasks` table: + +```python +# In nous/storage/models.py - Subtask class +frame_type: Mapped[str | None] = mapped_column(String(32), nullable=True, default=None) +split_id: Mapped[str | None] = mapped_column(String(36), nullable=True, default=None, index=True) +``` + +- `frame_type`: The cognitive frame this subtask runs in (question, decision, creative, task, debug). NULL for regular spawn_task subtasks. +- `split_id`: Groups related subtasks belonging to the same frame split. Indexed for efficient barrier queries. + +Also update `subtask_type` to include "frame_split": +```python +subtask_type: Mapped[str] = mapped_column(String(32), default="spawn") +# Values: "spawn" (existing), "schedule" (existing), "frame_split" (new) +``` + +### New Pydantic Schemas + +```python +# In nous/cognitive/schemas.py + +class FrameSplitTask(BaseModel): + """A single sub-task within a frame split.""" + task: str # What this sub-agent should do + frame_type: str # question | decision | creative | task | debug + context_hints: list[str] = [] # Keywords for filtering working memory + max_turns: int = 5 # Max tool-use loops + +class FrameSplitRequest(BaseModel): + """Request to split work across parallel frames.""" + subtasks: list[FrameSplitTask] + synthesis_mode: str = "inline" # Only "inline" in v1 + timeout_seconds: int = 120 + +class FrameSplitSubResult(BaseModel): + """Result from a single sub-agent.""" + task_id: str + frame_type: str + task: str + result: str + status: str # completed | failed | timeout + duration_ms: int + tool_calls_count: int = 0 + artifacts: list[str] = [] # Files, decisions, facts created + +class FrameSplitResult(BaseModel): + """Aggregated result from a frame split.""" + split_id: str + subtasks: list[FrameSplitSubResult] + total_duration_ms: int + subtasks_completed: int + subtasks_failed: int +``` + +## Implementation Details + +### Phase 1: Frame-Aware Subtask Worker (~4-6h) + +#### 1.1 Modify SubtaskWorker prompt building + +Current state (subtask_worker.py): +```python +system_prompt = f"You are Nous, completing a background task.\n\nTask: {task.instruction}" +``` + +New state — build a proper frame-aware prompt: + +```python +# nous/handlers/subtask_worker.py + +async def _build_frame_prompt(self, task: Subtask) -> str: + """Build a frame-aware system prompt for sub-agents.""" + parts = [] + + # 1. Core identity (always include) + identity = await self._load_identity() + parts.append(identity) + + # 2. Active censors (always enforce) + censors = await self._load_censors() + if censors: + parts.append(f"## Active Censors\n\n{censors}") + + # 3. Frame-specific instructions + if task.frame_type: + frame_instructions = self._get_frame_instructions(task.frame_type) + if frame_instructions: + parts.append(frame_instructions) + + # 4. Task instruction + parts.append(f"## Task\n\n{task.instruction}") + + # 5. Scoped working memory (if context_hints provided) + if task.metadata and task.metadata.get("context_hints"): + wm_items = await self._load_scoped_working_memory(task.metadata["context_hints"]) + if wm_items: + parts.append(f"## Relevant Context\n\n{wm_items}") + + return "\n\n".join(parts) +``` + +#### 1.2 Apply frame-specific tool gating + +```python +# In subtask_worker.py - when calling run_turn + +async def _execute_subtask(self, task: Subtask) -> str: + """Execute a subtask with optional frame-specific configuration.""" + system_prompt = await self._build_frame_prompt(task) + + # Determine tool set based on frame + if task.frame_type: + from nous.api.runner import FRAME_TOOLS + allowed_tools = FRAME_TOOLS.get(task.frame_type, []) + else: + allowed_tools = None # Default: all tools (existing behavior) + + # Remove split_frames from sub-agent tools (no recursive splits) + if allowed_tools and "split_frames" in allowed_tools: + allowed_tools = [t for t in allowed_tools if t != "split_frames"] + + result = await self.runner.run_turn( + system_prompt=system_prompt, + user_message=task.instruction, + allowed_tools=allowed_tools, + max_turns=task.metadata.get("max_turns", 5) if task.metadata else 5, + ) + return result +``` + +#### 1.3 Alembic migration + +```python +# alembic/versions/xxx_add_frame_split_columns.py + +def upgrade(): + op.add_column('subtasks', sa.Column('frame_type', sa.String(32), nullable=True)) + op.add_column('subtasks', sa.Column('split_id', sa.String(36), nullable=True)) + op.create_index('ix_subtasks_split_id', 'subtasks', ['split_id']) + +def downgrade(): + op.drop_index('ix_subtasks_split_id', 'subtasks') + op.drop_column('subtasks', 'split_id') + op.drop_column('subtasks', 'frame_type') +``` + +### Phase 2: split_frames Tool + Sync Barrier (~3-4h) + +#### 2.1 Tool definition + +```python +# In nous/api/tools.py - add to BUILTIN_TOOLS + +{ + "name": "split_frames", + "description": ( + "Split a complex task into parallel sub-tasks, each running in a specific " + "cognitive frame. Sub-agents execute in parallel and results are returned " + "together. Use when a task has distinct facets that benefit from different " + "cognitive modes (e.g., research + decision + writing)." + ), + "input_schema": { + "type": "object", + "properties": { + "subtasks": { + "type": "array", + "description": "List of sub-tasks to execute in parallel", + "items": { + "type": "object", + "properties": { + "task": { + "type": "string", + "description": "What this sub-agent should do" + }, + "frame_type": { + "type": "string", + "enum": ["question", "decision", "creative", "task", "debug"], + "description": "Cognitive frame for this sub-task" + }, + "context_hints": { + "type": "array", + "items": {"type": "string"}, + "description": "Keywords to filter relevant working memory" + }, + "max_turns": { + "type": "integer", + "default": 5, + "description": "Max tool-use loops for this sub-agent" + } + }, + "required": ["task", "frame_type"] + }, + "minItems": 2, + "maxItems": 5 + }, + "timeout_seconds": { + "type": "integer", + "default": 120, + "description": "Max seconds to wait for all sub-agents" + } + }, + "required": ["subtasks"] + } +} +``` + +#### 2.2 Tool handler + +```python +# In nous/api/tools.py or nous/handlers/frame_split.py + +async def handle_split_frames( + params: dict, + session_id: str, + storage: StorageManager, + worker_pool: SubtaskWorkerPool, +) -> str: + """Handle split_frames tool call.""" + import asyncio + import uuid + from datetime import datetime + + split_id = str(uuid.uuid4()) + timeout = params.get("timeout_seconds", 120) + subtask_specs = params["subtasks"] + + # Validate + if len(subtask_specs) < 2: + return "Error: split_frames requires at least 2 subtasks" + if len(subtask_specs) > 5: + return "Error: split_frames supports at most 5 subtasks" + + # Create subtask records + subtask_ids = [] + for spec in subtask_specs: + task = Subtask( + id=str(uuid.uuid4()), + session_id=session_id, + instruction=spec["task"], + subtask_type="frame_split", + frame_type=spec["frame_type"], + split_id=split_id, + status="pending", + metadata={ + "context_hints": spec.get("context_hints", []), + "max_turns": spec.get("max_turns", 5), + }, + created_at=datetime.utcnow(), + ) + await storage.save_subtask(task) + subtask_ids.append(task.id) + + # Submit to worker pool and collect futures + start_time = datetime.utcnow() + futures = [] + for task_id in subtask_ids: + future = await worker_pool.submit(task_id) + futures.append((task_id, future)) + + # Sync barrier: wait for all with timeout + results = [] + try: + completed = await asyncio.wait_for( + asyncio.gather(*[f for _, f in futures], return_exceptions=True), + timeout=timeout, + ) + + for i, (task_id, _) in enumerate(futures): + task = await storage.get_subtask(task_id) + result_value = completed[i] + + if isinstance(result_value, Exception): + results.append({ + "task_id": task_id, + "frame_type": task.frame_type, + "task": task.instruction, + "result": f"Error: {str(result_value)}", + "status": "failed", + "duration_ms": _elapsed_ms(task.created_at), + }) + else: + results.append({ + "task_id": task_id, + "frame_type": task.frame_type, + "task": task.instruction, + "result": task.result or str(result_value), + "status": "completed", + "duration_ms": _elapsed_ms(task.created_at), + }) + + except asyncio.TimeoutError: + # Collect whatever finished, mark rest as timeout + for task_id, future in futures: + task = await storage.get_subtask(task_id) + if future.done(): + results.append({ + "task_id": task_id, + "frame_type": task.frame_type, + "task": task.instruction, + "result": task.result or "Completed", + "status": "completed", + "duration_ms": _elapsed_ms(task.created_at), + }) + else: + future.cancel() + results.append({ + "task_id": task_id, + "frame_type": task.frame_type, + "task": task.instruction, + "result": "Sub-agent timed out", + "status": "timeout", + "duration_ms": timeout * 1000, + }) + + total_ms = _elapsed_ms(start_time) + completed_count = sum(1 for r in results if r["status"] == "completed") + failed_count = len(results) - completed_count + + # Format results for parent agent + output_parts = [ + f"## Frame Split Results (split_id: {split_id})", + f"**Duration**: {total_ms}ms | **Completed**: {completed_count}/{len(results)}", + "", + ] + + for r in results: + status_icon = "✅" if r["status"] == "completed" else "❌" if r["status"] == "failed" else "⏰" + output_parts.append(f"### {status_icon} [{r['frame_type'].upper()}] {r['task'][:60]}") + output_parts.append(r["result"]) + output_parts.append("") + + if failed_count > 0: + output_parts.append(f"⚠️ {failed_count} sub-task(s) failed or timed out.") + + output_parts.append("\nSynthesize the above results into a coherent response for the user.") + + return "\n".join(output_parts) +``` + +#### 2.3 Register tool in FRAME_TOOLS + +```python +# In nous/api/runner.py - FRAME_TOOLS + +# Add split_frames to conversation and task frames only +"conversation": [..., "split_frames"], +"task": ["*"], # Already has all tools +``` + +#### 2.4 Prevent recursive splits + +```python +# In subtask_worker.py - when building tool list for sub-agents +# Always exclude split_frames from sub-agent tool sets +if "split_frames" in allowed_tools: + allowed_tools = [t for t in allowed_tools if t != "split_frames"] +# If allowed_tools is ["*"], expand it and exclude split_frames +if allowed_tools == ["*"]: + allowed_tools = [t for t in ALL_TOOLS if t != "split_frames"] +``` + +### Phase 3: Result Enrichment + Measurement (~2-3h) + +#### 3.1 Structured sub-agent output + +Modify the subtask worker to collect execution metadata: + +```python +# In subtask_worker.py + +class SubtaskExecutionResult: + """Rich result from sub-agent execution.""" + text: str + tool_calls_count: int + duration_ms: int + artifacts: list[str] # Files written, decisions recorded + status: str + + @classmethod + def from_run_turn(cls, result, start_time, tool_log): + return cls( + text=result, + tool_calls_count=len(tool_log), + duration_ms=_elapsed_ms(start_time), + artifacts=_extract_artifacts(tool_log), + status="completed", + ) +``` + +#### 3.2 Cost tracking + +```python +# Track API costs per split +class FrameSplitMetrics: + split_id: str + total_api_calls: int + total_input_tokens: int + total_output_tokens: int + total_duration_ms: int + subtask_count: int + completed_count: int + +# Log to event bus +await event_bus.emit("frame_split.completed", metrics.dict()) +``` + +#### 3.3 Status messages + +During long-running splits, emit status to the user: + +```python +# In handle_split_frames, after submitting tasks: +await notify_user( + session_id, + f"⏳ Running {len(subtask_specs)} parallel sub-agents " + f"({', '.join(s['frame_type'] for s in subtask_specs)})...", + ephemeral=True, # Status message, not permanent +) +``` + +## Testing Requirements + +### Unit Tests + +1. **Frame prompt building** — Verify sub-agent prompts include identity, censors, frame instructions, and scoped working memory +2. **Tool gating** — Verify sub-agents get frame-appropriate tools, never split_frames +3. **Schema validation** — Verify FrameSplitTask validation (min 2, max 5, valid frame types) +4. **Timeout handling** — Verify partial results returned on timeout +5. **Result formatting** — Verify tool output is well-structured for parent synthesis + +### Integration Tests + +1. **Parallel execution** — 3 sub-agents run concurrently and return results +2. **Frame isolation** — Sub-agents in different frames get different tools +3. **Memory isolation** — Sub-agents don't see each other's working memory +4. **Censor propagation** — Active censors apply to all sub-agents +5. **Partial failure** — One sub-agent fails, others succeed, results collected correctly + +### E2E Tests + +1. **Full split-execute-synthesize** — Parent splits a task, sub-agents execute, parent synthesizes +2. **Real frame differentiation** — Question frame sub-agent uses recall_deep, decision frame uses record_decision +3. **Cost tracking** — Verify metrics are logged per split + +## Acceptance Criteria + +### Phase 1 (Frame-Aware Worker) +- [ ] Subtask model has frame_type and split_id columns +- [ ] Alembic migration runs cleanly +- [ ] Subtask worker builds frame-aware prompts when frame_type is set +- [ ] Sub-agents receive frame-appropriate tools +- [ ] Existing spawn_task subtasks work unchanged (backward compatible) +- [ ] Censors are enforced in sub-agents + +### Phase 2 (split_frames Tool) +- [ ] split_frames tool is registered and available in conversation/task frames +- [ ] Tool creates N parallel subtasks with correct frame_type and split_id +- [ ] Sync barrier waits for all sub-agents with configurable timeout +- [ ] Partial results returned on timeout (completed tasks + timeout markers) +- [ ] Results formatted as structured tool output for parent synthesis +- [ ] Recursive splits prevented (sub-agents cannot call split_frames) +- [ ] Min 2, max 5 subtasks enforced + +### Phase 3 (Enrichment) +- [ ] Sub-agent results include tool_calls_count, duration_ms, artifacts +- [ ] Cost metrics logged per split via event bus +- [ ] Status message sent to user during long-running splits + +## Configuration + +```python +# In nous/config.py or environment variables + +FRAME_SPLIT_MAX_WIDTH = 5 # Max sub-agents per split +FRAME_SPLIT_MAX_DEPTH = 1 # No recursive splits in v1 +FRAME_SPLIT_DEFAULT_TIMEOUT = 120 # Seconds +FRAME_SPLIT_SUBTASK_TIMEOUT = 60 # Per sub-agent timeout +FRAME_SPLIT_MAX_TURNS = 5 # Default max tool loops per sub-agent +FRAME_SPLIT_ALLOWED_FRAMES = [ # Frames that can invoke split_frames + "conversation", "task" +] +``` + +## Dependencies and Risks + +### Dependencies +- F009 SubtaskWorkerPool must be operational (✅ shipped) +- F003 Cognitive Layer frames must be stable (✅ shipped) +- Worker pool must support concurrent execution (✅ uses asyncio) + +### Risks + +1. **Cost explosion** — 5 sub-agents × 5 turns = 25 API calls. Mitigation: default max_turns=5, max_width=5, cost tracking. +2. **Timeout cascading** — Sub-agent timeouts + sync barrier timeout + API timeout. Mitigation: conservative defaults (60s sub-agent, 120s barrier). +3. **Synthesis quality** — Parent may struggle to merge heterogeneous outputs. Mitigation: structured output format, synthesis prompt in tool output. +4. **Worker pool contention** — Frame splits compete with regular subtasks for worker pool capacity. Mitigation: worker pool size is configurable. +5. **Backward compatibility** — Must not break existing spawn_task / schedule_task flows. Mitigation: frame_type and split_id are nullable, worker checks before applying frame config. + +## Future Enhancements (Not in v1) + +- **DAG dependencies** — Sub-tasks can depend on each other (sequencing within a split) +- **Agent synthesis mode** — Dedicated synthesis sub-agent for complex merges +- **Recursive splits** — Sub-agents can split further (depth > 1) +- **Model selection** — Use cheaper models (Haiku) for simple frame sub-tasks +- **Streaming progress** — Stream sub-agent progress to user during execution +- **Auto-detection** — Cognitive layer automatically detects split-worthy tasks +- **Frame negotiation** — Sub-agents can request frame change if misassigned +- **Cross-agent messaging** — Sibling sub-agents share intermediate results + +## Files Modified + +### New Files +- `nous/handlers/frame_split.py` — split_frames tool handler +- `nous/cognitive/frame_split_schemas.py` — FrameSplitTask, FrameSplitResult, etc. +- `alembic/versions/xxx_add_frame_split_columns.py` — Migration +- `tests/unit/test_frame_split.py` +- `tests/integration/test_frame_split_e2e.py` + +### Modified Files +- `nous/storage/models.py` — Add frame_type, split_id to Subtask +- `nous/handlers/subtask_worker.py` — Frame-aware prompt building, tool gating +- `nous/api/tools.py` — Register split_frames tool +- `nous/api/runner.py` — Add split_frames to FRAME_TOOLS map +- `nous/cognitive/schemas.py` — Add frame split schemas (or separate file) diff --git a/docs/reviews/F013-3-agent-review.md b/docs/reviews/F013-3-agent-review.md new file mode 100644 index 0000000..9f55776 --- /dev/null +++ b/docs/reviews/F013-3-agent-review.md @@ -0,0 +1,196 @@ +# F013 Frame Splitting — 3-Agent Review + +**Date**: 2025-03-03 +**Spec Reviewed**: F013-frame-splitting.md (Draft) +**Review Format**: Three specialized agent perspectives + +--- + +## Agent 1: Architecture Reviewer + +**Focus**: System design, Minsky alignment, existing patterns, scalability + +### Strengths + +1. **Clean Minsky alignment** — Chapter 18 (Parallel Bundles) directly supports this pattern. The idea that multiple agencies work simultaneously on different facets of a problem is core to Society of Mind. Frame Splitting operationalizes this. + +2. **Natural infrastructure extension** — Builds on two proven systems (F003 frames, F009 subtasks) rather than introducing entirely new infrastructure. The SubtaskWorkerPool, FRAME_TOOLS map, and _get_frame_instructions() all exist and can be reused directly. + +3. **Context isolation is correct** — Preventing sub-agents from seeing sibling state avoids the "cognitive contamination" problem. Each frame operates independently, producing cleaner outputs. + +4. **Bounded by default** — Width limits, depth limits, and timeouts prevent runaway splits. Good defensive design. + +### Concerns + +1. **Sync barrier is a paradigm shift** — Current subtasks are fire-and-forget (spawn_task). Frame Splits require blocking-within-turn semantics (the parent tool call blocks while sub-agents execute). This is fundamentally different and adds complexity to the runner's tool execution model. The tool execution timeout needs to accommodate multiple sub-agent turns. + +2. **Synthesis is the weakest link** — The spec offers three synthesis modes but doesn't deeply address the hardest problem: how to merge potentially contradictory or overlapping outputs. "Inline" synthesis (parent LLM just gets all results) works but means the parent does another expensive LLM call with potentially large context. This is the "reduce" in map-reduce and deserves more design attention. + +3. **Frame selection fidelity** — The parent agent must correctly assign frame types to sub-tasks. If it assigns "question" to a task that needs "decision" tools, the sub-agent is handicapped. There's no self-correction mechanism. Consider allowing sub-agents to request a frame change. + +4. **Over-engineering risk** — Many tasks that *could* be split don't *need* to be split. Serial processing in a single frame is often adequate and cheaper. Without clear heuristics for when splitting adds value, it may be used unnecessarily. + +5. **Memory write semantics** — The spec says facts and decisions are additive (no conflicts), but semantic conflicts are possible. Two sub-agents could learn contradictory facts or make contradictory decisions from different perspectives. Need a coherence check during synthesis. + +### Score: 7.0 / 10 + +### Recommendations +- Start with inline synthesis only. Agent synthesis and template merge can come later. +- Add a "split plan review" step where the parent agent evaluates its own decomposition before executing (self-check). +- Consider a frame negotiation mechanism where sub-agents can flag frame misfit. +- Implement cost tracking per split to build empirical data on value-vs-cost. + +--- + +## Agent 2: Implementation Reviewer + +**Focus**: Code complexity, existing infrastructure reuse, testing strategy, migration path + +### Strengths + +1. **High code reuse** — The core pieces exist: + - `SubtaskWorkerPool` handles worker lifecycle and concurrency + - `FRAME_TOOLS` maps frame_id → tool list + - `_get_frame_instructions()` generates frame-specific prompts + - `ContextAssembler` builds prompts with identity, censors, working memory + - `AgentRunner.run_turn()` already handles the full cognitive loop + +2. **Clear schema changes** — Adding `frame_type` and `split_id` to the Subtask model is a simple Alembic migration. The FrameSplitResult dataclass maps cleanly to existing patterns. + +3. **Worker modification is contained** — The subtask_worker.py changes are isolated: read frame_type from subtask, apply frame config when building prompt. The worker already calls `AgentRunner.run_turn()` which handles everything downstream. + +4. **Testing strategy is clear** — Unit test frame assignment, integration test parallel execution with mock workers, E2E test a full split-execute-synthesize cycle. + +### Concerns + +1. **Timeout calculus is complex** — We have: per-sub-agent timeout, full split timeout, worker pool limits, and the Anthropic API response timeout (~10 min). These interact: + - If 3 sub-agents each take 60s, the split takes ~60s (parallel) + synthesis overhead + - But if a sub-agent is mid-tool-call when timeout hits, cleanup is messy + - The parent's tool call blocks for the full duration — Anthropic's API must not timeout first + +2. **Context assembly for sub-agents needs refactoring** — The current subtask_worker builds a minimal prompt: + ```python + system_prompt = f"You are Nous, completing a background task.\n\nTask: {task.instruction}" + ``` + For frame-aware sub-agents, we need proper context assembly: identity, censors, frame instructions, scoped working memory. This means either: + - Calling ContextAssembler from the worker (tight coupling) + - Building a lighter "sub-agent context builder" (new code) + +3. **Result enrichment** — Current subtask result is a text string. FrameSplitSubResult needs structured data (confidence, artifacts, tool_calls_count, duration). The worker needs to collect this during execution and serialize it. + +4. **Tool registration** — `split_frames` is a new tool that needs to be registered in the builtin_tools registry, added to FRAME_TOOLS for appropriate frames (which frames can split?), and have a handler implemented. The handler is complex: it creates subtasks, submits to pool, awaits results, and formats output. + +5. **Streaming during split** — The spec mentions this as an open question, but users will wonder what's happening during a 60-120s wait. Even a simple "working on 3 parallel tasks..." status would require streaming support changes. + +### Score: 6.5 / 10 + +### Recommendations + +**Phase the implementation:** + +- **Phase 1** (~4-6h): Add frame_type to Subtask model. Modify worker to apply frame config. No new tool yet — test via direct DB/API. +- **Phase 2** (~3-4h): Implement split_frames tool with sync barrier. Basic inline synthesis (results returned as tool output). +- **Phase 3** (~2-3h): Result enrichment (structured FrameSplitSubResult), cost tracking, status messages during execution. + +**Cut from v1:** +- Streaming sub-agent progress +- Recursive splits (depth > 1) +- Agent synthesis mode +- Template synthesis mode +- Model selection per sub-agent +- Cross-agent dependencies (DAG) + +**Total estimated effort: 10-13 hours across 3 phases** + +--- + +## Agent 3: Research Reviewer + +**Focus**: Comparison with literature, best practices, novel contribution + +### Strengths + +1. **AgentOS alignment (2025)** — AgentOS proposes treating agent tasks as OS processes with scheduling, isolation, and resource management. Frame Splitting maps directly: each sub-agent is a "process" with its own environment (frame), isolated memory (working memory scope), and shared filesystem (Heart/Brain). The sync barrier is analogous to process join/wait. + +2. **SCL validation (Structured Cognitive Layer)** — SCL advocates for modular cognition where specialized modules handle different cognitive functions. Frame types *are* cognitive modules. Frame Splitting enables parallel module activation, which SCL identifies as key for complex reasoning. + +3. **MIRIX memory model match** — MIRIX's multi-agent memory architecture recommends shared long-term memory with isolated working memory. F013's design exactly matches: sub-agents share Heart/Brain (long-term) but have isolated working memory (short-term). MIRIX found this pattern prevents interference while enabling knowledge sharing. + +4. **A-MEM (Agentic Memory)** — A-MEM's memory indexing approach supports the context_hints mechanism. Sub-agents can use targeted recall (semantic search) against shared memory, retrieving only relevant context for their specific sub-task. + +5. **Society of Mind grounding** — Beyond Ch. 18, this connects to Ch. 13 (Reformulation) — breaking a problem into parts handled by different agencies — and Ch. 15 (Diplomats and Compromises) — the synthesis step where frame outputs are negotiated into a coherent response. + +### Concerns + +1. **Map-reduce limitations** — Research on multi-agent task decomposition (Khot et al., 2023; Anthropic, 2024) shows map-reduce patterns work well for independent subtasks but poorly for interdependent ones. The spec assumes task independence, but many real tasks have dependencies: + - "Research options, then decide" — decision depends on research + - "Analyze code, then write tests" — tests depend on analysis + - Purely parallel tasks (research A, research B, research C) are less common + + **This limits the practical applicability of v1.** True Frame Splitting may need DAG support eventually. + +2. **Synthesis quality gap** — ACC (Agent Computer Collaboration) research identifies result merging as the hardest challenge in multi-agent collaboration. Key findings: + - Simple concatenation produces incoherent, redundant output + - Agent-based synthesis adds its own error modes (hallucination, information loss) + - Best results come from structured output formats that facilitate merging + - Recommendation: enforce a common result schema for all sub-agents + +3. **Decomposition quality** — The parent agent's ability to decompose tasks correctly is critical but untested. Research on self-decomposition (task planning) shows LLMs tend to: + - Over-decompose simple tasks (waste resources) + - Under-decompose complex tasks (miss aspects) + - Misassign subtask granularity + Consider a decomposition evaluation step before execution. + +4. **Cost-quality empirics** — No existing research establishes when parallel frames outperform serial processing for LLM agents. The spec should include a measurement framework to validate that splits actually improve quality/speed vs. serial processing. + +### Score: 7.5 / 10 + +### Recommendations + +- **Explicit-only for v1** — Always require the agent to call split_frames intentionally. No auto-detection. +- **Structured sub-agent output** — Enforce a common result schema (summary, key_findings, confidence, artifacts) to facilitate synthesis. +- **Decomposition review** — Add a self-check step where the parent evaluates its split plan before executing. +- **Measurement framework** — Track metrics per split: total cost, total latency, quality (user satisfaction or self-rated), vs. estimated serial cost/latency. Build empirical data. +- **DAG support in roadmap** — Even if v1 is parallel-only, design the data model to support dependencies later (precedence field on FrameSplitTask). + +--- + +## Composite Assessment + +### Scores +- Architecture: **7.0 / 10** +- Implementation: **6.5 / 10** +- Research: **7.5 / 10** +- **Composite: 7.0 / 10** + +### Cross-Cutting Recommendations (Consensus) + +1. **Phase the implementation** — All three reviewers agree: start minimal, iterate. Phase 1 = frame-aware subtasks, Phase 2 = sync barrier + tool, Phase 3 = synthesis + measurement. + +2. **Explicit invocation only** — No auto-splitting in v1. The agent must call split_frames intentionally. + +3. **Inline synthesis first** — Parent agent synthesizes results in its turn. Skip agent/template synthesis for v1. + +4. **Cut aggressive scope** — Remove from v1: + - Recursive splits (depth > 1) + - Streaming progress + - Agent synthesis mode + - Model selection per sub-agent + - DAG dependencies + - Auto-detection + +5. **Enforce structured output** — Sub-agents should return results in a common schema to aid synthesis quality. + +6. **Build measurement from day 1** — Track cost, latency, quality per split. This data informs whether and when to expand the feature. + +7. **Address dependency gap** — Design the data model to support task dependencies even if v1 doesn't implement them. Add a `depends_on` field to FrameSplitTask that's ignored in v1 but available for v2. + +### Open Decisions for 012.1 + +1. **New tool vs. extended spawn_task** — Architecture says extend, Implementation says new tool (different semantics). Recommendation: **new tool** — the blocking/sync semantics are fundamentally different from fire-and-forget spawn_task. + +2. **Which frames can split?** — Should all frames have access to split_frames, or only certain ones (task, conversation)? Recommendation: **task and conversation only** — these are the frames that handle complex, multi-faceted requests. + +3. **Sub-agent episode handling** — Should each sub-agent create a full episode, or should all sub-agents share the parent's episode? Recommendation: **sub-agents create child episodes linked to parent** — maintains audit trail. + +4. **Censor enforcement in sub-agents** — Must sub-agents respect all parent censors? Recommendation: **yes, always** — censors are safety-critical and must propagate.