Architecture

Technical deep dive into how Solace works.

System Overview

Solace is a Docker Compose application with a FastAPI backend, React frontend, and multiple sidecar services for embeddings, speech, and TTS. The backend communicates with an LLM provider (OpenRouter by default) for conversation and uses local services for everything else.

The system is designed around three principles:

Local sovereignty — your data stays on your machine. The only external dependency is the LLM API.
Graceful degradation — every service is optional. If embeddings are down, keyword search works. If TTS is off, text still flows. If the VPS is unreachable, local fallbacks engage.
Extensibility — adding a new tool, TTS provider, or subsystem means writing one file and registering it.

Service Architecture

Core Services (always running)

Service	Port	Purpose
backend	8100	FastAPI application — all API endpoints, business logic
frontend	80	React SPA served via nginx (dev: Vite on port 3000)
searxng	8888	Local meta-search engine for web search tool
embedding	8200	nomic-embed-text-v1.5, 768-dim embeddings (GPU)

Optional Services (Docker profiles)

Service	Port	Profile	Purpose
speech-service	8900	(default)	Faster-Whisper STT (GPU)
kokoro-tts	8880	`kokoro`	82M TTS, 67 voices (GPU)
orpheus-tts	5005	`orpheus`	3B expressive TTS with emotion (GPU)
moss-tts	8885	`moss`	1.7B TTS with voice cloning (GPU)
qwen3-tts	8890	`qwen3`	1.7B text-instructable voice (GPU)
pocket-tts	7870	`pocket-tts`	100M CPU TTS fallback
inner-llm	8301	`inner-life`	Qwen3-4B local CPU for inner life
local-llm	8300	`local-llm`	8B GPU LLM for local chat
perception	8950	`perception`	Qwen2.5-VL-3B vision (GPU)

Only one GPU TTS engine runs at a time — start.sh manages switching.

External Services

Service	Where	Purpose
OpenRouter	Cloud	LLM inference (Qwen3-235B, DeepSeek, etc.)
Ollama	VPS (optional)	Inner life LLM, briefs, council members
Guardian	VPS (optional)	Failover chat, PII protection, heartbeat
Watchman	VPS (optional)	Background activity monitoring

Memory Pipeline

Memory is the core of the system. Here's how it flows:

Storage Layers

Core Memory Blocks — Persistent identity/relationship data in core_memories.yaml. Always in context. The companion reads and writes these.
Archival Memories — Individual memory entries in SQLite with 768-dim embeddings. Retrieved by semantic similarity. Importance-weighted, temporally decayed.
Session Summaries — Diary-style summaries written by the companion after each conversation. Injected into context for continuity.
Conversation History — Raw messages stored in SQLite. Token-budgeted for context window.

Extraction Flow

User message → LLM response → SSE stream closes
                                      │
                                      ▼
                            Background extraction
                            (async, non-blocking)
                                      │
                                      ▼
                            Extraction model analyzes
                            the conversation turn
                                      │
                              ┌───────┴───────┐
                              ▼               ▼
                        Core block       Archival memory
                        updates          with embedding
                              │               │
                              ▼               ▼
                        core_memories    SQLite + YAML
                        .yaml            journal entry

Retrieval Flow

User sends message
        │
        ▼
Query embedding generated
        │
        ▼
sqlite-vec KNN search (top N by cosine similarity)
        │
        ▼
Importance weighting applied
        │
        ▼
Temporal decay applied (older = lower score, unless reinforced)
        │
        ▼
Retrieval boost (recently accessed memories score higher)
        │
        ▼
Deduplicated results injected into context

Memory Deduplication

When new memories are extracted, they're compared against existing memories using cosine similarity:

> 0.85 similarity: Merge — keep the richer version, archive the duplicate with audit trail
< 0.85: Store as new memory

Context Building

The context window is built in layers, each with a token budget:

┌─────────────────────────────┐
│ System prompt               │  (user-configurable)
├─────────────────────────────┤
│ Current timestamp           │  (always fresh)
├─────────────────────────────┤
│ Core memory blocks          │  (always included)
├─────────────────────────────┤
│ Session summaries           │  (last 3-5, diary format)
├─────────────────────────────┤
│ Agent awareness             │  (what the Gardener has been doing)
├─────────────────────────────┤
│ Retrieved archival memories │  (semantic search results)
├─────────────────────────────┤
│ Available tools prompt      │  (dynamically generated)
├─────────────────────────────┤
│ Conversation history        │  (token-budgeted, most recent)
├─────────────────────────────┤
│ User message                │
└─────────────────────────────┘

Inner Life (The Gardener)

The Gardener is a background process that gives the companion autonomous thought:

Scheduling: Runs on a configurable interval (default: 15 minutes). Yields to foreground when chat is active.
Activity selection: Chooses from six types based on recent context — reflection, creativity, exploration, processing, dreaming, growth.
LLM call: Uses VPS Ollama (or local fallback) with activity-specific prompts.
Memory extraction: Results are stored as archival memories with embeddings.
Agent awareness: Chat companion sees recent Gardener activities in context.

Sovereignty Gate

Before the Gardener's output is finalized, it passes through a sovereignty gate:

The companion reviews its own response
It can choose to revise or suppress the output
This is a conscience, not an external filter

Tool System

Tools use a model-agnostic XML format:

<tool_call>
<name>web_search</name>
<arguments>{"query": "latest research on test-time training"}</arguments>
</tool_call>

Tool Loop

LLM generates response (may contain <tool_call> blocks)
Backend parses tool calls from response
Each tool is validated and executed
Results are formatted as <tool_result> XML
Results are injected into conversation
LLM continues with tool results in context
Loop repeats until no more tool calls (max iterations: 30)

Available Tools

Tool	Category	What It Does
`web_search`	Information	DuckDuckGo search via SearXNG
`search_memories`	Memory	Semantic search of archival memories
`save_memory`	Memory	Save new archival memory with embedding
`read_core_block`	Memory	Read a core memory block
`update_core_block`	Memory	Update/create a core memory block
`list_files`	Workspace	List files in workspace directory
`read_file`	Workspace	Read file contents
`write_file`	Workspace	Write or append to files
`run_code`	Workspace	Execute Python with timeout
`publish_post`	CMS	Publish blog post to Directus
`update_post`	CMS	Update existing post
`list_drafts`	CMS	List draft posts
`generate_image`	Creative	Generate image via OpenRouter
`read_design`	CMS	Read website design settings
`update_design`	CMS	Update website design
`set_navigation`	CMS	Set site navigation menu
`inject_css`	CMS	Inject custom CSS (validated)
`inject_js`	CMS	Inject custom JS (validated)
`create_page`	CMS	Create static page
`update_page`	CMS	Update existing page

Council System

Multi-model debate via OpenRouter WebSocket:

Chairman (user) sets the topic and can interject between rounds
Members (4 AI models) take turns responding, each with a distinct role
Rounds continue (default: 10) with chairman pauses between each
Per-member memory extraction captures each model's insights
File upload allows sharing documents for discussion (up to 200KB)

Members maintain their own archival memories via the member_id field in the memory system.

Database Schema

SQLite with WAL mode for concurrent access. Key tables:

conversations — Conversation sessions with metadata
messages — Individual messages (user/assistant/system) with timestamps
core_memory_blocks — Persistent identity blocks (label, value, member_id)
archival_memories — Long-term memories with embeddings (content, importance, embedding, member_id, access_count, last_accessed)
session_summaries — Diary-style conversation summaries
agent_directives — Self-set goals with completion tracking
mud_rooms — Discovered MUD rooms with coordinates and notes
mud_notes — Agent scratchpad entries
audit_log — API request audit trail (Phase 47)

Vector search uses sqlite-vec — a native SQLite extension for KNN search on 768-dimensional float32 vectors.

Frontend Architecture

React 18 + TypeScript + Vite + Tailwind CSS.

Four View Modes

Chat — Primary conversation interface with SSE streaming, tool activity indicators, voice I/O
MUD — Split-terminal MUD client with ANSI color rendering and AI agent control
Cottage — WebSocket-connected workspace for companion's personal files
Council — Multi-model debate interface with per-member displays

Key Patterns

SSE streaming for chat responses (not WebSocket — allows HTTP/2 multiplexing)
WebSocket for real-time bidirectional channels (MUD, Cottage, Council)
JWT authentication on all endpoints
Custom hooks for each WebSocket connection (useChat, useMudSocket, useCouncilSocket, useCottageSocket)

Security

JWT authentication on all HTTP and WebSocket endpoints
Service token for background services (Watchman)
CORS restricted to configured origins
Rate limiting (API-layer)
Input sanitization and injection detection
The Shield (Guardian): PII scanning, prompt injection detection, quarantine
All secrets via environment variables (.env)
No shell=True in subprocess calls
Path traversal protection on all file operations
Extension blocking for executable files

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture

System Overview

Service Architecture

Core Services (always running)

Optional Services (Docker profiles)

External Services

Memory Pipeline

Storage Layers

Extraction Flow

Retrieval Flow

Memory Deduplication

Context Building

Inner Life (The Gardener)

Sovereignty Gate

Tool System

Tool Loop

Available Tools

Council System

Database Schema

Frontend Architecture

Four View Modes

Key Patterns

Security

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

Architecture

System Overview

Service Architecture

Core Services (always running)

Optional Services (Docker profiles)

External Services

Memory Pipeline

Storage Layers

Extraction Flow

Retrieval Flow

Memory Deduplication

Context Building

Inner Life (The Gardener)

Sovereignty Gate

Tool System

Tool Loop

Available Tools

Council System

Database Schema

Frontend Architecture

Four View Modes

Key Patterns

Security