Technical deep dive into how Solace works.
Solace is a Docker Compose application with a FastAPI backend, React frontend, and multiple sidecar services for embeddings, speech, and TTS. The backend communicates with an LLM provider (OpenRouter by default) for conversation and uses local services for everything else.
The system is designed around three principles:
- Local sovereignty — your data stays on your machine. The only external dependency is the LLM API.
- Graceful degradation — every service is optional. If embeddings are down, keyword search works. If TTS is off, text still flows. If the VPS is unreachable, local fallbacks engage.
- Extensibility — adding a new tool, TTS provider, or subsystem means writing one file and registering it.
| Service | Port | Purpose |
|---|---|---|
| backend | 8100 | FastAPI application — all API endpoints, business logic |
| frontend | 80 | React SPA served via nginx (dev: Vite on port 3000) |
| searxng | 8888 | Local meta-search engine for web search tool |
| embedding | 8200 | nomic-embed-text-v1.5, 768-dim embeddings (GPU) |
| Service | Port | Profile | Purpose |
|---|---|---|---|
| speech-service | 8900 | (default) | Faster-Whisper STT (GPU) |
| kokoro-tts | 8880 | kokoro |
82M TTS, 67 voices (GPU) |
| orpheus-tts | 5005 | orpheus |
3B expressive TTS with emotion (GPU) |
| moss-tts | 8885 | moss |
1.7B TTS with voice cloning (GPU) |
| qwen3-tts | 8890 | qwen3 |
1.7B text-instructable voice (GPU) |
| pocket-tts | 7870 | pocket-tts |
100M CPU TTS fallback |
| inner-llm | 8301 | inner-life |
Qwen3-4B local CPU for inner life |
| local-llm | 8300 | local-llm |
8B GPU LLM for local chat |
| perception | 8950 | perception |
Qwen2.5-VL-3B vision (GPU) |
Only one GPU TTS engine runs at a time — start.sh manages switching.
| Service | Where | Purpose |
|---|---|---|
| OpenRouter | Cloud | LLM inference (Qwen3-235B, DeepSeek, etc.) |
| Ollama | VPS (optional) | Inner life LLM, briefs, council members |
| Guardian | VPS (optional) | Failover chat, PII protection, heartbeat |
| Watchman | VPS (optional) | Background activity monitoring |
Memory is the core of the system. Here's how it flows:
- Core Memory Blocks — Persistent identity/relationship data in
core_memories.yaml. Always in context. The companion reads and writes these. - Archival Memories — Individual memory entries in SQLite with 768-dim embeddings. Retrieved by semantic similarity. Importance-weighted, temporally decayed.
- Session Summaries — Diary-style summaries written by the companion after each conversation. Injected into context for continuity.
- Conversation History — Raw messages stored in SQLite. Token-budgeted for context window.
User message → LLM response → SSE stream closes
│
▼
Background extraction
(async, non-blocking)
│
▼
Extraction model analyzes
the conversation turn
│
┌───────┴───────┐
▼ ▼
Core block Archival memory
updates with embedding
│ │
▼ ▼
core_memories SQLite + YAML
.yaml journal entry
User sends message
│
▼
Query embedding generated
│
▼
sqlite-vec KNN search (top N by cosine similarity)
│
▼
Importance weighting applied
│
▼
Temporal decay applied (older = lower score, unless reinforced)
│
▼
Retrieval boost (recently accessed memories score higher)
│
▼
Deduplicated results injected into context
When new memories are extracted, they're compared against existing memories using cosine similarity:
- > 0.85 similarity: Merge — keep the richer version, archive the duplicate with audit trail
- < 0.85: Store as new memory
The context window is built in layers, each with a token budget:
┌─────────────────────────────┐
│ System prompt │ (user-configurable)
├─────────────────────────────┤
│ Current timestamp │ (always fresh)
├─────────────────────────────┤
│ Core memory blocks │ (always included)
├─────────────────────────────┤
│ Session summaries │ (last 3-5, diary format)
├─────────────────────────────┤
│ Agent awareness │ (what the Gardener has been doing)
├─────────────────────────────┤
│ Retrieved archival memories │ (semantic search results)
├─────────────────────────────┤
│ Available tools prompt │ (dynamically generated)
├─────────────────────────────┤
│ Conversation history │ (token-budgeted, most recent)
├─────────────────────────────┤
│ User message │
└─────────────────────────────┘
The Gardener is a background process that gives the companion autonomous thought:
- Scheduling: Runs on a configurable interval (default: 15 minutes). Yields to foreground when chat is active.
- Activity selection: Chooses from six types based on recent context — reflection, creativity, exploration, processing, dreaming, growth.
- LLM call: Uses VPS Ollama (or local fallback) with activity-specific prompts.
- Memory extraction: Results are stored as archival memories with embeddings.
- Agent awareness: Chat companion sees recent Gardener activities in context.
Before the Gardener's output is finalized, it passes through a sovereignty gate:
- The companion reviews its own response
- It can choose to revise or suppress the output
- This is a conscience, not an external filter
Tools use a model-agnostic XML format:
<tool_call>
<name>web_search</name>
<arguments>{"query": "latest research on test-time training"}</arguments>
</tool_call>- LLM generates response (may contain
<tool_call>blocks) - Backend parses tool calls from response
- Each tool is validated and executed
- Results are formatted as
<tool_result>XML - Results are injected into conversation
- LLM continues with tool results in context
- Loop repeats until no more tool calls (max iterations: 30)
| Tool | Category | What It Does |
|---|---|---|
web_search |
Information | DuckDuckGo search via SearXNG |
search_memories |
Memory | Semantic search of archival memories |
save_memory |
Memory | Save new archival memory with embedding |
read_core_block |
Memory | Read a core memory block |
update_core_block |
Memory | Update/create a core memory block |
list_files |
Workspace | List files in workspace directory |
read_file |
Workspace | Read file contents |
write_file |
Workspace | Write or append to files |
run_code |
Workspace | Execute Python with timeout |
publish_post |
CMS | Publish blog post to Directus |
update_post |
CMS | Update existing post |
list_drafts |
CMS | List draft posts |
generate_image |
Creative | Generate image via OpenRouter |
read_design |
CMS | Read website design settings |
update_design |
CMS | Update website design |
set_navigation |
CMS | Set site navigation menu |
inject_css |
CMS | Inject custom CSS (validated) |
inject_js |
CMS | Inject custom JS (validated) |
create_page |
CMS | Create static page |
update_page |
CMS | Update existing page |
Multi-model debate via OpenRouter WebSocket:
- Chairman (user) sets the topic and can interject between rounds
- Members (4 AI models) take turns responding, each with a distinct role
- Rounds continue (default: 10) with chairman pauses between each
- Per-member memory extraction captures each model's insights
- File upload allows sharing documents for discussion (up to 200KB)
Members maintain their own archival memories via the member_id field in the memory system.
SQLite with WAL mode for concurrent access. Key tables:
- conversations — Conversation sessions with metadata
- messages — Individual messages (user/assistant/system) with timestamps
- core_memory_blocks — Persistent identity blocks (label, value, member_id)
- archival_memories — Long-term memories with embeddings (content, importance, embedding, member_id, access_count, last_accessed)
- session_summaries — Diary-style conversation summaries
- agent_directives — Self-set goals with completion tracking
- mud_rooms — Discovered MUD rooms with coordinates and notes
- mud_notes — Agent scratchpad entries
- audit_log — API request audit trail (Phase 47)
Vector search uses sqlite-vec — a native SQLite extension for KNN search on 768-dimensional float32 vectors.
React 18 + TypeScript + Vite + Tailwind CSS.
- Chat — Primary conversation interface with SSE streaming, tool activity indicators, voice I/O
- MUD — Split-terminal MUD client with ANSI color rendering and AI agent control
- Cottage — WebSocket-connected workspace for companion's personal files
- Council — Multi-model debate interface with per-member displays
- SSE streaming for chat responses (not WebSocket — allows HTTP/2 multiplexing)
- WebSocket for real-time bidirectional channels (MUD, Cottage, Council)
- JWT authentication on all endpoints
- Custom hooks for each WebSocket connection (
useChat,useMudSocket,useCouncilSocket,useCottageSocket)
- JWT authentication on all HTTP and WebSocket endpoints
- Service token for background services (Watchman)
- CORS restricted to configured origins
- Rate limiting (API-layer)
- Input sanitization and injection detection
- The Shield (Guardian): PII scanning, prompt injection detection, quarantine
- All secrets via environment variables (
.env) - No shell=True in subprocess calls
- Path traversal protection on all file operations
- Extension blocking for executable files