⚠️ Research Prototype - This project is under active development and not production-ready. APIs may change without notice.
Synapto is a research prototype exploring whether reinforcement learning-based memory decisions can outperform heuristic approaches for AI agents. It provides learned policies for what, where, and when to store and retrieve memories.
| Component | Status |
|---|---|
| Memory Stores (Redis, PostgreSQL, pgvector) | ✅ Implemented |
| Dueling DQN with Prioritized Replay | ✅ Implemented |
| MCP Server Integration | ✅ Implemented |
| Online RL Training | 🚧 In Progress |
| Benchmark Framework | 🚧 In Progress |
| GNN Path Optimizer | ❌ Not Started |
| Decision Graph Hot Paths | ❌ Not Started |
| Multi-tenant Support | ❌ Not Started |
- RL Decision Controller: Dueling DQN with prioritized experience replay for memory routing decisions
- 3-Tier Memory Architecture:
- Working Memory (Redis): Sub-millisecond access, session-scoped, TTL-based expiration
- Episodic Memory (PostgreSQL): Timestamped events, timeline queries
- Semantic Memory (pgvector): Vector similarity search, entity tagging
- MCP Server: Integration with Claude Code via Model Context Protocol
- Configurable Embeddings: Local (sentence-transformers) or OpenAI API
- Benchmark Framework: Compare RL vs heuristic baselines
- Python 3.11+
- Docker and Docker Compose
- (Optional) OpenAI API key for OpenAI embeddings
# Clone the repository
cd /Users/arjun/Personal/synapto
# Start infrastructure
docker-compose up -d redis postgres
# Install dependencies
pip install -e ".[dev]"# Interactive mode
synapto interactive
# Store a memory
synapto store "I prefer Python for data science" --importance 0.8 --tags "preference,python"
# Retrieve memories
synapto retrieve "programming preferences" --k 5
# View stats
synapto statsAdd to ~/.claude/claude_code_config.json:
{
"mcpServers": {
"synapto-memory": {
"command": "python",
"args": ["-m", "synapto.mcp.server"],
"cwd": "/Users/arjun/Personal/synapto",
"env": {
"SYNAPTO_REDIS_URL": "redis://localhost:6379",
"SYNAPTO_DATABASE_URL": "postgresql://synapto:synapto_dev@localhost:5432/synapto",
"SYNAPTO_EMBEDDING_PROVIDER": "local"
}
}
}
}Then in Claude Code:
Use synapto_store to remember that I prefer vim keybindings
What are my editor preferences?
┌────────────────────────────────────────────────────────────────┐
│ SYNAPTO MVP │
├────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ MCP SERVER (FastMCP) │ │
│ │ Tools: synapto_store, synapto_retrieve, synapto_feedback, │ │
│ │ synapto_context, synapto_stats │ │
│ └─────────────────────────┬────────────────────────────────┘ │
│ │ │
│ ┌─────────────────────────▼────────────────────────────────┐ │
│ │ RL DECISION CONTROLLER │ │
│ │ • Dueling DQN with Double DQN updates │ │
│ │ • Prioritized Experience Replay │ │
│ │ • 14 discrete actions (store/retrieve/maintenance) │ │
│ │ • Multi-objective reward function │ │
│ └─────────────────────────┬────────────────────────────────┘ │
│ │ │
│ ┌─────────────────────────▼────────────────────────────────┐ │
│ │ MEMORY STORES │ │
│ │ ┌──────────┐ ┌───────────┐ ┌──────────────┐ │ │
│ │ │ WORKING │ │ EPISODIC │ │ SEMANTIC │ │ │
│ │ │ (Redis) │ │ (Postgres)│ │ (pgvector) │ │ │
│ │ │ <1ms │ │ ~10ms │ │ ~20ms │ │ │
│ │ └──────────┘ └───────────┘ └──────────────┘ │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ EMBEDDING SERVICE │ │
│ │ Local: sentence-transformers (bge-base-en-v1.5, 768d) │ │
│ │ API: OpenAI text-embedding-3-small (1536d) │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────┘
The agent selects from 14 discrete actions:
| Category | Actions |
|---|---|
| Store | STORE_WORKING, STORE_EPISODIC, STORE_SEMANTIC, STORE_SKIP |
| Retrieve | RETRIEVE_WORKING, RETRIEVE_EPISODIC, RETRIEVE_SEMANTIC, RETRIEVE_ALL |
| Maintenance | CONSOLIDATE, PROMOTE, DEMOTE, FORGET |
| Meta | PRELOAD, REINDEX |
Multi-objective reward with tunable weights:
R = 0.6 × task_success + 0.2 × precision + 0.1 × latency_bonus + 0.1 × efficiency
Environment variables:
| Variable | Default | Description |
|---|---|---|
SYNAPTO_REDIS_URL |
redis://localhost:6379 |
Redis connection URL |
SYNAPTO_DATABASE_URL |
postgresql://synapto:synapto_dev@localhost:5432/synapto |
PostgreSQL URL |
SYNAPTO_EMBEDDING_PROVIDER |
local |
local or openai |
OPENAI_API_KEY |
- | Required if using OpenAI embeddings |
# Start test infrastructure
docker-compose up -d redis postgres
# Run tests
pytest tests/ -v
# With coverage
pytest tests/ --cov=synapto --cov-report=htmlCompare RL policy against heuristic baselines:
# Run benchmark
python benchmarks/run_benchmark.py \
--policies rl,random,recency,semantic \
--episodes 100 \
--scenario coding \
--output benchmarks/results/
# Generate synthetic training data
python scripts/generate_data.py --output data/synthetic_scenarios.json
# Pre-train on synthetic data
python scripts/train_offline.py --data data/synthetic_scenarios.json --output models/pretrained.ptsynapto/
├── synapto/
│ ├── config.py # Configuration management
│ ├── engine.py # SynaptoEngine orchestrator
│ ├── cli.py # Command-line interface
│ ├── mcp/ # MCP server
│ │ ├── server.py
│ │ └── tools.py
│ ├── rl/ # RL components
│ │ ├── agent.py # Dueling DQN
│ │ ├── state.py # State representation
│ │ ├── actions.py # Action definitions
│ │ ├── rewards.py # Reward function
│ │ ├── replay_buffer.py
│ │ └── trainer.py
│ └── memory/ # Memory stores
│ ├── base.py
│ ├── working.py # Redis
│ ├── episodic.py # PostgreSQL
│ ├── semantic.py # pgvector
│ └── embeddings.py
├── tests/
├── benchmarks/
├── scripts/
├── docker-compose.yml
└── pyproject.toml
| Issue | Description | Mitigation |
|---|---|---|
| Cold Start Problem | RL agent starts with random policy, poor initial performance | Pre-trained model included, but may not generalize to all use cases |
| Training Instability | DQN training can diverge with small sample sizes | Fallback to heuristic policies when RL confidence is low |
| Reward Design | Current reward function is hand-tuned, may not capture all objectives | Configurable weights, but optimal values are task-dependent |
| Exploration vs Exploitation | Agent may over-explore or under-explore | Epsilon decay schedule needs tuning per deployment |
- Single-node only: No distributed deployment support yet
- No authentication/authorization: Memory is not user-isolated
- No encryption at rest: Sensitive data stored in plaintext
- Limited temporal reasoning: Episodic memory queries are basic compared to Graphiti/Zep
- No graph traversal: Semantic memory uses vector similarity only, missing multi-hop reasoning
- No hot path caching: Every query hits the database (target <10ms not achieved)
| Metric | Target | Current Status |
|---|---|---|
| Working memory retrieval | <5ms (p95) | ~2-5ms ✅ |
| Semantic retrieval | <50ms (p95) | ~30-80ms |
| RL vs random improvement | >20% | Not validated ❌ |
| Memory capacity | 10k+ | Not stress-tested ❌ |
- Embedding model loading: First request is slow (~5-10s) while loading sentence-transformers
- PostgreSQL connection pool: May exhaust connections under high load
- Redis TTL race conditions: Memories may expire during active use
- No graceful degradation: If Redis/PostgreSQL is down, entire system fails
From the research design, these features are not yet implemented:
- Decision Graph Engine: GNN-based path optimization for <10ms retrieval
- Hot Path Cache: LRU cache for frequent query patterns
- Proactive Memory Loading: Predictive prefetching based on context
- Memory Consolidation: Automatic merging of related memories
- Bi-temporal Queries: Tracking both system time and real-world valid time
- Multi-tool Integration: Currently only Claude Code via MCP
Success criteria (not yet validated):
- RL policy outperforms random by >20%
- RL policy matches or beats best heuristic
- Working memory retrieval < 5ms (p95)
- Semantic retrieval < 50ms (p95)
- Works with 10k+ memories
MIT
This is an early-stage research prototype. Contributions welcome for:
- Additional heuristic baselines for comparison
- Benchmark scenarios (coding, research, multi-session)
- RL algorithm improvements (PPO, SAC alternatives)
- Memory store optimizations
- Decision Graph / GNN path optimizer implementation
- Bug fixes and documentation
Note: The codebase is evolving rapidly. Please open an issue before starting major work to avoid conflicts.