MANDATORY — Read before running ANY baseline or evaluation tests.
- Never test against real user data
- Always use anonymized, synthetic test cases
- Isolate test environment from production
┌─────────────────────────────────────────────────────────┐
│ Main Clawdio (Orchestrator) │
│ - Spawns test sessions │
│ - Sends test queries │
│ - Scores responses │
│ - Records results │
└─────────────────────┬───────────────────────────────────┘
│ sessions_spawn(agentId: "test-baseline")
▼
┌─────────────────────────────────────────────────────────┐
│ Test Agent: "test-baseline" │
│ - Isolated workspace: ~/clawd/test-baseline-workspace/ │
│ - MEMORY.md = MEMORY_PERSON_A.md (anonymized) │
│ - memory/*.md = synthetic test data │
│ - No access to real user data │
└─────────────────────────────────────────────────────────┘
- Fictional personas (Person A, Person B, Person C)
- Synthetic preferences, facts, decisions
- Randomized generated content
- Anonymized project names (e.g., "Project Alpha" not "Wally")
- Real MEMORY.md from production workspace
- Real user names, preferences, or personal info
- Real project details, credentials, or infrastructure
- Any data that could identify actual users
~/clawd/eval/memory-test/corpus/
├── MEMORY_PERSON_A.md # Primary test persona (~100 facts)
├── MEMORY_PERSON_B.md # Distractor persona
├── MEMORY_PERSON_C.md # Distractor persona
├── MEMORY_ENGINEERING.md # Technical depth corpus
├── MEMORY_CODEBASE.md # Code-related corpus
└── PROJECT_NOTES.md # Synthetic project info
The test agent workspace must contain ONLY:
~/clawd/test-baseline-workspace/
├── MEMORY.md # Copy of MEMORY_PERSON_A.md
├── memory/
│ └── *.md # Synthetic daily logs if needed
├── AGENTS.md # Minimal agent instructions
├── SOUL.md # Test agent persona
└── USER.md # Fictional user profile
Before ANY test run:
- Test agent configured in gateway config
- Test workspace contains ONLY anonymized data
- No symlinks or references to real data
sessions_spawn({
agentId: "test-baseline",
task: "...",
label: "baseline-test-run-YYYY-MM-DD"
})Send test queries via sessions_send to the spawned session.
Compare responses against ground truth from test corpus.
Store in tests/baseline/results/ with:
- Timestamp
- Test corpus version
- Query/response pairs
- Scores
- Results go in
tests/baseline/results/(gitignored) - Never include real user data in result files
- Use anonymized IDs (test-001, query-042, etc.)
- Include reproducibility info (seed, corpus hash, timestamp)
Copy this checklist before each test run:
## Pre-Test Verification
- [ ] Using `test-baseline` agent (not main Clawdio)
- [ ] Test workspace has ONLY synthetic data
- [ ] MEMORY.md is MEMORY_PERSON_A.md (not real)
- [ ] No real user data in test queries
- [ ] Results directory is gitignored
- [ ] Test seed documented: _______________- Privacy: Real preferences shouldn't appear in test logs or results
- Reproducibility: Anonymized data can be shared and re-run
- Isolation: Tests shouldn't corrupt production memory
- Clean baseline: Synthetic data is controlled; real data has noise
If you realize you've tested against real data:
- STOP immediately
- Delete result files
- Document the violation in memory/YYYY-MM-DD.md
- Re-run with proper test environment
Created: 2026-01-31 Location: ~/clawd/projects/tribal-memory/TESTING-RULES.md