Continuous progress tracker for the AgentOps Replay project.
Mission: Become the system of record for AI agent behavior.
| Metric | Value |
|---|---|
| Phase | Phase 7 — Replay System |
| Spec Version | v0.6 |
| Status | Completed |
| Last Updated | January 29, 2026 |
Date: January 22, 2026
Established the immutable core — "The Moat"
Artifacts Created:
- CONSTITUTION.md — Non-negotiable laws (Immutable Logs, Verifiable Evidence)
- EVENT_LOG_SPEC.md (v0.5) — Technical implementation with hash-chaining
- SCHEMA.md — Event type definitions with field-level documentation
- agentops_events.schema.json — JSON Schema for validation
Key Decisions:
- Rejected "move fast and break things" for cryptographic auditability
- Built verifier BEFORE SDK to grade own homework
- RFC 8785 (JCS) canonicalization with UTF-16BE sorting
Date: January 22, 2026
Artifacts Created:
agentops_sdk/— Python SDK (untrusted producer)sdk/python/— Production Python SDKverifier/agentops_verify.py— Reference verifier (zero-dependency)verifier/jcs.py— RFC 8785 canonicalizationverifier/test_vectors/— Canonical valid/invalid test logs
SDK Features:
- Local authority mode (testing only)
- Ring buffer with
LOG_DROPmeta-events - Strict types via Pydantic/dataclasses
- Vendored dependencies (standalone)
Technical Fixes:
- RFC 8785 compliance (UTF-16BE code unit sorting)
- Added
content_hash,args_hash,result_hashfor redaction - Fixed Server Mode
prev_hashtracking - Buffer safety for
LOG_DROPcounters
Date: January 23, 2026
Addressed three existential risks identified in feedback.
Artifacts Created:
- CHAIN_AUTHORITY_INVARIANTS.md (v1.0) — Evidence classification rules
- FAILURE_MODES.md (v1.0) — Component failure documentation
- EVENT_LOG_SPEC.md (v0.5 → v0.6) — Hardened specification
Key Changes:
| Aspect | Before (v0.5) | After (v0.6) |
|---|---|---|
| Evidence Classification | None | 3-state (AUTHORITATIVE, PARTIAL, NON) |
| Authority Separation | Semantic label | Cryptographic (CHAIN_SEAL) |
prev_hash Semantics |
"hint" (ambiguous) | "MUST recompute" (strict) |
| LOG_DROP Spec | Basic mention | Forensic specification |
| Failure Modes | Implicit | Explicit tables per component |
| Language Precision | "allows", "exception" | RFC 2119 (MUST, SHALL, MAY) |
Three-State Evidence Classification:
AUTHORITATIVE_EVIDENCE— Server authority, sealed, complete (compliance-grade)PARTIAL_AUTHORITATIVE_EVIDENCE— Server authority, unsealed/incomplete (incident analysis)NON_AUTHORITATIVE_EVIDENCE— SDK/local authority (testing only)
Verifier Updates:
- Evidence classification in output
--reject-local-authoritypolicy flag- CHAIN_SEAL metadata validation
- LOG_DROP tracking with
total_drops
Date: January 24, 2026
Artifacts Created:
sdk/python/agentops_replay/integrations/langchain/— LangChain integration packagesdk/python/agentops_replay/integrations/langchain/callback.py— Callback handlersdk/python/agentops_replay/integrations/langchain/version.py— Version compatibilityexamples/langchain_demo/— Demo agent with verification workflowexamples/langchain_demo/INCIDENT_INVESTIGATION.md— PII incident simulation
Integration Features:
AgentOpsCallbackHandlerextends LangChain'sBaseCallbackHandler- Captures: LLM calls, tool invocations, agent actions, errors
- Version pinning and compatibility warnings
- PII redaction with hash preservation
- Safe serialization of complex objects
Demo Agent:
- Customer support agent with tools:
lookup_order,issue_refund,send_email - Mock mode for testing without API keys
- Full verification workflow documented
Validation Results:
Session: 88f970ff-22d6-47fd-850f-01d4aed5140f
Status: PASS
Evidence Class: NON_AUTHORITATIVE_EVIDENCE
Sealed: True
Complete: True
Fingerprint: 4272bdc7...
Date: January 29, 2026
Artifacts Created:
backend/app/compliance/json_export.py— RFC 8785 canonical JSON export (locked to verifier's JCS)backend/app/compliance/pdf_export.py— Human-readable PDF from verified JSONbackend/app/compliance/gdpr.py— PII detection (WARNING) + redaction validation (ERROR)
Key Changes:
- JSON export locked to verifier's JCS implementation
- Strict ISO 8601 formatting (YYYY-MM-DDTHH:MM:SS.sssZ)
- Explicit
evidence_classfield in export header - PDF consumes verified JSON, not raw DB
- GDPR severity levels (ERROR/WARNING)
Date: January 29, 2026
Artifacts Created:
backend/app/services/ingestion/hasher.py— Server-side hash recomputationbackend/app/services/ingestion/sealer.py— Chain sealing with authority invariants
Key Changes:
- Server-side hash recomputation (never trust SDK)
- Rejection invariants: non-monotonic, gaps, duplicates
- CHAIN_SEAL emission logic
- No re-sealing invariant
- PARTIAL_AUTHORITATIVE for incomplete chains
Tests:
backend/tests/compliance/test_jcs_canonicalization.py— Adversarial whitespace test
Date: January 30, 2026
Artifacts Created:
backend/app/services/ingestion/service.py— Ingestion orchestrator (atomic transactions, locking)backend/app/api/v1/endpoints/ingestion.py— Batch ingestion endpoint (POST /v1/ingest/batch)backend/app/schemas/ingestion.py— Strict Pydantic schemas (RawEventCreate, IngestBatchRequest)backend/tests/ingestion/test_ingestion_service.py— Adversarial test suite (8 scenarios)
Key Achievements:
- Server Authority: Implemented
IngestionServicestampingchain_authority=SERVER. - Fail-Loudly: Mapped state conflicts to HTTP 409 and bad requests to HTTP 400.
- Atomic Writes: Single transaction block for batch persistence.
- Seal Gate: Enforced invariant:
seal=trueREQUIRESSESSION_END. - Adversarial Testing: Verified rejection of gaps, duplicates, and sealed session tampering.
Status:
- Core Ingestion Service: Verified
- Queue Worker: Deferred (to Policy Phase)
Date: January 29, 2026
Artifacts Created:
backend/app/replay/— Core replay packagebackend/app/replay/frames.py— Frame types with single-origin invariantbackend/app/replay/warnings.py— Warning system with stable codesbackend/app/replay/engine.py— Verified-first replay enginebackend/app/schemas/replay_v2.py— Pydantic response models
Key Changes:
- Verified-first: Replay only serves verified chains
- Frame-based: EVENT, GAP, LOG_DROP, REDACTION types
- VerificationStatus as enum (not string)
- Single-origin frame invariant enforced
- No-bypass constraint on frame endpoint
- Explicit gap marking (no smoothing)
- Anti-inference: No synthetic events
Tests:
backend/tests/replay/test_replay_engine.py— All 5 core tests passing
Date: February 6-7, 2026
Artifacts Created:
agentops_verify/test_cli.py— CLI coverage testsagentops_verify/test_verifier_coverage.py— Targeted coverage gap testsagentops_verify/test_verifier_session_id_mismatch.py— Session ID consistency test.agent/rules/strictlooping.md— Strict first-principles debugging rulesverifier/test_vectors_bak/— Backup test vectors for regression testing
Key Achievements:
- CLI Verification: Added tests for CLI argument parsing and output formatting.
- Coverage Gaps: Targeted uncovered branches in verifier logic (JCS exceptions, fallback paths).
- Session ID Consistency: Explicit test for session ID mismatch detection.
- Strict Looping: Enforced first-principles debugging protocol.
Date: February 7, 2026
Artifacts Created:
docker-compose.yml— Reference deployment (postgres, api, verifier services)backend/Dockerfile— Multi-stage production build for FastAPIDEPLOYMENT.md— Deployment documentation and production checklistINCIDENT_RESPONSE.md— Incident response playbook (4 failure scenarios)tests/resilience/test_network_partition.py— SDK buffer overflow and LOG_DROP tests
Key Achievements:
- Reference Deployment: Docker Compose with postgres, API, and verifier services
- Dockerfile: Multi-stage build with health checks
- Incident Playbooks: Documented Hash Mismatch, Sequence Gap, PII Exposure, Missing Seal
- Resilience Tests: SDK buffer overflow simulation and LOG_DROP verification
Success Criteria Status:
- ✅
docker-compose.ymldefines complete deployment topology - ✅
INCIDENT_RESPONSE.mdcovers 4 critical failure modes - ✅
test_network_partition.pyvalidates SDK buffer behavior - ✅ Deployment verified: API is healthy (
curl localhost:8000/health-> OK)
Agent SDK (Untrusted)
|
| (batched events)
v
Ingestion Service (Authoritative) ---> Queue
| |
v v
Append-only Event Store Policy Engine
| |
v v
Replay API Violation Store
|
v
Compliance Export (JSON/PDF)
Trust Boundaries:
- SDK → Ingestion: UNTRUSTED
- Ingestion → Store: AUTHORITATIVE
- Store → Verifier: VERIFIED
- Store → Replay: AUTHORITATIVE
| Document | Purpose |
|---|---|
| agentops_prd_v2.md | Product Requirements (authoritative) |
| goal.md | Win-or-Die Execution Plan |
| CONSTITUTION.md | Inviolable system rules |
| EVENT_LOG_SPEC.md | Technical specification (v0.6) |
| CHAIN_AUTHORITY_INVARIANTS.md | Evidence classification |
| FAILURE_MODES.md | Component failure tables |
| SCHEMA.md | Event type definitions |
From PRD:
- ✅ Verifier passes 100% of adversarial tests
- System survives simulated network partition
- Compliance export accepted by legal team
- Security audit complete (internal)
- Incident response playbook validated
- Reference deployment on production agent (internal)
Blockers: Any verifier bug, any silent data loss, any chain repair.
- Constitution-First: All features start with constitutional review
- Verifier-Driven: Implementation follows verifier specification
- Fail Loudly: Never silently lose evidence
- Evidence > Interpretation: Record facts, not narratives
- Overbuilding
- Vague language
- Logging "thoughts" (Legal blocker)
- Trying to impress Twitter instead of security teams
Built for production. Designed for trust.