AgentOps Replay — Development Progress

Continuous progress tracker for the AgentOps Replay project.
Mission: Become the system of record for AI agent behavior.

Current Status

Metric	Value
Phase	Phase 7 — Replay System
Spec Version	v0.6
Status	Completed
Last Updated	January 29, 2026

Completed Milestones

✅ Phase 1-2: Constitutional Foundation

Date: January 22, 2026

Established the immutable core — "The Moat"

Artifacts Created:

CONSTITUTION.md — Non-negotiable laws (Immutable Logs, Verifiable Evidence)
EVENT_LOG_SPEC.md (v0.5) — Technical implementation with hash-chaining
SCHEMA.md — Event type definitions with field-level documentation
agentops_events.schema.json — JSON Schema for validation

Key Decisions:

Rejected "move fast and break things" for cryptographic auditability
Built verifier BEFORE SDK to grade own homework
RFC 8785 (JCS) canonicalization with UTF-16BE sorting

✅ Phase 3: SDK Implementation

Date: January 22, 2026

Artifacts Created:

agentops_sdk/ — Python SDK (untrusted producer)
sdk/python/ — Production Python SDK
verifier/agentops_verify.py — Reference verifier (zero-dependency)
verifier/jcs.py — RFC 8785 canonicalization
verifier/test_vectors/ — Canonical valid/invalid test logs

SDK Features:

Local authority mode (testing only)
Ring buffer with LOG_DROP meta-events
Strict types via Pydantic/dataclasses
Vendored dependencies (standalone)

Technical Fixes:

RFC 8785 compliance (UTF-16BE code unit sorting)
Added content_hash, args_hash, result_hash for redaction
Fixed Server Mode prev_hash tracking
Buffer safety for LOG_DROP counters

✅ Phase 3.5: Constitutional Hardening

Date: January 23, 2026

Addressed three existential risks identified in feedback.

Artifacts Created:

CHAIN_AUTHORITY_INVARIANTS.md (v1.0) — Evidence classification rules
FAILURE_MODES.md (v1.0) — Component failure documentation
EVENT_LOG_SPEC.md (v0.5 → v0.6) — Hardened specification

Key Changes:

Aspect	Before (v0.5)	After (v0.6)
Evidence Classification	None	3-state (AUTHORITATIVE, PARTIAL, NON)
Authority Separation	Semantic label	Cryptographic (CHAIN_SEAL)
`prev_hash` Semantics	"hint" (ambiguous)	"MUST recompute" (strict)
LOG_DROP Spec	Basic mention	Forensic specification
Failure Modes	Implicit	Explicit tables per component
Language Precision	"allows", "exception"	RFC 2119 (MUST, SHALL, MAY)

Three-State Evidence Classification:

AUTHORITATIVE_EVIDENCE — Server authority, sealed, complete (compliance-grade)
PARTIAL_AUTHORITATIVE_EVIDENCE — Server authority, unsealed/incomplete (incident analysis)
NON_AUTHORITATIVE_EVIDENCE — SDK/local authority (testing only)

Verifier Updates:

Evidence classification in output
--reject-local-authority policy flag
CHAIN_SEAL metadata validation
LOG_DROP tracking with total_drops

✅ Phase 4: LangChain Integration

Date: January 24, 2026

Artifacts Created:

sdk/python/agentops_replay/integrations/langchain/ — LangChain integration package
sdk/python/agentops_replay/integrations/langchain/callback.py — Callback handler
sdk/python/agentops_replay/integrations/langchain/version.py — Version compatibility
examples/langchain_demo/ — Demo agent with verification workflow
examples/langchain_demo/INCIDENT_INVESTIGATION.md — PII incident simulation

Integration Features:

AgentOpsCallbackHandler extends LangChain's BaseCallbackHandler
Captures: LLM calls, tool invocations, agent actions, errors
Version pinning and compatibility warnings
PII redaction with hash preservation
Safe serialization of complex objects

Demo Agent:

Customer support agent with tools: lookup_order, issue_refund, send_email
Mock mode for testing without API keys
Full verification workflow documented

Validation Results:

Session: 88f970ff-22d6-47fd-850f-01d4aed5140f
Status: PASS
Evidence Class: NON_AUTHORITATIVE_EVIDENCE
Sealed: True
Complete: True
Fingerprint: 4272bdc7...

Upcoming Milestones

✅ Phase 5: Compliance Artifacts

Date: January 29, 2026

Artifacts Created:

backend/app/compliance/json_export.py — RFC 8785 canonical JSON export (locked to verifier's JCS)
backend/app/compliance/pdf_export.py — Human-readable PDF from verified JSON
backend/app/compliance/gdpr.py — PII detection (WARNING) + redaction validation (ERROR)

Key Changes:

JSON export locked to verifier's JCS implementation
Strict ISO 8601 formatting (YYYY-MM-DDTHH:MM:SS.sssZ)
Explicit evidence_class field in export header
PDF consumes verified JSON, not raw DB
GDPR severity levels (ERROR/WARNING)

✅ Phase 6: Ingestion Service (Core)

Date: January 29, 2026

Artifacts Created:

backend/app/services/ingestion/hasher.py — Server-side hash recomputation
backend/app/services/ingestion/sealer.py — Chain sealing with authority invariants

Key Changes:

Server-side hash recomputation (never trust SDK)
Rejection invariants: non-monotonic, gaps, duplicates
CHAIN_SEAL emission logic
No re-sealing invariant
PARTIAL_AUTHORITATIVE for incomplete chains

Tests:

backend/tests/compliance/test_jcs_canonicalization.py — Adversarial whitespace test

✅ Phase 6: Ingestion Service Implementation (Complete)

Date: January 30, 2026

Artifacts Created:

backend/app/services/ingestion/service.py — Ingestion orchestrator (atomic transactions, locking)
backend/app/api/v1/endpoints/ingestion.py — Batch ingestion endpoint (POST /v1/ingest/batch)
backend/app/schemas/ingestion.py — Strict Pydantic schemas (RawEventCreate, IngestBatchRequest)
backend/tests/ingestion/test_ingestion_service.py — Adversarial test suite (8 scenarios)

Key Achievements:

Server Authority: Implemented IngestionService stamping chain_authority=SERVER.
Fail-Loudly: Mapped state conflicts to HTTP 409 and bad requests to HTTP 400.
Atomic Writes: Single transaction block for batch persistence.
Seal Gate: Enforced invariant: seal=true REQUIRES SESSION_END.
Adversarial Testing: Verified rejection of gaps, duplicates, and sealed session tampering.

Status:

Core Ingestion Service: Verified
Queue Worker: Deferred (to Policy Phase)

✅ Phase 7: Replay System

Date: January 29, 2026

Artifacts Created:

backend/app/replay/ — Core replay package
backend/app/replay/frames.py — Frame types with single-origin invariant
backend/app/replay/warnings.py — Warning system with stable codes
backend/app/replay/engine.py — Verified-first replay engine
backend/app/schemas/replay_v2.py — Pydantic response models

Key Changes:

Verified-first: Replay only serves verified chains
Frame-based: EVENT, GAP, LOG_DROP, REDACTION types
VerificationStatus as enum (not string)
Single-origin frame invariant enforced
No-bypass constraint on frame endpoint
Explicit gap marking (no smoothing)
Anti-inference: No synthetic events

Tests:

backend/tests/replay/test_replay_engine.py — All 5 core tests passing

✅ Phase 7.5: Verifier Hardening & Testing

Date: February 6-7, 2026

Artifacts Created:

agentops_verify/test_cli.py — CLI coverage tests
agentops_verify/test_verifier_coverage.py — Targeted coverage gap tests
agentops_verify/test_verifier_session_id_mismatch.py — Session ID consistency test
.agent/rules/strictlooping.md — Strict first-principles debugging rules
verifier/test_vectors_bak/ — Backup test vectors for regression testing

Key Achievements:

CLI Verification: Added tests for CLI argument parsing and output formatting.
Coverage Gaps: Targeted uncovered branches in verifier logic (JCS exceptions, fallback paths).
Session ID Consistency: Explicit test for session ID mismatch detection.
Strict Looping: Enforced first-principles debugging protocol.

✅ Phase 8: Operational Readiness & Resilience (Completed)

Date: February 7, 2026

Artifacts Created:

docker-compose.yml — Reference deployment (postgres, api, verifier services)
backend/Dockerfile — Multi-stage production build for FastAPI
DEPLOYMENT.md — Deployment documentation and production checklist
INCIDENT_RESPONSE.md — Incident response playbook (4 failure scenarios)
tests/resilience/test_network_partition.py — SDK buffer overflow and LOG_DROP tests

Key Achievements:

Reference Deployment: Docker Compose with postgres, API, and verifier services
Dockerfile: Multi-stage build with health checks
Incident Playbooks: Documented Hash Mismatch, Sequence Gap, PII Exposure, Missing Seal
Resilience Tests: SDK buffer overflow simulation and LOG_DROP verification

Success Criteria Status:

✅ docker-compose.yml defines complete deployment topology
✅ INCIDENT_RESPONSE.md covers 4 critical failure modes
✅ test_network_partition.py validates SDK buffer behavior
✅ Deployment verified: API is healthy (curl localhost:8000/health -> OK)

Architecture Overview

Agent SDK (Untrusted)
    |
    |  (batched events)
    v
Ingestion Service (Authoritative) ---> Queue
    |                                    |
    v                                    v
Append-only Event Store           Policy Engine
    |                                    |
    v                                    v
Replay API                        Violation Store
    |
    v
Compliance Export (JSON/PDF)

Trust Boundaries:

SDK → Ingestion: UNTRUSTED
Ingestion → Store: AUTHORITATIVE
Store → Verifier: VERIFIED
Store → Replay: AUTHORITATIVE

Key Documents

Document	Purpose
agentops_prd_v2.md	Product Requirements (authoritative)
goal.md	Win-or-Die Execution Plan
CONSTITUTION.md	Inviolable system rules
EVENT_LOG_SPEC.md	Technical specification (v0.6)
CHAIN_AUTHORITY_INVARIANTS.md	Evidence classification
FAILURE_MODES.md	Component failure tables
SCHEMA.md	Event type definitions

Success Criteria (v1.0 Launch)

From PRD:

✅ Verifier passes 100% of adversarial tests
System survives simulated network partition
Compliance export accepted by legal team
Security audit complete (internal)
Incident response playbook validated
Reference deployment on production agent (internal)

Blockers: Any verifier bug, any silent data loss, any chain repair.

Notes

Principles

Constitution-First: All features start with constitutional review
Verifier-Driven: Implementation follows verifier specification
Fail Loudly: Never silently lose evidence
Evidence > Interpretation: Record facts, not narratives

What Kills This

Overbuilding
Vague language
Logging "thoughts" (Legal blocker)
Trying to impress Twitter instead of security teams

Built for production. Designed for trust.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AgentOps Replay — Development Progress

Current Status

Completed Milestones

✅ Phase 1-2: Constitutional Foundation

✅ Phase 3: SDK Implementation

✅ Phase 3.5: Constitutional Hardening

✅ Phase 4: LangChain Integration

Upcoming Milestones

✅ Phase 5: Compliance Artifacts

✅ Phase 6: Ingestion Service (Core)

✅ Phase 6: Ingestion Service Implementation (Complete)

✅ Phase 7: Replay System

✅ Phase 7.5: Verifier Hardening & Testing

✅ Phase 8: Operational Readiness & Resilience (Completed)

Architecture Overview

Key Documents

Success Criteria (v1.0 Launch)

Notes

Principles

What Kills This

FilesExpand file tree

progress.md

Latest commit

History

progress.md

File metadata and controls

AgentOps Replay — Development Progress

Current Status

Completed Milestones

✅ Phase 1-2: Constitutional Foundation

✅ Phase 3: SDK Implementation

✅ Phase 3.5: Constitutional Hardening

✅ Phase 4: LangChain Integration

Upcoming Milestones

✅ Phase 5: Compliance Artifacts

✅ Phase 6: Ingestion Service (Core)

✅ Phase 6: Ingestion Service Implementation (Complete)

✅ Phase 7: Replay System

✅ Phase 7.5: Verifier Hardening & Testing

✅ Phase 8: Operational Readiness & Resilience (Completed)

Architecture Overview

Key Documents

Success Criteria (v1.0 Launch)

Notes

Principles

What Kills This