Skip to content

feat(F035): Observability — event bus stats, causal tracing, drift detection, context visibility#250

Merged
tfatykhov merged 7 commits intomainfrom
feat/f035-observability
Apr 4, 2026
Merged

feat(F035): Observability — event bus stats, causal tracing, drift detection, context visibility#250
tfatykhov merged 7 commits intomainfrom
feat/f035-observability

Conversation

@tfatykhov
Copy link
Copy Markdown
Owner

Summary

Implements F035 Observability with all 4 sub-features, giving Nous full visibility into what its autonomous systems are doing and what the LLM actually sees.

F035.1 — Event Bus Observability

  • EventBusStats with in-memory counters, per-handler success/error/timing, ring buffer of recent events
  • Wired into EventBus._dispatch and _safe_handle (zero-allocation hot path)
  • get_stats() on SessionTimeoutMonitor, SleepHandler, HeartbeatRunner
  • GET /events/stats, GET /events/recent endpoints
  • format_event_bus_status() for Telegram /status

F035.2 — Causal Chain Tracing

  • event_id (12-char hex), trace_id, caused_by on Event dataclass + ORM model
  • DB migration with indexed columns
  • Root event emission: heartbeat_tick, session_ended, turn_completed, sleep_started
  • Child propagation across all handlers (episode_summarizer, fact_extractor, sleep_handler, outcome_detector, subtask_worker)
  • State-modifying events tagged with modifies in data
  • GET /events/trace/{id}, GET /events/recent-traces (CTE query), GET /events/modifications

F035.3 — Behavioral Drift Detection

  • BehaviorSnapshot (27 metrics), DriftDetector (z-score analysis, per-metric thresholds)
  • BehaviorDriftCheck as heartbeat check — captures snapshots, stores to DB, detects anomalies
  • Warning (2σ) vs Alert (3σ) severity tiers
  • GET /behavior/snapshot/latest, /behavior/trends, /behavior/anomalies, /behavior/drift-report

F035.4 — Context Visibility

  • ContextLogger with section parser, token estimation, ring buffer for full payloads
  • Hooks into _build_api_payload() — logs every API call with token breakdown by section
  • update_response() wired to capture actual vs estimated tokens after API returns
  • Memory leak prevention (_sync_entries_index)
  • DB writer for persistent context_log table
  • GET /context/log, /context/log/{id}, /context/log/{id}/payload, /context/log/{id}/sections, /context/diff

Review Process

  • 3-agent plan review (architect + DB specialist + devil's advocate): 11 P1s identified and fixed
  • Implementation review: 1 new P1 + 3 P2s identified and fixed
  • All agent_id scoping enforced on new tables
  • 76 tests passing (68 new + 8 existing EventBus)

New Files

  • nous/observability/__init__.py, context_logger.py, snapshots.py, drift.py
  • sql/migrations/026_observability.sql
  • tests/test_event_bus_observability.py, test_causal_tracing.py, test_context_logger.py, test_drift_detection.py

Modified Files (15)

  • nous/events.py — EventBusStats + Event causal fields
  • nous/storage/models.py — ORM Event columns
  • nous/brain/brain.py — emit_event passthrough
  • nous/api/rest.py — 15 new endpoints
  • nous/api/runner.py — context logger hook + update_response
  • nous/config.py — 7 new settings
  • nous/main.py — wiring for all components
  • nous/telegram_bot.py — 3 formatting functions
  • nous/heartbeat/runner.py, checks.py — stats + drift check
  • nous/handlers/* — trace propagation
  • nous/cognitive/layer.py — root event trace IDs

Test plan

  • 76 unit tests pass across 4 new test files
  • 8 existing EventBus tests pass (no regressions)
  • All review P1 fixes verified applied
  • Integration test with running Postgres + migration 026
  • Verify Telegram formatting renders correctly
  • Verify dashboard endpoints return correct data shapes

🤖 Generated with Claude Code

tfatykhov and others added 5 commits April 4, 2026 15:12
…oints, telegram

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ery API

Add event_id/trace_id/caused_by fields to Event dataclass and ORM model,
enabling causal chain reconstruction across the event bus. Root events
(turn_completed, session_ended, heartbeat_tick, sleep_started) set
trace_id=event_id; child handlers propagate trace_id and set caused_by
to parent event_id. Three new REST endpoints for trace queries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ore, endpoints

Adds structured logging of what the LLM actually sees on each API call:
- ContextLogger with section parsing, token estimation, ring buffer payload store
- 5 REST endpoints (/context/log, detail, payload, sections, diff)
- Runner integration storing context metadata per turn
- Telegram formatting for context summaries
- Config settings for enable/disable, full payload capture, retention

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…r, heartbeat check

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Review fixes:
- P1-A: Add agent_id filter to all behavior_snapshots queries
- P2-A: Persist anomalies to DB (reorder detect before store)
- P2-B: Wire update_response in runner tool loop
- Clean exports in observability/__init__.py

Docs:
- Mark all F035 sub-specs as SHIPPED
- Add implementation plan

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

tfatykhov and others added 2 commits April 4, 2026 15:54
…ces, drift, context

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tfatykhov tfatykhov merged commit e5af3e3 into main Apr 4, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant