AgentOps for Claude Code. Real-time monitoring and behavioral intervention for AI agents + post-session analytics for developers.
NOT a memory MCP replacement. NOT infrastructure observability. NOT LLM API monitoring. Operations for the AI agent itself during development. Cost is inferred from Claude Code's local usage data, not from API-side tracing.
| Memory Tools | LLM Observability | claudewatch (AgentOps) | |
|---|---|---|---|
| Category | Storage | API monitoring | Agent operations |
| When | After session | After API call | During session + after |
| For | AI (read past) | Humans (API dashboards) | AI (live feedback) + Humans (ops dashboards) |
| Monitors | Conversations | API costs/latency | Agent behavior + workflow friction |
| Examples | claude-memory-mcp |
LangSmith, Langfuse | PostToolUse interventions, drift alerts, agent performance, CLAUDE.md effectiveness |
Like DevOps is operations for software delivery and MLOps is operations for ML models, AgentOps is operations for AI agents:
- Monitor agent behavior - Error rates, drift patterns, context pressure, cost velocity
- Intervene automatically - Block retry loops, surface known blockers, detect stuck states
- Provide analytics - Friction trends, cost per commit, agent success rates, exportable metrics
- Enable self-awareness - Agent queries its own performance mid-session via MCP tools
claudewatch brings AgentOps to the development experience—monitoring Claude Code sessions during your workflow, not production API calls.
Three concrete examples:
-
Error loops - Memory tools store that you hit an error. Observability tools log the retry count. claudewatch fires a PostToolUse hook on the third consecutive error and tells Claude "you're looping, call get_blockers() to check for known solutions" - during the session where it can act.
-
Drift detection - Memory tools archive the 15 files you read. Observability tools chart read/write ratios. claudewatch detects 8 consecutive reads with zero writes and alerts Claude "you're exploring without implementing, stuck or avoiding?" - before 20 more reads burn your context budget.
-
Agent performance - Memory tools store transcripts containing agent launches. Observability tools count API calls. claudewatch parses agent lifecycles from transcripts, computes success rates by type, and exposes
get_agent_performance()so Claude queries "plan agents get killed 40% of the time on this project" and skips plan mode - before spawning an agent that will fail.
The differentiation: Other tools give humans dashboards. claudewatch gives Claude queryable access to its own performance inside the session where decisions are being made.
![Demo GIF placeholder - shows PostToolUse hook firing on error loop, Claude calling get_blockers(), finding documented solution, applying fix instead of rediscovering it]
Demo coming soon: real-time intervention cycle from error detection to blocker lookup to solution application
claudewatch reads local session data from ~/.claude/ and turns it into actionable insights through three layers:
Automatic intervention - SessionStart briefing on project health + PostToolUse alerts on error loops, context pressure, cost spikes, drift. Claude doesn't need to remember to check - the system tells it when something is wrong.
Self-reflection API - 29 MCP tools let Claude query its own metrics mid-session: get_project_health, get_drift_signal, get_task_history, get_blockers, get_agent_performance. No other tool gives an AI agent this kind of introspective access.
Cross-session learning - Task history, blockers, and solutions tracked automatically. Claude queries "did we try this before?" and gets "yes, JWT approach hit rate limits, pivoted to sessions" - without you having to remember or explain.
All local. Reads ~/.claude/ files on disk. No network calls. No telemetry.
# Get a baseline on all your projects
claudewatch scan
# Find what's costing you time
claudewatch gaps
# See top 3 improvements ranked by impact
claudewatch suggest --limit 3Enable Claude's self-monitoring (one-time setup):
# Install behavioral rules + MCP server config
claudewatch install
# Restart Claude Code to load the MCP serverThat's it. Claude now has real-time awareness of its own behavior.
|
PostToolUse hooks detect problems during execution
|
29 MCP tools for mid-session queries
|
|
Cross-session history and blocker tracking
|
Measure what's slowing you down
|
|
Multi-agent workflow analytics
|
Parallel search across all context sources
|
- Installation - Homebrew, direct download, from source
- First Session - Complete walkthrough from scan to fix
- Configuration - Hooks, MCP setup, behavioral rules
- Reduce Friction - From 45% friction rate to 28%
- Improve Agent Success - Track and optimize agent workflows
- Optimize CLAUDE.md - Data-driven improvements with effectiveness scoring
- Hooks - SessionStart briefings + PostToolUse alerts
- MCP Tools - All 29 tools with usage patterns
- CLI Commands - scan, metrics, gaps, suggest, fix, track, watch, export
- Memory System - Task history, blockers, cross-session persistence
- Context Search - Unified search across all context sources
- Metrics & Analytics - Friction, cost-per-outcome, effectiveness scoring
- Agent Analytics - Success rates, timing, parallelization
- Architecture - How the three layers work together
- Hooks Implementation - Rate limiting, chronic pattern detection
- MCP Integration - Server setup, tool design, data freshness
- Data Model - Session parsing, friction scoring, attribution
- vs Memory Tools - Archive vs live feedback
- vs Observability Platforms - Dashboards vs agent introspection
- vs Built-in Claude Features - What Claude Code provides vs what's missing
- Contributing - How to contribute code, docs, bug reports
- Roadmap - Planned features and improvements
- Changelog - Version history and release notes
# 1. Baseline - where are you now?
claudewatch scan
# → Project "shelfctl" scores 42/100, friction rate 45%
# 2. Diagnose - what's causing friction?
claudewatch gaps
# → Missing: testing section in CLAUDE.md
# → Stale pattern: "go vet" errors in 55% of sessions
# 3. Fix - apply data-driven patches
claudewatch fix shelfctl --dry-run
claudewatch fix shelfctl
# → Added testing section
# → Added pre-edit lint hook
# 4. Measure - did it work?
claudewatch track
# ... work for a week ...
claudewatch track --compare
# → Friction rate: 45% → 28% (-17%)
# → Tool errors/session: 4.2 → 1.1 (-74%)Homebrew (macOS/Linux):
brew install blackwell-systems/tap/claudewatchDirect download:
# Download from https://github.com/blackwell-systems/claudewatch/releases/latest
tar -xzf claudewatch_*_$(uname -s)_$(uname -m).tar.gz
sudo mv claudewatch /usr/local/bin/From source (requires Go 1.26+):
go install github.com/blackwell-systems/claudewatch/cmd/claudewatch@latestSee Installation Guide for detailed instructions and troubleshooting.
Zero network calls. Reads only local files under ~/.claude/. Writes only to a local SQLite database for snapshot storage. No telemetry, no analytics, no crash reporting. Nothing leaves your machine.
commitmux - Semantic commit search across repositories. Find "when did we add authentication?" without remembering branch names or grep patterns.
scout-and-wave - Protocol for safely parallelizing human-guided agentic workflows. Orchestrator + Scout + Wave agents with explicit handoff contracts.
Dual-licensed under MIT and Apache 2.0.
Questions? Issues? Contributions? Open an issue or PR. We respond to everything.