Skip to content

blackwell-systems/claudewatch

Repository files navigation

claudewatch

Blackwell Systems™ CI Release Go Report Card

AgentOps for Claude Code. Real-time monitoring and behavioral intervention for AI agents + post-session analytics for developers.

NOT a memory MCP replacement. NOT infrastructure observability. NOT LLM API monitoring. Operations for the AI agent itself during development. Cost is inferred from Claude Code's local usage data, not from API-side tracing.

The Gap We Fill

Memory Tools LLM Observability claudewatch (AgentOps)
Category Storage API monitoring Agent operations
When After session After API call During session + after
For AI (read past) Humans (API dashboards) AI (live feedback) + Humans (ops dashboards)
Monitors Conversations API costs/latency Agent behavior + workflow friction
Examples claude-memory-mcp LangSmith, Langfuse PostToolUse interventions, drift alerts, agent performance, CLAUDE.md effectiveness

What is AgentOps?

Like DevOps is operations for software delivery and MLOps is operations for ML models, AgentOps is operations for AI agents:

  • Monitor agent behavior - Error rates, drift patterns, context pressure, cost velocity
  • Intervene automatically - Block retry loops, surface known blockers, detect stuck states
  • Provide analytics - Friction trends, cost per commit, agent success rates, exportable metrics
  • Enable self-awareness - Agent queries its own performance mid-session via MCP tools

claudewatch brings AgentOps to the development experience—monitoring Claude Code sessions during your workflow, not production API calls.


Three concrete examples:

  1. Error loops - Memory tools store that you hit an error. Observability tools log the retry count. claudewatch fires a PostToolUse hook on the third consecutive error and tells Claude "you're looping, call get_blockers() to check for known solutions" - during the session where it can act.

  2. Drift detection - Memory tools archive the 15 files you read. Observability tools chart read/write ratios. claudewatch detects 8 consecutive reads with zero writes and alerts Claude "you're exploring without implementing, stuck or avoiding?" - before 20 more reads burn your context budget.

  3. Agent performance - Memory tools store transcripts containing agent launches. Observability tools count API calls. claudewatch parses agent lifecycles from transcripts, computes success rates by type, and exposes get_agent_performance() so Claude queries "plan agents get killed 40% of the time on this project" and skips plan mode - before spawning an agent that will fail.

The differentiation: Other tools give humans dashboards. claudewatch gives Claude queryable access to its own performance inside the session where decisions are being made.


![Demo GIF placeholder - shows PostToolUse hook firing on error loop, Claude calling get_blockers(), finding documented solution, applying fix instead of rediscovering it]

Demo coming soon: real-time intervention cycle from error detection to blocker lookup to solution application


How It Works

claudewatch reads local session data from ~/.claude/ and turns it into actionable insights through three layers:

1. Push (Hooks)

Automatic intervention - SessionStart briefing on project health + PostToolUse alerts on error loops, context pressure, cost spikes, drift. Claude doesn't need to remember to check - the system tells it when something is wrong.

2. Pull (MCP Tools)

Self-reflection API - 29 MCP tools let Claude query its own metrics mid-session: get_project_health, get_drift_signal, get_task_history, get_blockers, get_agent_performance. No other tool gives an AI agent this kind of introspective access.

3. Persistent (Ops Memory)

Cross-session learning - Task history, blockers, and solutions tracked automatically. Claude queries "did we try this before?" and gets "yes, JWT approach hit rate limits, pivoted to sessions" - without you having to remember or explain.

All local. Reads ~/.claude/ files on disk. No network calls. No telemetry.

Quick Start

# Get a baseline on all your projects
claudewatch scan

# Find what's costing you time
claudewatch gaps

# See top 3 improvements ranked by impact
claudewatch suggest --limit 3

Enable Claude's self-monitoring (one-time setup):

# Install behavioral rules + MCP server config
claudewatch install

# Restart Claude Code to load the MCP server

That's it. Claude now has real-time awareness of its own behavior.

Core Capabilities

Real-Time Alerts

PostToolUse hooks detect problems during execution

  • Error loops (3+ consecutive failures)
  • Context pressure (window filling)
  • Cost velocity spikes
  • Drift (stuck reading without writing)

Hook Implementation

Self-Reflection Tools

29 MCP tools for mid-session queries

  • get_project_health - friction rate, agent success
  • get_drift_signal - exploring vs implementing
  • get_session_dashboard - all live metrics at once
  • get_cost_velocity - burn rate over last 10 min

MCP Tools Reference

Ops Memory

Cross-session history and blocker tracking

  • Query previous attempts by description
  • Retrieve known blockers with solutions
  • Checkpoint progress mid-session
  • Full-text transcript search

Memory System

Friction Analysis

Measure what's slowing you down

  • Session trends (friction rate, corrections)
  • Tool error patterns by type
  • Zero-commit session tracking
  • Stale problems that persist across weeks

Metrics & Analytics

Agent Performance

Multi-agent workflow analytics

  • Success rates by agent type
  • Kill patterns and cost per task
  • Parallelization ratios
  • Duration and token breakdowns

Agent Analytics

Unified Context Search

Parallel search across all context sources

  • Commits, memory, task history, transcripts
  • Deduplicated, relevance-ranked results
  • Source attribution for every match
  • Sub-second response times

Context Search

Documentation Hub

📘 Getting Started

🎯 Use Cases

🔧 Features

🏗️ Technical

🆚 Comparison

🤝 Community

  • Contributing - How to contribute code, docs, bug reports
  • Roadmap - Planned features and improvements
  • Changelog - Version history and release notes

Example: Friction Reduction Cycle

# 1. Baseline - where are you now?
claudewatch scan
# → Project "shelfctl" scores 42/100, friction rate 45%

# 2. Diagnose - what's causing friction?
claudewatch gaps
# → Missing: testing section in CLAUDE.md
# → Stale pattern: "go vet" errors in 55% of sessions

# 3. Fix - apply data-driven patches
claudewatch fix shelfctl --dry-run
claudewatch fix shelfctl
# → Added testing section
# → Added pre-edit lint hook

# 4. Measure - did it work?
claudewatch track
# ... work for a week ...
claudewatch track --compare
# → Friction rate: 45% → 28% (-17%)
# → Tool errors/session: 4.2 → 1.1 (-74%)

Installation

Homebrew (macOS/Linux):

brew install blackwell-systems/tap/claudewatch

Direct download:

# Download from https://github.com/blackwell-systems/claudewatch/releases/latest
tar -xzf claudewatch_*_$(uname -s)_$(uname -m).tar.gz
sudo mv claudewatch /usr/local/bin/

From source (requires Go 1.26+):

go install github.com/blackwell-systems/claudewatch/cmd/claudewatch@latest

See Installation Guide for detailed instructions and troubleshooting.

Privacy

Zero network calls. Reads only local files under ~/.claude/. Writes only to a local SQLite database for snapshot storage. No telemetry, no analytics, no crash reporting. Nothing leaves your machine.

Related Projects

commitmux - Semantic commit search across repositories. Find "when did we add authentication?" without remembering branch names or grep patterns.

scout-and-wave - Protocol for safely parallelizing human-guided agentic workflows. Orchestrator + Scout + Wave agents with explicit handoff contracts.

License

Dual-licensed under MIT and Apache 2.0.


Questions? Issues? Contributions? Open an issue or PR. We respond to everything.

About

AgentOps for Claude Code. Monitors AI agent behavior during development: real-time error loops, drift detection, friction analysis. Claude gets self-awareness hooks + MCP tools. You get cost-per-commit, agent success rates, exportable metrics. Like DevOps, but for AI agents.

Topics

Resources

License

MIT, Unknown licenses found

Licenses found

MIT
LICENSE
Unknown
LICENSE-APACHE

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages