Skip to content

fix(heartbeat): isolate heartbeat API client from main streaming session#247

Merged
tfatykhov merged 1 commit intomainfrom
fix/heartbeat-client-isolation
Apr 4, 2026
Merged

fix(heartbeat): isolate heartbeat API client from main streaming session#247
tfatykhov merged 1 commit intomainfrom
fix/heartbeat-client-isolation

Conversation

@tfatykhov
Copy link
Copy Markdown
Owner

Summary

  • Problem: HeartbeatRunner shared the same AnthropicClient (httpx pool of 5 connections) as the main chat streaming session. When heartbeat cognitive triage fired during active tool execution, both competed for connections — causing pool timeout.
  • Fix: Give HeartbeatRunner its own dedicated AnthropicClient with an isolated connection pool via a new AgentRunner.fork() method.
  • Review: 3-agent architecture review (architect + concurrency specialist + devil's advocate) + implementation review. All issues resolved.

Changes

File Change
nous/api/runner.py Add fork(api_client) method — creates sibling runner with shared cognitive/dispatcher but isolated API client
nous/heartbeat/runner.py Accept api_client param, create dedicated runner via fork(), route triage through it, cleanup with try/finally
nous/main.py Create second AnthropicClient for heartbeat
tests/test_runner_fork.py 4 tests for fork() correctness
tests/test_heartbeat_isolation.py 11 tests for routing, cleanup, start() integration, backward compat

Key Design Decisions

  • fork() sets _api_shared=True so the forked runner doesn't close the client — HeartbeatRunner.stop() owns that lifecycle
  • try/finally blocks in stop() ensure api_client is closed even if dedicated runner cleanup fails
  • _get_triage_runner() falls back to shared runner with warning log if start() wasn't called
  • Backward compatible: existing callers without api_client param work unchanged

Test plan

  • 4 tests: AgentRunner.fork() creates isolated runner, shares cognitive/dispatcher, doesn't close client on close()
  • 3 tests: triage routing uses dedicated runner when available, falls back with warning when not
  • 4 tests: stop() cleanup closes both runner and client, handles errors, backward compat
  • 2 tests: start() calls fork() when api_client provided, skips when not
  • 65 existing heartbeat tests pass (no regressions)
  • 67 heartbeat lifecycle/intelligent/tuner tests pass

🤖 Generated with Claude Code

HeartbeatRunner shared the same AgentRunner (and its httpx connection pool
of 5) as the main chat streaming session. When heartbeat cognitive triage
fired during active tool execution, both competed for connections — causing
pool timeout.

Changes:
- Add AgentRunner.fork(api_client) to create sibling runners with isolated
  connection pools but shared cognitive layer and dispatcher
- HeartbeatRunner accepts optional api_client parameter and creates a
  dedicated forked runner for cognitive triage
- main.py creates a second AnthropicClient for heartbeat
- Proper cleanup: try/finally in stop() ensures api_client is closed even
  if dedicated runner cleanup fails
- Warning log when api_client provided but start() not called

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@tfatykhov tfatykhov merged commit 56eb3cc into main Apr 4, 2026
1 of 2 checks passed
@tfatykhov tfatykhov deleted the fix/heartbeat-client-isolation branch April 4, 2026 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant