Problem
Sleep has still never fired (0 sleep_reflection facts) despite PR #161 being merged and container restarted.
Root Cause Analysis
Issue 1: Working Memory Accumulates Forever
heart.working_memory had 211 entries dating back to Feb 24 — test sessions, subtasks, debug sessions, backfill jobs. These are never cleaned up.
While the session monitor uses an in-memory dict (not the DB table) for tracking, the /status endpoint reads from this table and reports all entries as "active sessions." This is misleading and makes debugging harder.
Fix needed: TTL-based cleanup. Delete working_memory entries older than N hours (e.g. 24h). Could be:
- Periodic task in session_monitor
- Part of the sleep handler
- DB-level:
ON DELETE CASCADE or scheduled cleanup
Issue 2: Session Lifecycle - No Explicit Close
Sessions are never explicitly ended. The timeout path works like this:
- User sends message →
message_received event → monitor tracks in _last_activity
- User stops talking
- After 30 min (
session_idle_timeout): monitor calls cognitive.end_session() → removes from _last_activity
- After 2h global idle (
sleep_timeout): monitor emits sleep_started
But: The monitor only tracks sessions it has seen events for since the last restart. On restart, _last_activity is empty, _global_last_activity is set to time.monotonic() (current time). If no messages come in, global_idle starts counting from restart time.
So after restart, sleep should fire after 2h of silence. But Tim reports 8+ hours of inactivity with no sleep.
Possible Remaining Causes
- Container not running the fix — No version endpoint to verify. The fix is on
main (ba4def0) but the deployed container may be on an older commit.
- Something emitting events internally — If any internal process (e.g. subtask scheduler, background task) emits
turn_completed or message_received, it resets the timers.
- Crash in sleep handler — If
sleep_started fires but the handler crashes, sleep_reflection facts are never created and it looks like sleep never happened.
Recommended Fixes
-
Add working_memory TTL cleanup to session monitor _check_timeouts():
# Clean working_memory entries older than 24h
await self._heart.cleanup_stale_working_memory(max_age_hours=24)
-
Add /status version info — expose git SHA or version so we can verify deployments:
{"version": "0.2.0", "commit": "ba4def0", "started_at": "..."}
-
Add sleep handler logging — ensure sleep_started handler logs entry/exit so we can verify it ran.
-
Consider hydrating _last_activity from DB on startup — query working_memory updated_at to pre-populate with real timestamps instead of starting empty.
Manual Workaround Applied
Cleaned up 209 stale working_memory entries via SQL:
DELETE FROM heart.working_memory WHERE updated_at < NOW() - INTERVAL '24 hours';
Related: #160, PR #161
Problem
Sleep has still never fired (0
sleep_reflectionfacts) despite PR #161 being merged and container restarted.Root Cause Analysis
Issue 1: Working Memory Accumulates Forever
heart.working_memoryhad 211 entries dating back to Feb 24 — test sessions, subtasks, debug sessions, backfill jobs. These are never cleaned up.While the session monitor uses an in-memory dict (not the DB table) for tracking, the
/statusendpoint reads from this table and reports all entries as "active sessions." This is misleading and makes debugging harder.Fix needed: TTL-based cleanup. Delete working_memory entries older than N hours (e.g. 24h). Could be:
ON DELETE CASCADEor scheduled cleanupIssue 2: Session Lifecycle - No Explicit Close
Sessions are never explicitly ended. The timeout path works like this:
message_receivedevent → monitor tracks in_last_activitysession_idle_timeout): monitor callscognitive.end_session()→ removes from_last_activitysleep_timeout): monitor emitssleep_startedBut: The monitor only tracks sessions it has seen events for since the last restart. On restart,
_last_activityis empty,_global_last_activityis set totime.monotonic()(current time). If no messages come in,global_idlestarts counting from restart time.So after restart, sleep should fire after 2h of silence. But Tim reports 8+ hours of inactivity with no sleep.
Possible Remaining Causes
main(ba4def0) but the deployed container may be on an older commit.turn_completedormessage_received, it resets the timers.sleep_startedfires but the handler crashes,sleep_reflectionfacts are never created and it looks like sleep never happened.Recommended Fixes
Add working_memory TTL cleanup to session monitor
_check_timeouts():Add
/statusversion info — expose git SHA or version so we can verify deployments:{"version": "0.2.0", "commit": "ba4def0", "started_at": "..."}Add sleep handler logging — ensure
sleep_startedhandler logs entry/exit so we can verify it ran.Consider hydrating
_last_activityfrom DB on startup — query working_memoryupdated_atto pre-populate with real timestamps instead of starting empty.Manual Workaround Applied
Cleaned up 209 stale working_memory entries via SQL:
Related: #160, PR #161