Skip to content

fix(sessions): guard syncFromHermes vs SessionDeleter race + import all sources on startup#354

Open
29206394 wants to merge 1 commit intoEKKOLearnAI:mainfrom
29206394:fix/session-deleter-race-and-cli-sync
Open

fix(sessions): guard syncFromHermes vs SessionDeleter race + import all sources on startup#354
29206394 wants to merge 1 commit intoEKKOLearnAI:mainfrom
29206394:fix/session-deleter-race-and-cli-sync

Conversation

@29206394
Copy link
Copy Markdown

Closes #352.

Summary

Two related bugs in the v0.5.3 session pipeline:

Bug 1 — SessionDeleter race condition (#352)

chat-run-socket.syncFromHermes asynchronously reads a Hermes session and writes its assistant/tool messages into the local mirror, then enqueues the Hermes session id into gc_pending_session_deletes. SessionDeleter drains that queue on a 5-minute timer, on profile switch, and immediately on start — with no coordination against in-flight reads. A drain that fires while a sync is mid-flight deletes the Hermes session out from under the reader, and assistant/tool messages never make it into the local DB.

Visible symptom: switching away and back to a session shows assistant replies vanishing; message_count decrements over time (6 → 5 → 4 …).

Fix: introduce an in-memory inFlightHermesSessionIds: Set<string> exported from session-deleter.ts. syncFromHermes adds the id BEFORE awaiting the read and removes it after the read completes — handling all three exit paths:

  • Successful sync → enqueue delete, then release guard (ordering matters: the row must exist in the queue before drain can observe a guard-less candidate)
  • Empty-detail early return → release guard, do not enqueue
  • .catch() → release guard, do not enqueue (next successful sync will pick it up)

drain() filters out any candidate currently in the set, so periodic / profile-switch ticks defer deletion until the read is done.

Bug 2 — Startup sync silently drops CLI sessions

syncProfileSessions hardcoded 'api_server' as the source filter on listHermesSessionSummaries, so first-startup sync only imported chat sessions originated by the WebUI itself. CLI conversations (source='cli'), Telegram, Discord, Slack, etc. were never synced into the local DB and stayed invisible in the sidebar forever (sync only runs when the local DB is empty, so they couldn't be recovered without dropping the DB).

As a side effect, long CLI conversations that had been compressed into a parent_session_id chain weren't aggregated either, since they weren't synced at all.

Fix: pass undefined to listSessionSummaries so it returns roots from ALL sources. That function already filters out tool / compress_* sessions and aggregates parent_session_id chains, so no other change is needed.

Files changed

  • packages/server/src/services/hermes/session-deleter.ts — export inFlightHermesSessionIds, filter eligible rows in drain()
  • packages/server/src/services/hermes/chat-run-socket.ts — register/release guard around syncFromHermes
  • packages/server/src/services/hermes/session-sync.ts — import all sources, not just api_server

3 files, +65 / −6 lines.

Test plan

  • Local clean re-sync against a Hermes state.db containing mixed api_server + cli sessions across 4 profiles → imported 11 chains (previously 0 from this code path), correctly aggregating multi-step chains (max thread_session_count=5) into single sidebar entries.
  • npm run build passes (vue-tsc + tsc + esbuild).
  • Server starts cleanly, [session-sync] sync complete: synced=11, errors=0 in logs.
  • Maintainer to verify: chat run → switch session → wait > 5 min → switch back → assistant messages should still be present.

Notes

  • The guard is intentionally a process-local in-memory set rather than a DB column. Drain only runs in this process, and syncFromHermes only registers in this process, so a Set is sufficient and avoids schema migration.
  • Ordering inside the success path (enqueue → then release) is load-bearing — reversing it reintroduces a small window where drain could miss the guard. This is documented in the code with a // IMPORTANT: comment referencing the issue.

…ll sources on startup (EKKOLearnAI#352)

Two related bugs in the session pipeline introduced in v0.5.3.

Bug 1 — SessionDeleter race condition (issue EKKOLearnAI#352):
  chat-run-socket.syncFromHermes asynchronously reads a Hermes session
  and writes its assistant/tool messages into the local mirror. It then
  enqueues the Hermes session for deletion via gc_pending_session_deletes.
  SessionDeleter drains that queue on a 5-minute timer (and on profile
  switch + immediately on start), with no coordination against in-flight
  reads. A drain that fires while a sync is mid-flight deletes the Hermes
  session out from under the reader, and the assistant/tool messages
  never make it into the local DB.

  Visible symptom: switching away and back to a session shows assistant
  replies vanishing; message_count decrements over time (6 -> 5 -> 4 ...).

  Fix: introduce an in-memory `inFlightHermesSessionIds` set. syncFromHermes
  marks the id BEFORE awaiting the read, releases it after the read
  completes (success, failure, and the empty-detail early return are all
  handled). drain() filters out any candidate that is currently in the set,
  so periodic / profile-switch ticks defer deletion until the read is done.

  Important ordering: enqueueEphemeralDelete is called BEFORE clearing the
  guard, so drain never observes the pending row without the guard active.

Bug 2 — Startup sync silently drops CLI sessions:
  syncProfileSessions hardcoded `'api_server'` as the source filter on
  listHermesSessionSummaries, so first-startup sync only imported chat
  sessions originated by the WebUI itself. CLI conversations
  (source='cli'), Telegram, Discord, Slack, etc. were never synced into
  the local DB and stayed invisible in the sidebar forever (sync only
  runs when the local DB is empty).

  As a side effect, long CLI conversations that had been compressed into
  a parent_session_id chain also weren't aggregated by sync, since they
  weren't synced at all.

  Fix: pass `undefined` to listSessionSummaries so it returns roots from
  ALL sources. That function already filters out tool / compress_*
  sessions and aggregates parent_session_id chains, so no other change
  is needed.

Verified locally: clean re-sync against a Hermes state.db with mixed
api_server + cli sessions now imports 11 chains across 4 profiles
(previously: 0 from this code path), correctly aggregating multi-step
chains (thread_session_count up to 5) into single sidebar entries.
albert748 added a commit to albert748/hermes-web-ui that referenced this pull request May 1, 2026
…eleter race guard

## Root cause

The session-sync service introduced in PR EKKOLearnAI#294 has four defects:
1. Hardcoded source='api_server' filter — only api_server sessions imported
2. One-shot sync gate (count > 0) — permanently skipped after first run
3. Hardcoded source label in createSession — source info destroyed
4. addMessage() failed on NULL content from tool-call/system messages

Additionally, SessionDeleter race with syncFromHermes (EKKOLearnAI#352).

## Changes

### session-sync.ts — incremental dedup
- Remove count > 0 gate — sync runs on every startup
- Dedup by hermes_id column: query existing ids, skip already-imported
- Pass hermes_id to createSession for future dedup
- Import ALL sources (pass undefined instead of 'api_server')

### schemas.ts + session-store.ts — hermes_id tracking
- Add hermes_id TEXT column (auto-migrated via ensureTable)
- CREATE UNIQUE INDEX for fast dedup lookups
- createSession() accepts optional hermes_id parameter

### SessionDeleter race fix (credit: EKKOLearnAI#354 by moxian)
- Export inFlightHermesSessionIds Set from session-deleter.ts
- chat-run-socket.ts: register/release guard around async reads
- session-deleter.ts: filter out in-flight sessions in drain()

### Sidebar collapse toggle
- Add collapse button for compact icon-rail mode (persisted)

Fixes EKKOLearnAI#322, Fixes EKKOLearnAI#321, Fixes EKKOLearnAI#352
@albert748
Copy link
Copy Markdown
Contributor

Hey @29206394, thanks for the SessionDeleter fix — I've incorporated the inFlightHermesSessionIds guard into #373 along with the source filter fix and incremental sync (hermes_id dedup column). The incremental sync is the key piece: without it, the source filter fix alone doesn't help existing users who already have a populated DB (the count > 0 gate blocks re-sync forever). Hope we can consolidate — my PR has all three fixes in one commit. cc @54laowang

albert748 added a commit to albert748/hermes-web-ui that referenced this pull request May 1, 2026
…eleter race guard

## Root cause

The session-sync service introduced in PR EKKOLearnAI#294 has four defects:
1. Hardcoded source='api_server' filter — only api_server sessions imported
2. One-shot sync gate (count > 0) — permanently skipped after first run
3. Hardcoded source label in createSession — source info destroyed
4. addMessage() failed on NULL content from tool-call/system messages

Additionally, SessionDeleter race with syncFromHermes (EKKOLearnAI#352).

## Changes

### session-sync.ts — incremental dedup
- Remove count > 0 gate — sync runs on every startup
- Dedup by hermes_id column: query existing ids, skip already-imported
- Pass hermes_id to createSession for future dedup
- Import ALL sources (pass undefined instead of 'api_server')

### schemas.ts + session-store.ts — hermes_id tracking
- Add hermes_id TEXT column (auto-migrated via ensureTable)
- CREATE UNIQUE INDEX for fast dedup lookups
- createSession() accepts optional hermes_id parameter

### SessionDeleter race fix (credit: EKKOLearnAI#354 by moxian)
- Export inFlightHermesSessionIds Set from session-deleter.ts
- chat-run-socket.ts: register/release guard around async reads
- session-deleter.ts: filter out in-flight sessions in drain()

Fixes EKKOLearnAI#322, Fixes EKKOLearnAI#321, Fixes EKKOLearnAI#352
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: SessionDeleter race condition causes assistant message loss in v0.5.3

3 participants