Skip to content

design(streaming): clarify session-scoped runtime ownership for running WebUI sessions #1694

@dso2ng

Description

@dso2ng

Context

Several recent fixes have improved WebUI behavior around running sessions, stream reattachment, reload recovery, and multi-tab streaming:

These changes fixed important concrete bugs, but they also expose a broader architectural question: which runtime state is pane-owned, which runtime state is session-owned, and what invariants should future PRs preserve?

If this direction matches the intended architecture, this issue could also serve as a tracking issue for small follow-up PRs.

Proposed mental model

The WebUI appears to have two different concepts that should stay separate:

  1. Active pane state

    • S.session
    • S.messages
    • S.busy
    • S.activeStreamId
    • currently rendered approval / clarify / composer UI
  2. Running session state

    • INFLIGHT[session_id]
    • LIVE_STREAMS[session_id]
    • server-side active_stream_id
    • pending user message / checkpoint state
    • approval / clarify ownership
    • sidebar row runtime metadata
    • background completion / canonical-session rotation

The active pane is a projection of one session. A running turn belongs to the session that owns the stream, even when that session is not the currently viewed pane.

Why this matters

Long-running Hermes turns may continue while the user:

  • switches to another sidebar session;
  • opens the same session in another tab;
  • reloads /session/<id>;
  • restores from BFCache;
  • opens the root / page in a new tab;
  • receives approval / clarify prompts in the background;
  • cancels a running session from the sidebar;
  • hits network drops or EventSource reconnect paths;
  • eventually needs crash recovery / replay behavior.

If code paths rely too heavily on active-pane globals, several classes of bugs can reappear:

  • switching panes tears down or duplicates a still-running stream;
  • a background session completion mutates the active pane;
  • cancel / approval / clarify actions target the wrong session;
  • a root tab projects into a saved running session and appears globally blocked;
  • S.busy prevents unrelated pane actions even though the running turn belongs to another session;
  • transport fan-out and runtime ownership get conflated with replay / WAL durability.

Suggested invariants

I would like to confirm whether the following invariants match the intended direction:

  1. S.session, S.messages, S.busy, and S.activeStreamId should describe the currently viewed pane only.
  2. Running state should be keyed by session_id / stream_id, not by whichever pane is currently active.
  3. Returning to the same running session should reuse the existing live transport when possible.
  4. Switching away from a running session should not inherently close, hide, or delete that session's runtime state.
  5. Background completion should refresh sidebar / alias / canonical-session metadata without mutating an unrelated active pane.
  6. Approval and clarify state should be owned by the session that requested it, and rendered only when that session is active.
  7. Sidebar row actions should use row-owned runtime metadata, such as session.active_stream_id, not active-pane globals.
  8. Multi-tab live streaming should be handled by stream fan-out; replay / crash recovery should remain a separate WAL or snapshot design.
  9. Root / boot behavior should be considered separately from explicit /session/<id> reload / reattach behavior, especially when the saved session is currently running.
  10. Future refactors should preserve the current no-build-step / vanilla JS architecture and land as small, reviewable PRs.

Possible follow-up PR ladder

If this direction is correct, I think the safest path is not a large runtime rewrite, but a small PR ladder. Examples:

  1. Add/expand regression coverage for remaining running-session ownership paths.
  2. Audit terminal handlers so completion / cancel / error cleanup is scoped to the owning session.
  3. Audit approval / clarify rendering so pending state survives switching away and reappears when returning.
  4. Clarify root / boot policy for saved running sessions vs explicit /session/<id> reload recovery.
  5. Ensure background canonical-session rotation updates sidebar / aliases even when the completed session is not active.
  6. Only after the invariants are pinned, consider centralizing per-session runtime state behind a small helper object.

A possible future shape might be:

SESSION_RUNTIME[sid] = {
  streamId,
  status,
  eventSource,
  messages,
  toolCalls,
  approvalState,
  clarifyState,
  canonicalSessionId,
  lastEventAt,
};

But this issue is not proposing a broad rewrite up front. The immediate goal is to confirm the ownership model and use it to guide narrow fixes.

Questions

  1. Does the active-pane vs session-owned runtime model above match the intended architecture?
  2. Are there any current globals that should intentionally remain app-wide rather than session-scoped?
  3. Should root / restore a saved running session automatically, or should explicit /session/<id> be the only path that projects into a running session on boot?
  4. Should late subscribers rely only on fan-out from their attach point, or should a compact live-progress snapshot be considered separately from full WAL replay?
  5. Is Crash recovery: WAL design + integration for in-flight assistant tokens #1585 the right place to keep crash-recovery / replay design, while this issue tracks active-pane/runtime ownership?
  6. Would maintainers prefer follow-up PRs under one tracking issue, or individual bug reports for each remaining scenario?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requeststreamingSSE streaming, gateway sync, real-time updatestrackingTracking issue for follow-up workuxUser experience / visual polish

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions