feat(harness): SQLite bindings table for thread resume#12
Conversation
Add harness_bindings table to persist opaque resume cursors across session close. Codex thread IDs survive BEAM restarts, enabling thread/resume without full conversation replay. Key design decisions: - Bindings survive session close (that's the point — resume from closed sessions) - resume_cursor_json is opaque pass-through — harness never parses it - DELETE on any resume failure (before retry) prevents livelock from stale cursors - created_at/updated_at as TEXT (ISO 8601) for consistency with existing schema Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 WalkthroughWalkthroughPersists Codex thread resume cursors as bindings: storage gains binding CRUD and a DB table; Codex session upserts bindings after successful Changes
Sequence DiagramsequenceDiagram
actor "Codex RPC"
participant "Codex Session"
participant "Harness.Storage"
participant "SQLite DB"
"Codex RPC"->>"Codex Session": thread/start or thread/resume (success)
"Codex Session"->>"Codex Session": persist_binding() — encode cursor
"Codex Session"->>"Harness.Storage": upsert_binding(thread_id, provider, cursor_json)
"Harness.Storage"->>"SQLite DB": INSERT ... ON CONFLICT(thread_id) DO UPDATE
"SQLite DB"-->>"Harness.Storage": OK
"Harness.Storage"-->>"Codex Session": OK
alt resume RPC error
"Codex RPC"-->>"Codex Session": thread/resume error
"Codex Session"->>"Harness.Storage": delete_binding(thread_id)
"Harness.Storage"->>"SQLite DB": DELETE FROM harness_bindings WHERE thread_id=...
"SQLite DB"-->>"Harness.Storage": OK
"Harness.Storage"-->>"Codex Session": OK
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/harness/lib/harness/storage.ex`:
- Around line 72-75: Update the `@doc` for get_binding to match the actual return
value (a map) instead of a tuple: state that get_binding(thread_id) returns a
map like %{thread_id: tid, provider: provider, resume_cursor_json: cursor_json};
locate the doc above the get_binding/1 function and do_get_binding/2 to ensure
both descriptions match the implementation and tests expecting atom-keyed map.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 17e78a57-a4ff-4b17-b04d-ae309e221123
📒 Files selected for processing (3)
apps/harness/lib/harness/providers/codex_session.exapps/harness/lib/harness/storage.exapps/harness/test/harness/storage_test.exs
apps/harness/lib/harness/storage.ex
Outdated
| GenServer.call(__MODULE__, {:upsert_binding, thread_id, provider, resume_cursor_json}) | ||
| end | ||
|
|
||
| @doc "Get the binding for a thread. Returns {thread_id, provider, resume_cursor_json} or nil." |
There was a problem hiding this comment.
🟡 get_binding @doc claims tuple return type but implementation returns a map
The @doc on get_binding/1 at apps/harness/lib/harness/storage.ex:72 states Returns {thread_id, provider, resume_cursor_json} or nil, which in Elixir denotes a 3-element tuple. However, the actual implementation at apps/harness/lib/harness/storage.ex:478 returns %{thread_id: tid, provider: provider, resume_cursor_json: cursor_json} — a map. Any caller relying on the documented tuple contract (e.g., pattern matching with {tid, provider, cursor}) would crash with a MatchError at runtime.
| @doc "Get the binding for a thread. Returns {thread_id, provider, resume_cursor_json} or nil." | |
| @doc "Get the binding for a thread. Returns %{thread_id, provider, resume_cursor_json} map or nil." |
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/harness/lib/harness/storage.ex`:
- Around line 271-273: The harness_bindings table and binding operations
currently use thread_id as the sole primary key so one provider can overwrite
another; update the schema to use a composite primary key (thread_id, provider)
and adjust all binding operations (the upsert path and the lookup/delete paths
referenced around lines 451-460 and 474-485) to include provider in
WHERE/DELETE/INSERT/UPSERT logic so bindings are provider-scoped; ensure the
column names (thread_id, provider) and table name (harness_bindings) are used
consistently and any existing queries/functions that fetch or remove bindings
accept and pass the provider parameter.
- Around line 77-79: The docstring for delete_binding is outdated: it says
deletion is only for non-recoverable resume failures but
Harness.Providers.CodexSession.handle_rpc_error/3 deletes bindings for both
recoverable and non-recoverable thread/resume failures. Update the comment(s)
(the `@doc` for delete_binding and the other comment near the second mention) to
state that delete_binding/1 is invoked when a thread resume fails (both
recoverable and non-recoverable cases) and briefly note that callers such as
Harness.Providers.CodexSession.handle_rpc_error/3 may call it for either outcome
so the documentation matches runtime behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b9cd80fe-5ea5-4484-856d-1b8946e2b641
📒 Files selected for processing (1)
apps/harness/lib/harness/storage.ex
| CREATE TABLE IF NOT EXISTS harness_bindings ( | ||
| thread_id TEXT PRIMARY KEY, | ||
| provider TEXT NOT NULL, |
There was a problem hiding this comment.
Binding keying currently breaks multi-provider coexistence.
With thread_id as the sole key (Line 272) and lookup/delete keyed only by thread_id (Lines 474-485), one provider’s upsert will overwrite another provider’s cursor for the same thread. That conflicts with the PR objective to support multi-provider coexistence.
💡 Proposed fix (provider-scoped bindings)
- `@doc` "Get the binding for a thread. Returns %{thread_id, provider, resume_cursor_json} or nil."
- def get_binding(thread_id) do
- GenServer.call(__MODULE__, {:get_binding, thread_id})
+ `@doc` "Get the binding for a thread+provider. Returns %{thread_id, provider, resume_cursor_json} or nil."
+ def get_binding(thread_id, provider) do
+ GenServer.call(__MODULE__, {:get_binding, thread_id, provider})
end
- `@doc` "Delete a binding. Called when resume fails non-recoverably (before fresh start)."
- def delete_binding(thread_id) do
- GenServer.call(__MODULE__, {:delete_binding, thread_id})
+ `@doc` "Delete a binding for a thread+provider."
+ def delete_binding(thread_id, provider) do
+ GenServer.call(__MODULE__, {:delete_binding, thread_id, provider})
end- def handle_call({:get_binding, thread_id}, _from, %{conn: conn} = state) do
- result = do_get_binding(conn, thread_id)
+ def handle_call({:get_binding, thread_id, provider}, _from, %{conn: conn} = state) do
+ result = do_get_binding(conn, thread_id, provider)
{:reply, result, state}
end
- def handle_call({:delete_binding, thread_id}, _from, %{conn: conn} = state) do
- result = do_delete_binding(conn, thread_id)
+ def handle_call({:delete_binding, thread_id, provider}, _from, %{conn: conn} = state) do
+ result = do_delete_binding(conn, thread_id, provider)
{:reply, result, state}
end- CREATE TABLE IF NOT EXISTS harness_bindings (
- thread_id TEXT PRIMARY KEY,
+ CREATE TABLE IF NOT EXISTS harness_bindings (
+ thread_id TEXT NOT NULL,
provider TEXT NOT NULL,
resume_cursor_json TEXT,
created_at TEXT NOT NULL,
- updated_at TEXT NOT NULL
+ updated_at TEXT NOT NULL,
+ PRIMARY KEY (thread_id, provider)
)- ON CONFLICT(thread_id) DO UPDATE SET
+ ON CONFLICT(thread_id, provider) DO UPDATE SET
provider = excluded.provider,
resume_cursor_json = excluded.resume_cursor_json,
updated_at = excluded.updated_at- defp do_get_binding(conn, thread_id) do
- sql = "SELECT thread_id, provider, resume_cursor_json FROM harness_bindings WHERE thread_id = ?1"
+ defp do_get_binding(conn, thread_id, provider) do
+ sql = """
+ SELECT thread_id, provider, resume_cursor_json
+ FROM harness_bindings
+ WHERE thread_id = ?1 AND provider = ?2
+ """
- case query_one(conn, sql, [thread_id]) do
+ case query_one(conn, sql, [thread_id, provider]) do
[tid, provider, cursor_json] -> %{thread_id: tid, provider: provider, resume_cursor_json: cursor_json}
nil -> nil
end
end
- defp do_delete_binding(conn, thread_id) do
- sql = "DELETE FROM harness_bindings WHERE thread_id = ?1"
+ defp do_delete_binding(conn, thread_id, provider) do
+ sql = "DELETE FROM harness_bindings WHERE thread_id = ?1 AND provider = ?2"
...
- :ok = Sqlite3.bind(stmt, [thread_id])
+ :ok = Sqlite3.bind(stmt, [thread_id, provider])Also applies to: 451-460, 474-485
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/harness/lib/harness/storage.ex` around lines 271 - 273, The
harness_bindings table and binding operations currently use thread_id as the
sole primary key so one provider can overwrite another; update the schema to use
a composite primary key (thread_id, provider) and adjust all binding operations
(the upsert path and the lookup/delete paths referenced around lines 451-460 and
474-485) to include provider in WHERE/DELETE/INSERT/UPSERT logic so bindings are
provider-scoped; ensure the column names (thread_id, provider) and table name
(harness_bindings) are used consistently and any existing queries/functions that
fetch or remove bindings accept and pass the provider parameter.
delete_binding is called on all resume failures (recoverable and non-recoverable), not just non-recoverable ones. Update @doc and migration comment to match. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SessionManager now reads Storage.get_binding on session start and injects resumeCursor into params when the provider matches. Completes the bindings table feature: write path (PR #12) + read path (this). Provider mismatch check prevents injecting a Codex cursor into a Cursor session. Caller-supplied resumeCursor is never overwritten. 7 new tests including multi-session concurrent and cross-provider. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
harness_bindingstable to persist opaque resume cursors (resume_cursor_json) across session close, enabling Codexthread/resumewithout full conversation replayCodexSession: upserts binding afterthread/startorthread/resumesucceeds, deletes stale binding on any resume failure before retry (prevents livelock from permanently invalid cursors)reset!cleanupDesign decisions
resume_cursor_jsonis opaque pass-through — harness never parses it, just round-trips the JSON blob the adapter producesthread/startwhich upserts a fresh binding on successstorage.excreated_atcolumn added for future stale-binding cleanup queriesMeiotic review findings incorporated
reset!updated to includeDELETE FROM harness_bindingsfor test isolationget_binding/1returns provider so callers can verify provider match before using cursorTest plan
mix test test/harness/storage_test.exs— 31/31 pass (8 new binding tests)mix compile --warnings-as-errors— cleanthread/resumeuses persisted cursor🤖 Generated with Claude Code
Summary by CodeRabbit