Skip to content

fix(session): atomic file locking to prevent race conditions in all harnesses#94

Merged
yuh-yang merged 1 commit intoHKUDS:mainfrom
ZJZAC:fix/session-race-condition
Mar 17, 2026
Merged

fix(session): atomic file locking to prevent race conditions in all harnesses#94
yuh-yang merged 1 commit intoHKUDS:mainfrom
ZJZAC:fix/session-race-condition

Conversation

@ZJZAC
Copy link
Contributor

@ZJZAC ZJZAC commented Mar 17, 2026

Summary

Closes #51

The problem

Every harness saved session JSON with bare open("w") + json.dump(). Three compounding bugs:

# Bug Impact
1 open("w") truncates before any lock is acquired Concurrent opens zero the file even while another process holds a lock
2 No locking at all in current code Concurrent json.dump() calls interleave bytes → invalid JSON
3 No flush() before close Data may not be written to disk when another process reads

The fix

def _locked_save_json(path, data, **dump_kwargs) -> None:
    f = open(path, "r+")          # no truncation
    fcntl.flock(f, fcntl.LOCK_EX) # exclusive lock
    f.seek(0); f.truncate()       # truncate INSIDE the lock
    json.dump(data, f, **dump_kwargs)
    f.flush()                     # flush BEFORE releasing
    fcntl.flock(f, fcntl.LOCK_UN) # always released in finally

Key properties:

  • No pre-lock truncation"r+" mode avoids zeroing the file before the lock
  • Truncate inside lock — safe against concurrent readers
  • flush() before unlock — ensures data is written before other processes can read
  • finally block — lock is always released, even on exceptions
  • Windows fallbackfcntl import failure gracefully falls back to unlocked writes

Files changed (12)

  • */agent-harness/cli_anything/*/core/session.py — all 10 harnesses
  • cli-anything-plugin/HARNESS.md — added session locking guidance
  • gimp/agent-harness/cli_anything/gimp/tests/test_core.py — 5 new tests

Design decision: local copies, not a shared package

Per discussion in PR #73, the _locked_save_json function is kept as a local copy in each harness rather than a shared package. This avoids adding distribution complexity for a single utility function.

Test plan

  • All 1,051 existing unit tests pass (10 harnesses verified locally)
  • 5 new tests added: basic save, overwrite, truncation, parent dir creation, 4-thread concurrent stress test
  • All 69 gimp tests pass (64 existing + 5 new)

🤖 Generated with Claude Code

…ll harnesses

Closes HKUDS#51

The root cause: all 10 harnesses used bare `open("w") + json.dump()` for
session saves. `open("w")` truncates the file before any lock can be
acquired, so concurrent writes silently corrupt or lose data.

Fix: add `_locked_save_json()` to each session.py that opens with "r+"
(no truncation), acquires fcntl.LOCK_EX, then truncates inside the lock.
Falls back gracefully on Windows where fcntl is unavailable.

Also:
- Update HARNESS.md with session file locking guidance (per maintainer
  request on PR HKUDS#52)
- Add concurrent write tests (4 threads × 50 writes) to verify the fix

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@yuh-yang
Copy link
Collaborator

A mirror of #73.

I'm merging this one which is cleaner and less noisy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Race conditions in multi-session project state syncing

2 participants