Skip to content

feat(runner): fire on_interval on a real timer + thread-safe finding flush#1179

Open
ocervell wants to merge 1 commit into
feat/mongodb-batch-finding-writesfrom
feat/runner-interval-thread
Open

feat(runner): fire on_interval on a real timer + thread-safe finding flush#1179
ocervell wants to merge 1 commit into
feat/mongodb-batch-finding-writesfrom
feat/runner-interval-thread

Conversation

@ocervell

Copy link
Copy Markdown
Contributor

Stacked on #1176 (base = feat/mongodb-batch-finding-writes) so that one stays a clean, deployable batch fix. Review/merge #1176 first.

1. on_interval on a real timer

Today on_interval only fires after the runner produces an item (__iter__ line 560), so it stalls during quiet periods — a task that bursts then goes quiet won't flush/update until the next item or on_end. (It's already time-throttled in run_hooks via backend_update_frequency; the problem is it's only checked on item production.)

Add a daemon interval thread that fires on_interval every backend_update_frequency seconds:

  • created lazily in __iter__ (not __init__) and nulled in __getstate__ → runner stays picklable for Celery, mirroring the monitor thread;
  • stopped in _finalize before the final on_end flush;
  • not started when backend_update_frequency <= 0 (-1 = no time-based backend updates → rely on size cap + on_end).

The per-item on_interval call + the existing throttle are kept (harmless; throttle dedupes item- vs timer-triggered firings).

2. Thread-safety (your point about the discarded on_interval return)

With flushing now possible from the interval thread, the per-runner findings buffer is touched by two threads (append on on_item / main, flush on on_interval / thread):

  • guard the buffer with a lock; swap it out under the lock, then bulk_write outside the lock;
  • toDict() snapshots its mutable collections (errors/warnings/celery_ids) since the interval thread may read it while the main thread appends.

Context invariant (your point 2): update_finding adds all in-memory context to the item synchronously at on_item, before the yield (it mints item._uuid client-side); the batched flush is DB-write-only and never mutates the item, so nothing is yielded missing context even though the flush is deferred / off-thread.

⚠️ Needs your review + local validation

Concurrency on a core runner — please validate against the local MongoDB repro (counts, chaining, no RuntimeError: changed size during iteration under load). Open question for you: the buffer is lock-guarded and toDict reads are snapshotted, but if you want stronger guarantees on all runner-state reads from the thread we could add a runner-level state lock — flagged rather than assumed.

🤖 Generated with Claude Code

…flush

Builds on the batch-finding-writes PR. Two improvements:

1. on_interval ran only when the runner produced an item, so it stalled during
   quiet periods. Add a daemon interval thread that fires on_interval every
   backend_update_frequency seconds. Created lazily in __iter__ (not __init__)
   and nulled in __getstate__ so the runner stays picklable for Celery, like the
   monitor thread; stopped in _finalize before the final on_end flush. Disabled
   when backend_update_frequency <= 0 (-1 = no time-based backend updates).

2. Thread-safety: the findings buffer can now be appended (on_item, main thread)
   and flushed (on_interval, interval thread) concurrently. Guard the buffer with
   a lock and swap it out under the lock before the bulk_write (done outside the
   lock). toDict() now snapshots its mutable collections (errors/warnings/
   celery_ids) since it may be read from the interval thread.

Context invariant preserved: update_finding adds all in-memory context to the
item synchronously at on_item (before the yield); the batched flush is DB-only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8e4095a4-6d81-475b-9ded-debd4469b506

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/runner-interval-thread

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant