feat(runner): fire on_interval on a real timer + thread-safe finding flush#1179
Open
ocervell wants to merge 1 commit into
Open
feat(runner): fire on_interval on a real timer + thread-safe finding flush#1179ocervell wants to merge 1 commit into
ocervell wants to merge 1 commit into
Conversation
…flush Builds on the batch-finding-writes PR. Two improvements: 1. on_interval ran only when the runner produced an item, so it stalled during quiet periods. Add a daemon interval thread that fires on_interval every backend_update_frequency seconds. Created lazily in __iter__ (not __init__) and nulled in __getstate__ so the runner stays picklable for Celery, like the monitor thread; stopped in _finalize before the final on_end flush. Disabled when backend_update_frequency <= 0 (-1 = no time-based backend updates). 2. Thread-safety: the findings buffer can now be appended (on_item, main thread) and flushed (on_interval, interval thread) concurrently. Guard the buffer with a lock and swap it out under the lock before the bulk_write (done outside the lock). toDict() now snapshots its mutable collections (errors/warnings/ celery_ids) since it may be read from the interval thread. Context invariant preserved: update_finding adds all in-memory context to the item synchronously at on_item (before the yield); the batched flush is DB-only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked on #1176 (base =
feat/mongodb-batch-finding-writes) so that one stays a clean, deployable batch fix. Review/merge #1176 first.1.
on_intervalon a real timerToday
on_intervalonly fires after the runner produces an item (__iter__line 560), so it stalls during quiet periods — a task that bursts then goes quiet won't flush/update until the next item oron_end. (It's already time-throttled inrun_hooksviabackend_update_frequency; the problem is it's only checked on item production.)Add a daemon interval thread that fires
on_intervaleverybackend_update_frequencyseconds:__iter__(not__init__) and nulled in__getstate__→ runner stays picklable for Celery, mirroring the monitor thread;_finalizebefore the finalon_endflush;backend_update_frequency <= 0(-1= no time-based backend updates → rely on size cap +on_end).The per-item
on_intervalcall + the existing throttle are kept (harmless; throttle dedupes item- vs timer-triggered firings).2. Thread-safety (your point about the discarded
on_intervalreturn)With flushing now possible from the interval thread, the per-runner findings buffer is touched by two threads (append on
on_item/ main, flush onon_interval/ thread):bulk_writeoutside the lock;toDict()snapshots its mutable collections (errors/warnings/celery_ids) since the interval thread may read it while the main thread appends.Context invariant (your point 2):
update_findingadds all in-memory context to the item synchronously aton_item, before the yield (it mintsitem._uuidclient-side); the batched flush is DB-write-only and never mutates the item, so nothing is yielded missing context even though the flush is deferred / off-thread.Concurrency on a core runner — please validate against the local MongoDB repro (counts, chaining, no
RuntimeError: changed size during iterationunder load). Open question for you: the buffer is lock-guarded andtoDictreads are snapshotted, but if you want stronger guarantees on all runner-state reads from the thread we could add a runner-level state lock — flagged rather than assumed.🤖 Generated with Claude Code