fix: shorten cron profile lock for manual runs#1746
fix: shorten cron profile lock for manual runs#1746Michaelyklam wants to merge 2 commits intonesquena:masterfrom
Conversation
d97c630 to
e650de2
Compare
|
Thanks @Michaelyklam — the goal of #1574 (releasing the parent profile lock during long manual cron runs) is exactly right, and the subprocess boundary is the right shape for it. We're going to defer this PR from the v0.51.11 release pass because Opus advisor caught a real correctness blocker in the parent-side queue handling that would manifest on real-world job sizes. Blocker —
|
|
Addressed the queue-drain blocker from #1746 (comment) in follow-up commit 65ff8a9. What changed:
Verification:
I kept the multiprocessing context unchanged in this narrow follow-up rather than combining the blocker fix with a spawn migration; the deadlock is fixed by the queue-order change, and a spawn switch would be a separate bootstrap/import-behavior change around the profile-pinned cron module setup. |
|
Thanks @Michaelyklam — this shipped in v0.51.12 (commit GitHub didn't auto-close because the merge commit only references the squash-merged stage branch, not your fork's commit directly — closing manually for hygiene. Live now on https://get-hermes.ai/ and on existing installs after Release notes: https://github.com/nesquena/hermes-webui/releases/tag/v0.51.12 |
…-color meta, quote-strip) + test-isolation hardening (nesquena#1746 deferred) Constituent PRs: - nesquena#1747 (@Michaelyklam) — wait for model catalog before opening picker (closes nesquena#1743) - nesquena#1748 (@nesquena-hermes) — theme-color meta tag for native chrome bridges (nesquena APPROVED) - nesquena#1750 (@nesquena-hermes) — strip surrounding quotes from Add Space path (nesquena APPROVED) Deferred to v0.51.12: - nesquena#1746 — Opus caught multiprocessing.Queue deadlock pattern (parent process.join() before queue drain hangs on output >64KB pipe buffer). Deferral comment with two specific fix options posted on PR. Plus 1 in-stage absorbed test-isolation fix: - test_issue1426 + test_issue1680: skip on detected prefix pollution (prong 2 of test-isolation-flake-recipe). Failure rate ~25% in full suite from sys.modules pollution; standalone always passes. Tests: 4596 → 4622 passing (+26). 0 regressions. Stably green. Pre-release verification: - 3 PRs CI-green individually + rebased onto master - pytest 4622 passed, 0 failed - node -c clean on static/ui.js + static/boot.js - 11/11 browser API endpoints PASS - Opus advisor: SHIP nesquena#1747/nesquena#1748/nesquena#1750, MUST-FIX block on nesquena#1746 Closes nesquena#1743.
… custom provider routing + session runtime invariants) Constituent PRs: - nesquena#1746 (@Michaelyklam) — shorten cron profile lock for manual runs (closes nesquena#1574, RETURNS from v0.51.11 deferral with queue-drain blocker fixed) - nesquena#1752 (@Michaelyklam) — route custom provider models dict selections (slice of nesquena#1240 umbrella) - nesquena#1753 (@Michaelyklam) — guard session-owned runtime invariants (refs nesquena#1694) nesquena#1746 v2 fix: result_queue.get(timeout=...) BEFORE process.join() (drain-then-join), with queue.Empty recovery + 200,000-char regression test. Opus stage-306 verified the fix correct + complete; the prior fork→spawn SHOULD-FIX filed as follow-up issue nesquena#1754 (separate architectural change). Tests: 4622 → 4632 passing (+10). 0 regressions. Stably green on first try. Pre-release verification: - All 3 PRs CI-green individually + rebased onto master with NO conflicts (disjoint files: api/config.py + static/messages.js + api/routes.py) - pytest 4632 passed, 0 failed - node -c clean on static/messages.js - 11/11 browser API endpoints PASS - Opus advisor: SHIP all 3, 0 MUST-FIX, 1 SHOULD-FIX filed as nesquena#1754 Closes nesquena#1574.
Thinking Path
run_job()execution.What Changed
cron.scheduler.run_job()inside the selected profile context._run_cron_tracked()so it no longer imports/callsrun_job()directly in the parent worker thread.Why It Matters
Verification
/home/michael/.hermes/hermes-agent/venv/bin/python -m pytest tests/test_issue1574_cron_profile_lock.py tests/test_issue617_cron_profile_selector.py tests/test_cron_run_job_import.py tests/test_scheduled_jobs_profile_isolation.py -qgit diff --checktest (3.11),test (3.12), andtest (3.13)are passing on commite650de2c.Risks / Follow-ups
multiprocessing.get_context("fork"), matching the current Linux/self-hosted deployment target. If the WebUI later needs first-class Windows support for this path, this subprocess boundary may need a spawn-compatible entrypoint.Model Used
Closes #1574