GH-2222: Expand dispatcher recovery + add periodic loop (internal/executor/) - In internal/executor/dispatcher.go: (a) Add StaleRunningThreshold, StaleQueuedThreshold, and StaleRecoveryInterval fields to DispatcherConfig (keeping StaleTaskDuration as backwards-compat alias for StaleRunningThreshold, defaults: 30min running, 5min queued, 5min interval). (b) Rewrite recoverStaleTasks() to recover both running and queued orphans, marking them failed (not re-queued#2224
Closed
alekspetrov wants to merge 1 commit intomainfrom
Closed
GH-2222: Expand dispatcher recovery + add periodic loop (internal/executor/) - In internal/executor/dispatcher.go: (a) Add StaleRunningThreshold, StaleQueuedThreshold, and StaleRecoveryInterval fields to DispatcherConfig (keeping StaleTaskDuration as backwards-compat alias for StaleRunningThreshold, defaults: 30min running, 5min queued, 5min interval). (b) Rewrite recoverStaleTasks() to recover both running and queued orphans, marking them failed (not re-queued#2224alekspetrov wants to merge 1 commit intomainfrom
internal/executor/) - In internal/executor/dispatcher.go: (a) Add StaleRunningThreshold, StaleQueuedThreshold, and StaleRecoveryInterval fields to DispatcherConfig (keeping StaleTaskDuration as backwards-compat alias for StaleRunningThreshold, defaults: 30min running, 5min queued, 5min interval). (b) Rewrite recoverStaleTasks() to recover both running and queued orphans, marking them failed (not re-queued#2224alekspetrov wants to merge 1 commit intomainfrom
Conversation
- Add `GetStaleQueuedExecutions` to memory store for orphaned queued tasks - Expand `DispatcherConfig` with `StaleQueuedDuration` (1h) and `StaleRecoveryInterval` (5m) - Rewrite `recoverStaleTasks()` to handle both running and queued orphans, marking them as "failed" instead of re-queuing (re-queuing without a worker just recreates the orphan) - Add `runStaleRecoveryLoop` goroutine launched from `Start(ctx)` - Change `Start()` to accept `context.Context`, update callers in main.go - Add summary log on every recovery pass (even when 0) for GH-2213 diagnosability - Add tests: TestRecoverStaleTasks_QueuedAndRunning, TestRecoverStaleTasks_RespectsThresholds, TestRunStaleRecoveryLoop_Periodic, TestQueueTask_AfterRecovery
Collaborator
Author
|
Merge conflict detected. Auto-rebase failed — closing PR so the issue can be re-executed from updated main. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Automated PR created by Pilot for task GH-2222.
Closes #2222
Changes
re-queuing without a worker just recreates the orphan). (c) Change
Start()to acceptcontext.Contextand launch arunStaleRecoveryLoopgoroutine that ticks everyStaleRecoveryInterval. (d) Add summary log line"stale recovery complete, reset N tasks"on every pass (even when 0) for diagnosability of the GH-2213 survival mystery. (e) UpdateStart()callers if signature changes. (f) Add tests:TestRecoverStaleTasks_QueuedAndRunning,TestRecoverStaleTasks_RespectsThresholds,TestRunStaleRecoveryLoop_Periodic,TestQueueTask_AfterRecoveryininternal/executor/dispatcher_test.go.