Skip to content

GH-2222: Expand dispatcher recovery + add periodic loop (internal/executor/) - In internal/executor/dispatcher.go: (a) Add StaleRunningThreshold, StaleQueuedThreshold, and StaleRecoveryInterval fields to DispatcherConfig (keeping StaleTaskDuration as backwards-compat alias for StaleRunningThreshold, defaults: 30min running, 5min queued, 5min interval). (b) Rewrite recoverStaleTasks() to recover both running and queued orphans, marking them failed (not re-queued#2224

Closed
alekspetrov wants to merge 1 commit intomainfrom
pilot/GH-2222

Conversation

@alekspetrov
Copy link
Copy Markdown
Collaborator

Summary

Automated PR created by Pilot for task GH-2222.

Closes #2222

Changes

re-queuing without a worker just recreates the orphan). (c) Change Start() to accept context.Context and launch a runStaleRecoveryLoop goroutine that ticks every StaleRecoveryInterval. (d) Add summary log line "stale recovery complete, reset N tasks" on every pass (even when 0) for diagnosability of the GH-2213 survival mystery. (e) Update Start() callers if signature changes. (f) Add tests: TestRecoverStaleTasks_QueuedAndRunning, TestRecoverStaleTasks_RespectsThresholds, TestRunStaleRecoveryLoop_Periodic, TestQueueTask_AfterRecovery in internal/executor/dispatcher_test.go.

- Add `GetStaleQueuedExecutions` to memory store for orphaned queued tasks
- Expand `DispatcherConfig` with `StaleQueuedDuration` (1h) and `StaleRecoveryInterval` (5m)
- Rewrite `recoverStaleTasks()` to handle both running and queued orphans,
  marking them as "failed" instead of re-queuing (re-queuing without a worker
  just recreates the orphan)
- Add `runStaleRecoveryLoop` goroutine launched from `Start(ctx)`
- Change `Start()` to accept `context.Context`, update callers in main.go
- Add summary log on every recovery pass (even when 0) for GH-2213 diagnosability
- Add tests: TestRecoverStaleTasks_QueuedAndRunning, TestRecoverStaleTasks_RespectsThresholds,
  TestRunStaleRecoveryLoop_Periodic, TestQueueTask_AfterRecovery
@alekspetrov
Copy link
Copy Markdown
Collaborator Author

Merge conflict detected. Auto-rebase failed — closing PR so the issue can be re-executed from updated main.

@alekspetrov alekspetrov closed this Apr 7, 2026
@alekspetrov alekspetrov deleted the pilot/GH-2222 branch April 7, 2026 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expand dispatcher recovery + add periodic loop (internal/executor/) - In `i...

1 participant