fix(delegation-watchdog): resolve merge conflicts and address review feedback#456
fix(delegation-watchdog): resolve merge conflicts and address review feedback#456
Conversation
|
Bosun CI signal: Bosun-created PR currently has failing checks.
|
There was a problem hiding this comment.
Pull request overview
This PR introduces a “delegation watchdog” concept intended to detect/recover from stalled delegated workflows (especially non-task delegations), and updates workflow templates/tests to support the new behavior and Bosun-created PR detection.
Changes:
- Added delegation watchdog defaults/config plumbing in
action.run_agent, task lifecycle variables, and GitHub templates. - Added engine-side watchdog-based decisioning during interrupted-run recovery (
resumeInterruptedRuns()). - Updated/added tests and mocks around templates and delegation recovery behavior.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| workflow/workflow-nodes/actions.mjs | Adds delegation watchdog config and retry loop to action.run_agent (but appears not wired into the main node registration path). |
| workflow/workflow-engine.mjs | Adds env-backed defaults + watchdog-based retry/exhaust decision during interrupted-run recovery. |
| workflow-templates/task-lifecycle.mjs | Adds watchdog variables + passes them into agentPhase(...) config. |
| workflow-templates/github.mjs | Expands “Bosun-created PR” detection to include body marker/phrase checks. |
| tests/workflow-templates.test.mjs | Loosens template-variable “is referenced” detection logic. |
| tests/workflow-templates-e2e.test.mjs | Extends child_process mock to include execFileSync. |
| tests/workflow-task-lifecycle.test.mjs | Adds an assertion test for newly installed watchdog defaults. |
| tests/workflow-run-history-ui-regression.test.mjs | Updates UI regression expectations for run-history labels/copy. |
| tests/workflow-engine.test.mjs | Adds engine watchdog recovery tests and action.run_agent stall recovery tests (currently contains a syntax-breaking stray block). |
Comments suppressed due to low confidence (1)
workflow/workflow-nodes/actions.mjs:322
workflow/workflow-nodes/actions.mjsdoes not appear to be imported/used anywhere in the repo (tests and runtime importworkflow/workflow-nodes.mjs). As a result, the new delegation watchdog behavior and schema here likely won't take effect, and the new watchdog tests will still run against the implementation inworkflow/workflow-nodes.mjs. Please mirror these changes intoworkflow/workflow-nodes.mjs(or wire this module into the registration path) so there's a single authoritativeaction.run_agentimplementation.
delegationWatchdogTimeoutMs: { type: "number", default: 300000, description: "Stall threshold for delegated non-task workflows in ms" },
delegationWatchdogMaxRecoveries: { type: "number", default: 1, description: "Maximum watchdog recovery retries for delegated workflows" },
},
required: ["prompt"],
},
async execute(node, ctx, engine) {
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| agentPhase("run-agent-plan", "Agent Plan", | ||
| "{{_taskPrompt}}\n\nExecution phase: planning. Produce a concrete implementation plan and identify required tests. Do not make code changes in this phase.", | ||
| {}, { x: 200, y: 1740 }), | ||
| { delegationWatchdogTimeoutMs: "{{delegationWatchdogTimeoutMs}}", delegationWatchdogMaxRecoveries: "{{delegationWatchdogMaxRecoveries}}" }, { x: 200, y: 1740 }), | ||
|
|
||
| // ── Execute agent (phase 2: tests-first) ──────────────────────────── | ||
| agentPhase("run-agent-tests", "Agent Tests", | ||
| "{{_taskPrompt}}\n\nExecution phase: tests. Write or update tests first for the target behavior, then validate failures/pass criteria before implementation changes.", | ||
| {}, { x: 200, y: 1545 }), | ||
| { delegationWatchdogTimeoutMs: "{{delegationWatchdogTimeoutMs}}", delegationWatchdogMaxRecoveries: "{{delegationWatchdogMaxRecoveries}}" }, { x: 200, y: 1545 }), | ||
|
|
||
| // ── Execute agent (phase 3: implementation + verification) ────────── | ||
| agentPhase("run-agent-implement", "Agent Implement", | ||
| "{{_taskPrompt}}\n\nExecution phase: implementation. Complete implementation after tests exist, run required verification (tests/lint/build), then commit, push, and create/update PR.", | ||
| {}, { x: 200, y: 1610 }), | ||
| { delegationWatchdogTimeoutMs: "{{delegationWatchdogTimeoutMs}}", delegationWatchdogMaxRecoveries: "{{delegationWatchdogMaxRecoveries}}" }, { x: 200, y: 1610 }), |
There was a problem hiding this comment.
Workflow template changes here will affect the generated demo defaults bundle. The repo has a sync test that expects ui/demo-defaults.js and site/ui/demo-defaults.js to match the output of node tools/generate-demo-defaults.mjs, so these files likely need to be regenerated and committed as part of this PR.
tests/workflow-templates.test.mjs
Outdated
| const used = strings.some((text) => | ||
| text.includes(`{{${key}}}`) || | ||
| text.includes(`$data?.${key}`) || | ||
| text.includes(`$data.${key}`), | ||
| text.includes(`$data.${key}`) || | ||
| text.includes(`?.${key}`) || | ||
| text.includes(`.${key}`), |
There was a problem hiding this comment.
Expanding the “variable is referenced” check to text.includes(?.${key}) / text.includes(.${key}) makes the test prone to false positives (e.g. a variable named id, sdk, timeoutMs will match many non-variable property accesses or JSON snippets). This weakens the guardrail and can let genuinely unused template variables slip through. Prefer a more targeted pattern (e.g. regex for {{key}} or $data access paths that end in the key) rather than any .${key} substring.
| const mockEngine = { | ||
| list: vi.fn().mockReturnValue([ | ||
| { | ||
| id: "wf-backend", | ||
| name: "Backend Agent", | ||
| enabled: true, |
There was a problem hiding this comment.
This block leaves a stray const mockEngine = { ... } outside of any it(...)/function scope, which will cause a syntax/runtime error and prevent the test suite from running. Remove the orphaned block or wrap it in the intended test case (and ensure braces/parentheses are balanced).
workflow/workflow-engine.mjs
Outdated
| Math.trunc(Number( | ||
| watchdog.maxRecoveries | ||
| ?? watchdog.delegationWatchdogMaxRecoveries | ||
| ?? detail?.data?.delegationWatchdogMaxRecoveries | ||
| ?? DEFAULT_DELEGATION_WATCHDOG_MAX_RECOVERIES, | ||
| ) || DEFAULT_DELEGATION_WATCHDOG_MAX_RECOVERIES), | ||
| ); | ||
| const recoveryAttempts = Math.max( | ||
| 0, | ||
| Math.trunc(Number(watchdog.recoveryAttempts) | ||
| || (watchdog.recoveryAttempted === true ? 1 : 0)), |
There was a problem hiding this comment.
buildDelegationWatchdogDecision() currently uses Number(...) || DEFAULT... when parsing maxRecoveries and recoveryAttempts. That means explicit 0 values are ignored (e.g. setting maxRecoveries: 0 will be treated as the default, not "disable retries"). Use nullish coalescing / explicit Number.isFinite checks so 0 is honored and only null/undefined/NaN fall back to defaults.
| Math.trunc(Number( | |
| watchdog.maxRecoveries | |
| ?? watchdog.delegationWatchdogMaxRecoveries | |
| ?? detail?.data?.delegationWatchdogMaxRecoveries | |
| ?? DEFAULT_DELEGATION_WATCHDOG_MAX_RECOVERIES, | |
| ) || DEFAULT_DELEGATION_WATCHDOG_MAX_RECOVERIES), | |
| ); | |
| const recoveryAttempts = Math.max( | |
| 0, | |
| Math.trunc(Number(watchdog.recoveryAttempts) | |
| || (watchdog.recoveryAttempted === true ? 1 : 0)), | |
| Math.trunc( | |
| (() => { | |
| const raw = watchdog.maxRecoveries | |
| ?? watchdog.delegationWatchdogMaxRecoveries | |
| ?? detail?.data?.delegationWatchdogMaxRecoveries | |
| ?? DEFAULT_DELEGATION_WATCHDOG_MAX_RECOVERIES; | |
| const parsed = Number(raw); | |
| return Number.isFinite(parsed) | |
| ? parsed | |
| : DEFAULT_DELEGATION_WATCHDOG_MAX_RECOVERIES; | |
| })(), | |
| ), | |
| ); | |
| const recoveryAttempts = Math.max( | |
| 0, | |
| Math.trunc( | |
| (() => { | |
| const parsed = Number(watchdog.recoveryAttempts); | |
| if (Number.isFinite(parsed)) return parsed; | |
| return watchdog.recoveryAttempted === true ? 1 : 0; | |
| })(), | |
| ), |
|
Bosun PR classification: Bosun-created.
|
Co-authored-by: bosun-ve[bot] <262908237+bosun-ve[bot]@users.noreply.github.com>
Co-authored-by: bosun-ve[bot] <262908237+bosun-ve[bot]@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@copilot This branch has conflicts that must be resolved tests/workflow-engine.test.mjs Resolve the above conflicts |
Co-authored-by: jaeko44 <9289791+jaeko44@users.noreply.github.com>
... Conflicts in all 4 files have been resolved in commit 39059d6. In addition to the conflict resolution, the following reviewer feedback was addressed: |
…nding selects Agent-Logs-Url: https://github.com/virtengine/bosun/sessions/64fcb827-1494-487f-b973-13675f742421 Co-authored-by: jaeko44 <9289791+jaeko44@users.noreply.github.com>
"Select source port"placeholder option to source port select inui/tabs/workflows.js"Select target port"placeholder option to target port select inui/tabs/workflows.jsworkflow-run-history-ui-regression.test.mjsandworkflow-heavy-runner-integration.test.mjs)📱 Kick off Copilot coding agent tasks wherever you are with GitHub Mobile, available on iOS and Android.