Skip to content

[harness][subagents] Timeout result can be overwritten by late completion #2553

@hetaoBackend

Description

@hetaoBackend

Problem

The subagent timeout handler marks a shared result holder as timed out and sets cancellation state, but the running execution path can later mutate the same result holder to completed or failed after the timeout has already been reported.

Impact

Subagent status can become nondeterministic around timeouts. A caller may observe a timeout while internal state or later callbacks report completion, making orchestration and telemetry inconsistent.

Suggested Fix

Treat timeout/completed/failed as terminal transitions guarded by a lock or atomic check. Once timeout is recorded, late completion should not overwrite the result holder. The running task should observe cancellation and exit without changing terminal state.

Tests

  • Force a subagent execution to exceed timeout but complete shortly after cancellation.
  • Verify the final recorded status remains timed out.
  • Verify late completion does not overwrite output or status.

References

  • backend/packages/harness/deerflow/subagents/executor.py:434
  • backend/packages/harness/deerflow/subagents/executor.py:576

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions