Skip to content

OpenCode/Trellis dispatch can still mis-detect child completion in long sessions #148

@1508324011

Description

@1508324011

Summary

PR #147 improves the OpenCode dispatch prompt by switching child phase examples from background TaskOutput polling to synchronous Task(..., run_in_background: false) calls.

However, after additional testing, this appears to be only a mitigation, not a root fix.

In long conversations, the dispatch agent can still mis-handle child completion, and the underlying TaskOutput / background_output behavior can still report completion or error before the child session has truly finished.

Why PR #147 is not sufficient by itself

The PR changes Trellis prompts/templates, but it does not change the underlying runtime semantics of background task completion.

We now have evidence that:

  1. In real usage, background_output / TaskOutput can disagree with session_info / session_read

    • surface background status may return early or report error
    • the underlying child session can still show continued progress
  2. This is especially visible in longer sessions

    • even if the dispatch prompt says "use synchronous child calls"
    • the agent may still drift in long conversations
  3. There is at least one environment-level signal that the runtime/wrapper can override the requested mode

    • we observed a system message equivalent to:
      • run_in_background=false was automatically converted to background mode for reliability monitoring
    • this suggests the effective execution semantics may be controlled below the Trellis prompt layer

Current understanding

Based on inspection so far:

  • Trellis seems to control prompts, context injection, and workflow guidance
  • Trellis does not appear to implement the actual TaskOutput / background_output completion state machine
  • Therefore, PR fix(opencode): make dispatch wait for child tasks #147 should be treated as a patch that reduces one bad dispatch pattern, not as a complete fix for early child-completion reporting

Likely root-cause area

The real fix likely needs to happen in one of these lower layers:

  1. the OpenCode background task runtime / completion state machine
  2. an agent wrapper that maps task completion before the child session reaches a terminal state
  3. possibly oh-my-openagent or another environment-level integration layer if it wraps subagent execution

What would be a real fix

A real fix would make background task completion depend on the child session reaching a terminal state, rather than only the wrapper task returning.

In other words, TaskOutput / background_output should not report completed/error while the associated child session is still actively progressing.

Suggested next step

  • keep PR fix(opencode): make dispatch wait for child tasks #147 as a mitigation / workflow improvement
  • investigate the lower-level runtime semantics for child-session completion
  • if needed, document that Trellis alone cannot guarantee correct completion detection until the runtime layer is fixed

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions