Skip to content

Timeouts should restart hung Python subprocess/worker #40

@bbopen

Description

@bbopen

Summary

JS-side timeouts only reject the pending call; the Python process/worker keeps running, which can wedge the bridge and cause cascading timeouts or late-response protocol issues.

Failure mode

  • Long-running call times out; subsequent calls continue to hit the same hung process.
  • Late responses can still be parsed and cause protocol errors once timeout tracking expires.
  • Mixed concurrency: a slow call can starve a fast call on the same subprocess, so the fast call times out even though it would complete quickly (serial bridge limitation).

Proposed fix

  • On timeout, terminate and respawn the Python process (NodeBridge) or worker (OptimizedNodeBridge).
  • Clear pending/timed-out IDs for the terminated process.
  • Consider per-request cancellation protocol if feasible.

Acceptance criteria

  • A call that sleeps beyond timeout triggers a process/worker restart.
  • Next call after timeout succeeds without protocol errors.
  • Integration test uses a slow Python function and asserts recovery.
  • Concurrent test: one slow call and one fast call; slow call times out, fast call still succeeds (after restart or via separate worker).

Metadata

Metadata

Assignees

Labels

area:runtime-nodeArea: Node runtime bridgebugSomething isn't workingpriority:p1Priority P1 (high)

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions