feat(pty): implement lifecycle PTY management for task execution#1364
feat(pty): implement lifecycle PTY management for task execution#1364yashdev9274 wants to merge 1 commit intogeneralaction:mainfrom
Conversation
|
@yashdev9274 is attempting to deploy a commit to the General Action Team on Vercel. A member of the Team first needs to authorize it. |
Greptile SummaryThis PR replaces the raw While the overall direction is sound, there are three critical logic bugs introduced by the change:
Additionally, tests were significantly weakened: Confidence Score: 1/5
|
| Filename | Overview |
|---|---|
| src/main/services/TaskLifecycleService.ts | Integrates PTY-based process management for run/lifecycle phases, but introduces critical bugs: runTeardown checks the old runProcesses map instead of runPtys so it never waits for PTY runs to stop; lifecyclePtys is never cleaned up in clearTask/shutdown; and spawnWithFallback drops the 'error' event handler causing potential infinite hangs. |
| src/main/services/ptyManager.ts | Adds LifecyclePtyHandle interface and startLifecyclePty function; functionally reasonable but spawns the shell with -il (interactive + login) flags which source profile scripts and can contaminate lifecycle output and slow startup. |
| src/test/main/TaskLifecycleService.test.ts | Sets EMDASH_DISABLE_PTY=1 to force fallback paths in tests; several tests were significantly weakened or hollowed out (e.g., the setup error/exit test and clearTask in-flight test no longer validate their originally intended behavior). |
Sequence Diagram
sequenceDiagram
participant Caller
participant TaskLifecycleService
participant ptyManager
participant FallbackSpawn
Caller->>TaskLifecycleService: startRun(taskId, ...)
TaskLifecycleService->>ptyManager: startLifecyclePty({ id, command, cwd, env })
alt PTY available
ptyManager-->>TaskLifecycleService: LifecyclePtyHandle
TaskLifecycleService->>TaskLifecycleService: runPtys.set(taskId, handle)
else PTY unavailable (EMDASH_DISABLE_PTY=1 or error)
ptyManager--xTaskLifecycleService: throws Error
TaskLifecycleService->>FallbackSpawn: spawnWithFallback(id, script, cwd, env)
FallbackSpawn-->>TaskLifecycleService: LifecyclePtyHandle (wraps ChildProcess)
TaskLifecycleService->>TaskLifecycleService: runPtys.set(taskId, handle)
end
TaskLifecycleService-->>Caller: { ok: true }
Note over TaskLifecycleService: handle.onData → emitLifecycleEvent 'line'
Note over TaskLifecycleService: handle.onExit → update state, emitLifecycleEvent 'exit'
Caller->>TaskLifecycleService: stopRun(taskId)
TaskLifecycleService->>TaskLifecycleService: stopIntents.add(taskId)
TaskLifecycleService->>TaskLifecycleService: ptyHandle.kill()
TaskLifecycleService->>TaskLifecycleService: state.run = idle (EAGER - before actual exit)
TaskLifecycleService-->>Caller: { ok: true }
Caller->>TaskLifecycleService: runTeardown(taskId, ...)
TaskLifecycleService->>TaskLifecycleService: existingRun = runProcesses.get(taskId)
Note over TaskLifecycleService: ⚠️ BUG: always undefined when PTY is used
TaskLifecycleService->>TaskLifecycleService: runFinite(..., 'teardown') — no wait for PTY exit
Comments Outside Diff (3)
-
src/main/services/TaskLifecycleService.ts, line 520-538 (link)runTeardownnever waits for PTY-managed run to stoprunTeardownstill guards against an active run process by checkingthis.runProcesses.get(taskId)(line 520), but whenstartLifecyclePtysucceeds the run handle is stored inthis.runPtys, notrunProcesses. SoexistingRunis alwaysundefinedwhen a PTY-based run is active, and teardown immediately proceeds while the run process is still alive. This is a race condition that can corrupt state and cause conflicts between the concurrent run and teardown scripts.The fix requires also checking
runPtysand awaiting its exit before continuing. BecauseLifecyclePtyHandlehas noonce('exit', …)API, a small helper (e.g., wrappingonExitin aPromise) is needed:// Ensure PTY-managed run is stopped before teardown starts. const existingPty = this.runPtys.get(taskId); if (existingPty) { this.stopRun(taskId); await new Promise<void>((resolve) => { const timer = setTimeout(() => { log.warn('Timed out waiting for run PTY to exit before teardown', { taskId }); resolve(); }, 10_000); existingPty.onExit(() => { clearTimeout(timer); resolve(); }); }); } // (existing runProcesses block stays for the fallback path) const existingRun = this.runProcesses.get(taskId); if (existingRun) { … }
-
src/main/services/TaskLifecycleService.ts, line 559-601 (link)lifecyclePtysnot cleaned up inclearTaskorshutdownlifecyclePtysis populated for every setup/teardown phase invocation (this.lifecyclePtys.set(ptyId, ptyHandle)), and the entry is removed in theonExitcallback. However,clearTask()andshutdown()iterate overrunPtysandrunProcessesbut never touchlifecyclePtys.If a task is cleared (or the service shuts down) while a setup or teardown PTY is in-flight, those PTY processes will keep running in the background, leaking both OS-level PTY file descriptors and the
ptysmap entries inptyManager.ts. -
src/test/main/TaskLifecycleService.test.ts, line 265-278 (link)Test no longer exercises the intended behavior
The original test
'clearTask stops in-flight setup/teardown processes'verified that callingclearTaskwhile a setup process was in-flight would kill the running child and clean upfiniteProcesses. The new version only callsgetState(which creates empty state) and thenclearTask, and asserts that the state is deleted — which is trivially true and exercises no meaningful lifecycle behavior.The key regression is that the test no longer validates that in-flight PTY (or fallback process) resources are actually killed when
clearTaskis called mid-execution. ThelifecyclePtyscleanup gap noted elsewhere makes this especially risky to leave untested.
Last reviewed commit: 84cef58
| }); | ||
| } | ||
|
|
||
| private spawnWithFallback( | ||
| id: string, | ||
| script: string, | ||
| cwd: string, | ||
| env: NodeJS.ProcessEnv | ||
| ): LifecyclePtyHandle { | ||
| const child = spawn(script, { | ||
| cwd, | ||
| shell: true, | ||
| env, | ||
| detached: true, | ||
| }); | ||
| this.trackFiniteProcess(id, child); | ||
| const dataCallbacks: ((data: string) => void)[] = []; | ||
| const exitCallbacks: ((exitCode: number | null, signal: string | null) => void)[] = []; | ||
|
|
||
| const onData = (buf: Buffer) => { | ||
| const line = buf.toString(); | ||
| for (const cb of dataCallbacks) { | ||
| cb(line); | ||
| } | ||
| }; | ||
| child.stdout?.on('data', onData); | ||
| child.stderr?.on('data', onData); | ||
|
|
||
| child.on('exit', (code) => { | ||
| for (const cb of exitCallbacks) { | ||
| cb(code, null); | ||
| } | ||
| }); | ||
|
|
||
| return { | ||
| onData: (cb) => dataCallbacks.push(cb), | ||
| onExit: (cb) => exitCallbacks.push(cb), | ||
| kill: () => { | ||
| this.killProcessTree(child, 'SIGTERM'); | ||
| }, | ||
| }; | ||
| } | ||
|
|
||
| async runSetup( | ||
| taskId: string, | ||
| taskPath: string, |
There was a problem hiding this comment.
spawnWithFallback drops the 'error' event handler
The original spawn path attached a child.on('error', …) listener that called finish({ ok: false, … }) when the process failed to start (e.g., ENOENT, permission denied). spawnWithFallback omits this listener entirely.
If the shell process fails to spawn (or crashes at the OS level before writing to stdout/stderr), neither exit nor data will ever fire, and the finish callback inside runFinite will never be called. This leaves the task's phase status stuck at 'running' indefinitely and causes setupInflight / teardownInflight promises to hang forever.
| stopRun(taskId: string): LifecycleResult { | ||
| const ptyHandle = this.runPtys.get(taskId); | ||
| if (ptyHandle) { | ||
| this.stopIntents.add(taskId); | ||
| try { | ||
| ptyHandle.kill(); | ||
| const cur = this.ensureState(taskId); | ||
| cur.run = { | ||
| ...cur.run, | ||
| status: 'idle', | ||
| finishedAt: this.nowIso(), | ||
| pid: null, | ||
| error: null, | ||
| }; | ||
| return { ok: true }; | ||
| } catch (error) { | ||
| this.stopIntents.delete(taskId); | ||
| const message = error instanceof Error ? error.message : String(error); | ||
| log.warn('Failed to stop run PTY', { taskId, error: message }); | ||
| return { ok: false, error: message }; | ||
| } |
There was a problem hiding this comment.
Premature state update in stopRun before PTY actually exits
When ptyHandle.kill() is called (line 457), stopRun immediately updates cur.run.status to 'idle' and returns { ok: true } — before the PTY process has actually terminated. PTY processes can take a moment to die after SIGTERM is delivered (or may ignore it entirely).
Meanwhile, the onExit callback registered in startRun will fire later with wasStopped = true and overwrite status with 'idle' again (harmless, but inconsistent). More importantly, if kill() silently swallows an error and the process does not exit, the state remains 'idle' while the process is still running.
Consider updating state in onExit (where it already happens correctly via the wasStopped flag) rather than eagerly here, and removing the early state mutation from stopRun.
| const proc = pty.spawn(defaultShell, ['-ilc', command], { | ||
| name: 'xterm-256color', | ||
| cols: 120, | ||
| rows: 32, | ||
| cwd: cwd || os.homedir(), | ||
| env: useEnv, | ||
| }); |
There was a problem hiding this comment.
Interactive shell flags may interfere with script output
The PTY is spawned as pty.spawn(defaultShell, ['-ilc', command], …). The -i (interactive) and -l (login) flags cause shells like bash/zsh to source profile files (~/.bashrc, ~/.zshrc, /etc/profile, etc.). This can:
- Produce extra output (prompts, greeting messages,
echoin profile files) that contaminates the data delivered toonDatacallbacks and appears as lifecycle log lines. - Significantly slow startup on systems with heavyweight profile scripts.
- Alter environment variables (e.g.,
PATH) in unexpected ways that differ from theenvobject explicitly passed tostartLifecyclePty.
For non-interactive lifecycle scripts, -c alone (or --norc --noprofile combined with -c) is typically sufficient.
|
hey @arnestrickmann do review this ! |
- Added interface and function to manage PTY processes. - Integrated PTY handling into for improved task execution and lifecycle event management. - Updated tests to reflect changes in PTY management and ensure proper handling of lifecycle events.
|
hey @arnestrickmann any update on this PR |
Summary
Fixes
Fixes #1304
Type of change
Mandatory Tasks
Checklist
pnpm run format)pnpm run lint)