Skip to content

feat(pty): implement lifecycle PTY management for task execution#1364

Open
yashdev9274 wants to merge 1 commit intogeneralaction:mainfrom
yashdev9274:feat-yd-3
Open

feat(pty): implement lifecycle PTY management for task execution#1364
yashdev9274 wants to merge 1 commit intogeneralaction:mainfrom
yashdev9274:feat-yd-3

Conversation

@yashdev9274
Copy link
Contributor

Summary

  • Added interface and function to manage PTY processes.
  • Integrated PTY handling into for improved task execution and lifecycle event management.
  • Updated tests to reflect changes in PTY management and ensure proper handling of lifecycle events.

Fixes

Fixes #1304

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Chore (refactoring code, technical debt, workflow improvements)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactor (does not change functionality, e.g. code style improvements, linting)
  • This change requires a documentation update

Mandatory Tasks

  • I have self-reviewed the code
  • A decent size PR without self-review might be rejected

Checklist

  • I have read the contributing guide
  • My code follows the style guidelines of this project (pnpm run format)
  • I have commented my code, particularly in hard-to-understand areas
  • I have checked if my PR needs changes to the documentation
  • I have checked if my changes generate no new warnings (pnpm run lint)
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked if new and existing unit tests pass locally with my changes

@vercel
Copy link

vercel bot commented Mar 9, 2026

@yashdev9274 is attempting to deploy a commit to the General Action Team on Vercel.

A member of the Team first needs to authorize it.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 9, 2026

Greptile Summary

This PR replaces the raw child_process.spawn calls inside TaskLifecycleService with a PTY-backed execution model (startLifecyclePty in ptyManager.ts), falling back to a wrapped ChildProcess when PTY support is disabled or unavailable. The intent is to give lifecycle scripts a proper terminal environment (colours, interactive tools, etc.).

While the overall direction is sound, there are three critical logic bugs introduced by the change:

  • runTeardown ignores PTY-managed runs — it still checks this.runProcesses.get(taskId) (the old map) instead of this.runPtys. When a PTY run is active, teardown launches immediately without waiting for the run to stop, causing a race condition between concurrent run and teardown scripts.
  • lifecyclePtys is never killed on clearTask or shutdown — setup/teardown PTY handles are stored in lifecyclePtys but that map is not iterated or cleared in either cleanup path, leaking OS PTY file descriptors on task removal or service shutdown.
  • spawnWithFallback has no 'error' event handler — the original spawn paths caught OS-level spawn errors (e.g., ENOENT) and resolved the phase promise as failed. The fallback omits this, so a spawn error silently leaves the phase status as 'running' forever and the corresponding setupInflight/teardownInflight promise never settles.

Additionally, tests were significantly weakened: EMDASH_DISABLE_PTY=1 forces all tests through the fallback path (never exercising PTY code), and two previously meaningful tests ('keeps setup failed when child emits error and exit' and 'clearTask stops in-flight setup/teardown processes') were hollowed out to the point of no longer testing their original scenarios.

Confidence Score: 1/5

  • Not safe to merge — contains critical race conditions and resource leaks that were not present before this change.
  • Three logic-level bugs: runTeardown never waits for PTY-managed runs to finish (race condition), lifecyclePtys leaks PTY resources on task clear/shutdown, and the fallback spawn path can hang forever due to missing error handling. Tests were also significantly weakened, leaving the new PTY code path entirely untested.
  • Primary attention required on src/main/services/TaskLifecycleService.ts (teardown wait logic, lifecyclePtys cleanup, spawnWithFallback error handling). src/test/main/TaskLifecycleService.test.ts needs the hollowed-out tests restored.

Important Files Changed

Filename Overview
src/main/services/TaskLifecycleService.ts Integrates PTY-based process management for run/lifecycle phases, but introduces critical bugs: runTeardown checks the old runProcesses map instead of runPtys so it never waits for PTY runs to stop; lifecyclePtys is never cleaned up in clearTask/shutdown; and spawnWithFallback drops the 'error' event handler causing potential infinite hangs.
src/main/services/ptyManager.ts Adds LifecyclePtyHandle interface and startLifecyclePty function; functionally reasonable but spawns the shell with -il (interactive + login) flags which source profile scripts and can contaminate lifecycle output and slow startup.
src/test/main/TaskLifecycleService.test.ts Sets EMDASH_DISABLE_PTY=1 to force fallback paths in tests; several tests were significantly weakened or hollowed out (e.g., the setup error/exit test and clearTask in-flight test no longer validate their originally intended behavior).

Sequence Diagram

sequenceDiagram
    participant Caller
    participant TaskLifecycleService
    participant ptyManager
    participant FallbackSpawn

    Caller->>TaskLifecycleService: startRun(taskId, ...)
    TaskLifecycleService->>ptyManager: startLifecyclePty({ id, command, cwd, env })
    alt PTY available
        ptyManager-->>TaskLifecycleService: LifecyclePtyHandle
        TaskLifecycleService->>TaskLifecycleService: runPtys.set(taskId, handle)
    else PTY unavailable (EMDASH_DISABLE_PTY=1 or error)
        ptyManager--xTaskLifecycleService: throws Error
        TaskLifecycleService->>FallbackSpawn: spawnWithFallback(id, script, cwd, env)
        FallbackSpawn-->>TaskLifecycleService: LifecyclePtyHandle (wraps ChildProcess)
        TaskLifecycleService->>TaskLifecycleService: runPtys.set(taskId, handle)
    end
    TaskLifecycleService-->>Caller: { ok: true }

    Note over TaskLifecycleService: handle.onData → emitLifecycleEvent 'line'
    Note over TaskLifecycleService: handle.onExit → update state, emitLifecycleEvent 'exit'

    Caller->>TaskLifecycleService: stopRun(taskId)
    TaskLifecycleService->>TaskLifecycleService: stopIntents.add(taskId)
    TaskLifecycleService->>TaskLifecycleService: ptyHandle.kill()
    TaskLifecycleService->>TaskLifecycleService: state.run = idle (EAGER - before actual exit)
    TaskLifecycleService-->>Caller: { ok: true }

    Caller->>TaskLifecycleService: runTeardown(taskId, ...)
    TaskLifecycleService->>TaskLifecycleService: existingRun = runProcesses.get(taskId)
    Note over TaskLifecycleService: ⚠️ BUG: always undefined when PTY is used
    TaskLifecycleService->>TaskLifecycleService: runFinite(..., 'teardown') — no wait for PTY exit
Loading

Comments Outside Diff (3)

  1. src/main/services/TaskLifecycleService.ts, line 520-538 (link)

    runTeardown never waits for PTY-managed run to stop

    runTeardown still guards against an active run process by checking this.runProcesses.get(taskId) (line 520), but when startLifecyclePty succeeds the run handle is stored in this.runPtys, not runProcesses. So existingRun is always undefined when a PTY-based run is active, and teardown immediately proceeds while the run process is still alive. This is a race condition that can corrupt state and cause conflicts between the concurrent run and teardown scripts.

    The fix requires also checking runPtys and awaiting its exit before continuing. Because LifecyclePtyHandle has no once('exit', …) API, a small helper (e.g., wrapping onExit in a Promise) is needed:

    // Ensure PTY-managed run is stopped before teardown starts.
    const existingPty = this.runPtys.get(taskId);
    if (existingPty) {
      this.stopRun(taskId);
      await new Promise<void>((resolve) => {
        const timer = setTimeout(() => {
          log.warn('Timed out waiting for run PTY to exit before teardown', { taskId });
          resolve();
        }, 10_000);
        existingPty.onExit(() => {
          clearTimeout(timer);
          resolve();
        });
      });
    }
    
    // (existing runProcesses block stays for the fallback path)
    const existingRun = this.runProcesses.get(taskId);
    if (existingRun) {  }
  2. src/main/services/TaskLifecycleService.ts, line 559-601 (link)

    lifecyclePtys not cleaned up in clearTask or shutdown

    lifecyclePtys is populated for every setup/teardown phase invocation (this.lifecyclePtys.set(ptyId, ptyHandle)), and the entry is removed in the onExit callback. However, clearTask() and shutdown() iterate over runPtys and runProcesses but never touch lifecyclePtys.

    If a task is cleared (or the service shuts down) while a setup or teardown PTY is in-flight, those PTY processes will keep running in the background, leaking both OS-level PTY file descriptors and the ptys map entries in ptyManager.ts.

  3. src/test/main/TaskLifecycleService.test.ts, line 265-278 (link)

    Test no longer exercises the intended behavior

    The original test 'clearTask stops in-flight setup/teardown processes' verified that calling clearTask while a setup process was in-flight would kill the running child and clean up finiteProcesses. The new version only calls getState (which creates empty state) and then clearTask, and asserts that the state is deleted — which is trivially true and exercises no meaningful lifecycle behavior.

    The key regression is that the test no longer validates that in-flight PTY (or fallback process) resources are actually killed when clearTask is called mid-execution. The lifecyclePtys cleanup gap noted elsewhere makes this especially risky to leave untested.

Last reviewed commit: 84cef58

Comment on lines 281 to 326
});
}

private spawnWithFallback(
id: string,
script: string,
cwd: string,
env: NodeJS.ProcessEnv
): LifecyclePtyHandle {
const child = spawn(script, {
cwd,
shell: true,
env,
detached: true,
});
this.trackFiniteProcess(id, child);
const dataCallbacks: ((data: string) => void)[] = [];
const exitCallbacks: ((exitCode: number | null, signal: string | null) => void)[] = [];

const onData = (buf: Buffer) => {
const line = buf.toString();
for (const cb of dataCallbacks) {
cb(line);
}
};
child.stdout?.on('data', onData);
child.stderr?.on('data', onData);

child.on('exit', (code) => {
for (const cb of exitCallbacks) {
cb(code, null);
}
});

return {
onData: (cb) => dataCallbacks.push(cb),
onExit: (cb) => exitCallbacks.push(cb),
kill: () => {
this.killProcessTree(child, 'SIGTERM');
},
};
}

async runSetup(
taskId: string,
taskPath: string,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spawnWithFallback drops the 'error' event handler

The original spawn path attached a child.on('error', …) listener that called finish({ ok: false, … }) when the process failed to start (e.g., ENOENT, permission denied). spawnWithFallback omits this listener entirely.

If the shell process fails to spawn (or crashes at the OS level before writing to stdout/stderr), neither exit nor data will ever fire, and the finish callback inside runFinite will never be called. This leaves the task's phase status stuck at 'running' indefinitely and causes setupInflight / teardownInflight promises to hang forever.

Comment on lines 452 to +472
stopRun(taskId: string): LifecycleResult {
const ptyHandle = this.runPtys.get(taskId);
if (ptyHandle) {
this.stopIntents.add(taskId);
try {
ptyHandle.kill();
const cur = this.ensureState(taskId);
cur.run = {
...cur.run,
status: 'idle',
finishedAt: this.nowIso(),
pid: null,
error: null,
};
return { ok: true };
} catch (error) {
this.stopIntents.delete(taskId);
const message = error instanceof Error ? error.message : String(error);
log.warn('Failed to stop run PTY', { taskId, error: message });
return { ok: false, error: message };
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Premature state update in stopRun before PTY actually exits

When ptyHandle.kill() is called (line 457), stopRun immediately updates cur.run.status to 'idle' and returns { ok: true } — before the PTY process has actually terminated. PTY processes can take a moment to die after SIGTERM is delivered (or may ignore it entirely).

Meanwhile, the onExit callback registered in startRun will fire later with wasStopped = true and overwrite status with 'idle' again (harmless, but inconsistent). More importantly, if kill() silently swallows an error and the process does not exit, the state remains 'idle' while the process is still running.

Consider updating state in onExit (where it already happens correctly via the wasStopped flag) rather than eagerly here, and removing the early state mutation from stopRun.

Comment on lines +1635 to +1641
const proc = pty.spawn(defaultShell, ['-ilc', command], {
name: 'xterm-256color',
cols: 120,
rows: 32,
cwd: cwd || os.homedir(),
env: useEnv,
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interactive shell flags may interfere with script output

The PTY is spawned as pty.spawn(defaultShell, ['-ilc', command], …). The -i (interactive) and -l (login) flags cause shells like bash/zsh to source profile files (~/.bashrc, ~/.zshrc, /etc/profile, etc.). This can:

  1. Produce extra output (prompts, greeting messages, echo in profile files) that contaminates the data delivered to onData callbacks and appears as lifecycle log lines.
  2. Significantly slow startup on systems with heavyweight profile scripts.
  3. Alter environment variables (e.g., PATH) in unexpected ways that differ from the env object explicitly passed to startLifecyclePty.

For non-interactive lifecycle scripts, -c alone (or --norc --noprofile combined with -c) is typically sufficient.

@yashdev9274
Copy link
Contributor Author

hey @arnestrickmann do review this !

- Added  interface and  function to manage PTY processes.
- Integrated PTY handling into  for improved task execution and lifecycle event management.
- Updated tests to reflect changes in PTY management and ensure proper handling of lifecycle events.
@yashdev9274
Copy link
Contributor Author

hey @arnestrickmann any update on this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Lifecycle scripts use child_process.spawn instead of PTY, breaking interactive tools

1 participant