Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What should run_process do if it's cancelled but it can't kill the child process? #1209

Closed
dj-foxxy opened this issue Sep 13, 2019 · 5 comments · Fixed by #1525
Closed

What should run_process do if it's cancelled but it can't kill the child process? #1209

dj-foxxy opened this issue Sep 13, 2019 · 5 comments · Fixed by #1525

Comments

@dj-foxxy
Copy link

The async context manager tries to SIGKILL the subprocess, which causes a PermissionError if that subprocess is sudo

import trio


async def main():
    with trio.move_on_after(1):
        await trio.run_process(['sleep', '2'])
    print('Cancelled 1')

    with trio.move_on_after(1):
        await trio.run_process(['sudo', 'sleep', '2'])
    print('Cancelled 2')


trio.run(main)
@dj-foxxy dj-foxxy changed the title Operation not permitted with sudo subprocess cancellation Operation not permitted with sudo subprocess cancelletion Sep 13, 2019
@dj-foxxy dj-foxxy changed the title Operation not permitted with sudo subprocess cancelletion Operation not permitted with sudo subprocess cancellation Sep 13, 2019
@njsmith njsmith changed the title Operation not permitted with sudo subprocess cancellation What should run_process do if it's cancelled but it can't kill the child process? Sep 14, 2019
@njsmith
Copy link
Member

njsmith commented Sep 14, 2019

So the problem here is that the sudo process runs as root, so a regular user doesn't have the necessary permissions to kill it. As far as I know, there's nothing we can do about the underlying problem – we just can't kill it. (As a user you can generally kill it by hitting control-C, but to take advantage of that we'd have to allocate a pseudo-tty and set that as the subprocess's controlling terminal... and sudo tries to read its password from its controlling terminal, so if we inserted our own controlling terminal it would break the password prompt.)

So I guess run_process/open_process need to cope with the fact that their kill might fail, and we should decide what they should do when this happens.

Options:

  • crash with a long traceback, like we do now
  • just wait for the process to exit normally (which might take an arbitrarily long time)
  • abandon the process, letting it continue running in the background

None of these are super appealing, but I think they're the only options we have? Anyone have any thoughts on why one would be better than another?

@dj-foxxy What would you expect to happen, given that what you'd actually expect (the process being killed) isn't possible?

@njsmith
Copy link
Member

njsmith commented Sep 14, 2019

In case anyone's curious, here's what the traceback looks like currently:

Traceback (most recent call last):
  File "/home/njs/.user-python3.7/lib/python3.7/site-packages/trio/_subprocess.py", line 489, in run_process
    await proc.wait()
  File "/home/njs/.user-python3.7/lib/python3.7/site-packages/trio/_core/_run.py", line 725, in __aexit__
    raise combined_error_from_nursery
trio.MultiError: Cancelled(), Cancelled()

Details of embedded exception 1:

  Traceback (most recent call last):
    File "/home/njs/.user-python3.7/lib/python3.7/site-packages/trio/_subprocess.py", line 489, in run_process
      await proc.wait()
    File "/home/njs/.user-python3.7/lib/python3.7/site-packages/trio/_subprocess.py", line 217, in wait
      await wait_child_exiting(self)
    File "/home/njs/.user-python3.7/lib/python3.7/site-packages/trio/_subprocess_platform/waitid.py", line 108, in wait_child_exiting
      await process._wait_for_exit_data.wait()
    File "/home/njs/.user-python3.7/lib/python3.7/site-packages/trio/_sync.py", line 83, in wait
      await self._lot.park()
    File "/home/njs/.user-python3.7/lib/python3.7/site-packages/trio/_core/_parking_lot.py", line 137, in park
      await _core.wait_task_rescheduled(abort_fn)
    File "/home/njs/.user-python3.7/lib/python3.7/site-packages/trio/_core/_traps.py", line 165, in wait_task_rescheduled
      return (await _async_yield(WaitTaskRescheduled(abort_func))).unwrap()
    File "/home/njs/.user-python3.7/lib/python3.7/site-packages/outcome/_sync.py", line 111, in unwrap
      raise captured_error
    File "/home/njs/.user-python3.7/lib/python3.7/site-packages/trio/_core/_run.py", line 1061, in raise_cancel
      raise Cancelled._create()
  trio.Cancelled: Cancelled

Details of embedded exception 2:

  Traceback (most recent call last):
    File "/home/njs/.user-python3.7/lib/python3.7/site-packages/trio/_subprocess.py", line 489, in run_process
      await proc.wait()
    File "/home/njs/.user-python3.7/lib/python3.7/site-packages/trio/_subprocess.py", line 217, in wait
      await wait_child_exiting(self)
    File "/home/njs/.user-python3.7/lib/python3.7/site-packages/trio/_subprocess_platform/waitid.py", line 108, in wait_child_exiting
      await process._wait_for_exit_data.wait()
    File "/home/njs/.user-python3.7/lib/python3.7/site-packages/trio/_sync.py", line 83, in wait
      await self._lot.park()
    File "/home/njs/.user-python3.7/lib/python3.7/site-packages/trio/_core/_parking_lot.py", line 137, in park
      await _core.wait_task_rescheduled(abort_fn)
    File "/home/njs/.user-python3.7/lib/python3.7/site-packages/trio/_core/_traps.py", line 165, in wait_task_rescheduled
      return (await _async_yield(WaitTaskRescheduled(abort_func))).unwrap()
    File "/home/njs/.user-python3.7/lib/python3.7/site-packages/outcome/_sync.py", line 111, in unwrap
      raise captured_error
    File "/home/njs/.user-python3.7/lib/python3.7/site-packages/trio/_core/_run.py", line 1061, in raise_cancel
      raise Cancelled._create()
  trio.Cancelled: Cancelled

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "/home/njs/.user-python3.7/lib/python3.7/site-packages/trio/_core/_run.py", line 845, in _nested_child_finished
      await checkpoint()
    File "/home/njs/.user-python3.7/lib/python3.7/site-packages/trio/_core/_run.py", line 2029, in checkpoint
      await _core.wait_task_rescheduled(lambda _: _core.Abort.SUCCEEDED)
    File "/home/njs/.user-python3.7/lib/python3.7/site-packages/trio/_core/_traps.py", line 165, in wait_task_rescheduled
      return (await _async_yield(WaitTaskRescheduled(abort_func))).unwrap()
    File "/home/njs/.user-python3.7/lib/python3.7/site-packages/outcome/_sync.py", line 111, in unwrap
      raise captured_error
    File "/home/njs/.user-python3.7/lib/python3.7/site-packages/trio/_core/_run.py", line 1061, in raise_cancel
      raise Cancelled._create()
  trio.Cancelled: Cancelled

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/njs/.user-python3.7/lib/python3.7/site-packages/trio/_subprocess.py", line 201, in aclose
    await self.wait()
  File "/home/njs/.user-python3.7/lib/python3.7/site-packages/trio/_subprocess.py", line 215, in wait
    async with self._wait_lock:
  File "/home/njs/.user-python3.7/lib/python3.7/site-packages/trio/_sync.py", line 100, in __aenter__
    await self.acquire()
  File "/home/njs/.user-python3.7/lib/python3.7/site-packages/trio/_sync.py", line 574, in acquire
    await trio.hazmat.checkpoint_if_cancelled()
  File "/home/njs/.user-python3.7/lib/python3.7/site-packages/trio/_core/_run.py", line 2052, in checkpoint_if_cancelled
    await _core.checkpoint()
  File "/home/njs/.user-python3.7/lib/python3.7/site-packages/trio/_core/_run.py", line 2029, in checkpoint
    await _core.wait_task_rescheduled(lambda _: _core.Abort.SUCCEEDED)
  File "/home/njs/.user-python3.7/lib/python3.7/site-packages/trio/_core/_traps.py", line 165, in wait_task_rescheduled
    return (await _async_yield(WaitTaskRescheduled(abort_func))).unwrap()
  File "/home/njs/.user-python3.7/lib/python3.7/site-packages/outcome/_sync.py", line 111, in unwrap
    raise captured_error
  File "/home/njs/.user-python3.7/lib/python3.7/site-packages/trio/_core/_run.py", line 1061, in raise_cancel
    raise Cancelled._create()
trio.Cancelled: Cancelled

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/fdsa.py", line 16, in <module>
    trio.run(main)
  File "/home/njs/.user-python3.7/lib/python3.7/site-packages/trio/_core/_run.py", line 1783, in run
    raise runner.main_task_outcome.error
  File "/tmp/fdsa.py", line 12, in main
    await trio.run_process(['sudo', 'sleep', '2'])
  File "/home/njs/.user-python3.7/lib/python3.7/site-packages/trio/_subprocess.py", line 489, in run_process
    await proc.wait()
  File "/home/njs/.user-python3.7/lib/python3.7/site-packages/trio/_abc.py", line 264, in __aexit__
    await self.aclose()
  File "/home/njs/.user-python3.7/lib/python3.7/site-packages/trio/_subprocess.py", line 204, in aclose
    self.kill()
  File "/home/njs/.user-python3.7/lib/python3.7/site-packages/trio/_subprocess.py", line 263, in kill
    self._proc.kill()
  File "/usr/lib/python3.7/subprocess.py", line 1756, in kill
    self.send_signal(signal.SIGKILL)
  File "/usr/lib/python3.7/subprocess.py", line 1746, in send_signal
    os.kill(self.pid, sig)
PermissionError: [Errno 1] Operation not permitted

@dj-foxxy
Copy link
Author

@njsmith In general, I think the current behaviour is the correct default: either cleanup happened or raise. However, the number of specific use cases is so broad that I think this needs to be a customisation point.

For example, in code that promoted this issue, I'd really like clean up to use another subprocess to call sudo kill -9 $PID. In other use case, I'd like to send SIGTERM, wait, then SIGKILL.

This might already been possible, I'm too new the project to know for sure.

Thanks.

@njsmith
Copy link
Member

njsmith commented May 16, 2020

So the policy proposed in #1525 is:

  • run_process always waits for the child to exit
  • If kill() fails (like in the case that started this thread), then we print a RuntimeWarning on the console
  • You can define a custom cancellation policy (e.g. sudo kill -9 {proc.pid}) using the deliver_cancel= argument

And if you'd prefer to abandon the child on cancellation, you can't do that with run_process, but you can always fall back on open_process instead.

@dj-foxxy
Copy link
Author

@njsmith Look like what I need, this is going to simplify so much code!

Had a look at the PR. Is it worth mentioning in the delivery_cancel= docs that the function is guaranteed to be shielded from further cancellation (at least I think it is from looking a the PR)?

Looking forward to removing all my hacks! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants