Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signal handling omni issue #2473

Open
casey opened this issue Nov 19, 2024 · 8 comments
Open

Signal handling omni issue #2473

casey opened this issue Nov 19, 2024 · 8 comments
Labels

Comments

@casey
Copy link
Owner

casey commented Nov 19, 2024

There are a bunch of issues related to how just handles signals. I've been trying to figure out what they are, so I created this issue to collect all the information in one place, and summarize what's going on.

Way back in 2018, in #302, @bheisler opened an issue that just was not waiting for child processes to exit on CTRL-C (which sends SIGINT). This would cause them to be orphaned, and continue writing output to the terminal, and have to be cleaned up manually.

In response to #302, I merged #345, which made just ignore SIGINT, SIGHUP, and SIGTERM while running commands. SIGINT is CTRL-C, SIGHUP is sent when you close the terminal ("hup" is for "hangup"), and SIGTERM is the signal which the kill command sends by default.

This gave just the behavior it has now. When running a long-running command, for example sleep 60, if you hit CTRL-C, just will ignore the generated SIGINT signal, but, depending on your terminal or setup, hitting CTRL-C sends SIGINT to all processes in the foreground process group, so sleep 60 will receive SIGINT which will terminate it, just will see it was terminated with a signal, and just will return with an error reporting that it terminated with a signal.

There appear to be a number of issues with this behavior, reported in:

  • Running Firestore Emulator From Just Prevents Control C #1558 by @LLBlumire. Here, just is used to run gcloud emulators firestore start. When gcloud emulators firestore start is run on its own, CTRL-C stops it, but when run under just, CTRL-C does nothing.

    I'm not entirely sure what's going on here. When you do sleep 60 under just, CTRL-C gets sent to just, which ignores it, but also gets sent to sleep 60, which terminates it.

    What's going on in this case? Why isn't CTRL-C getting sent to the gcloud command? When CTRL-C is hit, it is the system (some combination of the kernel, shell, terminal) which arranges to send SIGINT to the relevant processes, so I'm not sure what just could be doing to interfere with this.

  • Trap signals? #1560 by @OJFord. I'm not sure what's going on here, because the signals mentioned are SIGUSR1, and SIGINFO (generated by CTRL-T), which I don't think just touches at all. However, @stefanchrobot mentions that his app terminates when it receives SIGTERM, but nothing happens if the same command is run under just.

  • SIGHUP seems to not be propagated to inner child processes #1781 by @davoclavo. Here, just is not responding to SIGHUP. This is expected, because just ignores these signals while running child processes. This isn't necessarily bad, because in the normal case, SIGHUP is sent to all members of the process group, which should terminate the child and cause it to exit.

  • Improve signal/interrupt handling #1803 PR by @davoclavo. Reworks the signal handler to actively poll for new signals, and terminate the child when it receives one. Never got merged, I think due it being hard to implement/test on Windows.

I'll also summarize what I know about signals, which may be wildly off.

The relevant signals are:

  • SIGHUP Sent when the terminal is closed.
  • SIGINT Sent on CTRL-C.
  • SIGTERM Sent by default by the kill command.

Currently, just running sleep 60, will close when you hit CTRL-C or close the terminal. This is, I believe, because CTRL-C and closing the terminal not only send the signal to the foreground process, but also to members of the foreground process group. So just ignores the SIGHUP or SIGINT signal, but sleep receives it and terminates, so just, which is waiting for it, sees that it has stopped and terminates it as well.

If you generate SIGHUP, SIGINT, or SIGTERM manually, with kill -s HUP/INT/TERM and send it to just, nothing will happen, because only just is receiving the signal, not the child sleep process.

I think the fact that in some cases, a signal is sent to a single process, and in some cases, all members of a process group, causes confusion, since it means that a SIGINT generated by a CTRL-C will have a different effect from one generated by the kill command, even though it's the same signal.

As far as I know, a parent process is not responsible for "forwarding" signals to child processes, it's the terminal / system which does this, so I don't think just is interfering with that. (The only possible exception to this is the process signal mask, which indicates some signals that should be ignored, and which is inherited by child processes. However, just does not set the signal mask, so this should not be an issue.)

Also, I don't know how this varies between different setups, because there are a lot of variables.

  • Do different terminals handle this differently?
  • What even happens on Windows? Can you CTRL-C just at all on Windows?
  • What about shell background jobs? I.e., just &?
  • What about tmux, zellij, dtach, etc?

Here are my current questions:

  • When is ignoring SIGINT and SIGHUP, just's current behavior, desirable? If SIGINT and SIGHUP are sent with CTRL-C or closing the terminal, they should be sent to all children, and thus everything should get cleaned up. I think it would only be in the case of child processes which don't handle these signals that anything doesn't get cleaned up, so a reasonable argument might be that just shouldn't try to handle this, and it's the fault of the child processes that they aren't responding to these signals.

  • When is ignoring SIGTERM, just's current behavior, desirable? This is a bit different from SIGINT and SIGHUP, since SIGTERM, when sent with kill $JUST_PID, is not sent to child processes, so if just exited, it would leave behind child processes.

  • How can I reproduce examples of just processes not terminating when they should? An example from SIGHUP seems to not be propagated to inner child processes #1781 is with zellij, but I wasn't able to reproduce this. If I create a zellij session, open two tabs, run just with a recipe that calls just sleep, and then close the tab, just sleep is not left running in the background.

  • just ignores SIGINT and SIGHUP. As far as I can tell, this is fine, since SIGINT and SIGHUP should be sent to child processes on CTRL-C and terminal close, so just can just ignore them and wait for children to exit. Are there cases when this is problematic?

  • just ignores SIGTERM, which is sent by kill. As far as I know, this signal only gets sent to just, so it may be surprising that it doesn't stop just. Should we change just to forward SIGTERM to child processes, so that kill $JUST_PID will cause just to exit? (Assuming that child processes respond to SIGTERM by exiting.)

My current thoughts are:

  • I think that just should probably not ignore SIGINT and SIGHUP. These signals are generated by CTRL-C and closing the terminal, respectively, and should be sent to children of just as well. So, unless something is wrong, just and all children should exit, which is what the user wants.

  • I think that just should probably either ignore SIGTERM, or forward it to any running child processes. If the user runs kill $JUST_PID, this only sends SIGTERM to just, so if it exits, it will leave child processes behind.

  • I think the original reason for ignoring SIGINT was not necessarily a good one. In that particular use-case, just was being used to run a test runner which would interpret CTRL-C as a failed test, but continue with further tests. So if just ignored CTRL-C, you could hold it down and it would eventually run out of tests to run and exit. However, most programs exit immediately on CTRL-C, so I don't think it makes sense to optimize for the case that CTRL-C doesn't terminate child processes.

(Sorry for tagging a bunch of people! I wanted to make sure people were aware of the new issue. Feel free to unsubscribe!)

@casey
Copy link
Owner Author

casey commented Nov 20, 2024

Here's my tentative plan so far. The interesting signals are SIGINT, SIGTERM, SIGHUP, and SIGQUIT.

When not running a child process, just should exit when it receives SIGINT, SIGTERM, SIGHUP, or SIGQUIT. This is the default behavior with no signal handlers or signal mask installed. The only reason to not exit when receiving a fatal signal is to avoid leaving behind a child process, so when there are no children, we can just exit.

When a child process is running:

  • SIGINT: When receiving SIGINT, we should assume that it's because the user has hit CTRL-C On CTRL-C, SIGINT is sent to all processes in the foreground process group. just should ignore SIGINT, since it can assume that SIGINT was sent to child processes, so it can wait for them to finish.

  • SIGHUP: When receiving SIGHUP, we should assume that it's because the user has closed their terminal. When the user closes their terminal, SIGHUP is sent to all processes in the foreground process group. Again, just should ignore SIGHUP, since it can assume SIGHUP was sent to child processes, so it can wait for them to finish.

  • SIGTERM: When receiving SIGTERM, we should assume it's from kill $JUST_PID. In this case, SIGTERM is not sent to child processes, so just should send SIGTERM to all child processes, and wait for them to exit.

  • SIGQUIT: SIGQUIT is like a more "violent" version of SIGINT, and is sent to the entire foreground process group. I think we should probably handle it like SIGINT. We could also consider quitting without waiting for children to finish, so if you want a way to kill just with 100% certainty, you have one.

In all of the above cases, just should record that a fatal signal was received, and when the child returns, even if it returns successfully, just stops execution and returns an error. I.e., it does not continue to the next recipe/line/command whatever.

Obviously, all this should be documented thoroughly.

Where does this go wrong?

  • I believe the only case that this goes wrong is in the case of running just remotely via SSH. In some cases (I think when there's no interactive shell, but there is a tty) just will receive SIGHUP, but SIGHUP is not also sent to child processes. So, just will ignore it, but child processes will not exit. It's possible that we can manually create a process group or something, which will cause SIGHUP to be sent to children, but I'm quite shaky on the mechanics.

How is this different from current behavior?

  • SIGTERM is forwarded to children, so kill $JUST_PID will work.
  • After receiving a fatal signal, just will not continue to the next command and will exit, even if that command exited successfully.

PS For fun, I think we should also print out something useful if just receives SIGINFO, like which recipe is running, which command in that recipe is running, etc. So you can hit CTRL-T while just is running and have it tell you what it's doing.

@OJFord
Copy link

OJFord commented Nov 21, 2024

#1560 by @OJFord. I'm not sure what's going on here, because the signals mentioned are SIGUSR1, and SIGINFO (generated by CTRL-T), which I don't think just touches at all.

I think that's the problem really, just receives them & does nothing with them; for whatever I was doing at the time, I wanted to be able to trap them in my child process of just.

@casey
Copy link
Owner Author

casey commented Nov 21, 2024

@OJFord Using this justfile:

foo:
    #!/usr/bin/env bash

    function sigusr1() {
      echo sigusr
    }

    function siginfo() {
      echo siginfo
    }

    trap sigusr1 SIGUSR1
    trap siginfo SIGINFO

    while true; do
      sleep 1
    done

I was able to invoke the handler functions with CTRL-T and kill -USR1 $BASH_PID. I think maybe the issue you were running into was that the script was running gdlv, and while a bash script is executing a process, trap functions won't run. (Notice how in the above recipe, sleep will terminate and be rerun every 1 second, giving trap functions a chance to run.)

@casey casey added the coveted label Dec 21, 2024
@betaboon
Copy link

betaboon commented Dec 23, 2024

I've just stumbled upon this issue while trying to figure out how just handles signals. specifically SIGINT.

i have a justfile like this:

tmpdir := `mktemp -d`

build: && _delete_tmpdir
  sleep 60

_delete_tmpdir:
  rm -rf {{tmpdir}}

so my intention here is to always delete the tmp-dir after the build is done.
hitting ctrl+c while the build is running (here the sleep) will prevent that from happening.

i tried -sleep 30, kinda thinking "hah, ignoring the 'error' might allow me to continue on and delete the tmp directory.

but no dice.

tbh I'm not fully sure what I'm asking here, either:

  • give me a way to control what happens with signals (specifically sigint)
  • give me a way to create a temp-directory that is guaranteed to get deleted after the run

based on the above comment, the following does work tho:

tmpdir := `mktemp -d`

build:
  #!/usr/bin/env bash
  function delete_tmpdir() {
    echo "deleting {{tmpdir}}"
    rm -rf {{tmpdir}}
  }

  trap delete_tmpdir SIGINT

  sleep 30
  delete_tmpdir

I don't know if this is the right issue to write this in, if not please tell me :)

@casey
Copy link
Owner Author

casey commented Dec 23, 2024

This is definitely the right issue!

- currently only suppresses errors when the child process returns a nonzero exit status, but I think it should also suppress errors when the child process exits because of a signal, which would make -sleep 30 do what you want.

@betaboon
Copy link

even tho that might solve my "usecase" tbh i don't know if that would be obvious/non-surprising behavior

@casey
Copy link
Owner Author

casey commented Dec 23, 2024

I think it can be argued both ways. It is maybe a little surprising that an infallible line isn't infallible if it's terminated by a signal.

@betaboon
Copy link

betaboon commented Dec 23, 2024

true.

i guess the question is really what the user thinks they are "stopping" when hitting ctrl+c: the current "step" or the whole recipe.

the approach that you described above regarding SIGINT seems reasonable for this tho.
interrupting the current step would in most cases lead to non-0 exit-code which would stop consecutive steps unless suppressed with -

it might also be worth thinking about something like this:
- ignore non-zero exit code excluding sigint
! (or something like that) - ignore non-zero exit code including sigint

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants