InvalidOperationException : Process must exit before requested information can be determined upon cancellation #133

borland · 2024-11-28T19:58:46Z

Failed Tests.ShellCommandFixtureStdInput.ShouldBeCancellable(behaviour: Async) [8 ms]
Error Message:
 System.InvalidOperationException : Process must exit before requested information can be determined.
Stack Trace:
   at System.Diagnostics.Process.EnsureState(State state)
 at System.Diagnostics.Process.get_ExitCode()
 at Octopus.Shellfish.ShellCommandExecutorHelpers.SafelyGetExitCode(Process process) in /opt/buildagent/work/c4d67eb417aa367f/source/Shellfish/ShellCommandExecutorHelpers.cs:line 21
 at Octopus.Shellfish.ShellCommand.ExecuteAsync(CancellationToken cancellationToken) in /opt/buildagent/work/c4d67eb417aa367f/source/Shellfish/ShellCommand.cs:line 243
 at Tests.ShellCommandFixtureStdInput.ShouldBeCancellable(SyncBehaviour behaviour) in /opt/buildagent/work/c4d67eb417aa367f/source/Tests/ShellCommandFixture.StdInput.cs:line 228

Analysis

The relevant code is (edited for brevity) doing this:

try {
    await exitedTask.ConfigureAwait(true);
}
catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested) {
    TryKillProcessAndChildrenRecursively(process);
}
// Exception thrown from SafelyGetExitCode
return new ShellCommandResult(SafelyGetExitCode(process));

What is most likely happening.

On linux, TryKillProcessAndChildrenRecursively is implemented by sending a SIGTERM to the process.
SIGTERM is a message asking the process to exit itself. It may not exit immediately, or even if it is trying to, it may not be fast enough before we try and get the exit code

Discussion

The old ShellExecutor API does not have this bug.
The old code goes like this:

In background, register cancellation
Wait for the process to exit (this cannot be cancelled)
When the cancellation token fires, the background registration kills the process
The wait does not return until after the process has actually exited.

Whereas the new code goes like this:

Wait for the process to exit (with cancellation)
Cancellation token fires, cancelling the wait
Proceed to trying to kill the process, then SafelyGetExitCode; we have not waited for the "kill" to actually kill.

Internally in OctopusDeploy we use a forked copy of the old code, called SilentProcessRunner. It also looks like it will have this bug, as the cancellation will cancel the wait operation before the process has actually exited.

Possible Solutions

We could simply handle the "not yet exited" code path and return -1 from SafelyGetExitCode
We could insert a second wait-for-exit on the cancellation path to ensure the process has actually exited before we proceed to SafelyGetExitCode
We could change the linux codepath to use SIGKILL rather than SIGTERM so it should be dead immediately. On the windows codepath we call Process.Kill (which is the Win32 TerminateProcess API) which is equivalent to SIGKILL
Or some combination of the above

My ideal solution:

Change windows to send a "soft shutdown" message to the process to match the linux SIGTERM.
We could post a WM_QUIT or GenerateConsoleCtrlEvent or as this person on StackOverflow found, writing the Ctrl+C escape character to the child process stdinput stream
On both windows/linux, after sending a soft shutdown message, install a short additional WaitForExit with a timer. 5 seconds seems to be a common value and seems fair.
If the process does not exit after this time, hard kill it with TerminateProcess / SIGKILL
Proceed to SafelyGetExitCode, and install the fallback code so that it returns -1 rather than throwing exceptions if the process is still running for some reason.

That's quite a lot of work though, in practice I'm thinking probably just handling the "not yet exited" code path and return -1 from SafelyGetExitCode

The text was updated successfully, but these errors were encountered:

adam-mccoy · 2024-11-29T09:13:44Z

If I'm understanding the proposed fix correctly, on cancellation the reported exit code will vary between success (0) if the process managed to exit on time, or -1 if it didn't. If anything, I think the result should consistently be -1 in this situation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InvalidOperationException : Process must exit before requested information can be determined upon cancellation #133

InvalidOperationException : Process must exit before requested information can be determined upon cancellation #133

borland commented Nov 28, 2024 •

edited

Loading

adam-mccoy commented Nov 29, 2024

InvalidOperationException : Process must exit before requested information can be determined upon cancellation #133

InvalidOperationException : Process must exit before requested information can be determined upon cancellation #133

Comments

borland commented Nov 28, 2024 • edited Loading

Analysis

Discussion

Possible Solutions

adam-mccoy commented Nov 29, 2024

borland commented Nov 28, 2024 •

edited

Loading