-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error out if we fail to kill child test processes #80
Comments
@yarikoptic When I ran with anyio.fail_after(TIMEOUT):
await anyio.run_process( ... ) When it comes to just |
so this code might need to be modified here after analysis of what |
@yarikoptic I believe the relevant code in anyio is: (Shouldn't GitHub show the code in the comment here? It's not doing it for me.) Specifically, if a subprocess needs to be cancelled (e.g., due to an enclosing timeout), anyio kills the process and then waits un-cancellablely for it to exit, and that's where our processes were hanging. I believe that handling things any other way (without reimplementing anyio) would require changes to anyio itself. |
it used to do that I think for me but no longer does could we subclass Process class there with desired logic and use it instead? could you then prepare PR to anyio with the needed logic? |
@yarikoptic Subclassing I've filed a feature request with |
@yarikoptic The maintainer of |
I followed up there. Can we have another thread/whatever which would monitor if any of tests stall and do extra killing/erroring out? |
@yarikoptic Even if we could come up with a decent way to check for stalling, if we wanted the program to error out on a stall, the exception would still trigger the same process cleanup code I linked to above, and any cleanup currently stalled would just continue to be stalled. I believe the only way out would be for the healthstatus program to send |
@yarikoptic Ping; do you still want to do this somehow? |
@yarikoptic Ping. |
Well, the main problem ATM is that we simply do not have an idea that the stall has happened. If we detect and exit with some ERROR and cause with that some kind of an email to be sent to notify us, we would be good even if there is still some stuck process -- we would know that there is an issue and would come to mitigate it. Overall it might also be an issue of establishing some "invocation/progress monitoring" e.g. via https://healthchecks.io/ where we curl the ping point after one run completion and expect that to happen at least daily. WDYT? |
@yarikoptic I think just using healthchecks.io or similar for monitoring should be cleaner. |
Let's close whenever we add some such monitoring (thought to do now but failing to login... will try later) |
inspired by davfs2 stall and our test processes also stalling and not succumbing to
kill -9
. We should test (for up to a minute) that the process we kill upon time out dies off. If it doesn't and we did trykill -9
even -- error out entire process to bring attention to the matter.The text was updated successfully, but these errors were encountered: