You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#2691 added code to detect qstat failures by searching for "Connection refused" in stderr. However this is not working on our new system which is resulting in jobs being incorrectly reported as failed when polled.
Information at the time indicated we could expect to see errors like this if qstat failed to contact the server:
Connection refused
qstat: cannot connect to server xxxxxx (errno=111)
However, we now seeing errors like this from PBS 2022.1.7:
Connection timed out
qstat: cannot connect to server xxxxxx (errno=xxxxx)
For the moment I think we would be safe to change the search string to "cannot connect" (or possibly "errno").
Longer term we should consider other ways to make the polling more robust, see #3436
The text was updated successfully, but these errors were encountered:
#2691 added code to detect qstat failures by searching for "Connection refused" in stderr. However this is not working on our new system which is resulting in jobs being incorrectly reported as failed when polled.
Information at the time indicated we could expect to see errors like this if qstat failed to contact the server:
However, we now seeing errors like this from PBS 2022.1.7:
For the moment I think we would be safe to change the search string to "cannot connect" (or possibly "errno").
Longer term we should consider other ways to make the polling more robust, see #3436
The text was updated successfully, but these errors were encountered: