You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- If both os.kill and container.restart() fail, set _healthy=False
and return distinct "cleanup failed" error instead of silently
returning a normal timeout result
- Add thread.join(timeout=2) after kill/restart to prevent daemon
thread leaks on repeated timeout failures
- Log warning if worker thread is still alive after post-kill join
- Add test cases for total cleanup failure and thread leak scenarios
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Dependency on `procps`/`pkill` being installed in the image
351
377
352
-
3.**Container restart as last resort** — If `os.kill` fails (e.g.,
378
+
3.**Post-kill thread join** — After kill/restart, a short
379
+
`thread.join(timeout=2)` gives the worker thread time to exit
380
+
cleanly. If it's still alive, a warning is logged. The thread is
381
+
a daemon, so it will not prevent process exit, but repeated
382
+
timeout failures without this join could accumulate leaked threads.
383
+
384
+
4.**Unhealthy state on total cleanup failure** — If both `os.kill`
385
+
and `container.restart()` fail, the executor sets `self._healthy
386
+
= False` and returns a distinct error message. Subsequent calls
387
+
should check `self._healthy` and raise early rather than queueing
388
+
work against a broken container. Reinitialization (stop + start)
389
+
is required to recover.
390
+
391
+
5.**Container restart as last resort** — If `os.kill` fails (e.g.,
353
392
insufficient permissions when Docker runs rootless), restart the
354
393
container. This is the most reliable fallback but destroys
355
394
in-container state.
@@ -1153,6 +1192,8 @@ is new and has no tests yet.
1153
1192
| Timeout tests | Scripts with `time.sleep()` to verify enforcement | Per-executor timeout tests |
1154
1193
| Timeout kill fallback | Verify `PermissionError` from `os.kill` triggers container restart | Mock `os.kill` to raise `PermissionError`, assert `container.restart()` called and `CodeExecutionResult.stderr` contains timeout message |
1155
1194
| Timeout kill success | Verify `os.kill(host_pid)` path when permitted | Mock `exec_inspect` to return PID, assert `os.kill` called with correct signal |
1195
+
| Timeout total failure | Verify both `os.kill` and `container.restart()` fail → unhealthy | Mock both to raise, assert `_healthy` is `False` and `stderr` contains "cleanup failed" |
1196
+
| Timeout thread leak | Verify post-kill `join(2)` is called and warning logged if thread lingers | Mock thread to stay alive after kill, assert warning logged |
0 commit comments