It would be good to build upon the code in PR #38 , to enable rerunning only the subset of array tasks that failed or were killed. This could be generalized to work in a wider variety of cases, not only for OOM-killed tasks but also for tasks that ran out of time, etc.