Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
if workers fail to post a task_update, or for other reasons, they start communicating with the server more. This includes: a 5x retry to update the task, a failure message, an upload pgn of the task, a new task, etc. If the reason for failing is actually the load on the server, that load suddenly increases significantly, leading to an unstable, run-away situation in which most workers fail, and of which the server can't recover. The attached patch tries to improve upon this, by increasing the retry time for task updates upon failure progressively. If a worker really failed, it starts with a 2min sleep before retrying. This patch was successfully tested over the past couple of days, and made the server auto-recover under fairly large load.
- Loading branch information