-
Notifications
You must be signed in to change notification settings - Fork 496
Batch acceleration
A typical app might have jobs that take average ~1 hour CPU time to complete. But the turnaround time on a particular host H may be higher because:
- H has a large work buffer, and other jobs must complete before this one starts
- H computes only sporadically
- H is slower than average
So turnaround time on H could be several days. So the 'max delay' setting for the app may need to be, say, 1 week.
If there's a large batch - say 1000 jobs - some of them will get sent to hosts that never complete them. After a week these jobs time out and we resend them to other hosts. But some of these hosts may never complete them, or complete them with large turnaround time.
As a result, the 'makespan' of the batch - the time from submission to 100% completion - may be several weeks.
We'd like to reduce batch makespan using scheduling techniques; we call this 'batch acceleration'.
Our goal is to reduce makespan with minimal complexity. We're not concerned with performance; current projects have a few thousand hosts, not millions.
The basic idea: mark certain hosts as 'low turnaround' (for particular app versions). Mark the last 10% or so of each batch as 'high priority'. Use low-turnaround hosts to run high-priority jobs.
This involves the following components:
- Scheduler:
- generate a 'job log' file, with a line for each reported or timed-out job;
- enforce the above scheduling rule.
- batch_stats.php (new): periodically scan the job log file, compute statistics, and mark hosts as low-turnaround.
- batch_accel.php: periodically scan in-progress batches, identifying those that need acceleration. Mark jobs as high-priority, and possibly create new instances.
Runs every hour or so.
Skip (and delete) job log entries older than, say, 2 weeks.
First pass:
- Group entries by batch.
- For each batch with at least 50% success jobs, compute mean TT of success jobs.
Second pass:
- For each job in such a batch, its 'TT ratio' is TT/mean if success, ~10 if not.
For a host H and app version AV, the 'recent average TT ratio' RATTR(H, AV) is the average of TT ratios of its jobs using AV (over all batches).
Clear the LTT flag in all HAVs.
Set the LTT if RATTR(H, AV) is in lowest 20th percent, and < 1
Write job log entries for completed jobs.
Job selection:
if a job is high priority
if we're using a LTT HAV
boost its score
else
lower its score, and don't send at all unless job has been in shmem > 20 min
Runs every hour or so.
For each in-progress batch B that's at least 90% complete:
For each uncompleted WU
mark WU as high priority
mark its unsent results as high priority
if no unsent results and in-prog results are older than average TT
in wu.target_nresults, schedule transition
Ideally, want both high and low prio jobs in shmem. That way we'll have work for both LTT and non-LTT hosts.
But this conflicts with e.g. dividing slots between apps.
So probably better to enumerate by priority, then create time.
If we use the above scheduling policy, we may sometimes fill shmem with high-prio jobs. If only non-LTT hosts arrive we won't send anything for 20 min.
In general: we want to have high-prio results only if we have a reasonable number of LTT hosts for that app.