In this section, we will turn our attention to methods adoptiong asynchronous decision-making, which tend to be more efficient than their synchronous counterparts.
In synchronous successive halving (SH), decisions on whether to promote a trial
or not can be delayed for a long time. In our example, say we are lucky and
sample an excellent configuration early on, among the 243 initial ones. In order
to promote it to train for 81 epochs, we first need to train 243 trials for 1
epoch, then 81 for 3 epochs, 27 for 9 epochs, and 9 for 27 epochs. Our excellent
trial will always be among the top
In asynchronous successive halving (ASHA),
the aim is to promote promising configurations as early as possible. There are
two different variants of ASHA, and we will begin with the (arguably) simpler
one. Whenever a worker becomes available, a new configuration is sampled at
random, and a new trial starts training from scratch. Whenever a trial reaches a
rung level, a decision is made immediately on whether to stop training or
let it continue. This decision is made based on all data available at the rung
until now. If the trial is among the top
Different to synchronous SH, there are no fixed rung sizes. Instead, each rung grows over time. ASHA is free of synchronization points. Promising trials can be trained for many epochs without having to wait for delayed promotion decisions. While asynchronous decision-making can be much more efficient at running good configurations to the end, it runs the risk of making bad decisions based on too little data.
A launcher script for ASHA is given in
launch_method.py, passing method="ASHA-STOP"
.
With type="stopping"
, we select the early stopping variant of ASHA (more on
this below).
In fact, the algorithm originally proposed as ASHA
is slightly different to what has been detailed above. Instead of starting a
trial once and rely on early stopping, this promotion variant is of the
pause-and-resume type. Namely, whenever a trial reaches a rung, it is paused
there. Whenever a worker becomes available, all rungs are scanned top to bottom.
If a paused trial is found which lies in the top
A launcher script for ASHA (promotion type) is given in
launch_method.py, passing method="ASHA-STOP"
.
Here, type="promotion"
selects the promotion variant of ASHA.
If these two variants (stopping and promotion) are compared under ideal conditions, one sometimes does better than the other, and vice versa. However, they come with different requirements. The promotion variant pauses and resumes trials, therefore benefits from checkpointing being implemented for the training code. If this is not the case, the stopping variant may be more attractive.
On the other hand, the stopping variant requires the backend to frequently stop
workers and bringing them back in order to start a new trial. For some backends,
the turn-around time for this process may be slow, in which case the promotion
type can be more attractive. In this context, it is important to understand the
relevance of passing max_resource_attr
to the scheduler (and, in our case, also
to the BlackboxRepositoryBackend
). Recall the discussion
here. If the configuration space contains
an entry with the maximum resource, whose key is passed to the scheduler as
max_resource_attr
, the latter can modify this value when calling the backend
to start or resume a trial. For example, if a trial is resumed at {max_resource_attr: 9}
. This means that the training code knows
how long it has to run, it does not have to be stopped by the backend.
Finally, ASHA can also be extended to use multiple brackets. Namely, whenever
a new trial is started, its bracket (or, equivalently, its
A launcher script for asynchronous Hyperband (stopping type) is given in
launch_method.py, passing method="ASHA6-STOP"
.
brackets=6
sets the number of brackets. The default is 1, which corresponds
to asynchronous successive halving
As also noted in ASHA, the algorithm often
works best with a single bracket, so that brackets=1
is the default in Syne
Tune. However, we will see further below that model-based variants of ASHA with
multiple brackets can outperform the single-bracket version if the distribution
over
Finally, Syne Tune implements two variants of ASHA with brackets > 1
. In the
default variant, there is only a single system of rungs. For each new trial,
rung_system_per_bracket=True
to
HyperbandScheduler
. In this case, each bracket has its own rung system, and
trials started in one bracket only have to compete with others in the same
bracket.
In the next section, we dive into model-based extensions of synchronous Hyperband, where configurations are chosen based on how others have performed before, instead of drawing them at random.