Contributing a New Scheduler: Extending Asynchronous Hyperband

Syne Tune provides powerful generic scheduler templates for popular methods like successive halving and Hyperband. These can be run with synchronous or asynchronous decision-making. The most important generic templates at the moment are:

FIFOScheduler: "Full evaluation" scheduler, baseclass for many others.
HyperbandScheduler: Asynchronous successive halving and Hyperband.
SynchronousHyperbandScheduler: Synchronous successive halving and Hyperband.

Chances are your idea for a new scheduler maps to one of these templates, in which case you can save a lot of time and headache by just extending the template, rather than re-implementing the wheel. Due to Syne Tune's modular design of schedulers and their components (e.g., searchers, decision rules), you may even get more than you bargained for.

In this section, we will walk through an example of how to furnish the asynchronous successive halving scheduler with a specific searcher.

HyperbandScheduler

Details about asynchronous successive halving and Hyperband are given in the Multi-fidelity HPO tutorial. This is a multi-fidelity scheduler, where trials report intermediate results (e.g., validation error at the end of each epoch of training). We can formalize this notion by the concept of resource $r = 1, 2, 3, \dots$ (e.g., $r$ is the number of epochs trained). A generic implementation of this method is provided in HyperbandScheduler. Let us have a look at its arguments not shared with the base class FIFOScheduler:

A mandatory argument is resource_attr, which is the name of a field in the result dictionary passed to scheduler.on_trial_report. This field contains the resource $r$ for which metric values have been reported. For example, if a trial reports validation error at the end of the 5-th epoch of training, result contains {resource_attr: 5}.
We already noted the arguments max_resource_attr and max_t in FIFOScheduler. They are used to determine the maximum resource $r_{max}$ (e.g., the total number of epochs a trial is to be trained, if not stopped before). As discussed in detail here, it is best practice reserving a field in the configuration space scheduler.config_space to contain $r_{max}$. If this is done, its name should be passed in max_resource_attr. Now, every configuration sent to the training script contains $r_{max}$, which should not be hardcoded in the script. Moreover, if max_resource_attr is used, a pause-and-resume scheduler (e.g., HyperbandScheduler with type="stopping") can modify this field in the configuration of a trial which is only to be run until a certain resource less than $r_{max}$. Nevertheless, if max_resource_attr is not used, then $r_{max}$ has to be passed explicitly via max_t (which is not needed if max_resource_attr is used).
reduction_factor, grace_period, brackets are important parameters detailed in the tutorial. If brackets>1, we run asynchronous Hyperband with this number of brackets, while for bracket=1 we run asynchronous successive halving (this is the default).
As detailed in the tutorial, type determines whether the method uses early stopping (type="stopping") or pause-and-resume scheduling (type="promotion"). Further choices of type activate specific algorithms such as RUSH, PASHA, or cost-sensitive successive halving.

Kernel Density Estimator Searcher

One of the most flexible ways of extending HyperbandScheduler is to provide it with a novel searcher. In order to understand how this is done, we will walk through MultiFidelityKernelDensityEstimator and KernelDensityEstimator. This searcher implements suggest as in BOHB, as also detailed in the tutorial. In a nutshell, the searcher splits all observations into two parts ("good" and "bad"), depending on metric values lying above or below a certain quantile, and fits kernel density estimators to these two subsets. It then makes decisions based on a particular ratio of these densities, which is approximating a variant of the expected improvement acquisition function.

We begin with the base class KernelDensityEstimator, which works together with FIFOScheduler, but already implements most of what is needed in the multi-fidelity context.

The code does quite some bookkeeping concerned with mapping configurations to feature vectors. If you want to do this from scratch for your searcher, we recommend to use HyperparameterRanges. However, KernelDensityEstimator was extracted from the original BOHB implementation.
Observation data is collected in self.X (feature vectors for configurations) and self.y (values for self._metric, negated if self.mode == "max"). In particular, the _update method simply appends new data to these members.
get_config fits KDEs to the "good" and "bad" parts of self.X, self.y. It then samples self.num_candidates configurations at random, evaluates the TPE acquisition function for each candidate, and returns the best one.

The class MultiFidelityKernelDensityEstimator inherits from KernelDensityEstimator:

On top of self.X and self.y, it also maintains resource values $r$ for each datapoint in self.resource_levels.
get_config remains the same, only its subroutine train_kde for training the "good" and "bad" density models is modified. The idea is to fit these to data from a single rung level, namely the largest level at which we have observed at least self.num_min_data_points points.
configure_scheduler restricts usage to HyperbandScheduler (asynchronous Hyperband) and SynchronousHyperbandScheduler (synchronous Hyperband). Also, self.resource_attr is obtained from the scheduler, so does not have to be passed.

While being functional and simple, the MultiFidelityKernelDensityEstimator does not showcase the full range of information exchanged between HyperbandScheduler and a searcher. In particular:

register_pending: BOHB does not take pending evaluations into account.
remove_case, evaluation_failed are not implemented.
get_state, clone_from_state are not implemented, so schedulers with this searcher are not properly serialized.

For a more complete and advanced example, the reader is invited to study GPMultiFidelitySearcher and GPFIFOSearcher This searcher takes pending evaluations into account (by way of fantasizing). Moreover, it can be configured with a Gaussian process model and an acquisition function, which is optimized in a gradient-based manner.

Moreover, as already noted here, HyperbandScheduler also allows to configure the decision rule for stop/continue or pause/resume as part of on_trial_report. Examples for this are found in StoppingRungSystem, PromotionRungSystem, RUSHStoppingRungSystem, PASHARungSystem, CostPromotionRungSystem.

In the next section, we show how extensions of synchronous successive halving and Hyperband can be implemented.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extend_async_hb.md

extend_async_hb.md

Contributing a New Scheduler: Extending Asynchronous Hyperband

HyperbandScheduler

Kernel Density Estimator Searcher

Files

extend_async_hb.md

Latest commit

History

extend_async_hb.md

File metadata and controls

Contributing a New Scheduler: Extending Asynchronous Hyperband

HyperbandScheduler

Kernel Density Estimator Searcher