-
Notifications
You must be signed in to change notification settings - Fork 81
Scaling Up Experiments
Robert L. Logan IV edited this page Jan 15, 2021
·
5 revisions
This codebase contains a couple utilities to facilitate running a large number of experiments.
-
scripts/launch.py
: This script takes as input a yaml file containing all of the jobs you want to run and will manage distributing them accross the available GPUs to ensure that they are being fully utilized. It is intended to be used in place of the prior approach of bash scripts (which do not guarantee that work will be properly distributed across GPUs). -
jobs/
: The jobs folder contains jinja2 templates used to create the aforementioned yaml files. To render them you can usescripts/render_template.py
.
Example usage:
# Render the jinja2 template
cat jobs/superglue_finetune.jinja2 | python scripts/render_template.py > jobs/superglue_finetune.yaml
# Launch the jobs
python scripts/launch.py --logdir results/superglue_finetune/ jobs/superglue_finetune.yaml
Each section of the YAML file needs to have three fields:
-
out
: A string containing the name for the experiment. The name will be used to create stdout and stderr logs in the logdir -
script
: The python script to run. Usually one of:autoprompt/contintinuous_trigger_classification.py
,autoprompt/continuous_trigger_mlm.py
orautoprompt/finetune.py
. -
args
: A list of arguments to pass to the script. Sections are contained between---
and...
.
- Tensorboard logs will be saved to the checkpoint directory.
- The
--tmp
flag will remove the model weights after training. This will save a lot of disk space if you are launching a lot of training jobs. - The
--quiet
flag will mute tqdm. This makes stderr logs much cleaner. - Filenames and stderr logs can be processed into an easy format to paste into Google Sheets with the
scripts/print_results.py
script, e.g.:for f in results/superglue_finetune/*.stderr; do python scripts/prints_results.py $f; done