Skip to content

Scaling Up Experiments

Robert L. Logan IV edited this page Jan 15, 2021 · 5 revisions

Launch Script & YAML Files

This codebase contains a couple utilities to facilitate running a large number of experiments.

  • scripts/launch.py: This script takes as input a yaml file containing all of the jobs you want to run and will manage distributing them accross the available GPUs to ensure that they are being fully utilized. It is intended to be used in place of the prior approach of bash scripts (which do not guarantee that work will be properly distributed across GPUs).
  • jobs/: The jobs folder contains jinja2 templates used to create the aforementioned yaml files. To render them you can use scripts/render_template.py.

Example usage:

# Render the jinja2 template
cat jobs/superglue_finetune.jinja2 | python scripts/render_template.py > jobs/superglue_finetune.yaml

# Launch the jobs
python scripts/launch.py --logdir results/superglue_finetune/ jobs/superglue_finetune.yaml

YAML file structure

Each section of the YAML file needs to have three fields:

  • out: A string containing the name for the experiment. The name will be used to create stdout and stderr logs in the logdir
  • script: The python script to run. Usually one of: autoprompt/contintinuous_trigger_classification.py, autoprompt/continuous_trigger_mlm.py or autoprompt/finetune.py.
  • args: A list of arguments to pass to the script. Sections are contained between --- and ....

Other Useful Knowledge

  • Tensorboard logs will be saved to the checkpoint directory.
  • The --tmp flag will remove the model weights after training. This will save a lot of disk space if you are launching a lot of training jobs.
  • The --quiet flag will mute tqdm. This makes stderr logs much cleaner.
  • Filenames and stderr logs can be processed into an easy format to paste into Google Sheets with the scripts/print_results.py script, e.g.:
    for f in results/superglue_finetune/*.stderr; do python scripts/prints_results.py $f; done