WDL Best Practices

All rules below should be followed by contributors to this repo. Contributors should also follow the rules outlined in style-guide.md. Pull Requests which do not conform to these specifications will be asked to change.

Rules

All WDL should be written in v1.1
See template/common-parameter-meta.txt for common description strings.
- If applicable, use the same parameter name, help string, and parameter ordering as the underlying tool called by the task
Check all assumptions made about workflow inputs before beginning long running executions
- Common examples of assumptions that should be checked: valid String choice, mutually exclusive parameters, missing optional file for selected parameters, filename extensions
- This can commonly be handled by a parse_input task (defined in the same file as the workflow in question)
  - When possible, avoid passing in entire files to the parse_input task. Coerce files to Booleans or Strings to avoid unnecessary disk space usage
All tasks with multiple commands (including any pipes (|)) should have set -euo pipefail before any other commands.
Tasks with string parameters for which a limited number of choices are valid, must be documented following the template in string_choices_task (see template/task-examples.wdl.template)
- they should also fail quickly with an informative error message if an invalid input is provided
  - In most cases, just passing the parameter to the underlying tool should produce a satisfactory error, but this must be checked for each tool
- While redundant, it is still best practice to validate these strings in the parse_input task of any workflow which calls the task
  - This ensures the workflow will fail as fast as possible to save users time and resources
All tasks must have configurable memory and disk space allocations
- see the various tasks in the template directory for possible ways to allocate resources
  - Contributors can mix and match the available templates, copy and pasting subsections as appropriate
  - It is allowed to have one resource allocated dynamically, and another allocated statically in the same task.
multi-core tasks should always follow the conventions laid out in the use_all_cores_task example (see template/task-examples.wdl.template)
- this is catering to cloud users, who may be allocated a machine with more cores than are specified by the ncpu parameter
- Note that future versions of WDL will likely cause a change to this convention.
  - We plan to deprecate the ncpu param in favor of accessing the runtime section directly (n_cores=~{task.runtime.cpu})
Tasks which assume a file and any accessory files (e.g. a BAM and a BAI) have specific extensions and/or are in the same directory should always follow the conventions laid out in the localize_files_task example (see template/task-examples.wdl.template)
- This is to accomodate as many backends as possible
output file names should always be determined with either the outfile_name parameter or the prefix parameter.
- outfile_name should be preferred if no downstream tasks/tools rely on the file name/extension
- tasks with multiple outputs should always use the prefix convention
After the input sorting rules in style-guide.md have been applied, follow the below rules for further sorting.
- "sample" files come before "reference" files
- If present, use_all_cores should be the last Boolean in its block
- the ncpu parameter comes before inputs that allocate memory, which come before inputs that allocate disk space
  - This block of 2-3 inputs should come after all other inputs.
Most tasks should have a default maxRetries of 1
- Certain tasks are prone to intermittent failure (often if an internet connection is involved) and can have a higher default maxRetries. This value should not exceed 3.
There are lower bounds for resource allocation
- Memory should be allocated a minimum of 4gb
- Disk size should be allocated a minimum of 10gb
- If the task is multi-cored, it should use at least 2 cpu by default
- These bounds were selected somewhat arbitrarily, but consistency is important for quickly identifying our light-weight tasks
- These bounds are subject to change pending a more empirical investigation
All tasks should have an output
- This may be a hardcoded "dummy" output such as String check = "passed"
- This ensures the task can be cached by runners. Tasks without outputs may be required to rerun on the same input due to a cache miss.
Use the as keyword sparingly; only in the case of increased readability or to avoid name collisions
- Prefer using as in the import block rather than at the task/workflow call level
- When using as to rename an invalid URI, attempt to make as few changes to the filename as possible (i.e. try not to abbreviate)
- To disambiguate a task or workflow file from it's contents, you can respectively add the _tasks or _wf suffix in the import section
Whenever possible, prefer a Docker image maintained by an external source (such as BioContainers) rather than creating your own image
When adding a Dockerfile to this repository, follow the below conventions
- The Dockerfile should be nested under the docker/ directory, a folder with a name for the image (in most cases the name of the primary tool), and finally a folder named after the version being built.
- Docker images should be versioned according to the following convention
  - Start with the version of whatever tool is named in the path to the Dockerfile
    - If no specific tool is named (e.g. the util image), default to SemVer. Ignore the next 3 bullet points.
  - Followed by a dash-zero (-0)
    - If the Docker image gets updated, without updating the base tool's version, increment the number after the dash (-) by one
    - If the Docker image gets updated, including updating the base tool's version, revert back to a dash-zero (-0)
general purpose tasks can use the util image maintained in this repo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

best-practices.md

best-practices.md

WDL Best Practices

Rules

Files

best-practices.md

Latest commit

History

best-practices.md

File metadata and controls

WDL Best Practices

Rules