Skip to content

Latest commit

 

History

History
62 lines (59 loc) · 6.01 KB

best-practices.md

File metadata and controls

62 lines (59 loc) · 6.01 KB

WDL Best Practices

All rules below should be followed by contributors to this repo. Contributors should also follow the rules outlined in style-guide.md. Pull Requests which do not conform to these specifications will be asked to change.

Rules

  • All WDL should be written in v1.1
  • See template/common-parameter-meta.txt for common description strings.
    • If applicable, use the same parameter name, help string, and parameter ordering as the underlying tool called by the task
  • Check all assumptions made about workflow inputs before beginning long running executions
    • Common examples of assumptions that should be checked: valid String choice, mutually exclusive parameters, missing optional file for selected parameters, filename extensions
    • This can commonly be handled by a parse_input task (defined in the same file as the workflow in question)
      • When possible, avoid passing in entire files to the parse_input task. Coerce files to Booleans or Strings to avoid unnecessary disk space usage
  • All tasks with multiple commands (including any pipes (|)) should have set -euo pipefail before any other commands.
  • Tasks with string parameters for which a limited number of choices are valid, must be documented following the template in string_choices_task (see template/task-examples.wdl.template)
    • they should also fail quickly with an informative error message if an invalid input is provided
      • In most cases, just passing the parameter to the underlying tool should produce a satisfactory error, but this must be checked for each tool
    • While redundant, it is still best practice to validate these strings in the parse_input task of any workflow which calls the task
      • This ensures the workflow will fail as fast as possible to save users time and resources
  • All tasks must have configurable memory and disk space allocations
    • see the various tasks in the template directory for possible ways to allocate resources
      • Contributors can mix and match the available templates, copy and pasting subsections as appropriate
      • It is allowed to have one resource allocated dynamically, and another allocated statically in the same task.
  • multi-core tasks should always follow the conventions laid out in the use_all_cores_task example (see template/task-examples.wdl.template)
    • this is catering to cloud users, who may be allocated a machine with more cores than are specified by the ncpu parameter
    • Note that future versions of WDL will likely cause a change to this convention.
      • We plan to deprecate the ncpu param in favor of accessing the runtime section directly (n_cores=~{task.runtime.cpu})
  • Tasks which assume a file and any accessory files (e.g. a BAM and a BAI) have specific extensions and/or are in the same directory should always follow the conventions laid out in the localize_files_task example (see template/task-examples.wdl.template)
    • This is to accomodate as many backends as possible
  • output file names should always be determined with either the outfile_name parameter or the prefix parameter.
    • outfile_name should be preferred if no downstream tasks/tools rely on the file name/extension
    • tasks with multiple outputs should always use the prefix convention
  • After the input sorting rules in style-guide.md have been applied, follow the below rules for further sorting.
    • "sample" files come before "reference" files
    • If present, use_all_cores should be the last Boolean in its block
    • the ncpu parameter comes before inputs that allocate memory, which come before inputs that allocate disk space
      • This block of 2-3 inputs should come after all other inputs.
  • Most tasks should have a default maxRetries of 1
    • Certain tasks are prone to intermittent failure (often if an internet connection is involved) and can have a higher default maxRetries. This value should not exceed 3.
  • There are lower bounds for resource allocation
    • Memory should be allocated a minimum of 4gb
    • Disk size should be allocated a minimum of 10gb
    • If the task is multi-cored, it should use at least 2 cpu by default
    • These bounds were selected somewhat arbitrarily, but consistency is important for quickly identifying our light-weight tasks
    • These bounds are subject to change pending a more empirical investigation
  • All tasks should have an output
    • This may be a hardcoded "dummy" output such as String check = "passed"
    • This ensures the task can be cached by runners. Tasks without outputs may be required to rerun on the same input due to a cache miss.
  • Use the as keyword sparingly; only in the case of increased readability or to avoid name collisions
    • Prefer using as in the import block rather than at the task/workflow call level
    • When using as to rename an invalid URI, attempt to make as few changes to the filename as possible (i.e. try not to abbreviate)
    • To disambiguate a task or workflow file from it's contents, you can respectively add the _tasks or _wf suffix in the import section
  • Whenever possible, prefer a Docker image maintained by an external source (such as BioContainers) rather than creating your own image
  • When adding a Dockerfile to this repository, follow the below conventions
    • The Dockerfile should be nested under the docker/ directory, a folder with a name for the image (in most cases the name of the primary tool), and finally a folder named after the version being built.
    • Docker images should be versioned according to the following convention
      • Start with the version of whatever tool is named in the path to the Dockerfile
        • If no specific tool is named (e.g. the util image), default to SemVer. Ignore the next 3 bullet points.
      • Followed by a dash-zero (-0)
        • If the Docker image gets updated, without updating the base tool's version, increment the number after the dash (-) by one
        • If the Docker image gets updated, including updating the base tool's version, revert back to a dash-zero (-0)
  • general purpose tasks can use the util image maintained in this repo