Skip to content

File Formats groupinfo.yaml

Keiran Raine edited this page Jan 15, 2018 · 2 revisions

The groupinfo.yaml file is used with fastq input to bwa_mem.pl.

The file uses basic yaml syntax to describe each set of files and data to be applied to the appropriate readgroup header line of the final BAM/CRAM file.

The sample SM tag is abstracted out of individual records as it must match across all input data. This is validated against the command line argument for sample (still required).

Each fastq file (or pair) is described in the READGRPS section, indentation is mandatory for parsing to work. The filename without path is used to link input files to the relevant records (for compatibility with Dockstore/CWL).

All RG header information declared in the BAM/SAM specification is permitted, unknown fields will be rejected but minimal validation is applied, specifically:

  • ID if not found or clashes a partial UUID will be generated.
  • PL forced to uppercase as common error in headers.
  • SM checked against command line value for -sample.

Undeclared items are ommitted.

Example.

# abstracted as has to be same for all records
SM: sample
# the actual readgroups
READGRPS:
  1.fq:
      ID: 9
      CN: centre
      DS: Please don't use multiline
      LB: Library_id
      PI: 500
      PL: FORCED TO UPPER
      PM: HiSeq-XTen
      PU: 1234_1
  2.fq:
      LB: Library2
Clone this wiki locally