-
Notifications
You must be signed in to change notification settings - Fork 9
File Formats groupinfo.yaml
The groupinfo.yaml
file is used with fastq input to bwa_mem.pl
.
The file uses basic yaml syntax to describe each set of files and data to be applied to the appropriate readgroup header line of the final BAM/CRAM file.
The sample SM
tag is abstracted out of individual records as it must match across all input data. This is validated against the command line argument for sample (still required).
Each fastq file (or pair) is described in the READGRPS section, indentation is mandatory for parsing to work. The filename without path is used to link input files to the relevant records (for compatibility with Dockstore/CWL).
All RG header information declared in the BAM/SAM specification is permitted, unknown fields will be rejected but minimal validation is applied, specifically:
-
ID
if not found or clashes a partial UUID will be generated. -
PL
forced to uppercase as common error in headers. -
SM
checked against command line value for-sample
.
Undeclared items are ommitted.
Example.
# abstracted as has to be same for all records
SM: sample
# the actual readgroups
READGRPS:
1.fq:
ID: 9
CN: centre
DS: Please don't use multiline
LB: Library_id
PI: 500
PL: FORCED TO UPPER
PM: HiSeq-XTen
PU: 1234_1
2.fq:
LB: Library2