Bachelor evaluates the pathogenicity of germline variants from a snpeff annotated germline vcf in a configured panel of genes. Pathogenicity is determined based both on ClinVar and a list of configured snpeff effects that are deemed to be likely pathogenic even if unannotated in ClinVar.
The steps in the routine are as follows:
- Parse germline VCF to identify candidate variants.
- If the tumor alt and read depth counts are not in the germline VCF, obtain them from the sample's tumor BAM file.
- Enrich variants with copy number data from purple
- Enrich variants with ClinVar pathogenicity information and configured whitelist / blacklist and determine reportability
- Write final set of germline variant data to DB (germlineVariant table) and a TSV output file.
In addition to standard PURPLE annotations (https://github.com/hartwigmedical/hmftools/tree/master/purity-ploidy-estimator#10-somatic-enrichment), Bachelor also annotates the following information:
Pathogenicity is determined primarily from ClinVar, or may be set or overridden via a configured blacklist or whitelist.
Permitted values are:
- BLACK_LIST - Variant matches black list configuration of known benign variant. INDELs in repeats or with microhomology in splice and intronic regions are also blacklisted (ie. CodingEffect in (‘SPLICE’,’NONE’) AND Type = ‘INDEL’ AND (repeatCount>=8 OR indelSequence == microhomology))
- WHITE_LIST - Variant matches white list configuration of known pathogenic variants
- CLINVAR_PATHOGENIC - At least 1 interpretation of 'PATHOGENIC' and none ‘BENIGN’ or ‘LIKELY_BENIGN’
- CLINVAR_LIKELY_PATHOGENIC - No interpretation of PATHOGENIC, but at least 1 interpretation of 'LIKELY_PATHOGENIC' and none ‘BENIGN’ or ‘LIKELY_BENIGN’
- CLINVAR_CONFLICTING - Variant has both likely 'BENIGN'/'LIKELY_BENIGN' and 'PATHOGENIC'/'LIKELY_PATHOGENIC' interpretations
- CLINVAR_LIKELY_BENIGN - No interpretation of 'BENIGN' and at least 1 interpretation of 'LIKELY_BENIGN' and none ‘PATHOGENIC’ or ‘LIKELY_PATHOGENIC’
- CLINVAR_BENIGN - At least 1 interpretation of 'BENIGN' and none ‘PATHOGENIC’ or ‘LIKELY_PATHOGENIC’
- UNANNOTATED - Variant is not annotated in Clinvar
Note that custom white and blacklists can be specified per gene in the XML config by either:
- MinCodon OR
- Ref, Alt,Chromosome & Position
A filter field is populated. Permitted values are:
- PASS
- GERMLINE_FILTERED - Variant was soft filtered in input germline vcf file
- ARTEFACT - Phred Score < 150 and adjusted VAF < 0
This is a summary boolean flag which implements the HMF logic for reportable germline variants. A variant will be marked as reported if the filter = PASS and either one or both of the following criteria are met:
- Pathogenicity in ('WHITE_LIST','CLINVAR_PATHOGENIC','CLINVAR_LIKELY_PATHOGENIC')
- Pathogenicity = 'UNANNOTATED' and effect is configured as a known snpeffect
- Find candidate germline variants, write to file and optionally write to db.
java -jar bachelor.jar
-sample [sampleId]
-germline_vcf /sample_path/sampleid.germline_variants.vcf.gz
-tumor_bam_file /sample_path/sampleid.tumor.bam
-purple_data_dir /sample_path/purple/
-xml_config /path/bachelor_config.xml
-ext_filter_file /path/bachelor_clinvar_filters.csv
-ref_genome /path/ref_genome.fasta
-output_dir /sample_path/bachelor/
[-db_url [db_url] -db_user [user] -db_pass [password]]
- Load germline variants from file and upload to database
java -cp bachelor.jar com.hartwig.hmftools.bachelor.LoadGermlineVariants
-sample [sampleId]
-sample_data_dir /sample_path/bachelor/
-db_url [db_url] -db_user [user] -db_pass [password]
Optional config:
- include_vcf_filtered - process non-passing variants from the VCF, mark as filter = GERMLINE_FILTERED
- log_debug - verbose logging
Bachelor requires that VCF files have been annotated using SnpEff.
Bachelor uses 2 config files:
- an XML config file specifying the genes and effects to filter on, and blacklist entries for known exceptions
- optionally a file of ClinVar variant data to set pathogenic status
This very basic example has a single criteria: missense variants in BRCA2.
<?xml version="1.0" ?>
<Program name="ExampleStudy" xmlns="http://www.hartwigmedicalfoundation.nl/bachelor.xsd">
<Panel>
<Gene name="BRCA2" ensembl="ENST00000544455" refseq="NM_000059"/>
<SnpEffect>missense_variant</SnpEffect>
</Panel>
</Program>
Create a set of files from ClinVar variant classification with this command:
java -cp bachelor.jar com.hartwig.hmftools.bachelor.ExternalDBFilters
-xml_config /path/to/bachelor_config.xml
-clinvar_db_filter_file /path/to/clinvar_snp.vcf.gz
-output_filter_file /path/to/output_filter_file
log_debug can be passed an optional additional parameter for additional logging.