Some tools for NGS studies
- NGS_pipeline.sh
Usage: NGS_pipeline.sh <BAM_file/FASTQ_Files>
For example:
NGS_pipeline.sh id1.bam
OR:
NGS_pipeline.sh id1_R1.fastq.gz id1_R2.fastq.gz
This script implements the GATK Best Practices (gatk v3.5). This is what this script will do:
If the input file is a BAM file, it will be converted back to FASTQ files then followed by:
1) mapping (bwa mem)
2) mark duplicates & sort (Picard)
3) Indel realignment
4) Base recalibration
5) Variant calling (HaplotypeCaller GVCF mode)
If the input files are two FASTQ files, the mapping will be started right away.
- fixFastq.py
usage: fixFastq.py [-h] [--checkEncodingOnly] fastq1 fastq2 output
Fix FASTQ files (remove singletons, resolve pairs, recode quality scores to
Illumina-1.8 if needed)
positional arguments:
fastq1 input fastq file for first read in paired end data
fastq2 input fastq file for second read in paired end data
output the prefix of output fastq files
optional arguments:
-h, --help show this help message and exit
--checkEncodingOnly Use this flag if you only want to check the encoding
- vcfSummary.py
usage: vcfSummary.py [-h] input output
Get variant- and individual-level summary from a VCF file. For example: AC, AF, missing rate, ... for variants; NVAR, Ti/Tv, ... for subjects
positional arguments:
input The VCF input file
output The prefix of output files
optional arguments:
-h, --help show this help message and exit
- vcfPedcheck.py
usage: vcfPedcheck.py [-h] [--zeroout] [--me N] vcf fam prefix
Scan the vcf file for Mendelian Errors
positional arguments:
vcf The VCF input file
fam The fam file
prefix The prefix of the output files
optional arguments:
-h, --help show this help message and exit
--zeroout, -z Create a new vcf file by zeroing out all Mendelian Errors
--me N Mark all variants with > N Mendelian error rate (based on
trios) in the new vcf file
- SelectVariants.py
usage: SelectVariants.py [-h] [--genemodel {ensembl,refSeq}] [--maf N]
[--splicing] [--frameshift] [--nonsynonymous]
[--stopgain] [--stoploss]
input output
Select variants based on MAF and functional categories
positional arguments:
input input file generated by 'table_annovar.pl'
output output file with all selected variants included
optional arguments:
-h, --help show this help message and exit
--genemodel {ensembl,refSeq}
which gene model to use (default: ensembl)
--maf N variants with MAF > N will be excluded (default: 0.01)
--splicing use this flag to select splicing variants
--frameshift use this flag to select frameshift-indels
--nonsynonymous use this flag to select non-synonymous variants
--stopgain use this flag to select stop-gain variants
--stoploss use this flag to select stop-loss variants
- backfillVCF.py
usage: backfillVCF.py [-h] [--subjects] input output
Backfilling is needed when different VCF files were merged. This script will
go through the merged VCF file, and backfill the genotypes to '0/0' when all
subejcts provided by the user have missing calls
positional arguments:
input Name of the input file (VCF format)
output Name of the output file
optional arguments:
-h, --help show this help message and exit
--subjects A file that includes groups of subjects for whom the
backfilling is needed (This file can have multhiple lines, but
subjects that belong to the same group have to be on a single
line)