Skip to content

Options

Danny Antaki edited this page Mar 8, 2018 · 33 revisions







Help

-h | --help         show this help message and exit

Display help message


Input Arguments

Required

BAM

-i, -bam    ...    BAM file(s)

More than one BAM file can be passed to SV2, separating by a space

$sv2 -i HG00096.bam NA12878.bam ...

Details for BAM files.

SV files

SV2 can take multiple files containing SV predictions as input. BED and VCF files are supported.

BED

-b, -bed    ...    BED file(s) of SVs

Multiple BED files can be passed to SV2, separating by a space.

$ sv2 -i HG00096.bam -b del.bed dup.bed 

BED files are either space or tab delimited, formatted as CHROM START END SVTYPE.

Details on the required BED format

VCF

-v, -vcf    ...    VCF file(s) of SVs

Multiple VCF files can be passed to SV2, separating by a space.

VCF files are tab delimited, END= and SVTYPE= are required in the INFO column.

Details on the required VCF format

SNV VCF

-snv    ...    SNV VCF file(s) 

SNV calls are required to genotype duplications with imprecise breakpoints. For such variants genotyping considers both coverage and heterozygous allele ratio.

VCF files must be compressed with bgzip and indexed with tabix.

Multiple VCF files can be passed to SV2, separating by a space.

Details for SNV VCF files

PED

-p, -ped    ...    PED file(s)

PED format defined by plink

Multiple PED files can be passed to SV2, separating by a space.

Details for PED files


Genotype Arguments

Reference Genome

-g, -genome    STR    Reference genome build [hg19, hg38, mm10]. Default: hg19

Accepted reference genome builds for SV2 are hg19 (GRCh37), hg38 (GRCh38), or mm10. Accepted command line argument strings are either hg19, hg38, or mm10.

PCRfree Libraries

-pcrfree        GC content normalization for PCRfree libraries

SV2 performs a GC content normalization for coverage estimates adapted from CNVator. Supply this flag if the samples in the sample information list were sequenced with PCRfree chemistries.

By default this flag is off and SV2 assumes samples were sequenced with PCR protocols.

bwa mem -M compatibility

-M        bwa mem -M compatibility. Split-reads flagged as secondary instead of supplementary 

SV2 can accommodate legacy alignments with chimeric reads flagged as secondary. Pass the -M flag if samples in the sample information file were aligned with bwa mem -M.

By default SV2 assumes chimeric reads are flagged as supplementary (-M is off).

Merging Divergent Breakpoints

-merge        Merge SV after genotyping

SV2 can merge breakpoints that are reciprocally overlapping by 80% (by default). This step is done recursively until no more SVs can be merged. The SV position with the maximum ALT genotype likelihood is retained.

By default SV2 does not merge breakpoints.

Minimum Reciprocal Overlap for Merging

-min-ovr    FLOAT    Minimum reciprocal overlap for merging SVs [0.8]

Users can define the minimum reciprocal overlap required for merging SVs after genotyping. The -merge flag is not required if -min-ovr option is passed.

Genotype without annotating

-no-anno        Genotype without annotating        

Skip variant annotation with the -no-anno flag. By default, SV2 will annotate each variant.

Skip Preprocessing

-pre    PATH    Preprocessing output directory. Skips preprocessing

Users can skip preprocessing by passing the path of the sv2_preprocessing/ directory to the -pre argument. Doing this will instruct SV2 to load the values in sv2_preprocessing/ skipping this step.

Skipping preprocessing is useful when genotyping a different set of variants. Example

Skip Feature Extraction

-feats    PATH    Feature output directory. Skips feature extraction

Passing the path of the sv2_features/ to the -feats argument will skip this step.

Skipping feature extraction does not require BAM or SNV files. Additionally, multiple samples can be passed to square off a genotype matrix.

Skipping feature extraction example


Classifier Arguments

Load a New Classifier

-load-clf    PATH    Add custom classifiers. `-load-clf <clf.JSON>`

SV2 can incorporate new classifiers for genotyping. Packaged with SV2 is a guide on training new classifiers. The output of this guide is a JSON file containing paths to the new classifier.

Pass the JSON file to the -load-clf argument to add more classifiers to SV2. Details for training

Genotype with a New Classifier

-clf    STR    Specify classifiers for genotyping [default]

After loading a new classifier, specify the name of the classifier in the -clf argument to genotype variants with that classifier. The original classifier from SV2 is named default, and this is the default classifier.


Config Arguments

Download the required resource files

Download

$ sv2 -download

Follow the instructions when prompted. You will be asked to download a zipped file ~250MB in size. This contains documents SV2 uses for filtering and annotation. The default install location is the SV2 install location.

FASTA Files

Before genotyping, users have to supply the full path to FASTA files for SV2. At least one FASTA file is required for SV2 to run. Configuration needs only to be executed once or updated if the FASTA paths change.

hg19 FASTA

-hg19    PATH    hg19 FASTA file

-hg19 takes the full path to a faidx indexed FASTA file for the hg19 (GRCh37) reference build.

hg38 FASTA

-hg38    PATH    hg38 FASTA file

-hg38 takes the full path to a faidx indexed FASTA file for the hg38 (GRCh38) reference build.

mm10 FASTA

-mm10    PATH    mm10 FASTA file

-mm10 takes the full path to a faidx indexed FASTA file for the mm10 reference build.


Optional Arguments

Log File

-L, -log    PATH    log file for standard error messages

Error messages and warnings are printed to a log file. The default log file outputs to $WORKING_DIR/sv2.err

Temporary Directory

-T, -tmp-dir   PATH    directory for temporary files

SV2 generates temporary files that are placed by default in $WORKING_DIR/sv2_tmp/

Random Seed

-s, -seed    INT    Random seed for genome shuffling in preprocessing [42]

During preprocessing, SV2 randomly selects reads from each chromosome to generate basic alignment statistics. The random seed is set at 42.

Output

-o, -out    STR    Output name

Prefix for the output files in sv2_genotypes/


Clone this wiki locally