Skip to content

Options

Danny Antaki edited this page Oct 26, 2017 · 33 revisions


Help

-h | --help         show this help message and exit

Display help message


Input Arguments

Required

BAM

-i, -bam    ...    BAM file(s)

More than one BAM file can be passed to SV2, separating by a space

$sv2 -i HG00096.bam NA12878.bam ...

Details for BAM files.

SV files

SV2 can take multiple files containing SV predictions as input. BED and VCF files are supported.

BED

-b, -bed    ...    BED file(s) of SVs

Multiple BED files can be passed to SV2, separating by a space.

$ sv2 -i HG00096.bam -b del.bed dup.bed 

BED files are either space or tab delimited, formatted as CHROM START END SVTYPE.

Details on the required BED format

VCF

-v, -vcf    ...    VCF file(s) of SVs

Multiple VCF files can be passed to SV2, separating by a space.

VCF files are tab delimited, END= and SVTYPE= are required in the INFO column.

Details on the required VCF format

SNV VCF

-snv    ...    SNV VCF file(s) 

SNV information is leveraged for genotyping duplications that are between repetitive sequence; typically such SVs lack split-reads or discordant paired-ends.

VCF files must be compressed with bgzip and indexed with tabix.

Multiple VCF files can be passed to SV2, separating by a space.

Details for SNV VCF files

PED

-p, -ped    ...    PED file(s)

PED format defined by plink

Multiple PED files can be passed to SV2, separating by a space.

Details for PED files


Genotype Arguments

Reference Genome

-g, -genome    STR    Reference genome build [hg19, hg38, mm10]. Default: hg19

Accepted reference genome builds for SV2 are hg19 (GRCh37), hg38 (GRCh38), or mm10. Accepted command line argument strings are either hg19, hg38, or mm10.

PCRfree Libraries

-pcrfree        GC content normalization for PCRfree libraries

SV2 performs a GC content normalization for coverage estimates adapted from CNVator. Supply this flag if the samples in the sample information list were sequenced with PCRfree chemistries.

By default this flag is off and SV2 assumes samples were sequenced with PCR protocols.

bwa mem -M compatibility

-M        bwa mem -M compatibility. Split-reads flagged as secondary instead of supplementary 

SV2 can accommodate legacy alignments with chimeric reads flagged as secondary. Pass the -M flag if samples in the sample information file were aligned with bwa mem -M.

By default SV2 assumes chimeric reads are flagged as supplementary (-M is off).

Merging Divergent Breakpoints

-merge        Merge SV after genotyping

SV2 can merge breakpoints that are reciprocally overlapping by 80% (by default). This step is done reciprocally until no more SVs can be merged. The SV position with the maximum ALT genotype likelihood is retained.

By default SV2 does not merge breakpoints.

Minimum Reciprocal Overlap for Merging

-min-ovr    FLOAT    Minimum reciprocal overlap for merging SVs [0.8]

Users can define the minimum reciprocal overlap required for merging SVs after genotyping. The -merge flag is not required if -min-ovr option is passed.

Skip Preprocessing

-pre    PATH    Preprocessing output directory. Skips preprocessing

Users can skip preprocessing by passing the path of the sv2_preprocessing/ directory to the -pre argument. Doing this will instruct SV2 to load the values in sv2_preprocessing/ skipping this step. This is useful if users wish to genotype a different set of variants in previously processed samples.

Skip Feature Extraction

-feats    PATH    Feature output directory. Skips feature extraction

Passing the path of the sv2_features/ to the -feats argument will skip this step. This is useful for users that wish to generate a genotype matrix containing multiple samples. An example of skipping feature extraction.


Classifier Arguments

Load a New Classifier

-load-clf    PATH    Add custom classifiers. `-load-clf <clf.JSON>`

SV2 can incorporate new classifiers for genotyping. Packaged with SV2 is a guide on training new classifiers. The output of this guide is a JSON file containing paths to the new classifier.

Pass the JSON file to the -load-clf argument to add more classifiers to SV2. More details are located in the Training section of the User Guide.

Genotype with a New Classifier

-clf    STR    Specify classifers for genotyping [default]

After loading a new classifier, specify the name of the classifier in the -clf argument to genotype variants with that classifier. The original classifier from SV2 is named default, and this is the default classifier.


Config Arguments

Before genotyping, users have to supply the full path to FASTA files for SV2. At least one FASTA file is required for SV2 to run. Configuration needs only to be executed once or updated if the FASTA paths change.

hg19 FASTA

-hg19    PATH    hg19 FASTA file

-hg19 takes the full path to a faidx indexed FASTA file for the hg19 (GRCh37) reference build.

hg38 FASTA

-hg38    PATH    hg38 FASTA file

-hg38 takes the full path to a faidx indexed FASTA file for the hg38 (GRCh38) reference build.

mm10 FASTA

-mm10    PATH    mm10 FASTA file

-mm10 takes the full path to a faidx indexed FASTA file for the mm10 reference build.


Optional Arguments

Temporary Directory

-T, -tmp-dir   PATH    directory for temporary files

SV2 generates temporary files that are placed by default in $WORKING_DIR/sv2_tmp/

Random Seed

-s, -seed    INT    Random seed for genome shuffling in preprocessing [42]

During preprocessing, SV2 randomly selects reads from each chromosome to generate basic alignment statistics. The random seed is set at 42.

Output

-o, -out    STR    Output name

Prefix for the output files in sv2_genotypes/


Clone this wiki locally