Skip to content

Options

Danny Antaki edited this page Oct 26, 2017 · 33 revisions


Help

-h | --help         show this help message and exit
$ sv2 --help

Support Vector Structural Variation Genotyper
Version 1.3.2    Author: Danny Antaki <[email protected]>    github.com/dantaki/SV2

optional arguments:
  -h, --help            show this help message and exit

input arguments:
  -i I, -in I           Tab or space delimited input [ID, BAM-PATH, VCF-PATH, M/F]
  -b [B [B ...]], -bed [B [B ...]]
                        BED file(s)
  -v [V [V ...]], -vcf [V [V ...]]
                        VCF file(s)

genotype arguments:
  -c C, -cpu C          Parallelize sample-wise. 1 per cpu
  -g G, -genome G       Reference genome build [hg19, hg38]
  -pcrfree              GC content normalization for PCR free chemistries
  -M                    bwa mem -M compatibility. Split-reads flagged as secondary instead of supplementary
  -s S, -seed S         Preprocessing integer seed for genome shuffling
  -o O, -out O          Output
  -merge                Merge SV after genotyping
  -min-ovr MIN_OVR      Minimum reciprocal overlap for merging SVs [0.8]
  -pre PRE              Preprocessing output directory
  -feats FEATS          Feature extraction output directory

classifier arguments:
  -load-clf LOAD_CLF    Add custom classifiers. -load-clf <clf.JSON>
  -clf CLF              Specify classifiers for genotyping [default]

config arguments:
  -hg19 HG19            hg19 FASTA
  -hg38 HG38            hg38 FASTA

Input Arguments

Sample Information

Required

-i | -in    FILE        Sample information [ID, BAM-PATH, VCF-PATH, GENDER]

The sample information file is required by SV2.

The sample information file must be either tab or space delimited and must contain four columns. Each line contains one sample to be genotyped by SV2.

Multiple samples can be run in parallel with the -c | -cpu argument.

SV files

SV2 can take multiple files containing SV predictions as input. BED and VCF files are supported.

BED

-b | -bed    ...        BED file(s) of SVs

One or more BED files can be passed to SV2, separating by a space.

$ sv2 -i in.txt -b del.bed dup.bed 

BED files are either space or tab delimited, formatted as CHROM START END SVTYPE.

Details on the required BED format

VCF

-v | -vcf    ...        VCF file(s) of SVs

One or more VCF file can be passed to SV2, separating by a space.

VCF files are tab delimited, END= and SVTYPE= are required in the INFO column.

Details on the required VCF format


Genotype Arguments

Parallelization

-c | -cpu    INT        Parallelize sample-wise: 1 per CPU [1]

Given more than one sample in the sample information input, SV2 can perform preprocessing and feature extraction in parallel. Each subprocess operates on one sample, this is limited by the number of cores on a CPU.

By default SV2 will run each sample serially. Note that SV2 does not parallelize across chromosomes.

Reference Genome

-g | -genome    STR        Reference genome build [hg19, hg38]. Default: hg19

Accepted reference genome builds for SV2 are hg19 (GRCh37) or hg38 (GRCh38). Accepted command line argument strings are either hg19 or hg38.

PCRfree Libraries

-pcrfree        GC content normalization for PCRfree libraries

SV2 performs a GC content normalization for coverage estimates adapted from CNVator. Supply this flag if the samples in the sample information list were sequenced with PCRfree chemistries.

By default this flag is off and SV2 assumes samples were sequenced with PCR protocols.

bwa mem -M compatibility

-M        bwa mem -M compatibility. Split-reads flagged as secondary instead of supplementary 

SV2 can accommodate legacy alignments with chimeric reads flagged as secondary. Pass the -M flag if samples in the sample information file were aligned with bwa mem -M.

By default SV2 assumes chimeric reads are flagged as supplementary (-M is off).

Random Seed

-s | -seed    INT        Random seed for genome shuffling in preprocessing [42]

During preprocessing, SV2 randomly selects reads from each chromosome to generate basic alignment statistics. The random seed is set at 42.

Output

-o | -out    STR        Output name

Prefix for the output files in sv2_genotypes/

Merging Divergent Breakpoints

-merge        Merge SV after genotyping

SV2 can merge breakpoints that are reciprocally overlapping by 80% (by default). This step is done reciprocally until no more SVs can be merged. The SV position with the maximum ALT genotype likelihood is retained.

By default SV2 does not merge breakpoints.

Minimum Reciprocal Overlap for Merging

-min-ovr    FLOAT        Minimum reciprocal overlap for merging SVs [0.8]

Users can define the minimum reciprocal overlap required for merging SVs after genotyping. The -merge flag is not required if -min-ovr option is passed.

Skip Preprocessing

-pre    PATH        Preprocessing output directory. Skips preprocessing

Users can skip preprocessing by passing the path of the sv2_preprocessing/ directory to the -pre argument. Doing this will instruct SV2 to load the values in sv2_preprocessing/ skipping this step. This is useful if users wish to genotype a different set of variants in previously processed samples.

Skip Feature Extraction

-feats    PATH        Feature output directory. Skips feature extraction

Passing the path of the sv2_features/ to the -feats argument will skip this step. This is useful for users that wish to generate a genotype matrix containing multiple samples. An example of skipping feature extraction.


Classifier Arguments

Load a New Classifier

-load-clf    PATH        Add custom classifiers. `-load-clf <clf.JSON>`

SV2 can incorporate new classifiers for genotyping. Packaged with SV2 is a guide on training new classifiers. The output of this guide is a JSON file containing paths to the new classifier.

Pass the JSON file to the -load-clf argument to add more classifiers to SV2. More details are located in the Training section of the User Guide.

Genotype with a New Classifier

-clf    STR        Specify classifers for genotyping [default]

After loading a new classifier, specify the name of the classifier in the -clf argument to genotype variants with that classifier. The original classifier from SV2 is named default, and this is the default classifier.


Config Arguments

Before genotyping, users have to supply the full path to FASTA files for SV2. At least one FASTA file is required for SV2 to run. Configuration needs only to be executed once or updated if the FASTA paths change.

hg19 FASTA

-hg19    PATH        hg19 FASTA file

-hg19 takes the full path to a faidx indexed FASTA file for the hg19 (GRCh37) reference build.

hg38 FASTA

-hg38    PATH        hg38 FASTA file

-hg38 takes the full path to a faidx indexed FASTA file for the hg38 (GRCh38) reference build.

Clone this wiki locally