Customized protein database construction

Introduction

Customprodbj is a Java-based tool for customized protein database construction:

Build a customized database based on single VCF file.
Build a customized database based on multiple VCF files from a sample.
Build a customized database based on multiple VCF files from multiple samples.

Usage

Please download Customprodbj program from the release page: https://github.com/wenbostar/customprodbj/releases

java -jar customprodbj.jar

 -d      mRNA fasta database
 -f      A file which includes multiple samples. This parameter is used to
         build a customized database for
 -h      Help
 -i      Annovar annotation result file. Multiple files are separated by
         ','.
 -o      Output folder
 -p1     The prefix of variant protein ID, default is VAR_
 -p2     The prefix of final output files, default is merge
 -r      Gene annotation data
 -ref    Output reference protein database file
 -t      Whether or not to add reference protein sequences to the output
         database file
 -v      Verbose

Example

Build a customized database based on single VCF file

Step 1: Variant annotation using ANNOVAR:

perl table_annovar.pl test.vcf annovar_database/humandb/ -buildver hg19 -out out/test -protocol refGene -operation g -nastring . -vcfinput --thread 30 --maxgenethread 30 -polish

Step 2: Build customized protein database using Customprodbj:

java -jar customprodbj.jar -i test.hg19_multianno.txt -d annovar_database/humandb/hg19_refGeneMrna.fa -r annovar_database/humandb/hg19_refGene.txt -t -o out/

The input file "test.hg19_multianno.txt" is from ANNOVAR annotation result. The input files "annovar_database/humandb/hg19_refGeneMrna.fa" and "annovar_database/humandb/hg19_refGene.txt" are two files used by ANNOVAR. Before uses do variant annotation using ANNOVAR, users need to download these files using ANNOVAR. Please follow the instruction described here: http://annovar.openbioinformatics.org/en/latest/user-guide/download/.

Build a customized database based on multiple VCF files from a sample

Step 1: Variant annotation using ANNOVAR:

Perform variant annotation for each VCF file using the same method described in above section.

Step 2: Build customized protein database using Customprodbj:

java -jar customprodbj.jar -f input_variant_file_list.txt -d annovar_database/humandb/hg19_refGeneMrna.fa -r annovar_database/humandb/hg19_refGene.txt -t -o out/

The format of input_variant_file_list.txt looks like below:

sample	somatic	germline	rna	msi
sample1	s1a.hg19_multianno.txt	s1b.hg19_multianno.txt	s1c.hg19_multianno.txt	s1d.hg19_multianno.txt

Please note the columns are separated by "\t".

Build a customized database based on multiple VCF files from multiple samples

Step 1: Variant annotation using ANNOVAR:

Perform variant annotation for each VCF file using the same method described in above section.

Step 2: Build customized protein database using Customprodbj:

java -jar customprodbj.jar -f input_variant_file_list.txt -d annovar_database/humandb/hg19_refGeneMrna.fa -r annovar_database/humandb/hg19_refGene.txt -t -o out/

The format of input_variant_file_list.txt looks like below:

sample	somatic	germline	rna	msi
sample1	s1a.hg19_multianno.txt	s1b.hg19_multianno.txt	s1c.hg19_multianno.txt	s1d.hg19_multianno.txt
sample2	s2a.hg19_multianno.txt	s2b.hg19_multianno.txt	s2c.hg19_multianno.txt	s2d.hg19_multianno.txt
sample3	s3a.hg19_multianno.txt	s3b.hg19_multianno.txt	s3c.hg19_multianno.txt	s3d.hg19_multianno.txt

Ouput

The final outputs consist of three files:

*-varInfo.txt: A table contains the detailed amino acid change information. Each row is a variant.

*-var.fasta: Customized protein database.

*-varStat.txt: Summary data.

How to cite:

Wen, B., Li, K., Zhang, Y. et al. Cancer neoantigen prioritization through sensitive and reliable proteogenomics analysis. Nature Communications 11, 1759 (2020). https://doi.org/10.1038/s41467-020-15456-w

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
src/main/java		src/main/java
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Customized protein database construction

Introduction

Usage

Example

Build a customized database based on single VCF file

Step 1: Variant annotation using ANNOVAR:

Step 2: Build customized protein database using Customprodbj:

Build a customized database based on multiple VCF files from a sample

Step 1: Variant annotation using ANNOVAR:

Step 2: Build customized protein database using Customprodbj:

Build a customized database based on multiple VCF files from multiple samples

Step 1: Variant annotation using ANNOVAR:

Step 2: Build customized protein database using Customprodbj:

Ouput

How to cite:

About

Releases 3

Packages

Languages

License

wenbostar/Customprodbj

Folders and files

Latest commit

History

Repository files navigation

Customized protein database construction

Introduction

Usage

Example

Build a customized database based on single VCF file

Step 1: Variant annotation using ANNOVAR:

Step 2: Build customized protein database using Customprodbj:

Build a customized database based on multiple VCF files from a sample

Step 1: Variant annotation using ANNOVAR:

Step 2: Build customized protein database using Customprodbj:

Build a customized database based on multiple VCF files from multiple samples

Step 1: Variant annotation using ANNOVAR:

Step 2: Build customized protein database using Customprodbj:

Ouput

How to cite:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages