How to prepare a protein database

Proteomics pipelines and toolkits like Philosopher rely on properly formatted protein sequence databases to correctly identify peptides. Here are some tips on how to prepare a protein database for your experiment.

If you do not have a protein sequence database: `--id`

Run Philosopher from the command line to download one from UniProt by executing the following two commands:

philosopher workspace --init
philosopher database --reviewed --contam --id UP000005640

This will generate a human UniProt/SwissProt (i.e. reviewed sequences only) database, with common contaminants and decoys added (with a default decoy prefix rev_). If you would like to use the full UniProt, remove the --reviewed tag.

For mouse, for example, use the proteome ID UP000000589. To find the proteome ID for other organisms, search within the UniProt proteomes.

If you have your own database, without decoys and contaminants: `--custom`

Add decoys and contaminants and format it for FragPipe using the following commands:

philosopher workspace --init 
philosopher database --custom <file_name> --contam

If you have your own database, with decoys and contaminants: `--annotate`

Reformat it for FragPipe using the following commands:

philosopher workspace --init 
philosopher database --annotate <file_name> --prefix <prefix>

Header Formatting

If you need to run the --custom or the --annotate command, you may manually inspect the formatted files to ensure it will be compatible with Philosopher, it should follow one of these formats:

UniProt
NCBI
ENSEMBL
Generic

Generic means a simple: >proteinID

If you are adding you own decoys, they also need to follow a specific formatting; sequences need to be formatted as a whole protein string in FASTA file with a decoy (e.g. rev_ or DECOY_) added at the beginning.

Examples of compatible formats:

>rev_tr|J3KNE0|J3KNE0_HUMAN
>DECOY_tr|J3KNE0|J3KNE0_HUMAN

Examples of incompatible formats:

>tr_REVERSED|J3KNE0|J3KNE0_HUMAN
>tr|fake_J3KNE0|J3KNE0_HUMAN RanBP2-like
>tr|J3KNE0_DECOY|J3KNE0_HUMAN

Home
About
Commands
- workspace
- database
- PeptideProphet
- iProphet
- ProteinProphet
- PTMProphet
- filter
- freequant
- labelquant
- bioquant
- Abacus
- report
- pipeline
- slack
Tutorials
- How to Build
- Preparing protein sequence databases
Reports
- PSM
- Ion
- Peptide
- Protein
- Combined Peptide
- Combined Protein

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to prepare a protein database

If you do not have a protein sequence database: `--id`

If you have your own database, without decoys and contaminants: `--custom`

If you have your own database, with decoys and contaminants: `--annotate`

Header Formatting

Clone this wiki locally

How to prepare a protein database

If you do not have a protein sequence database: --id

If you have your own database, without decoys and contaminants: --custom

If you have your own database, with decoys and contaminants: --annotate

Header Formatting

Clone this wiki locally

If you do not have a protein sequence database: `--id`

If you have your own database, without decoys and contaminants: `--custom`

If you have your own database, with decoys and contaminants: `--annotate`