Skip to content

haddocking/protein-quest

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

protein-quest

Documentation CI Research Software Directory Badge PyPI DOI Codacy Badge

Python package to search/retrieve/filter proteins and protein structures.

It uses

An example workflow:

graph TB;
    taxonomy[/Search taxon/] -. taxon_ids .-> searchuniprot[/Search UniprotKB/]
    goterm[/Search GO term/] -. go_ids .-> searchuniprot[/Search UniprotKB/]
    searchuniprot --> |uniprot_accessions|searchpdbe[/Search PDBe/]
    searchuniprot --> |uniprot_accessions|searchaf[/Search Alphafold/]
    searchuniprot -. uniprot_accessions .-> searchemdb[/Search EMDB/]
    searchpdbe -->|pdb_ids|fetchpdbe[Retrieve PDBe]
    searchaf --> |uniprot_accessions|fetchad(Retrieve AlphaFold)
    searchemdb -. emdb_ids .->fetchemdb[Retrieve EMDB]
    fetchpdbe -->|mmcif_files_with_uniprot_acc| chainfilter{Filter on chain of uniprot}
    chainfilter --> |mmcif_files| residuefilter{Filter on chain length}
    fetchad -->|pdb_files| confidencefilter{Filter out low confidence}
    confidencefilter --> |mmcif_files| ssfilter{Filter on secondary structure}
    residuefilter --> |mmcif_files| ssfilter
    classDef dashedBorder stroke-dasharray: 5 5;
    goterm:::dashedBorder
    taxonomy:::dashedBorder
    searchemdb:::dashedBorder
    fetchemdb:::dashedBorder
Loading

(Dotted nodes and edges are side-quests.)

Install

pip install protein-quest

Or to use the latest development version:

pip install git+https://github.com/haddocking/protein-quest.git

Usage

The main entry point is the protein-quest command line tool which has multiple subcommands to perform actions.

To use programmaticly, see the Jupyter notebooks and API documentation.

Search Uniprot accessions

protein-quest search uniprot \
    --taxon-id 9606 \
    --reviewed \
    --subcellular-location-uniprot nucleus \
    --subcellular-location-go GO:0005634 \
    --molecular-function-go GO:0003677 \
    --limit 100 \
    uniprot_accs.txt

(GO:0005634 is "Nucleus" and GO:0003677 is "DNA binding")

Search for PDBe structures of uniprot accessions

protein-quest search pdbe uniprot_accs.txt pdbe.csv

pdbe.csv file is written containing the the PDB id and chain of each uniprot accession.

Search for Alphafold structures of uniprot accessions

protein-quest search alphafold uniprot_accs.txt alphafold.csv

Search for EMDB structures of uniprot accessions

protein-quest search emdb uniprot_accs.txt emdbs.csv

To retrieve PDB structure files

protein-quest retrieve pdbe pdbe.csv downloads-pdbe/

To retrieve AlphaFold structure files

protein-quest retrieve alphafold alphafold.csv downloads-af/

For each entry downloads the summary.json and cif file.

To retrieve EMDB volume files

protein-quest retrieve emdb emdbs.csv downloads-emdb/

To filter AlphaFold structures on confidence

Filter AlphaFoldDB structures based on confidence (pLDDT). Keeps entries with requested number of residues which have a confidence score above the threshold. Also writes pdb files with only those residues.

protein-quest filter confidence \
    --confidence-threshold 50 \
    --min-residues 100 \
    --max-residues 1000 \
    ./downloads-af ./filtered

To filter PDBe files on chain of uniprot accession

Make PDBe files smaller by only keeping first chain of found uniprot entry and renaming to chain A.

protein-quest filter chain \
    pdbe.csv \
    ./downloads-pdbe ./filtered-chains

To filter PDBe files on nr of residues

protein-quest filter residue  \
    --min-residues 100 \
    --max-residues 1000 \
    ./filtered-chains ./filtered

To filter on secondary structure

To filter on structure being mostly alpha helices and have no beta sheets.

protein-quest filter secondary-structure \
    --ratio-min-helix-residues 0.5 \
    --ratio-max-sheet-residues 0.0 \
    --write-stats filtered-ss/stats.csv \
    ./filtered-chains ./filtered-ss

Search Taxonomy

protein-quest search taxonomy "Homo sapiens" -

Search Gene Ontology (GO)

You might not know what the identifier of a Gene Ontology term is at protein-quest search uniprot. You can use following command to search for a Gene Ontology (GO) term.

protein-quest search go --limit 5 --aspect cellular_component apoptosome -

Model Context Protocol (MCP) server

Protein quest can also help LLMs like Claude Sonnet 4 by providing a set of tools for protein structures.

Protein Quest MCP workflow

To run mcp server you have to install the mcp extra with:

pip install protein-quest[mcp]

The server can be started with:

protein-quest mcp

The mcp server contains an prompt template to search/retrieve/filter candidate structures.

Contributing

For development information and contribution guidelines, please see CONTRIBUTING.md.

About

Python package to search/retrieve/filter proteins and protein structures

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Contributors 2

  •  
  •  

Languages