protein-quest

Python package to search/retrieve/filter proteins and protein structures.

It uses

Uniprot Sparql endpoint to search for proteins and their measured or predicted 3D structures.
Uniprot taxonomy to search for taxonomy.
QuickGO to search for Gene Ontology terms.
gemmi to work with macromolecular models.
dask-distributed to compute in parallel.

An example workflow:

graph TB;
    taxonomy[/Search taxon/] -. taxon_ids .-> searchuniprot[/Search UniprotKB/]
    goterm[/Search GO term/] -. go_ids .-> searchuniprot[/Search UniprotKB/]
    searchuniprot --> |uniprot_accessions|searchpdbe[/Search PDBe/]
    searchuniprot --> |uniprot_accessions|searchaf[/Search Alphafold/]
    searchuniprot -. uniprot_accessions .-> searchemdb[/Search EMDB/]
    searchpdbe -->|pdb_ids|fetchpdbe[Retrieve PDBe]
    searchaf --> |uniprot_accessions|fetchad(Retrieve AlphaFold)
    searchemdb -. emdb_ids .->fetchemdb[Retrieve EMDB]
    fetchpdbe -->|mmcif_files_with_uniprot_acc| chainfilter{Filter on chain of uniprot}
    chainfilter --> |mmcif_files| residuefilter{Filter on chain length}
    fetchad -->|pdb_files| confidencefilter{Filter out low confidence}
    confidencefilter --> |mmcif_files| ssfilter{Filter on secondary structure}
    residuefilter --> |mmcif_files| ssfilter
    classDef dashedBorder stroke-dasharray: 5 5;
    goterm:::dashedBorder
    taxonomy:::dashedBorder
    searchemdb:::dashedBorder
    fetchemdb:::dashedBorder

(Dotted nodes and edges are side-quests.)

Install

pip install protein-quest

Or to use the latest development version:

pip install git+https://github.com/haddocking/protein-quest.git

Usage

The main entry point is the protein-quest command line tool which has multiple subcommands to perform actions.

To use programmaticly, see the Jupyter notebooks and API documentation.

Search Uniprot accessions

protein-quest search uniprot \
    --taxon-id 9606 \
    --reviewed \
    --subcellular-location-uniprot nucleus \
    --subcellular-location-go GO:0005634 \
    --molecular-function-go GO:0003677 \
    --limit 100 \
    uniprot_accs.txt

(GO:0005634 is "Nucleus" and GO:0003677 is "DNA binding")

Search for PDBe structures of uniprot accessions

protein-quest search pdbe uniprot_accs.txt pdbe.csv

pdbe.csv file is written containing the the PDB id and chain of each uniprot accession.

Search for Alphafold structures of uniprot accessions

protein-quest search alphafold uniprot_accs.txt alphafold.csv

Search for EMDB structures of uniprot accessions

protein-quest search emdb uniprot_accs.txt emdbs.csv

To retrieve PDB structure files

protein-quest retrieve pdbe pdbe.csv downloads-pdbe/

To retrieve AlphaFold structure files

protein-quest retrieve alphafold alphafold.csv downloads-af/

For each entry downloads the summary.json and cif file.

To retrieve EMDB volume files

protein-quest retrieve emdb emdbs.csv downloads-emdb/

To filter AlphaFold structures on confidence

Filter AlphaFoldDB structures based on confidence (pLDDT). Keeps entries with requested number of residues which have a confidence score above the threshold. Also writes pdb files with only those residues.

protein-quest filter confidence \
    --confidence-threshold 50 \
    --min-residues 100 \
    --max-residues 1000 \
    ./downloads-af ./filtered

To filter PDBe files on chain of uniprot accession

Make PDBe files smaller by only keeping first chain of found uniprot entry and renaming to chain A.

protein-quest filter chain \
    pdbe.csv \
    ./downloads-pdbe ./filtered-chains

To filter PDBe files on nr of residues

protein-quest filter residue  \
    --min-residues 100 \
    --max-residues 1000 \
    ./filtered-chains ./filtered

To filter on secondary structure

To filter on structure being mostly alpha helices and have no beta sheets.

protein-quest filter secondary-structure \
    --ratio-min-helix-residues 0.5 \
    --ratio-max-sheet-residues 0.0 \
    --write-stats filtered-ss/stats.csv \
    ./filtered-chains ./filtered-ss

Search Taxonomy

protein-quest search taxonomy "Homo sapiens" -

Search Gene Ontology (GO)

You might not know what the identifier of a Gene Ontology term is at protein-quest search uniprot. You can use following command to search for a Gene Ontology (GO) term.

protein-quest search go --limit 5 --aspect cellular_component apoptosome -

Model Context Protocol (MCP) server

Protein quest can also help LLMs like Claude Sonnet 4 by providing a set of tools for protein structures.

To run mcp server you have to install the mcp extra with:

pip install protein-quest[mcp]

The server can be started with:

protein-quest mcp

The mcp server contains an prompt template to search/retrieve/filter candidate structures.

Contributing

For development information and contribution guidelines, please see CONTRIBUTING.md.

Name		Name	Last commit message	Last commit date
Latest commit History 140 Commits
.github/workflows		.github/workflows
.vscode		.vscode
docs		docs
src/protein_quest		src/protein_quest
tests		tests
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

protein-quest

Install

Usage

Search Uniprot accessions

Search for PDBe structures of uniprot accessions

Search for Alphafold structures of uniprot accessions

Search for EMDB structures of uniprot accessions

To retrieve PDB structure files

To retrieve AlphaFold structure files

To retrieve EMDB volume files

To filter AlphaFold structures on confidence

To filter PDBe files on chain of uniprot accession

To filter PDBe files on nr of residues

To filter on secondary structure

Search Taxonomy

Search Gene Ontology (GO)

Model Context Protocol (MCP) server

Contributing

About

Uh oh!

Releases 2

Contributors 2

Uh oh!

Languages

License

haddocking/protein-quest

Folders and files

Latest commit

History

Repository files navigation

protein-quest

Install

Usage

Search Uniprot accessions

Search for PDBe structures of uniprot accessions

Search for Alphafold structures of uniprot accessions

Search for EMDB structures of uniprot accessions

To retrieve PDB structure files

To retrieve AlphaFold structure files

To retrieve EMDB volume files

To filter AlphaFold structures on confidence

To filter PDBe files on chain of uniprot accession

To filter PDBe files on nr of residues

To filter on secondary structure

Search Taxonomy

Search Gene Ontology (GO)

Model Context Protocol (MCP) server

Contributing

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Contributors 2

Uh oh!

Languages