Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do we know which additional uniprot metadata features can be fetched by fetch_uniprot_metadata.py? #90

Open
taylorreiter opened this issue Oct 3, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@taylorreiter
Copy link
Member

Description of feature

I see that the following code is used to fetch uniprot metadata fields.

        python ProteinCartography/fetch_uniprot_metadata.py \
            --input {input} \
            --output {output.uniprot_features} \
            --additional-fields {UNIPROT_ADDITIONAL_FIELDS}

Would you be willing to document how to provide additional fields?

I see in the script that:

REQUIRED_FIELDS_DICT = {
    "Entry": "accession",
    "Entry Name": "id",
    "Protein names": "protein_name",
    "Gene Names (primary)": "gene_primary",
    "Annotation": "annotation_score",
    "Organism": "organism_name",
    "Taxonomic lineage": "lineage",
    "Length": "length",
    "Fragment": "fragment",
    "Sequence": "sequence",
}
OTHER_FIELDS_DICT = {
    "Reviewed": "reviewed",
    "Gene Names": "gene_names",
    "Protein existence": "protein_existence",
    "Sequence version": "sequence_version",
    "RefSeq": "xref_refseq",
    "GeneID": "xref_geneid",
    "EMBL": "xref_embl",
    "AlphaFoldDB": "xref_alphafolddb",
    "PDB": "xref_pdb",
    "Pfam": "xref_pfam",
    "InterPro": "xref_interpro",
}

but I'm not sure:

  • how to provide those fields as part of the command line argument
  • if those are the only fields that are accepted
  • how to check what other fields exist. For example, I want to know if my protein of interest has a signal peptide, but I don't think that information is currently retrieved by PC but I'm not totally sure.
@taylorreiter taylorreiter added the enhancement New feature or request label Oct 3, 2024
@braebigge
Copy link
Contributor

Thanks for pointing this out, Taylor! We'll add more documentation to cover these questions.

In the meantime, all of the available fields can be found here. I tried it out for the signal peptide field, ft_signal, using the following command line argument:

ProteinCartography/fetch_uniprot_metadata.py -i test/proteins.txt -o test/output/uniprot_features.tsv -a ft_signal

If you want to do 2 or more fields, you can separate them with a comma (but no space) like you see here:

ProteinCartography/fetch_uniprot_metadata.py -i test/proteins.txt -o test/output/uniprot_features.tsv -a ft_signal,ft_act_site,ft_transmem

Both of these commands worked for me, so I assume you can use any fields that UniProt has to offer, but let me know if you run into any that give you errors!

@taylorreiter
Copy link
Member Author

Thanks so much @braebigge! This looks perfect but I'll let you know if any edge cases come up for me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants