Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPTAC MS dataset files in s3 bucket are not found #75

Open
pixuenan opened this issue Sep 20, 2024 · 7 comments
Open

CPTAC MS dataset files in s3 bucket are not found #75

pixuenan opened this issue Sep 20, 2024 · 7 comments

Comments

@pixuenan
Copy link

pixuenan commented Sep 20, 2024

Hi, thanks for creating this tool. But recently when I use pepquery2 in web application and stand alone version, there was error about the mgf file in s3 bucket is not exist. I attached the screenshot of the error on web application.
Screenshot from 2024-09-20 16-24-07

My input peptide is MAEASPHPGRYFCHCCSVEIVPRLPIISVQDASLVLSRSFRKRPEHRKWFCPLHSSHRPEPATVGHVDQHLFTLPQGYGQFAFGIFDDSFEIPTFPPGAQADDGRDPESRRERDHPSRHRYGARQPRARLTTRRATGRHEGVPTLEG

@wenbostar
Copy link
Collaborator

For a given peptide precursor (a combination of peptide sequence, charge and modification), if there is no any spectra matched from the query, it will print out something like "*.mgf doesn't exist". This is not an error message from the search.

@wenbostar
Copy link
Collaborator

It’s quite common for some peptides not to have any spectra matched in a query

@pixuenan
Copy link
Author

Thanks for the reply

@pixuenan pixuenan reopened this Sep 22, 2024
@pixuenan
Copy link
Author

May I know is there a way to query multiple protein sequences as in a single input file in the stand alone version? It seems that only multiple peptides query in a single input file is supported now.

@wenbostar
Copy link
Collaborator

Yes, you could put your protein sequences in a FASTA format file like the one below and then set parameter as "-i target_proteins.fasta -t protein -s 1". This only works for novel protein search not known protein search.

target_proteins.fasta :

>sp|A0A087WT01|TVA27_HUMAN T cell receptor alpha variable 27 OS=Homo sapiens OX=9606 GN=TRAV27 PE=1 SV=1
MVLKFSVSLLWLQLAWVSTQLLEQSPQFLSLQEGENLTVYCNSSSVFSSLQWYRQEPGEG
PVLLVTVVTGGEVKKLKRLTFQFGDARKDSSLHLTAAQTGDTGLYLCAG
>sp|A0A1B0GTB2|TUNAR_HUMAN Protein TUNAR OS=Homo sapiens OX=9606 GN=TUNAR PE=1 SV=2
MVLTSENDEDRGGQEKESKEESVLAMLGLLGTLLNLLVLLFVYLYTTL
>sp|A0A1W2PP97|THSD8_HUMAN Thrombospondin type-1 domain-containing protein 8 OS=Homo sapiens OX=9606 GN=THSD8 PE=3 SV=2
MARTPGALLLAPLLLLQLATPALVYQDYQYLGQQGEGDSWEQLRLQHLKEVEDSLLGPWG
KWRCLCDLGKQERSREVVGTAPGPVFMDPEKLLQLRPCRQRDCPSCKPFDCDWRL
>sp|A0AUZ9|KAL1L_HUMAN KAT8 regulatory NSL complex subunit 1-like protein OS=Homo sapiens OX=9606 GN=KANSL1L PE=1 SV=2
MTPALREATAKGLSFSSLPSTMESDKMLYMESPRTVDEKLKGDTFSQMLGFPTPEPTLNT
NFVNLKHFGSPQSSKHYQTVFLMRSNSTLNKHNENYKQKKLGEPSCNKLKNLLYNGSNLQ
LSKLCLSHSEEFLKKEPLSDTTSQCMKDVQLLLDSNLTKDTNVDKVQLQNCKWYQENALL
DKVTDAELKKGLLHCTQKKLVPGHSNVPVSSSAAEKEEEVHARLLHCVSKQKLLLSQARR
TQKHLQMLLAKHVVKHYGQQMKLSMKHQLPKMKTFHEPTTLLGNSLPKCTELKPEVNTLT
AENKLWDDAKNGFARCTAAELQRFAFSATGLLSHVEEGLDSDATDSSSDDDLDEYTLRKN
VAVNCSTEWKWLVDRARVGSRWTWLQAQLSDLECKLQQLTDLHRQLRASKGLVVLEECQL
PKDLLKKQMQFADQAASLNLLGNPQVPQECQDPVPEQDFEMSPSSPTLLLRNLEKQSAQL
TELLNSLLAPLNLSPTSSPLSSKSCSHKCLANGLYRSASENLDELSSSSSWLLNQKHSKK
KRKDRTRLKSSSLTFMSTSARTRPLQSFHKRKLYRLSPTFYWTPQTLPSKETAFLNTTQM
PCLQSASTWSSYEHNSESYLLREHVSELDSSFHSVLSLPSDVPLHFHFETLLKKTELKGN
LAENKFVDEYLLSPSPVHSTLNQWRNGYSPLCKPQLRSESSAQLLQGRKKRHLSETALGE
RTKLEESDFQHTESGSHSNFTAVSNVNVLSRLQNSSRNTARRRLRSESSYDLDNLVLPMS
LVAPAKLEKLQYKELLTPSWRMVVLQPLDEYNLGKEELEDLSDEVFSLRHKKYEEREQAR
WSLWEQSKWHRRNSRAYSKNVEGQDLLLKEYPNNFSSSQQCAAASPPGLPSENQDLCAYG
LPSLNQSQETKSLWWERRAFPLKGEDMAALLCQDEKKDQVERSSTAFHGELFGTSVPENG
HHPKKQSDGMEEYKTFGLGLTNVKKNR

@pixuenan
Copy link
Author

Thanks, that helps a lot. May I ask how to say a protein search result is confident or not? By looking at the pepquery result, the psm_rank.txt is reported at the peptide level. Is there any downstream analysis required for the novel protein identification?

@wenbostar
Copy link
Collaborator

We have some description at http://pepquery.org/document.html#saoutput to show how to interpret the result in the psm_rank.txt file, such as how a match is considered as confident in a query.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants