Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Glycan Library and Enzyme Cleavage Sites for O-Glycopeptide Searches in O-Pair #2388

Open
MuZi-Y opened this issue Jul 25, 2024 · 3 comments

Comments

@MuZi-Y
Copy link

MuZi-Y commented Jul 25, 2024

Hello,

First of all, I would like to commend you on the excellent work you have done with O-Pair. It is an incredibly powerful tool for O-glycopeptide identification, and it has significantly enhanced our research capabilities.

I have a couple of questions regarding the customization options in O-Pair for O-glycopeptide searches:

Custom Glycan Library: Is it possible to perform O-glycopeptide searches using a custom glycan library in O-Pair? If so, could you please provide guidance or instructions on how to set this up?

Custom Enzyme Cleavage Sites: Similarly, can O-Pair be configured to use custom enzyme cleavage sites for O-glycopeptide digestion? If this feature is available, I would appreciate it if you could provide detailed steps or an example on how to implement it.

Thank you for your time and for developing such a valuable tool. I look forward to your response.

Best regards,

Yue

@RayMSMS
Copy link
Contributor

RayMSMS commented Jul 26, 2024

Hi Yue,

We appreciate your comment and are glad to help.

Regarding your concern, please open the MetaMorpheus:
Find and click the "Settings" button, you will find the option "Open mods/data folder", please click it.

For glycan-library, please enter the "Glycan_Mods" folder then select "OGlycan". That is the folder to upload your custom glycan database. (reference file: Olgycan Database 28 glycans.txt).

For custom enzyme, please enter "ProteolyticDigestion" and then open the xls file in there. Add the information of your custom enzyme into the Name, Sequence, and Cleavage Specificity columns. (Note: the bar in the Sequence means "Cut after" or "Before"). Finally, remember to close and restart the whole software after adding.

Hope that can solve your problem, and look forward to your feedback.

Ray

@MuZi-Y
Copy link
Author

MuZi-Y commented Jul 28, 2024

Hello Ray,

Thank you very much for your detailed response and the helpful instructions. Your guidance has been incredibly useful.

I have a few more questions I hope you can assist me with:

  1. IndividualFileResults Folder: Is this folder supposed to contain the results for each raw file? In my case, the folder is usually empty. Could you please provide some insight into why this might be happening and how I can ensure the results are correctly generated in this folder?

  2. protein_oglyco_localization File: This file is almost empty in my results. Is this normal?

  3. seen_oglyco_localization File: This file is supposed to contain protein and site information, but it seems to lack decoy information, which makes subsequent analysis challenging. Could you provide some guidance on how to include this information?

  4. ind Files in Raw Folder: After the search is complete, I notice ind files appearing in the folder containing the raw files. Could you explain the purpose of these files?

  5. In result file: "PSMs within 1% FDR: 0; Delta Score Used for FDR Analysis in result file: False; Posterior error probability analysis failed. This can occur for small data sets when some sample groups are missing positive or negative training examples." Does this indicate that my results are not reliable?

  6. T Task Running Indefinitely: Sometimes, after each raw file search is complete, the top T task keeps running and never outputs results. What could be causing this issue?

  7. Dual Dissociation (HCD+EThcD) Settings: When using dual dissociation (one precursor ion scan followed by both HCD and EThcD fragmentations), should the Dissociation type be set to EThcD and the Child Scan Dissociation set to HCD? I only get more search results with these settings.

  8. Enzyme Digestion Table: In the enzyme digestion table, should column G's "StcE/trpsin" be corrected to "StcE/trypsin"?

  9. Large Glycan Libraries: When using glycan libraries with 28 or 32 entries, the search seems to take significantly longer. With even larger libraries, the software tends to freeze or crash. Is there a way to address this issue?

  10. User Manual: I couldn't find a detailed user manual on GitHub. Could you provide a comprehensive user manual? I believe this would greatly enhance the usability of O-Pair.

Thank you once again for your assistance and for the excellent work on O-Pair. I look forward to your response.

Best regards,

Yue

@RayMSMS
Copy link
Contributor

RayMSMS commented Jul 29, 2024

Hi Yue,
Regarding your concern, the following are our reply:

  1. The “write individual result file” is an optional button in the search parameter, if you click it, MetaMorpheus will generate the individual File corresponding raw file. On the other side, the whole result will be located in the same Psm files instead.
  2. protein_oglyco_localization File” will organize the protein glyco_site information, however, it only shows the most confident glycan_localization pair (100% located and 0.01 Q value) in this file. So in some cases, there is nothing inside depending on your result.
  3. The file is aimed to show all of the target glycan-match and list by the peptide classification. So we are not supposed to show the decoy information in that file. I strongly recommend you use the “o-glyco” to do the decoy search.
  4. These are internal files used by Metamorpheus. MetaMorpheus creates an index lookup table prior to the search and then writes this to disk. You don’t need to worry about this.
  5. The posterior error probability calculator using machine learning, requires a large amount of training data. Your results list was too short to train the model effectively. It does not mean your results were unreliable. But, it would be a good idea to closely check them.
  6. Please send us the raw file, database, and toml so that we can run it on our end. If there is an error, we will find it.
  7. In your case, the parameter should be set as Dissociation as HCD and child scan as “EThcD”. If you checked the parameter toml.file, you will see that means DissociationType = "HCD", MS2ChildScanDissociationType = "EThcD". It should be fit to your case.
  8. That should be our typo issue. Thank you for helping us find that.
  9. The more database means more possible combinations to your prediction, in order to get a comprehensive result, which will lead to a longer analysis time. But we are now trying to edit our algorithm to improve our speed.
  10. Sure, the link is put below as a reference.
    [https://drive.google.com/drive/folders/1_xAcH1k2Bqf-IyTAsUsK_WuM9o0qKj3Y?usp=sharing]

Thank you so much for your feedback, please contact us if you have any problems.

Ray

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants