-
Notifications
You must be signed in to change notification settings - Fork 47
Home
The concept behind MetaMorpheus is simple. A significant percentage of peptides analyzed in bottom-up experiments contain post-translational modifications (PTMs) or sequence variants. Many search program ignore these peptides and look for only unmodified peptides. We created a way to discover these modified peptides while maintaining quality control (i.e., false discovery rate control).
There are several methods of PTM discovery in MetaMorpheus.
- G-PTM search: You can search with a database that already contains the location of known PTMs. You can get databases which contain PTMs from UniProt. When downloading a list of proteins, select the "XML" format option. If you search with this database, MetaMorpheus will automatically interpret the PTMs found in the UniProt database, and PTM-containing peptides will appear in the search results. We call this a G-PTM search.
- G-PTM-D search: MetaMorpheus can also find PTMs that are not annotated in a UniProt database. We perform a two-pass search to do this; we refer to this strategy as Global PTM Discovery (G-PTM-D). The first search finds high-scoring matches between an experimental MS/MS and a theoretical MS/MS where the difference in mass corresponds to a known PTM (e.g. 79.97 Da for phosphorylation). MetaMorpheus then annotates the PTM in the protein database so that peptide can have a phosphorylation at all possible locations (e.g. S, T and Y). The second search with the new database uses these modified peptides as theoretical peptides for the search, and these peptides will be reported in the results.
- Variable modification search: This is the "old school" way of searching for modifications that most search programs use. Generally, it is very slow and prone to high FDRs, which often go underestimated. It is not recommended to use variable modification-type searching unless the PTMs are very common in the sample (e.g., acetylation on protein N-term, or oxidation on M).
Label free quantification in MetaMorpheus is performed with https://github.com/smith-chem-wisc/FlashLFQ. You can read about it here and here. We recently enabled the software to perform normalization across conditions, samples, fractions and replicates. To perform intensity normalization, you need to define the experimental design, create a search task, and then check the box for "Normalize quantification results" in the Quantification options.
One key feature of MetaMorpheus is mass calibration. This is very useful when trying to discriminate between very similar theoretical peptides. For example, several PTMs have very similar mass; sulfonation (79.956815 Da) and phosphorylation (79.966331 Da) are only 0.009516 Da apart. Acetylation (42.010565 Da) and trimethylation (42.046950 Da) are only 0.036385 Da apart. High-quality calibration can make accurately identifying these PTMs possible.
- Download a protein database in .XML or .fasta format from UniProt and drag it onto the MetaMorpheus application. It will go where it needs to go. By the way, there is no need to unzip the database and waste all that hard-drive space. MetaMorpheus reads .gz compressed databases.
- Next, drag a couple .raw or .mzML files onto the MetaMorpheus application. Again, they'll go where they need to go.
- Click on the "New Search Task" tab. Open up "Some Search Properties" and make the appropriate settings adjustments.
- Click on "Post-Search Analysis"and decide if you want to aggregate proteins and quantify peptides. The choice is yours.
- Click on "Modifications" choose the variable/fixed mods you want to keep. NOTE: This is not G-PTM-D! Use the G-PTM-D task to discover low-abundance PTMs.
- Click "Add the Search Task"
- Finally "Run all tasks!"
Search results for each file are generated automatically in the folder that contains the original files. PSMs and aggregate unique PSMs are automatically generated. If you selected "Construct protein groups" in the "Post Search Analysis" tab, then you will also have that result to look at.
Shouldn't I calibrate my files first?
Probably.
- Click on "New Calibrate Task" and adjust the settings appropriately.
- Click on "Add the Calibration Task"
- Click on "Run all tasks!"
This one takes a little longer. So, go get a cup of coffee.
- Simple. Click on "Discover PTMs"
- Select the modification that you want to discover.
- Click on "Add the GPTMD Task"
- Click on "Run All Tasks!"
But that only makes the database annotated with possible PTMs. What you want to do now is:
- Use that new database in a regular search. If you did things right, this database will appear in the Protein Databases in the upper left and be already selected.
- Follow the directions for a regular search, which are described above.
Good news. You can. Just add all the individual tasks before you click on "Run all tasks!" and MetaMorpheus will take care of everything. In order:
- Calibrate
- G-PTM-D
- Search - NOTE: This search will include all modifications discovered in G-PTM-D automatically. HURRAY!
When you come in to work in the morning, your data processing will be complete.
How does MetaMorpheus deal with contaminants?
With your help. A nice feature of MetaMorpheus is the ability to simultaneously use multiple database files in a single search. These can be any combination of .fasta and .xml. We recommend that you download an existing database of contaminants or create your own based on the type of sample you are analyzing the the probable contaminants in your lab. Once you've done it and dragged it into the MetaMorpheus GUI, you simply CHECK the box marked CONTAMINANT. That way MetaMorpheus knows that the database you created contains the contaminant proteins. During the search, any peptide matching a contaminant peptide gets assigned to a contaminant even if there is an exact duplicate in the target protein database. During protein parsimony, contaminant peptides and proteins get assigned first and are not included in target protein parsimony.
Look no further. Below you will find all the technical details that make MetaMorpheus hum. If you have an issue or a question, please click on the "Issues" tab of this GitHub repository and let us know. We'll respond quickly.
Checking the "Aggregate Proteins" button in the MetaMorpheus "Add Search Task" window constructs the most concise possible list of proteins that could account for all observed peptides ("maximum parsimony"). Peptides are assigned to proteins by the following rules:
- All peptides that could be assigned to a decoy protein are removed from any target protein associations (i.e., they are only assigned to decoy protein(s)).
- A peptide that can only be assigned to one protein is a "unique" peptide; this protein is added and all peptides that could be assigned to that protein are assigned to that protein.
- The remaining unaccounted-for peptides are assigned by the "greedy algorithm", which iteratively chooses a protein by how many peptides it can account for. For instance, if a protein can account for 4 unaccounted-for peptides, this is superior to a protein that would only account for 2 peptides. If two proteins have the same number of unaccounted-for peptides in the given iteration, the protein with the most total peptides is added. The loop continues until all peptides are accounted for.
- Any protein that is indistinguishable (i.e., has the same set of peptides) from a protein in the resulting parsimonious list is added to that protein's group.
- Protein groups are scored by using the highest-scoring PSM below 1% FDR belonging to that group. Peptides below 1% FDR are not displayed in the protein groups list.
In addition to the set of included modifications, MetaMorpheus allows adding user-defined modifications.
Some proteins are present in biological samples as subsequences of the complete sequence specified in the database. Since they are common, and UniProt lists these protein fragments, we expanded the search functionality to look for those as well.
The open-mass search is enhanced by automatic mass-difference histogram generation. The mass-difference of each PSM below 1% FDR is used for this analysis. The results of the analysis are written in a separate file, and they include the total number of unique peptides associated with the mass shift, the fraction of decoys, mass match with any known entry in the UniMod or UniProt databases, amino acid addition/removal combination, combination of higher frequency peaks, fraction of localizable targets, localization residues and/or termini, and presence of any modifications in the matched peptides. All of this data can then be used to determine the nature of the mass-difference, and the characteristics of the corresponding modification.