Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the intensity/density of the peak #2

Open
Cheng111 opened this issue Oct 25, 2018 · 7 comments
Open

Questions about the intensity/density of the peak #2

Cheng111 opened this issue Oct 25, 2018 · 7 comments
Labels

Comments

@Cheng111
Copy link

Hi, in the Readme page, you mentioned the GUI file OpenSeachKDE.jar. However, I didn't find it in the releases page. Where can I find it? Thank you so much.

@Cheng111
Copy link
Author

Hi, and I have another question. In the plot, the y-axis is "PSM density" and in the output file, there is one column called "intensity." From the value, I guess they are the same thing. I'd like to double check with you is it the intensity of the mass spectrum?. Is it possible that, in some conditions, two PSM maybe have same delta mass. What the software will do in this condition, will it calculate the average or sum or the biggest intensity then draw the graph?

@Cheng111 Cheng111 changed the title OpenSeachKDE.jar doesn't in the releases page Questions about the intensity/density of the peak Oct 25, 2018
@chhh
Copy link
Owner

chhh commented Oct 25, 2018

  1. It's the same jar file (okde-...jar), just forgot to change the name everywhere. (https://github.com/chhh/deltamass/releases/latest)
  2. The intensity is not from the mass spectrum, it can't be, because data from multiple peptide identification files is plotted simultaneously. You can have a 100 input pepxml files, for example. The intensity of peaks roughly translates to the frequency of that particular mass shift.

@Cheng111
Copy link
Author

Cheng111 commented Oct 26, 2018

  1. It's the same jar file (okde-...jar), just forgot to change the name everywhere. (https://github.com/chhh/deltamass/releases/latest)
  2. The intensity is not from the mass spectrum, it can't be, because data from multiple peptide identification files is plotted simultaneously. You can have a 100 input pepxml files, for example. The intensity of peaks roughly translates to the frequency of that particular mass shift.

Thank you for the reply. But, I still have two questions. First, does the "frequency" means the number of PSM? It looks little inconsistent in my data. In the DeltaMass output, the density of the peak of 16.99 is around 350K, however, the number of PSM which has delta mass between [16.9,17.1) is less than 60K in my data. I attached the screen cut here. Other peaks have similar problems.

Second, in the output table, the intensity is decimals, such as 351732.6621, but not integers. Why the frequency is not integers. Do you do some normalization? Thank you so much
16 99935
image

@chhh
Copy link
Owner

chhh commented Oct 26, 2018

@Cheng111 That's why I said "roughly translates". Peak height is proportional to the number of PSMs, but it's not the number of PSMs, it is the density of PSMs. The area under the curve equals the number of PSMs you have in your pepxml files, the absolute height is indicative of the frequency of a specific mass shift, but it can't be translated to the number of PSMs directly. To get the number of PSMs you need to integrate.

Another way to think about it is the following: let's say I give you a specific number 17.00572 and ask you a question "how many PSMs in your files have this mass shift?". You can run your grep command and see that the result it zero. Well, that's because no PSM had that exact mass shift. Yet, the density plot shows you that the region around Mass Delta 17.00572 is very dense, the density of PSMs around that area is about 300,000 PSMs/Da. The numbers look huge to you because the density is per Dalton and one Dalton is a lot compared to the average peak width you see, in your example peak width is more like 0.1 Da or less. If you had the same amount of PSMs but dispersed "wider" the peak would become less tall and broader, but the integral (the area under the red line would stay the same).

Let's use your example screenshot displaying mass shifts from 16.90 to 17.35. The main peak spans roughly from 16.95 to 17.05, so the width = (17.05-16.95) = 0.1 Da. The top of the peak is somewhere at 320,000 PSM/Da, so our rough estimate of the number of PSMs contributing to this peak is num_psms = width * height = 0.1 * 320,000 = 32,000. This is a very crude estimate, but it's in agreement with your grep based numbers (note that the values you're grepping for might occur multiple times in the file).

@Cheng111
Copy link
Author

@Cheng111 That's why I said "roughly translates". Peak height is proportional to the number of PSMs, but it's not the number of PSMs, it is the density of PSMs. The area under the curve equals the number of PSMs you have in your pepxml files, the absolute height is indicative of the frequency of a specific mass shift, but it can't be translated to the number of PSMs directly. To get the number of PSMs you need to integrate.

Another way to think about it is the following: let's say I give you a specific number 17.00572 and ask you a question "how many PSMs in your files have this mass shift?". You can run your grep command and see that the result it zero. Well, that's because no PSM had that exact mass shift. Yet, the density plot shows you that the region around Mass Delta 17.00572 is very dense, the density of PSMs around that area is about 300,000 PSMs/Da. The numbers look huge to you because the density is per Dalton and one Dalton is a lot compared to the average peak width you see, in your example peak width is more like 0.1 Da or less. If you had the same amount of PSMs but dispersed "wider" the peak would become less tall and broader, but the integral (the area under the red line would stay the same).

Let's use your example screenshot displaying mass shifts from 16.90 to 17.35. The main peak spans roughly from 16.95 to 17.05, so the width = (17.05-16.95) = 0.1 Da. The top of the peak is somewhere at 320,000 PSM/Da, so our rough estimate of the number of PSMs contributing to this peak is num_psms = width * height = 0.1 * 320,000 = 32,000. This is a very crude estimate, but it's in agreement with your grep based numbers (note that the values you're grepping for might occur multiple times in the file).

Thank you so much for the detailed reply. It's very clearly and useful.

@Cheng111
Copy link
Author

  1. It's the same jar file (okde-...jar), just forgot to change the name everywhere. (https://github.com/chhh/deltamass/releases/latest)
  2. The intensity is not from the mass spectrum, it can't be, because data from multiple peptide identification files is plotted simultaneously. You can have a 100 input pepxml files, for example. The intensity of peaks roughly translates to the frequency of that particular mass shift.

Hi, could you also briefly explain how you calculate the quality and score of the delta mass peak? Is there a general critical value which is used to control the quality of the detected peak or it depends on the input data? Thank you so much

@chhh
Copy link
Owner

chhh commented Nov 13, 2018

Quality values are determined by the input data, the absolute value of quality makes no sense, it's a reflection of how "pointy" and "well defined" a peak is. If it's a wide shallow peak, or like a small hump sitting on the side of another peak quality is lower, if it's a standalone narrow well defined peak - quality is higher. Score is just quality multiplied by intensity, it's just used to rank the peaks in the output list. If you sort the peaks by score you'll likely notice that peaks at the top of the list have matching annotations from unimod, while peaks at the bottom normally don't.

@chhh chhh added the wiki label Nov 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants