Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ppm paramenter for ChromPeakAreaParam #756

Open
stanstrup opened this issue Jun 27, 2024 · 1 comment
Open

ppm paramenter for ChromPeakAreaParam #756

stanstrup opened this issue Jun 27, 2024 · 1 comment

Comments

@stanstrup
Copy link
Contributor

stanstrup commented Jun 27, 2024

Hi,

I was looking into the newish ChromPeakAreaParam and trying to understand how wide windows it actually creates compared to the old way. It is definitely better than the old way that would go all the way to zero. But it still goes a lot lower than I think is likely optimal.
So to start with the conclusion I think it would still be advisable to have a minimum ppm window like in the old function.

So first I calculated what the windows will look like for each feature. The peak picking ppm was 30. But we see that some windows would go as low as 7 ppm for some features. From the analysis below I would argue that that could be too low.

peaks <- cbind(chromPeaks(xset_g_r_g_fill), chromPeakData(xset_g_r_g_fill)) %>% 
          as_tibble %>% 
          mutate(filled = if_else(is_filled, "filled", "not filled"))
feature_def <- featureDefinitions(xset_g_r_g_fill) %>% 
                as_tibble(rownames="feature_id") %>% 
                rename(f_mzmed = "mzmed", f_mzmin = "mzmin", f_mzmax = "mzmax", f_rtmed = "rtmed", f_rtmin = "rtmin", f_rtmax = "rtmax") %>% 
                unnest(peakidx)

peaks_sum <- peaks %>% 
                mutate(peakidx = 1:n()) %>% 
                full_join(feature_def, by = "peakidx") %>% 
                group_by(feature_id, filled) %>% 
                summarise(mz = quantile(mz, probs = 0.25, names = FALSE),
                          mzmin = quantile(mzmin, probs = 0.25, names = FALSE),
                          mzmax = quantile(mzmax, probs = 0.25, names = FALSE),
                          .groups = "drop"
                          )

peaks_sum_notfilled <- peaks_sum %>% 
  filter(filled == "not filled") %>% 
  mutate(ppm_dev = 1E6*(mzmax-mzmin)/mz)
ggplot(peaks_sum_notfilled, aes(x=ppm_dev)) +
  geom_histogram(position="identity", binwidth = 1,linewidth = NA) + 
  xlab("ppm") + ggtitle("Histogram of ppm window across matched peaks")+ 
  scale_x_continuous(breaks = seq(0,1000,10), lim = c(0,max(peaks_sum_notfilled$ppm_dev))) +
  theme_bw()

image


Then we can look at the range for the found peaks. We observe that for the peaks filled we more or less inherited the original ppm value, while the peak picked peaks go much higher, since that in the ROI definition the range can be more or less +/- the ppm setting.
ggplot(peaks, aes(x=1E6*(mzmax-mzmin)/mz)) +
  geom_histogram(position="identity", binwidth = 2,linewidth = NA) + 
  facet_grid(filled~., scales = "free_y") +
  xlab("ppm") + ggtitle("Histogram of ppm deviation for all peaks")+ 
  scale_x_continuous(breaks = seq(0,1000,10)) +
  theme_bw()

image

Finally I wanted to see how window size was correlated to intensity since I assume smaller peaks are less accurate and that gapfilling mainly will be look at small peaks.

ggplot(peaks, aes(x =into, y=1E6*(mzmax-mzmin)/mz)) +
  geom_point(size = 1, alpha = 0.5) + 
  facet_grid(filled~.) +
  xlab("intensity") + 
  ylab("ppm dev") + 
  ggtitle("Intensity, ppm deviation across matched peaks")+ 
  theme_bw()+ 
  scale_y_continuous(limits = c(0,NA))

image

My conclusion here would be that small peaks go as high as 60 ppm, but the windows we will be doing integrating in during gap-filling would be half that.


Bonus: if we set the prop to 1 the window sizes are still potentially too low.

image

So based on all this wouldn't a safely valve ppm paramenter make sense?

@jorainer
Copy link
Collaborator

Thanks for the thorough analysis/description @stanstrup . Actually, it's funny that you are worried the m/z width being too small - I was initially afraid that it might be in fact too large :)

One thing, when you compare the intensities between the gap-filled and detected data - I think it would be fair/correct to only consider the detected peak signal of features for which you also have gap-filled signal. otherwise you're comparing two different things with each other. With ChromPeakAreaParam(mzmin = min, mzmax = max, rtmin = min, rtmax = max) I would then actually expect that the ppm deviation should be identical - because the gap-filled signal, in the samples where no chrom peak was detected, will be integrated based on the m/z and rt ranges of all detected chrom peaks (for that feature).

But, if I understand you correctly, you ask for an additional parameter ppmMin that would allow the user to define a minimal m/z deviation/range that always should be used to integrate data from, i.e. that m/z min for a feature would be the smaller value of mzmin and the feature's "mzmed" - ppm("mzmed", ppmMin), and m/z max the same, just with max?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants