Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in filterPrecursorScan() #194

Open
lgatto opened this issue Apr 7, 2021 · 4 comments
Open

Bug in filterPrecursorScan() #194

lgatto opened this issue Apr 7, 2021 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@lgatto
Copy link
Member

lgatto commented Apr 7, 2021

> library(rpx)
> PXD022816 <- PXDataset("PXD022816")
> (mzmls <- pxget(PXD022816, grep("mzML", pxfiles(PXD022816))[1:2]))
Loading QEP2LC6_HeLa_50ng_251120_01-calib.mzML from cache.
Loading QEP2LC6_HeLa_50ng_251120_02-calib.mzML from cache.
[1] "~/.cache/rpx/207e04bccd3dd_QEP2LC6_HeLa_50ng_251120_01-calib.mzML"
[2] "~/.cache/rpx/207e03973c908_QEP2LC6_HeLa_50ng_251120_02-calib.mzML"
> sp <- Spectra(mzmls)
> sp2 <- filterPrecursorScan(sp, 2490)
> length(sp2)
[1] 12
spectraData(sp2)[, c("msLevel", "acquisitionNum", "precScanNum", "dataOrigin")]
msLevel acquisitionNum precScanNum dataOrigin
1 2479 NA 207e04bccd3dd_QEP2LC6_HeLa_50ng_251120_01-calib.mzML
2 2480 2479 207e04bccd3dd_QEP2LC6_HeLa_50ng_251120_01-calib.mzML
1 2482 NA 207e04bccd3dd_QEP2LC6_HeLa_50ng_251120_01-calib.mzML
2 2485 2482 207e04bccd3dd_QEP2LC6_HeLa_50ng_251120_01-calib.mzML
1 2486 NA 207e04bccd3dd_QEP2LC6_HeLa_50ng_251120_01-calib.mzML
2 2490 2486 207e04bccd3dd_QEP2LC6_HeLa_50ng_251120_01-calib.mzML
1 2479 NA 207e03973c908_QEP2LC6_HeLa_50ng_251120_02-calib.mzML
1 2480 NA 207e03973c908_QEP2LC6_HeLa_50ng_251120_02-calib.mzML
2 2482 2480 207e03973c908_QEP2LC6_HeLa_50ng_251120_02-calib.mzML
1 2485 NA 207e03973c908_QEP2LC6_HeLa_50ng_251120_02-calib.mzML
2 2486 2485 207e03973c908_QEP2LC6_HeLa_50ng_251120_02-calib.mzML
2 2490 2485 207e03973c908_QEP2LC6_HeLa_50ng_251120_02-calib.mzML

Issues

  1. First, the acquisition number for all files are extracted, which is probably not what the user wanted. This could be handled explicitly with a call to filterDataOrigin() if I wanted to extract the set from a single file.
  2. The result is wrong. We get the acquisition numbers 2490 from the two files, their precursor scans 2485 and 2486 (again from both files), but then these recursively extract each other across different files. The correct results should contain 4 scans:
> spectraData(filterPrecursorScan(filterDataOrigin(sp, unique(dataOrigin(sp))[1]), 2490))[, c("msLevel", "acquisitionNum", "precScanNum", "dataOrigin")]
msLevel acquisitionNum precScanNum dataOrigin
1 2486 NA 207e04bccd3dd_QEP2LC6_HeLa_50ng_251120_01-calib.mzML
2 2490 2486 207e04bccd3dd_QEP2LC6_HeLa_50ng_251120_01-calib.mzML

and

> spectraData(filterPrecursorScan(filterDataOrigin(sp, unique(dataOrigin(sp))[2]), 2490))[, c("msLevel", "acquisitionNum", "precScanNum", "dataOrigin")]
msLevel acquisitionNum precScanNum dataOrigin
1 2485 NA 13707d04ce6b_QEP2LC6_HeLa_50ng_251120_02-calib.mzML
2 2490 248513707d04ce6b_QEP2LC6_HeLa_50ng_251120_02-calib.mzML

Suggested workarounds

  • We make filterPrecursorScan() only work with a data from a single data origin ( 👎🏻 )
  • We run it on different files to (1) avoid the wrong results (recursive scan selection across files) even though an acquisition should be file specific (not ideal)
  • We add a dataOrigin argument (that can be omitted when there's only a single data origin) that needs to be same length at the acquisition number ( 👍🏻 )

@sgibb @jorainer - any comments/suggestions?

@lgatto lgatto added the bug Something isn't working label Apr 7, 2021
@jorainer
Copy link
Member

jorainer commented Jul 1, 2021

I think the cleanest solution would be to add an argument f = dataOrigin(object) to the filterPrecursorScan method. Internally, the method should then split the data by this f and call the internal function to filter on each subset. We already use the f = dataOrigin(object) for other function to either perform calculations in parallel or to apply it on a per-file basis.

If OK I would implement that - and ask you @lgatto to confirm the results to be correct then.

@jorainer jorainer self-assigned this Jul 1, 2021
jorainer added a commit that referenced this issue Jul 1, 2021
- Add parameter `f = dataOrigin(object)` to `filterPrecursorScan` to allow
  filtering within each sample/file. This fixes issue #194.
lgatto added a commit that referenced this issue Jul 2, 2021
fix: per-file filtering in filterPrecursorScan (issue #194)
@lgatto
Copy link
Member Author

lgatto commented Jul 2, 2021

Fixed by PR #206. Thanks @jorainer

@jorainer
Copy link
Member

jorainer commented Jul 2, 2021

Then I guess we can close the issue now.

@jorainer jorainer closed this as completed Jul 2, 2021
@lgatto lgatto reopened this Apr 7, 2023
@lgatto
Copy link
Member Author

lgatto commented Apr 7, 2023

This bug isn't fixed.

Using the default f, i.e. all files is OK

> library(rpx)
> PXD022816 <- PXDataset("PXD022816")
Loading PXD022816 from cache.
> (mzmls <- pxget(PXD022816, grep("mzML", pxfiles(PXD022816))[1:2]))
[1] "/home/lgatto/.cache/R/rpx/3e55cc6b3df7f9_QEP2LC6_HeLa_50ng_251120_01-calib.mzML"
[2] "/home/lgatto/.cache/R/rpx/3e55cc7a53b04f_QEP2LC6_HeLa_50ng_251120_02-calib.mzML"
> sp <- Spectra(mzmls)
> (sp2 <- filterPrecursorScan(sp, 2490)) 
MSn data (Spectra) with 4 spectra in a MsBackendMzR backend:
    msLevel     rtime scanIndex
  <integer> <numeric> <integer>
1         1   915.757      2486
2         2   916.293      2490
3         1   915.252      2485
4         2   915.876      2490
 ... 33 more variables/columns.

file(s):
3e55cc6b3df7f9_QEP2LC6_HeLa_50ng_251120_01-calib.mzML
3e55cc7a53b04f_QEP2LC6_HeLa_50ng_251120_02-calib.mzML
Processing:
 Filter: select parent/children scans for 2490 [Fri Apr  7 23:57:32 2023] 

But of I want to get the results for one file, the f argument get ignores:

> filterPrecursorScan(sp, 2490, f = "/home/lgatto/.cache/R/rpx/3e55cc6b3df7f9_QEP2LC6_HeLa_50ng_251120_01-calib.mzML")
MSn data (Spectra) with 4 spectra in a MsBackendMzR backend:
    msLevel     rtime scanIndex
  <integer> <numeric> <integer>
1         1   915.757      2486
2         2   916.293      2490
3         1   915.252      2485
4         2   915.876      2490
 ... 33 more variables/columns.

file(s):
3e55cc6b3df7f9_QEP2LC6_HeLa_50ng_251120_01-calib.mzML
3e55cc7a53b04f_QEP2LC6_HeLa_50ng_251120_02-calib.mzML
Processing:
 Filter: select parent/children scans for 2490 [Fri Apr  7 23:59:34 2023] 

Indeed, looking at this line, if does get ignored.

If I change it, passing f on that same line:

> filterPrecursorScan(sp, 2490, f = "/home/lgatto/.cache/R/rpx/3e55cc6b3df7f9_QEP2LC6_HeLa_50ng_251120_01-calib.mzML")
Error in i2index(i, length(x), rownames(x@spectraData)) : 
  if i is logical it has to match the length of the object (58907)
In addition: Warning message:
In x[i] <- value[[j]] :
  number of items to replace is not a multiple of replacement length

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants