Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

two CV values? #57

Open
aulke opened this issue Feb 2, 2021 · 23 comments
Open

two CV values? #57

aulke opened this issue Feb 2, 2021 · 23 comments

Comments

@aulke
Copy link

aulke commented Feb 2, 2021

Hi Chris,
We are running our MS with FAIMS and multiple CV values and would like to analyse the data with MaxQuant. However, MaxQuant currently can't analyse these files.
Kevin had written a "RawSplit" tool for us to be able to analyse .raw files with MaxQuant. The tool split the .raw file into two files, depending on CV used, similar to the tool from the Coon lab (FAIMS-MzXML-Generator).
Do you have a RawSplit like functionality in RawTools? We rely heavily on RawTools and MaxQuant for our quality control pipeline, but it is quite limiting not being able to analyse multiple CV values.
We are having issues with the tool from the Coon lab not working.

Thanks for your help
Anne

@kevinkovalchik
Copy link
Owner

kevinkovalchik commented Feb 2, 2021 via email

@chrishuges
Copy link
Collaborator

Hi Anne,

I have actually been thinking about this quite a bit as we are now using FAIMS files quite often and I know RawTools has some issues with multi-CV raw files. I have been thinking about how best to implement this. The biggest problem I think RawTools has with it is with building precursor ion profiles because MS1 scans are no longer sequential.

I would echo Kevin's questions related to how you envision to use RawTools in your pipeline? What functionality is RawSplit not giving you?

Regards,
Chris

@kevinkovalchik
Copy link
Owner

kevinkovalchik commented Feb 3, 2021 via email

@aulke
Copy link
Author

aulke commented Feb 3, 2021

Hi Kevin, Hi Chris,
you do recall correctly, RawSplit was for positive / negative small molecule MS files, not for two-CV values from MS proteomics run - I clearly remembered this incorrectly. So RawSplit is not what we require.
For RawTools, we are using parse outputs (Ion injection time, intensity, etc), not the qc options. In addition, we require a way to split the multiple-CV files into i.e. two mgf files. We can feed those into MaxQuant for analysis.
Hope this answers your question.

@kevinkovalchik
Copy link
Owner

Okay, so I think there are two things:

  1. Somehow split up the analysis in RawTools so it handles multiple CVs. Possibly we can filter the scans we extract by CV and just run the whole workflow for each CV, then combine the results at the end. It seems like this would take care of the gaps in the MS1 profiles. Not sure how easy it would be.
  2. Split a raw file into multiple MGF files by CV.

1 might be tricky. I'm really only working on RawTools in my free time right now, and there isn't much of it, so it would probably take me a while to get anything ready. I've been working on reporter ion impurity corrections for someone and its taken me like 3 months to do anything at all... If Chris has time for it, though, we might go faster. Possibly the simplest thing would be to turn off the precursor peak profiling if that is all that is tripping it up. I don't recall if that is an option or not.

2 would be pretty simple, I think. We would just have to modify the RawSplit program to do it. Might be nice to put it into RawTools at the same time as it would be a useful feature.

Anne, could you upload the RawSplit program? I don't think I have the original code anymore, but I can get it out of the compiled files.

@aulke
Copy link
Author

aulke commented Feb 3, 2021

The second option sounds easier to me, but I am no programmer...
Here is RawSplit. It does not seem to work right now as it crashes upon opening, likely an easy fix.
RawSplit-new.zip

Thank you for your help!

@chrishuges
Copy link
Collaborator

Yes, it seems that splitting the raw file into independent MGF's is best for now. Although in the future it would be nice to have an option to split for the RawTools calculations and then recombine into a single file, or to keep them separate.

But can you use MGF as input for MaxQuant? I don't think so...but I could be wrong. So are you wanting us to split by CV into new raw files (or mzML)?

@aulke
Copy link
Author

aulke commented Feb 3, 2021

You are correct, MGF does not work with MaxQuant, I had that somewhere in the back of my head, too.
We usually use mzXML for small molecule files.
Let me check on my end what exactly we are doing with those.

@chrishuges
Copy link
Collaborator

I guess I am just confused exactly what your pipeline is here. RawTools + MaxQuant separately on the raw file? So, you would ideally want RawTools > split raw files by CV to mzML output > MaxQuant?

Perhaps this is a good time to code an mzML writer for RawTools

@kevinkovalchik
Copy link
Owner

kevinkovalchik commented Feb 3, 2021 via email

@chrishuges
Copy link
Collaborator

chrishuges commented Feb 3, 2021

Compomics group does as well that seems to have a good writer, rawfileparser

To add - I can't find any indication that it works or doesn't work with FAIMS data though.

@aulke
Copy link
Author

aulke commented Feb 3, 2021

I guess I am just confused exactly what your pipeline is here. RawTools + MaxQuant separately on the raw file? So, you would ideally want RawTools > split raw files by CV to mzML output > MaxQuant?

Perhaps this is a good time to code an mzML writer for RawTools

Yes, this is pretty much what we do. A quick and easy way to check the quality of our data. We have the advantage that the MS files we generate are pretty homogenous - similar to your cancer TMT files.

@aulke
Copy link
Author

aulke commented Feb 3, 2021

It looks like FAIMS data is still a unicorn considering how difficult it is to process them. I am nevertheless hoping that Mann / Cox will modify MaxQuant to utilize multiple-CV files.

@chrishuges
Copy link
Collaborator

I can work on this, but probably not until a bit later this month. The easiest thing is to just code a file that spits out MGF's of the individual CV's like Kevin mentioned above. But, we should work towards having an mzML output and to have RawTools on the whole work with these data. I have a lot of data I can use for testing.

@kevinkovalchik
Copy link
Owner

kevinkovalchik commented Feb 3, 2021 via email

@kevinkovalchik
Copy link
Owner

I mean it converts RAW files into mzML files.... not MGF into mzML

@aulke
Copy link
Author

aulke commented Feb 3, 2021

Thanks guys, looking forward to it.
I can be your beta-tester if required.

@aulke
Copy link
Author

aulke commented Feb 4, 2021

We have a converter in the lab that converts .raw into mzXML - also works for multiple-CV files. Still would need to sort the mxXML into two CV files, though.

@sorenwacker
Copy link

This is an interesting discussion. I wonder if RawTools can use mzML files as input to enable processing from other vendors than Thermo. I am preparing a manuscript about our pipeline (same lab as @aulke) and currently, the pipeline is limited to Thermo Orbitrap, because we use RawTools. But if RawTools would be able to process mzML that come from other vendors, we could drop this restriction.

@kevinkovalchik
Copy link
Owner

That is something I have thought about for a while. I agree it would be nice to allow for mzML files. In principle, writing something that loads mzML files into RawTools' internal data structure wouldn't be too difficult, but I'm sure we would discover things that don't translate well for one reason or another.

I have very limited time to work on RawTools at the moment, but it's possible I could get permission at work to dedicate a little time to it if you can include some extra names and institutions on the author list. That would also give me time to work on some other issues that have been raised over the past year that I was never able to get to. What is your timeline for publication?

@kevinkovalchik
Copy link
Owner

Ah, well, never mind about that. :) But still, what is your timeline? I will see if I can do some work on this in my free time, but it's a toss-up if I will get it done in time for you.

@chrishuges
Copy link
Collaborator

I don't know if it is worth making something that reads an mzML format versus reading the actual raw files themselves. It should be possible for vendors like Bruker and Sciex. I guess one advantage of having it read mzML would be that you could maintain complete functionality on Linux systems that I am not sure would be possible if you worked directly with other vendor formats. But, I also think this is a somewhat large undertaking adding this functionality. I will need to think about it more.

@sorenwacker
Copy link

I think the advantage would be to have common entry point for all the formats, however, I agree reading the RAW files from all the different vendors directly would be even better. I thought right now only Thermo RAW files are compatible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants