Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenMS parsing #187

Open
enryH opened this issue Jan 16, 2025 · 0 comments
Open

OpenMS parsing #187

enryH opened this issue Jan 16, 2025 · 0 comments

Comments

@enryH
Copy link
Contributor

enryH commented Jan 16, 2025

First of all thanks for the great tool. I use it within the nextflow pipelinebigbio/quantms and try to analyze the ProteoBench DDA ion quant module. Therefore I subselected the relevant rows from the SDRF file provide along side PXD028735. The file is the following.

dda_lfq_proteobench_SDRF.sdrf.txt

If I parse this for openms as describe in the latest nfcore workflow step:

https://github.com/bigbio/quantms/blob/655b159251206d862039d8e3990558607355c62b/modules/local/sdrfparsing/main.nf

so translated for the example this would be

parse_sdrf convert-openms -t2 -l  --extension_convert raw:mzML -s dda_lfq_proteobench_SDRF.sdrf.txt

However for OpenMS ProteomicsLFQ this produces an invalid experimental design file. The concentration duplicates for the mixture of samples are not reduced leading to a an error saying ((Fraction Group, Fraction, Label) combination can only appear once), whos combination are indeed not unique.

Fraction_Group	Fraction	Spectra_Filepath	Label	Sample
1	1	LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01.mzML	1	1
1	1	LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01.mzML	1	1
1	1	LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01.mzML	1	1
2	1	LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_02.mzML	1	2
2	1	LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_02.mzML	1	2
2	1	LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_02.mzML	1	2
3	1	LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_03.mzML	1	3
3	1	LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_03.mzML	1	3
3	1	LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_03.mzML	1	3
4	1	LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_01.mzML	1	4
4	1	LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_01.mzML	1	4
4	1	LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_01.mzML	1	4
5	1	LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_02.mzML	1	5
5	1	LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_02.mzML	1	5
5	1	LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_02.mzML	1	5
6	1	LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_03.mzML	1	6
6	1	LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_03.mzML	1	6
6	1	LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_03.mzML	1	6

Sample	MSstats_Condition	MSstats_BioReplicate
1	cond_A	1
2	cond_A	2
3	cond_A	3
4	cond_B	4
5	cond_B	5
6	cond_B	6

instead of

Fraction_Group	Fraction	Spectra_Filepath	Label	Sample
1	3	LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01.mzML	1	1
2	3	LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_02.mzML	1	2
3	3	LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_03.mzML	1	3
4	3	LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_01.mzML	1	4
5	3	LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_02.mzML	1	5
6	3	LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_03.mzML	1	6

Sample	MSstats_Condition	MSstats_BioReplicate
1	cond_A	1
2	cond_A	2
3	cond_A	3
4	cond_B	4
5	cond_B	5
6	cond_B	6

I can fix it manueally by removing duplicated raw file entries from the SDRF file, but I wonder if this should be handled better?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant