Skip to content

OpenMS parsing #187

@enryH

Description

@enryH

First of all thanks for the great tool. I use it within the nextflow pipelinebigbio/quantms and try to analyze the ProteoBench DDA ion quant module. Therefore I subselected the relevant rows from the SDRF file provide along side PXD028735. The file is the following.

dda_lfq_proteobench_SDRF.sdrf.txt

If I parse this for openms as describe in the latest nfcore workflow step:

https://github.com/bigbio/quantms/blob/655b159251206d862039d8e3990558607355c62b/modules/local/sdrfparsing/main.nf

so translated for the example this would be

parse_sdrf convert-openms -t2 -l  --extension_convert raw:mzML -s dda_lfq_proteobench_SDRF.sdrf.txt

However for OpenMS ProteomicsLFQ this produces an invalid experimental design file. The concentration duplicates for the mixture of samples are not reduced leading to a an error saying ((Fraction Group, Fraction, Label) combination can only appear once), whos combination are indeed not unique.

Fraction_Group	Fraction	Spectra_Filepath	Label	Sample
1	1	LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01.mzML	1	1
1	1	LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01.mzML	1	1
1	1	LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01.mzML	1	1
2	1	LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_02.mzML	1	2
2	1	LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_02.mzML	1	2
2	1	LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_02.mzML	1	2
3	1	LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_03.mzML	1	3
3	1	LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_03.mzML	1	3
3	1	LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_03.mzML	1	3
4	1	LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_01.mzML	1	4
4	1	LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_01.mzML	1	4
4	1	LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_01.mzML	1	4
5	1	LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_02.mzML	1	5
5	1	LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_02.mzML	1	5
5	1	LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_02.mzML	1	5
6	1	LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_03.mzML	1	6
6	1	LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_03.mzML	1	6
6	1	LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_03.mzML	1	6

Sample	MSstats_Condition	MSstats_BioReplicate
1	cond_A	1
2	cond_A	2
3	cond_A	3
4	cond_B	4
5	cond_B	5
6	cond_B	6

instead of

Fraction_Group	Fraction	Spectra_Filepath	Label	Sample
1	3	LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01.mzML	1	1
2	3	LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_02.mzML	1	2
3	3	LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_03.mzML	1	3
4	3	LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_01.mzML	1	4
5	3	LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_02.mzML	1	5
6	3	LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_03.mzML	1	6

Sample	MSstats_Condition	MSstats_BioReplicate
1	cond_A	1
2	cond_A	2
3	cond_A	3
4	cond_B	4
5	cond_B	5
6	cond_B	6

I can fix it manueally by removing duplicated raw file entries from the SDRF file, but I wonder if this should be handled better?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions