-
Notifications
You must be signed in to change notification settings - Fork 26
Description
First of all thanks for the great tool. I use it within the nextflow pipelinebigbio/quantms
and try to analyze the ProteoBench DDA ion quant module. Therefore I subselected the relevant rows from the SDRF file provide along side PXD028735. The file is the following.
dda_lfq_proteobench_SDRF.sdrf.txt
If I parse this for openms as describe in the latest nfcore workflow step:
so translated for the example this would be
parse_sdrf convert-openms -t2 -l --extension_convert raw:mzML -s dda_lfq_proteobench_SDRF.sdrf.txt
However for OpenMS ProteomicsLFQ
this produces an invalid experimental design file. The concentration duplicates for the mixture of samples are not reduced leading to a an error saying ((Fraction Group, Fraction, Label) combination can only appear once)
, whos combination are indeed not unique.
Fraction_Group Fraction Spectra_Filepath Label Sample
1 1 LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01.mzML 1 1
1 1 LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01.mzML 1 1
1 1 LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01.mzML 1 1
2 1 LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_02.mzML 1 2
2 1 LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_02.mzML 1 2
2 1 LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_02.mzML 1 2
3 1 LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_03.mzML 1 3
3 1 LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_03.mzML 1 3
3 1 LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_03.mzML 1 3
4 1 LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_01.mzML 1 4
4 1 LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_01.mzML 1 4
4 1 LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_01.mzML 1 4
5 1 LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_02.mzML 1 5
5 1 LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_02.mzML 1 5
5 1 LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_02.mzML 1 5
6 1 LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_03.mzML 1 6
6 1 LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_03.mzML 1 6
6 1 LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_03.mzML 1 6
Sample MSstats_Condition MSstats_BioReplicate
1 cond_A 1
2 cond_A 2
3 cond_A 3
4 cond_B 4
5 cond_B 5
6 cond_B 6
instead of
Fraction_Group Fraction Spectra_Filepath Label Sample
1 3 LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_01.mzML 1 1
2 3 LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_02.mzML 1 2
3 3 LFQ_Orbitrap_DDA_Condition_A_Sample_Alpha_03.mzML 1 3
4 3 LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_01.mzML 1 4
5 3 LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_02.mzML 1 5
6 3 LFQ_Orbitrap_DDA_Condition_B_Sample_Alpha_03.mzML 1 6
Sample MSstats_Condition MSstats_BioReplicate
1 cond_A 1
2 cond_A 2
3 cond_A 3
4 cond_B 4
5 cond_B 5
6 cond_B 6
I can fix it manueally by removing duplicated raw
file entries from the SDRF file, but I wonder if this should be handled better?