-
Notifications
You must be signed in to change notification settings - Fork 127
Description
Description of the bug
The empty fastq
files appears in this command:
umi_tools \
extract \
-I Healthy_E72a.umi_dedup.sorted.fastq.gz \
-S Healthy_E72a.umi_extract.fastq.gz \
--extract-method=regex --bc-pattern='.+(?P<discard_1>AACTGTAGGCACCATCAAT){s<=2}(?P<umi_1>.{12})(?P<discard_2>.*)' \
> Healthy_E72a.umi_extract.log
The files in this work folder are:
0 Mar 21 07:56 .command.begin
672 Mar 21 07:56 .command.err
672 Mar 21 07:56 .command.log
0 Mar 21 07:56 .command.out
11K Mar 21 07:56 .command.run
764 Mar 21 07:56 .command.sh
264 Mar 21 07:56 .command.trace
1 Mar 21 07:56 .exitcode
151 Mar 21 07:56 Healthy_E72a.umi_dedup.sorted.fastq.gz -> .../Healthy_E72a.umi_dedup.sorted.fastq.gz
56 Mar 21 07:56 Healthy_E72a.umi_extract.fastq.gz
3.3K Mar 21 07:56 Healthy_E72a.umi_extract.log
56 Mar 21 07:56 versions.yml
Simply, the input of this command is a valid fastq files (Healthy_E72a.umi_dedup.sorted.fastq.gz
, with 10675 reads), however, the output fastq is empty. The reason is that no exact adapter (AACTGTAGGCACCATCAAT) is included among those reads. So the output fastq file is empty.
In my tracing of the workflow, smrnaseq executes the tasks in the following sequence:
- fastp for trimming the adapter
- umicollapse for deduplication
- umi_tools extract
I realize that fastp already trimmed the adapter before extracting UMI, that is why it breaks.
When I check the fastp command:
fastp \
--in1 Healthy_E72a.fastq.gz \
--out1 Healthy_E72a.fastp.fastq.gz \
--thread 6 \
--json Healthy_E72a.fastp.json \
--html Healthy_E72a.fastp.html \
\
\
-l 17 --max_len1 100 --adapter_sequence AACTGTAGGCACCATCAAT \
2> >(tee Healthy_E72a.fastp.log >&2)
The output file (Healthy_E72a.fastp.fastq.gz) has no adapter anymore. This issues in the empty file after umi_tools extract . Shouldn't the adapter be trimmed after processing UMI? This is how I see it in my data, I didn't check the source codes yet. Please comment and advice.
Command used and terminal output
nextflow run nf-core/smrnaseq -r 2.4.0 -profile docker,qiaseq -resume \
--input samplesheet_220308.csv \
--outdir results_220308 \
--mirtrace_species "hsa" \
--skip_mirdeep \
--umitools_extract_method regex \
--umitools_bc_pattern '.+(?P<discard_1>AACTGTAGGCACCATCAAT){s<=2}(?P<umi_1>.{12})(?P<discard_2>.*)' \
--save_umi_intermeds \
--with_umi
Relevant files
No response
System information
nf-core/smrnaseq: v2.4.0-g72c4c4c
Nextflow: 24.10.5