Skip to content

fastp trims the qiaseq adapter before UMI extract so no BC pattern is found for UMI extraction #503

@chaochungkuo

Description

@chaochungkuo

Description of the bug

The empty fastq files appears in this command:

umi_tools \
    extract \
    -I Healthy_E72a.umi_dedup.sorted.fastq.gz \
    -S Healthy_E72a.umi_extract.fastq.gz \
    --extract-method=regex --bc-pattern='.+(?P<discard_1>AACTGTAGGCACCATCAAT){s<=2}(?P<umi_1>.{12})(?P<discard_2>.*)' \
    > Healthy_E72a.umi_extract.log

The files in this work folder are:

0 Mar 21 07:56 .command.begin
672 Mar 21 07:56 .command.err
672 Mar 21 07:56 .command.log
  0 Mar 21 07:56 .command.out
11K Mar 21 07:56 .command.run
764 Mar 21 07:56 .command.sh
264 Mar 21 07:56 .command.trace
  1 Mar 21 07:56 .exitcode
151 Mar 21 07:56 Healthy_E72a.umi_dedup.sorted.fastq.gz -> .../Healthy_E72a.umi_dedup.sorted.fastq.gz
 56 Mar 21 07:56 Healthy_E72a.umi_extract.fastq.gz
3.3K Mar 21 07:56 Healthy_E72a.umi_extract.log
 56 Mar 21 07:56 versions.yml

Simply, the input of this command is a valid fastq files (Healthy_E72a.umi_dedup.sorted.fastq.gz , with 10675 reads), however, the output fastq is empty. The reason is that no exact adapter (AACTGTAGGCACCATCAAT) is included among those reads. So the output fastq file is empty.

In my tracing of the workflow, smrnaseq executes the tasks in the following sequence:

  1. fastp for trimming the adapter
  2. umicollapse for deduplication
  3. umi_tools extract

I realize that fastp already trimmed the adapter before extracting UMI, that is why it breaks.
When I check the fastp command:

fastp \
    --in1 Healthy_E72a.fastq.gz \
    --out1 Healthy_E72a.fastp.fastq.gz \
    --thread 6 \
    --json Healthy_E72a.fastp.json \
    --html Healthy_E72a.fastp.html \
     \
     \
    -l 17 --max_len1 100 --adapter_sequence AACTGTAGGCACCATCAAT \
    2> >(tee Healthy_E72a.fastp.log >&2)

The output file (Healthy_E72a.fastp.fastq.gz) has no adapter anymore. This issues in the empty file after umi_tools extract . Shouldn't the adapter be trimmed after processing UMI? This is how I see it in my data, I didn't check the source codes yet. Please comment and advice.

Command used and terminal output

nextflow run nf-core/smrnaseq -r 2.4.0 -profile docker,qiaseq -resume \
     --input samplesheet_220308.csv \
     --outdir results_220308 \
     --mirtrace_species "hsa" \
     --skip_mirdeep \
     --umitools_extract_method regex \
     --umitools_bc_pattern '.+(?P<discard_1>AACTGTAGGCACCATCAAT){s<=2}(?P<umi_1>.{12})(?P<discard_2>.*)' \
     --save_umi_intermeds \
     --with_umi

Relevant files

No response

System information

nf-core/smrnaseq: v2.4.0-g72c4c4c
Nextflow: 24.10.5

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions