Unable to go from FASTQ to Salmon Quantification #1333

SuhasSrinivasan · 2024-07-02T07:24:25Z

Description of the bug

Dear Researchers and Developers,

Thank you for developing this pipeline.

I am trying to go from FASTQ files to Salmon pseudo-alignment and quantification, as per the flow chart (Phase 1 and 3 only): https://raw.githubusercontent.com/nf-core/rnaseq/3.14.0//docs/images/nf-core-rnaseq_metro_map_grey.png

Specifically, trying to achieve:

Infer strandedness
FastQC
FastP/TrimGalore
FastQC
SortMeRNA
Salmon (pseudo-alignment and quantification)
MultiQC on FastQC output

Issue 1:

Despite supplying a pre-built decoy-aware Salmon index for transcripts, both Genome fasta and GTF files are still needed.
It is not clear why this is needed.

Genome fasta file not specified with e.g. '--fasta genome.fa' or via a detectable config file.
No GTF or GFF3 annotation specified! The pipeline requires at least one of these files.

Issue 2:

The fq subsample step is run, not sure if this is necessary for Salmon to infer strandedness.

Issue 3:

At some point in the pipeline, there is a failure due to an RSEM error.
It is not clear why RSEM is being called for the Reference Genome, when it is not part of Steps 1 and 3.

process > NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:MAKE_TRANSCRIPTS_FASTA (rsem/GRCh38.primary_assembly.genome.fa) [  0%] 0 of 1

Issue 4:

The pipeline does not stop at Salmon quantification and tries to continue to unexpected next steps.

[78/cab4d8] process > NFCORE_RNASEQ:RNASEQ:QUANTIFY_PSEUDO_ALIGNMENT:SALMON_QUANT (ERR2179089)                             [100%] 1 of 1 ✔
[78/c6d1b3] process > NFCORE_RNASEQ:RNASEQ:QUANTIFY_PSEUDO_ALIGNMENT:TX2GENE (gencode.v46.primary_assembly.annotation.gtf) [100%] 1 of 1 ✔
[8d/caed0e] process > NFCORE_RNASEQ:RNASEQ:QUANTIFY_PSEUDO_ALIGNMENT:TXIMPORT                                              [100%] 1 of 1, failed: 1 ✘

It would be very helpful to know what switches need to toggled to only execute Steps 1–7.
Thank you for your consideration.

Infer strandedness
FastQC
FastP/TrimGalore
FastQC
SortMeRNA
Salmon (pseudo-alignment and quantification)
MultiQC on FastQC output

Command used and terminal output

nextflow run nf-core/rnaseq \
    --input samplesheet.csv \
    --outdir ~/bioinformatics/output/salmon/ \
    --fasta ~/bioinformatics/references/salmon_hs/GRCh38.primary_assembly.genome.fa.gz \
    --gtf ~/bioinformatics/references/salmon_hs/gencode.v46.primary_assembly.annotation.gtf.gz \
    --gencode \
    --trimmer fastp \
    --salmon_index ~/bioinformatics/references/salmon_hs/index/ \
    --pseudo_aligner salmon \
    --skip_gtf_filter \
    --skip_gtf_transc \
    --skip_umi_extract \
    --skip_bbsplit \
    --skip_alignment \
    --skip_markduplic \
    --skip_bigwig \
    --skip_stringtie \
    --skip_preseq \
    --skip_dupradar \
    --skip_qualimap \
    --skip_rseqc \
    --skip_biotype_qc \
    --skip_deseq2_qc \
    --skip_multiqc \
    --max_memory 100.GB \
    --max_cpus 24 \

System information

Nextflow: 24.04.2
Hardware: Desktop
Executor: local
Container: conda
OS: Ubuntu 22.04.4 LTS
nf-core/rnaseq v3.14.0-gb89fac3

The text was updated successfully, but these errors were encountered:

pinin4fjords · 2024-08-20T10:27:13Z

So, as a general point you need to consider the flow chart as a qualitative guide to what's going on. The workflow doesn't provided you with absolute control on the modules that are run- for that you'll need to make your own workflow (which is definitely an option for you to get exactly what you want).

We have a related feature request to reduce the genome requirements in this context, but haven't got to it yet.

Further:

This step is necessary, we don't need to example all reads to infer the strandedness so we down-sample first.
This is just using a utility from the RSEM suite to generate a transcriptome. We may be able to remove that dependency if and when we tackle the issue above.
We use tximport to construct matrices from the output of Salmon, we don't have any plans to remove that.

To summarise:

Reducing dependencies when using pseudo-aligners is a valid point we will try to address as priorities allow.
But you don't have absolute control of the specific modules used. For that, I would encourage you to build your own workflow using the pre-build nf-core modules and subworkflows that are available.

I'm closing this as not being a bug, and we're already tracking the feature request elsewhere.

SuhasSrinivasan added the bug Something isn't working label Jul 2, 2024

pinin4fjords closed this as completed Aug 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to go from FASTQ to Salmon Quantification #1333

Unable to go from FASTQ to Salmon Quantification #1333

SuhasSrinivasan commented Jul 2, 2024 •

edited

Loading

pinin4fjords commented Aug 20, 2024

Unable to go from FASTQ to Salmon Quantification #1333

Unable to go from FASTQ to Salmon Quantification #1333

Comments

SuhasSrinivasan commented Jul 2, 2024 • edited Loading

Description of the bug

Issue 1:

Issue 2:

Issue 3:

Issue 4:

Command used and terminal output

System information

pinin4fjords commented Aug 20, 2024

SuhasSrinivasan commented Jul 2, 2024 •

edited

Loading