Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tximport for RSEM outputs #1320

Open
olgabot opened this issue Jun 18, 2024 · 0 comments
Open

Add tximport for RSEM outputs #1320

olgabot opened this issue Jun 18, 2024 · 0 comments

Comments

@olgabot
Copy link
Contributor

olgabot commented Jun 18, 2024

Description of feature

Hello,
Hope you are doing well! I am using the output of this pipeline, using --aligner star_rsem, into nf-core/differentialabundance. I prefer the RSEM aligner as I saw odd results in a dataset where some samples were treated with plasmids, and others weren't, and with kallisto/salmon, the samples without plasmids got reads assigned to the plasmid 😱 But I didn't have this issue with RSEM.

Anyway, I'm trying to be a good bioinformatician and use the transcript lengths from #1123. However, I don't see the *.gene_lengths.tsv file necessary for --transcript_length_matrix in differentialabundance, in the output files from the star_rsem folder:

From nf-core/rnaseq documentation

STAR via RSEM

  • Output files
    • star_rsem/
      • rsem.merged.gene_counts.tsv: Matrix of gene-level raw counts across all samples.
      • rsem.merged.gene_tpm.tsv: Matrix of gene-level TPM values across all samples.
      • rsem.merged.transcript_counts.tsv: Matrix of isoform-level raw counts across all samples.
      • rsem.merged.transcript_tpm.tsv: Matrix of isoform-level TPM values across all samples.
      • .genes.results: RSEM gene-level quantification results for each sample.
      • .isoforms.results: RSEM isoform-level quantification results for each sample.
      • .STAR.genome.bam: If -save_align_intermeds is specified the original BAM file containing read alignments to the reference genome will be placed in this directory.
      • .transcript.bam: If -save_align_intermeds is specified the original BAM file containing read alignments to the transcriptome will be placed in this directory.
    • star_rsem/<SAMPLE>.stat/
      • .cnt.model.theta: RSEM counts and statistics for each sample.
      • star_rsem/log/
      • .log: STAR alignment report containing the mapping results summary.

RSEM is a software package for estimating gene and isoform expression levels from RNA-seq data. It has been widely touted as one of the most accurate quantification tools for RNA-seq analysis. RSEM wraps other popular tools to map the reads to the genome (i.e. STAR, Bowtie2, HISAT2; STAR is used in this pipeline) which are then subsequently filtered relative to a transcriptome before quantifying at the gene- and isoform-level. Other advantages of using RSEM are that it performs both the alignment and quantification in a single package and its ability to effectively use ambiguously-mapping reads.

You can choose to align and quantify your data with RSEM by providing the --aligner star_rsem parameter.

I was able to get around this by creating my own gene lengths file (see script here: nf-core/differentialabundance#279 (comment)), but it would be great to incorporate into the main nf-core/rnaseq pipeline for other RSEM users.

Thanks and hope you're having a great day!

Warmest,
Olga

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant