RNAseq processing with Salmon, extraction of unmapped reads and subsequent de novo transcriptome synthesis

This collection of scripts takes raw fastq files from RNAseq, aligns them to the transcriptome using salmon, extracts read pairs that did not map, preprocesses and assembles them de novo using the Trinity pipeline.

Steps of the workflow are as follows:

Preprocess raw reads

Quality of raw reads is assessed before and after trimming with script salmon_run.sh

Initial quality control

FastQC and MultiQC are used to assess quality of raw reads

Trimming adapters and low quality bases

Trim Galore is used for adapter and quality trimming.

QC trimmed reads

FastQC and multiQC are used to assess read quality after trimming.

Quantify reads with Salmon

Generate Salmon index

Use index_salmon.sh to generate an index for Ceratopteris richardii

Quantify reads and extract unmapped

Run Salmon quantification of trimmed reads with quantify_salmon.sh. After quantification, unmapped reads (where both reads in a pair did not map) are extracted and written to 4_unmapped.

Preprocess unmapped reads

All preprocessing steps are in the file preprocess_unmapped.sh. Code roughly follows pipeline from (https://github.com/matevzl533/Noccaea_praecox_transcriptome/tree/main)

Initial quality control

FastQC and MultiQC are used to assess quality of "raw" unmapped reads

Removing erroneous k-mers from Illimina paired-end reads

rCorrector is used to tag reads in the fastq output as corrected or uncorrectable. rcorrector is a tool specifically designed for kmer-bases read error correction of RNA-seq data.

Discard read pairs for which one or both reads is deemed unfixable

Uses a python script from the Harvard Informatics GitHub repository TranscriptomeAssemblyTools. The script has been updated to Python3.

Remove unwanted rrna reads with Bowtie2

From Silva, the SSUParc and LSUParc fasta files were downloaded (https://ftp.arb-silva.de/?pk_vid=8352a8ccf0ead1d7168388545541b6c1). Before running bowtie2-build, SSUParc and LSUParc were concatenated and U translated to T.

cat *.fasta > SILVA.db
awk '/^[^>]/ { gsub(/U/,"T"); print; next }1' SILVA.db > SILVA.db

Run fastqc on processed reads

Re-run QC from step 1.

de novo assemble with Trinity

Make sample table text file

Trinity accepts a text file via --samples_file rather than looping through reads see here. Run make_sample_table.py and provide the directory containing your clean reads.

Run trinity

Trinity is used for de novo transcriptome assembly with default parameters. Script to run Trinity is in trinity.sh. Ensure you have the latest trinity image downloaded and stored in the same directory as your clean reads.

Post-processing

postprocessing.sh contains all post-processing steps to process and assess quality and completeness of de novo assembly:

Remove redundancy with CD-HIT
Produce basic statistics with trinity script TrinityStats.pl
Quanitfy read representation by mapping reads back to assembly with BowTie2
Prepare new gene trans map for non-reduntant assembly
Build gene expression matrices for DEG analysis with kallisto (can also modify to run with salmon)
Calculate ExN50 for assembly

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
salmontools		salmontools
.gitignore		.gitignore
FilterUncorrectabledPEfastq_P3.py		FilterUncorrectabledPEfastq_P3.py
README.md		README.md
TrinityStats.pl		TrinityStats.pl
build_bowtie_index.sh		build_bowtie_index.sh
filter_fasta.pl		filter_fasta.pl
find_missing_genes.sh		find_missing_genes.sh
index_salmon.sh		index_salmon.sh
postprocessing.sh		postprocessing.sh
preprocess_unmapped.sh		preprocess_unmapped.sh
quantify_salmon.sh		quantify_salmon.sh
salmon_run.sh		salmon_run.sh
submit_extract_decoys.sh		submit_extract_decoys.sh
submit_extract_unmapped.sh		submit_extract_unmapped.sh
test_fastqc.sh		test_fastqc.sh
test_rcorrector.sh		test_rcorrector.sh
trinity.sh		trinity.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RNAseq processing with Salmon, extraction of unmapped reads and subsequent de novo transcriptome synthesis

Preprocess raw reads

Initial quality control

Trimming adapters and low quality bases

QC trimmed reads

Quantify reads with Salmon

Generate Salmon index

Quantify reads and extract unmapped

Preprocess unmapped reads

Initial quality control

Removing erroneous k-mers from Illimina paired-end reads

Discard read pairs for which one or both reads is deemed unfixable

Remove unwanted rrna reads with Bowtie2

Run fastqc on processed reads

de novo assemble with Trinity

Make sample table text file

Run trinity

Post-processing

About

Releases

Packages

Languages

kirstymcmc/rnaseq-processing

Folders and files

Latest commit

History

Repository files navigation

RNAseq processing with Salmon, extraction of unmapped reads and subsequent de novo transcriptome synthesis

Preprocess raw reads

Initial quality control

Trimming adapters and low quality bases

QC trimmed reads

Quantify reads with Salmon

Generate Salmon index

Quantify reads and extract unmapped

Preprocess unmapped reads

Initial quality control

Removing erroneous k-mers from Illimina paired-end reads

Discard read pairs for which one or both reads is deemed unfixable

Remove unwanted rrna reads with Bowtie2

Run fastqc on processed reads

de novo assemble with Trinity

Make sample table text file

Run trinity

Post-processing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages