Skip to content

kirstymcmc/rnaseq-processing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RNAseq processing with Salmon, extraction of unmapped reads and subsequent de novo transcriptome synthesis

This collection of scripts takes raw fastq files from RNAseq, aligns them to the transcriptome using salmon, extracts read pairs that did not map, preprocesses and assembles them de novo using the Trinity pipeline.

Steps of the workflow are as follows:

Preprocess raw reads

Quality of raw reads is assessed before and after trimming with script salmon_run.sh

Initial quality control

FastQC and MultiQC are used to assess quality of raw reads

Trimming adapters and low quality bases

Trim Galore is used for adapter and quality trimming.

QC trimmed reads

FastQC and multiQC are used to assess read quality after trimming.

Quantify reads with Salmon

Generate Salmon index

Use index_salmon.sh to generate an index for Ceratopteris richardii

Quantify reads and extract unmapped

Run Salmon quantification of trimmed reads with quantify_salmon.sh. After quantification, unmapped reads (where both reads in a pair did not map) are extracted and written to 4_unmapped.

Preprocess unmapped reads

All preprocessing steps are in the file preprocess_unmapped.sh. Code roughly follows pipeline from (https://github.com/matevzl533/Noccaea_praecox_transcriptome/tree/main)

Initial quality control

FastQC and MultiQC are used to assess quality of "raw" unmapped reads

Removing erroneous k-mers from Illimina paired-end reads

rCorrector is used to tag reads in the fastq output as corrected or uncorrectable. rcorrector is a tool specifically designed for kmer-bases read error correction of RNA-seq data.

Discard read pairs for which one or both reads is deemed unfixable

Uses a python script from the Harvard Informatics GitHub repository TranscriptomeAssemblyTools. The script has been updated to Python3.

Remove unwanted rrna reads with Bowtie2

From Silva, the SSUParc and LSUParc fasta files were downloaded (https://ftp.arb-silva.de/?pk_vid=8352a8ccf0ead1d7168388545541b6c1). Before running bowtie2-build, SSUParc and LSUParc were concatenated and U translated to T.

cat *.fasta > SILVA.db
awk '/^[^>]/ { gsub(/U/,"T"); print; next }1' SILVA.db > SILVA.db

Run fastqc on processed reads

Re-run QC from step 1.

de novo assemble with Trinity

Make sample table text file

Trinity accepts a text file via --samples_file rather than looping through reads see here. Run make_sample_table.py and provide the directory containing your clean reads.

Run trinity

Trinity is used for de novo transcriptome assembly with default parameters. Script to run Trinity is in trinity.sh. Ensure you have the latest trinity image downloaded and stored in the same directory as your clean reads.

Post-processing

postprocessing.sh contains all post-processing steps to process and assess quality and completeness of de novo assembly:

  • Remove redundancy with CD-HIT
  • Produce basic statistics with trinity script TrinityStats.pl
  • Quanitfy read representation by mapping reads back to assembly with BowTie2
  • Prepare new gene trans map for non-reduntant assembly
  • Build gene expression matrices for DEG analysis with kallisto (can also modify to run with salmon)
  • Calculate ExN50 for assembly

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published