Skip to content

RNASeq I QC, Mapping w STAR

Meg Staton edited this page Nov 2, 2022 · 2 revisions

Log into Isaac Next Gen. After your password, type 1, and it will send you a Duo push.

ssh <yourusername>@login.isaac.utk.edu

Go to the project directory

/lustre/isaac/proj/UTK0208/rnaseq

Let's take a peak at the raw data

ls raw_data
  • A209 bud 600 chill hours rep1; Prunus persica; RNA-Seq (SRR10269867) - early blooming, bud in ecodormancy
  • A209 bud 600 chill hours rep2; Prunus persica; RNA-Seq (SRR10269868) - early blooming, bud in ecodormancy
  • A318 bud 600 chill hours rep1; Prunus persica; RNA-Seq (SRR10269871) - late blooming, bud in endodormancy
  • A318 bud 600 chill hours rep2; Prunus persica; RNA-Seq (SRR10269872) - late blooming, bud in endodormancy

You will see a directory set up for our practice. cd into it and create a directory for your lab

cd analysis
mkdir <yourusername>
cd <yourusername>

Fastqc

Lets check the quality of the files. This is worth a look, as the quality stats for RNASeq differ in important ways from DNA.

mkdir 1_fastqc
cd 1_fastqc
ln -s /lustre/isaac/proj/UTK0208/rnaseq/raw_data/*fastq.gz .

To run fastqc, we need to create an analysis script. Open fastqc.qsub and paste in:

#!/bin/bash
#SBATCH -J fastqc
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH -A ISAAC-UTK0208
#SBATCH -p condo-epp622
#SBATCH -q condo
#SBATCH -t 00:30:00

module load fastqc

fastqc *gz

Run on Isaac:

sbatch fastqc.sh

Monitor:

squeue -u <yourusername>

You can also monitor progress by keeping tabs on the slurm output:

cat slurm-######.out

Index the genome for STAR

I already did this, so you do not have to do it. Here is the script I used:


#!/bin/bash
#SBATCH -J star-index
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH -A ISAAC-UTK0208
#SBATCH -p condo-epp622
#SBATCH -q condo
#SBATCH -t 00:30:00
#SBATCH --mem-per-cpu=16G

module load star

STAR \
--runMode genomeGenerate \
--genomeDir STAR_idx \
--genomeFastaFiles Ppersica_298_v2.0.fa \
--runThreadN 1 \
--genomeSAindexNbases 11 \
--sjdbGTFfile Ppersica_298_v2.1.gene_exons.gff3 \
--sjdbGTFtagExonParentTranscript Parent \
--sjdbOverhang 149

Note addition of more memory! If you don't add that it fails with the error slurmstepd: error: Detected 1 oom-kill event(s) in StepId=261994.batch. Some of your processes may have been killed by the cgroup out-of-memory handler.

Mapping with STAR

Create and move into a new dir, 2_star, then link the fastq.gz files again.

Lets start by mapping one read pair. Let's create a STAR.qsh script.

#!/bin/bash
#SBATCH -J star-map
#SBATCH --nodes=1
#SBATCH --ntasks=2
#SBATCH -A ISAAC-UTK0208
#SBATCH -p condo-epp622
#SBATCH -q condo
#SBATCH -t 00:30:00
#SBATCH --mem-per-cpu=8G

module load star

STAR \
--genomeDir /lustre/isaac/proj/UTK0208/rnaseq/raw_data/STAR_idx \
--runThreadN 2 \
--readFilesIn EarlyBlommingRep1_1.fastq.gz EarlyBlommingRep1_2.fastq.gz \
--readFilesCommand zcat \
--outFileNamePrefix EarlyBlommingRep1 \
--outSAMtype BAM SortedByCoordinate

Map with task arrays