This repository contains an RNA-Seq analysis pipeline designed to process sequencing data through several stages on CHTC, from quality control to gene expression quantification.
The pipeline consists of the following stages:
-
FastQC1:
- Initial quality check of raw reads.
- Docker image:
staphb/fastqc
-
Trim:
- Adapter and quality trimming of reads.
- Docker image:
staphb/trimmomatic
-
FastQC2:
- Quality check of trimmed reads.
- Docker image:
staphb/fastqc
-
BWA:
- Alignment of reads to the reference genome.
- Docker image:
staphb/bwa
-
Samtools View:
- Conversion of SAM to BAM format.
- Docker image:
staphb/samtools
-
Samtools Sort:
- Sorting of BAM files.
- Docker image:
staphb/samtools
-
HTSeq & Qualimap:
- Gene expression quantification and quality control of BAM files.
- Docker images:
fischuu/htseq
andpegi3s/qualimap
-
MultiQC:
- Final quality control of all raw FASTA files, trimmed FASTA files, and BAM files.
- Docker image:
staphb/multiqc
To run the pipeline, the following files are needed:
- Reference File:
MtbNCBIH37Rv.fa
- Adapter File:
adapters.fa
- GTF File:
MtbNCBIH37Rv.gtf
- Data Files: Raw sequencing reads
3151_19_S13_R1_001.fastq.gz
3151_19_S13_R2_001.fastq.gz
- Input File:
input.txt
(contains sample identifiers, one per line)3151_17_S11 3151_18_S12 3151_19_S13
The pipeline uses HTcondor DAG files to manage the workflow. These files are automatically generated and include:
- Top-Level DAG File:
input_topLevel.dag
- Runs individual DAG files for each sample.
- Example DAG files for individual samples:
3151_17_S11_RNAseq.dag
3151_18_S12_RNAseq.dag
3151_19_S13_RNAseq.dag
- Template and Script for DAG Generation:
RNAseq_dag.template
: Template DAG file with placeholders ($(RUN)
,$(REF)
,$(annot_gtf)
) to be replaced.make_RNAseq_dag.py
: Script to generate individual DAG files by replacing placeholders with actual values.
- Generating DAG Files:
- To generate the individual DAG files and top-level DAG file from the template, run the following command:
python3 make_RNAseq_dag.py input.txt RNAseq_dag.template MtbNCBIH37Rv.fa MtbNCBIH37Rv.gtf
To submit the DAG job described in input_topLevel.dag
on CHTC, use the following command:
condor_submit_dag input_topLevel.dag
To check the status of a DAG job on HTCondor, use the following command:
condor_q -nobatch
If your DAG job encounters an error, HTCondor will generate a .dag.rescue
file. This file contains information about the state of the DAG at the time of failure and can be used to resume the job from where it left off.
To debug a failed job, you need to identify which job failed(in .dag.rescue
file) and inspect the job's error, log, and output files.
After identifying and fixing the issue, update the corresponding submit/shell file. Then resume the job from the point of failure using the .dag.rescue
file.
To resume the job, submit the rescue DAG file:
condor_submit_dag input_topLevel.dag
This command will restart the DAG job from the last successful checkpoint, skipping the jobs that have already completed successfully(if rescue file is not deleted).
For more detailed information, refer to the HTCondor Manual.