GitHub - HuangLab-Fudan/ASJA: ASJA: a program for Assembling Splice Junctions Analysis

ASJA: a program for Assembling Splice Junctions Analysis

What is the ASJA

RNA splicing may generate different kinds of splice junctions, such as linear, back-splice and fusion junctions. Only a limited number of programs are available for detection and quantification of splice junctions. Here, we present Assembling Splice Junctions Analysis (ASJA), a software package that identifies and characterizes all splice junctions from high-throughput RNA sequencing (RNA-seq) data. ASJA processes assembled transcripts and chimeric alignments from the STAR aligner and S tringTie assembler. ASJA provides the unique position and normalized expression level of each junction. Annotations and integrative analysis of the junctions enable additional filtering. It is also appropriate for the identification of novel junctions. Implementation and Dependencies

ASJA was developed with perl (v5) and shell (bash) language. Before running the program, it is necessary to check or download perl packages as follow: *File::Basename; *Getopt::Long; *List::Util qw/min sum max/;

Moreover, ASJA works based on the STAR and StringTie fearturecount, so these tools also should be installed and their pathway should be added in ~/.bashrc

STAR (version <= 2.5)
StringTie (version <= 1.2.3)
featureCounts (version >= 1.5.0)
sambamba (version >=0.6.6) ASJA Installation

Download the packages and then unzip it in Linux (CentOS or Ubuntu) Documentation

This chapter provides detailed commands arguments and description of output. the commands are labeled after ‘usage’

Files Needed:

Human genome sequence (hg38.fasta) and GTF File (we recommend GENCODE, and the program will report an error if you use GTF from UCSC) are used to generate STAR index.
Raw data of RNA-seq (fasta.gz)

Commands and arguments

Note: the absolute pathway is necessary to perform scripts

1, Mapping of RNA-seq data usage: perl runSTAR.pl [OPTIONS] The arguments of runSTAR.pl are as followings and if you want to work with single-read ,please see more detail of STAR on https://github.com/alexdobin/STAR:

-f1 <FASTA1>  
           Using Illumina paired-end reads, and the name of read1 has to be supplied.
-f2 <FASTA2>  
           Using Illumina paired-end reads, and the name of read2 has to be supplied.
-fq_dir <fastq dir> 
           Specifies path to files containing the sequences to be mapped
-G <path_and_gtf> 
            Specifies the path to the file with annotated transcripts in the standard GTF format.
-GA <genomeFastaFiles> 
            Specified one or more FASTA files with the genome reference sequences.
-O <outdir>   
            Specifies path to the directory (henceforth called "genome directory" where the alignment results are stored.)
-pass 
            Running STAR in the 2-pass alignment mode
-index
            Generating genome index of STAR with default settings
-SI_dir < genome index dir>  
            specifies path to the genome directory where genome indexes where generated
-I <path>  
            Specifies path to the directory where the ASJA installation 
-S <sample>   
            Name of sample

Generating genome indexes

usage: perl runSTAR.pl -I path/to/ASJA -index -SI_dir path/to/star_index -G path/to/genecode.annoataion.gtf -GA path/to/GRCh38.primary_assembly.genome.fa

running STAR in the 2-pass mode [Kahles et al., 2018, Cancer Cell 34, 1–14]

usage:  perl runSTAR.pl -I path/to/ASJA -pass -SI_dir /path/to/star_index -f1 R2.fq.gz -f2 R1.fq.gz -fq_dir path/to/fastq -GA path/to/GRCh38.primary_assembly.genome.fa -O path/to/out_dir -S sample_name

Output:

sample_mapped_reads.bam Chimeric.out.junction

SJ.out.tab

2, the extraction and processing of junctions

We provide step-by-step processing (ASJA.pl filtering.pl integration.pl) and quick processing(ASJA-all.pl) program to obtain junctions. However, preparing file for annotation only be implemented with ASJA.pl -setup, and generating transcripts for linear junction only be implemented with StringTie. ****** step-by-step processing ****** usage: perl ASJA.pl [options] The arguments of ASJA.pl are as followings:

	-I <ASJA dir>  
                 Specifies path to the directory where the ASJA installation
	-G < path_and_gtf >     
                 Specifies the path to the file with annotated transcripts in the standard GTF format.
	-setup           
                 Preparing reference file for annotation junctions
	-linear
                 Extraction linear junctions
	-backsplicing
                 Extraction back splicing junctions
	-fusion
                 Extraction fusion junctions
	-CI < alignment dir>   
                Specified path with the alignment result of STAR
	-SI <path_and_file >  
                Name(s) (with path) of the files containing generated transcript by StringTie.
				The path is also an out_dir
	-ann
                Annotation for junctions
	-ratio
                Calculation ratio

The mapped reads were further used to obtain transcripts by StringTie with reference-based transcriptome assembly. please see http://ccb.jhu.edu/software/stringtie/

usage: stringtie input_mapped_reads.bam -f 0.1 -o path/to/stringtie_assembly.gtf -p 4 -G path/to/gencode.v29.annotation.gtf

The process of preparing file for annotation junctions.

Usage: perl ASJA.pl -I /path/to/ASJA -G path/to/ref/gencode.v29.annotation.gtf -setup

The extraction of liner junction form stringtie_assembly

Usage: perl ASJA.pl -I path/to/ASJA -linear -G path/to/gencode.v29.annotation.gtf -SI path/to/example/assembly/input/stringtie_assembly.gtf -CI path/to/example/alignment/input -ann -ratio

The extraction of back splicing junction form Chimeric.out.junction

usage: perl ASJA.pl -I path/to/ASJA -backsplicing -G path/to/gencode.v29.annotation.gtf -SI path/to/example/assembly/input/stringtie_assembly.gtf -CI path/to/example/alignment/input -ann -ratio

The extraction of extraction fusion junction form Chimeric.out.junction

usage: perl ASJA.pl -I path/to/ASJA -fusion -G path/to/gencode.v29.annotation.gtf -SI path/to/example/assembly/input/stringtie_assembly.gtf -CI path/to/example/alignment/input -ann -ratio

usage: perl filtering.pl [options]

The arguments of filtering.pl are as followings:

	-read <1>
               Set threshold for filtration based on counts of junction reads (optional: e.g. 1)
	-ratio<0.01>
               Set threshold for filtration based on ratio (linear weight ratio/back splicing ratio /fused ratio) of junction (optional: e.g. 0.01)
	-linear
               Filtration of linear junctions
	-backsplicing
               Filtration of back splicing junctions
	-fusion
               Filtration of fusion junctions
	-IN<input file>
               Name(s) (with path) of the files for filtration
	-O<output file>
               Name(s) (with path) of the files for result

Generating junctions with high-confidence(NOTE: For any kind of junction, there should be a threshold to get a high-confidence junction. For example, we believe that the screening criteria for high-confidence liner junctions should satisfy the condition that ratio is greater than 0.01 and the number of reads are greater than 1.)

usage: perl filtering.pl -read 1 -ratio 0.08 -linear -IN path/to/Linear.txt -O path/to/F_linear.txt

The integration of three types of junctions(Note: These junctions need to be annotated)

usage：perl integration.pl -A liner.txt -B circRNA.txt -C fusion.txt -O all.txt

3. quick processing

usage: perl ASJA-all.pl [options]
-I <dir ASJA>  
          Specifies path to the directory where the ASJA installation
-G <path_and_gtf>     
          Specifies the path to the file with annotated transcripts in the standard GTF format.
-CI < dir alignment >           
          Specified path with the alignment result of STAR.
-SI <path_and_file >  
          Name(s) (with path) of the files containing generated transcript by StringTie
-O<outdir>
         Specifies path to the directory where the results are stored.

Quickly get three types of junctions using default parameters

Usage: perl ASJA-all.pl -I /path/to/ASJA -G path/to/gencode.v29.annotation.gtf -CI /path/to/example/alignment/input-SI path/to/example/assembly/input/stringtie_assembly.gtf  -O path/to/result

Other programs

The read counts of gene level can be calculated by featureCounts. Please see http://subread.sourceforge.net/

usage: featureCounts -p -T 6 -a genecode.annoataion.gtf -o path/to/featurecount.txt sample_mapped_reads.bam

the calculation of TPM from featureCounts

usage: perl TPM.pl -A featurecount -B featurecount.summary -O TPM.txt

The description of Output files generated by ASJA

Linear junction primary format

junctions: A unique identifier for a linear junction
CPT: The expression of junction with custom formal (CPT).
read: The read count of junction that SJ.out.tab matched.
transID: The transcript_id in the reference annotation that the instance matched.
geneID: The gene_id in the reference annotation that the instance matched.
gene: The gene_name in the reference annotation that the instance matched.
type: The gene_type in the reference annotation that the instance matched.
Weight ratio: the weight of junction in annotated gene.

Back splicing junction primary format

circID: A unique identifier for a back splicing junction
read: the sum of GT_AG_read and CT_AC_read.
GT_AG_read: The read count of back splicing that junction type=1(STAR manual) matched.
CT_AC_read: The read count of back splicing that junction type=2(STAR manual) matched.
left_backratio: 5’ratio of circRNA.
right_backratio: 3’ratio of circRNA.
annotation: the annotation of circRNA, including gene_id;trans_id;gene_type; gene_name
length_exon: the length of exon.
pos_exon: the position of exon

Fusion junction primary format

fusionID: A unique identifier for a fusion junction
read: the sum of GT_AG_read and CT_AC_read.
GT_AG_read: The read count of back splicing that junction type=1(STAR manual) matched.
CT_AC_read: The read count of back splicing that junction type=2(STAR manual) matched.
Leftbackratio: the ratio of accepter
Rightbackratio: the ratio of donor
left_type: the type of annotation in accepter
leftann: the annotation of accepter, including gene_id;transcript_id;gene_type;gene_name;exon_number
right_type: the type of annotation in donor
rightann: the annotation of accepter, including gene_id;transcript_id;gene_type;gene_name;exon_number

An integration output:

Gene_name: Gene symbol
Linear junctions: A unique identifier for a linear junction
circRNA: A unique identifier for circRNA related to linear junction and gene, separate the two circRNAs with a semicolon
fusion: A unique identifier for fusion related to linear junction and gene, separate the two fusions with a semicolon

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ASJA: a program for Assembling Splice Junctions Analysis

What is the ASJA

Download the packages and then unzip it in Linux (CentOS or Ubuntu) Documentation

Files Needed:

Commands and arguments

3. quick processing

The description of Output files generated by ASJA

About

Releases 1

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
scripts		scripts
ASJA-all.pl		ASJA-all.pl
ASJA.pl		ASJA.pl
FPKM-featurecounts.pl		FPKM-featurecounts.pl
Log.out		Log.out
README.txt		README.txt
TPM.pl		TPM.pl
filtering.pl		filtering.pl
integration.pl		integration.pl
readme.md		readme.md
runSTAR.pl		runSTAR.pl

HuangLab-Fudan/ASJA

Folders and files

Latest commit

History

Repository files navigation

ASJA: a program for Assembling Splice Junctions Analysis

What is the ASJA

Download the packages and then unzip it in Linux (CentOS or Ubuntu) Documentation

Files Needed:

Commands and arguments

3. quick processing

The description of Output files generated by ASJA

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages