CircCode

Introduction

CircCode is a Python3-base pipeline for translated circular RNA identification. It automatically tandem links sequence in series and processes a given ribosome profile data (including quality control, filtering and alignment). Finally, based on J48 classification, the final translated circular RNA was predicted. The user only needs to fill in the given configuration file and run the python scripts to get the predicted translated circular RNA.

Overview

Requirement

Data:

Genome sequence (fasta format)
Candidate circRNA sequence (fasta/bed format)
rRNA sequence (fasta format)
Adapter sequence (fasta format)
Ribosome profiling data (sra format)
Coding and non-coding sequence (fasta format)

Supported operating systems：

Download

Open the terminal and input:

git clone https://github.com/PSSUN/CircCode.git

Install required packages

cd CircCode
./ Install.sh

NOTE: This step is optional, If you have already met all the required packages in your environment, you don't need to do this step, you can run the python script directly. You can also install the missing dependencies yourself. In the case that all dependencies are met, no compilation is required and all scripts can be run directly.

Usage

- You can run all CircCode pipeline by one script

Fill the config file (https://github.com/Sunpeisen/CircCode/blob/master/config.yaml), input full path of each required file.
Run bash script on command line with your config file.

  sh one_step_script.sh config.yaml

- Or you can run CircCode step by step

Fill the config file (https://github.com/Sunpeisen/CircCode/blob/master/config.yaml), input full path of each required file.
Making virtual genomes

 python3 make_virtual_genomes.py -y config.yaml

This script output two files: *.fa and *.gff, *.gff is the annotation information of *.fa. in tmp_file.

Filter reads and compare to virtual genomes

 python3 map_to_virtual_genomes.py -y config.yaml

Find RPF-covered region on junction (RCRJ) and classification of RCRJ by sequence features

 python3 find_RCRJ_and_classify.py -y config.yaml

Run example

You can download the required sra file from NCBI-SRA, we also provide the other required files (includes genome.fa, genome.gtf etc.) in example.tar.xz. Fill in the path of the corresponding file into the project corresponding to config.yaml. Then follow the steps mentioned above to run each script.

How to fill in the config.yaml file?

When opening the config file in text format, there are some lines that need to be filled in, they are:

genome_name：

Each fasta file has its own name. Similarly, this value represents the name of the virtual genome generated by CircCode. You can fill in any value here in text form (we strongly recommend using only English letters). Note that you only need to fill in the name here, no suffix is required, for example, you should fill in 'textGenome' instead of 'textGenome.fa'.
genome_fasta:

You need to fill in the absolute path of the corresponding species genome here (not the relative path!)
genome_gtf:

You need to fill in the absolute path of the corresponding annotation file of species genome here (not the relative path!)
raw_reads:

CircCode's identification of circRNAs with translational potential relies on the support of Ribo-Seq data, and you need to fill in the absolute path of the Ribo-Seq data in sra format. The sra data can be your own sequencing data or downloaded from the NCBI public database. This supports inputting multiple sra data and making predictions at the same time. It is allowed if you only provide one sra file for prediction.
ribosome_fasta:

Here we need to provide rRNA data for the corresponding species in fasta format for filtering the Ribo-Seq data. Here you need to fill in the absolute path of this fasta file.
trimmomatic_jar:

The absolute path to the trimmomatic_jar file, we have provided the trimmomatic_jar file in CircCode, you just need to fill in the absolute path of this file on your computer.
circrnas:

The fasta file of the candidate circRNA, CircCode, is used to predict those circRNAs with translational potential from a given sequence of candidate circRNAs. This fasta file should contain all the candidate circRNAs and fill in the absolute path of this fasta file here.
riboseq_adapters:

The absolute path of the adapters file for Ribo-Seq data.
coding_seq:

The fasta file of the coding sequence in this species, if running on a small computer, in order to avoid memory overflow, this file should not be too large. Otherwise, you may get an error due to insufficient memory.
non_coding_seq:

The fasta file of the non-coding sequence in this species, if running on a small computer, in order to avoid memory overflow, this file should not be too large. Otherwise, you may get an error due to insufficient memory.
result_file_location:

Fill in the absolute path of a folder to hold the final run results.
tmp_file_location:

Fill in the absolute path of a folder to hold the temporary files.
reads_type:

The type of sequencing data, sequencing data is divided into single-ended and pairs-ended, corresponding, you need to fill in single or pair here
thread:

The number of threads running, only the number in the input int format is supported here, for example: 1 or 2 or 3 or 4 or 5...

NOTE：The test file is only used to test whether the software can run smoothly and does not represent the actual research results.

Update

2019-10-14: Fixed a bug where the executable didn't run permission and caused an error.
2019-10-15: Update the Read.md
2019-12-16: Update the Read.md
2020-01-30: Remove the useless code of second script
2020-05-20: Update one_step_script.sh script
2020-07-06: Add a new parameter [merge] in yaml file

FAQ

Q: In the prediction step, why does BASiNET not seem to have the expected results?

A: Make sure that the BASiNET package is properly installed on your computer. It is worth noting that BASiNET's dependency package rJava tends to have installation errors, ensuring that the dependencies are working fine.
Q: Do I need to fill out all the items in the yaml configuration file?

A: Yes, all projects need to be filled out. How to fill in the configuration file is explained in the previous section.
Q: I can't install R packages named 'rJava'.

A: Try apt-get install r-cran-rjava in ternimal.
Q: What result file means?

A: Please see here

Taking code from GitHub and running on your data ：)

Citation

Sun P and Li G (2019) CircCode: A Powerful Tool for Identifying circRNA Coding Ability. Front. Genet. 10:981. doi: 10.3389/fgene.2019.00981

Acknowledge

Zhang Jinwen’s ([email protected]) suggestion for the scripts was adopted.
luhan125’s suggestion for the scripts was adopted.

Contact us

If you encounter any problems while using CircCode, please send an email ([email protected] / [email protected]) or submit the issues on GitHub (https://github.com/Sunpeisen/circCode/issues) and we will resolve it as soon as possible.

Name		Name	Last commit message	Last commit date
Latest commit History 199 Commits
requiredSoft		requiredSoft
BASiNET_0.0.4.tar.gz		BASiNET_0.0.4.tar.gz
Install.sh		Install.sh
LICENSE		LICENSE
README.md		README.md
Update.md		Update.md
bedtools-2.26.0.tar.gz		bedtools-2.26.0.tar.gz
config.yaml		config.yaml
example.tar		example.tar
find_RCRJ_and_classify.py		find_RCRJ_and_classify.py
make_virtual_genomes.py		make_virtual_genomes.py
map_to_virtual_genomes.py		map_to_virtual_genomes.py
one_step_script.sh		one_step_script.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CircCode

Introduction

Overview

Requirement

Data:

Software:

python3 package:

R package:

Supported operating systems：

Download

Install required packages

Usage

- You can run all CircCode pipeline by one script

- Or you can run CircCode step by step

Run example

How to fill in the config.yaml file?

Update

FAQ

Taking code from GitHub and running on your data ：)

Citation

Acknowledge

Contact us

About

Releases

Packages

Contributors 2

Languages

License

PSSUN/CircCode

Folders and files

Latest commit

History

Repository files navigation

CircCode

Introduction

Overview

Requirement

Data:

Software:

python3 package:

R package:

Supported operating systems：

Download

Install required packages

Usage

- You can run all CircCode pipeline by one script

- Or you can run CircCode step by step

Run example

How to fill in the config.yaml file?

Update

FAQ

Taking code from GitHub and running on your data ：)

Citation

Acknowledge

Contact us

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages