GitHub - dbgoodman/splicemod: A toolkit to modify and synthesize exons for splicing studies

Splicemod

A toolkit for scoring and modifying exons and their adjacent intronic boundaries.

Author: Daniel Bryan Goodman ([email protected])

Python Dependencies

We use Python 2.7.

Required python packages are in requirements.txt and so can be installed with:

pip install -r requirements.txt

Note that biopython MUST be version 1.57, which is quite old, as splicemod uses the deprecated motif package. It is recommended that you install the requirements in a python virtualenv.

Bash commands for set-up

Read below for more info, but this set of commands should set up your virtual environment, install the packages, and download the wiggle tracks:

#choose a virtual environment dir and set it up
venv_dir= ~/.pyenv/splicemod-venv
virtualenv $venv_dir
source $venv_dir/bin/activate

# from the splicemod dir:
pip install -r requirements.txt
source scripts/get_wig.sh

Running splicemod

After installing the required data (see below) and python packages, Splicemod can be run from the base directory with the command:

`python src/ensembl.py`

This will write gbk/fas files for the natural and mutated exons to the data/ccds_ensembl dir. It might also be useful to save the output log to a file, like:

`python src/ensembl.py > data/2017.02.23.splicemod_output.txt`

Data required

Motif definitions are included in the data/motifs dir.
Ensembl mySQL database access (local or remote, see below)
Wiggle tracks for conservation. This requires approximately 5.3 Gb. These can be downloaded and indexed with an included bash script:

source scripts/get_wig.sh

Ensembl Database

Remote Ensembl Access

Ensembl can be used remotely and the host and port can be set in src/cfg.py. The defaults currently work correctly but a list of up-to-date urls can be found here.

Local Ensembl Copy

A local copy of the ensembl database can also be used for fast access. These directions are based off of the ENSEMBL guide found here:

http://useast.ensembl.org/info/docs/webcode/mirror/install/ensembl-data.html

Download the ENSEMBL sql files for both core and * from the Ensembl FTP site and unzip them:

mkdir -p /path/to/ensembl_db_dir/core
mkdir -p /path/to/ensembl_db_dir/variation
cd /path/to/ensembl_db_dir/core
wget -r ftp://ftp.ensembl.org/pub/release-78/mysql/homo_sapiens_core_78_38/
cd /path/to/ensembl_db_dir/variation
wget -r ftp://ftp.ensembl.org/pub/release-78/mysql/homo_sapiens_variation_78_38/
gunzip *.gz

Install mysql if not already installed, and create a DB in the ensembl console:

create database homo_sapiens_core_78_38;

Then load the schema. I created a user called ensembl and gave it full access to the new db.

mkdir -p /path/to/ensembl_db_dir/core
mysql -u ensembl homo_sapiens_core_78_38 < homo_sapiens_core_78_38.sql
mkdir -p /path/to/ensembl_db_dir/variation

Then load all the txt data into the new schema. This takes a while.

mysqlimport -u ensembl --fields_escaped_by=\\ homo_sapiens_core_78_38 -L *.txt

Cached Exon List

To speed things up, we cache a copy of all exons in the file data/ccds_ensembl/78_38_CCDS_exons.all.txt. This file was generated using the mySQL command in get_ccds_exons() function in ensembl.py and is hard coded with an exon size of 100. The easiest way to regenerate this file is to run the mySQL query in a program like Sequel Pro and copy the result into a text file, the filename of which is pointed to in cfg.ens_exon_fn.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
notes		notes
perl_utils/max_ent		perl_utils/max_ent
scripts		scripts
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
.project		.project
.pydevproject		.pydevproject
README.md		README.md
requirements.txt		requirements.txt
splicemod.sublime-project		splicemod.sublime-project
splicemod.sublime-workspace		splicemod.sublime-workspace

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Splicemod

Python Dependencies

Bash commands for set-up

Running splicemod

Data required

Ensembl Database

Remote Ensembl Access

Local Ensembl Copy

Cached Exon List

About

Releases

Packages

Languages

dbgoodman/splicemod

Folders and files

Latest commit

History

Repository files navigation

Splicemod

Python Dependencies

Bash commands for set-up

Running splicemod

Data required

Ensembl Database

Remote Ensembl Access

Local Ensembl Copy

Cached Exon List

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages