Skip to content

Latest commit

 

History

History
499 lines (357 loc) · 20.3 KB

README.md

File metadata and controls

499 lines (357 loc) · 20.3 KB

Table of Contents

ViewBS

Workflow of ViewBS

ViewBS has several top level commands which determine the required and optimal arguments. These top level commands can be divided into two parts: methylation report and data visualization of functional regions.

Methylation report part has several different top commands which can generate report about read coverage, distribution of methylation level, global methylation leve, etc.

The part of visualization for functional regions also has several different top commands. For ViewBS, the first input that users should provide is the regions of interest. These regions could be functional elements, like genes, transposable elements (TE), or differentially methylated regions (DMR). The other type of input that the users should provide is the methylation information. Methylation information are the outputs from BS-seq aligner, like Bismark, etc.

Here is the workflow of ViewBS:

The workflow of ViewBS commands

Installation

Installation via conda [recommended]

Special thanks to @xuzhougeng for writing bioconda recipe for ViewBS (#57).

Install conda

First you need to install miniconda following the instructions here: https://conda.io/en/latest/miniconda.html

Scenario 1

If you want to install ViewBS in an existing conda environment, please run:

conda activate <your_environment_name>
## If you want to install a specific verison, 
## replace 'viewbs' with 'viewbs=<version_number>' (e.g. 'viewbs=0.1.10')
conda install -c bioconda viewbs

Scenario 2

If you want to install ViewBS in a new conda environment, please run:

## You can change `env4viewbs` to other name you want
conda create -n env4viewbs -c bioconda viewbs
## To activate the environment
conda activate env4viewbs

Installation with Docker

docker pull xie186/viewbs
## Use "docker run <image name>" ViewBS" to replace "ViewBS". Here is an example:
cd ViewBS_testdata/
docker run -v ${PWD}:/data -w /data bc1743f3418f ViewBS MethOneRegion --region chr5:19497000-19499600 --sample bis_WT.tab.gz,WT --sample bis_cmt23.tab.gz,cmt23 --outdir MethOneRegion --prefix chr5_19497000-19499600 --context CHG

## 1) ${PWD}:/data: means mount the current directory to /data in Docker image
## 2) bc1743f3418f: IMAGE ID (run `docker image ls` to get the IMAGE ID). 

Because docker needs root access, sometimes it's not available. But singularity is an alternative software. Please see the link here for details: https://github.com/xie186/ViewBS/wiki/Run-ViewBS-with-%60Docker%60-or-%60Singularity%60#singularity

Installation of dependencies step by step

Download the lastest version:

https://github.com/xie186/ViewBS/releases/latest

To make the installation of dependencies easier, a script was developped. perl INSTALL.pl can be used as a helper to install and check the dependencies.

You can also install it step by step shown as below:

  1. Install htslib

  2. Perl version: >v5.8.7

  3. Perl packages:

    • Getopt::Long::Subcommand - Process command-line options, with subcommands and completion
    • Bio::DB::HTS::Tabix - Object oriented access to the underlying tbx C methods
    • Bio::SeqIO - Handler for SeqIO Formats
    wget https://raw.githubusercontent.com/xie186/ViewBS/master/ext_tools/cpanm
    chmod 755 cpanm
    cpanm --local-lib=~/perl5 local::lib && eval $(perl -I ~/perl5/lib/perl5/ -Mlocal::lib)
    ./cpanm Getopt::Long::Subcommand
    ./cpanm Getopt::Long (> 2.50)
    ./cpanm Bio::DB::HTS::Tabix 
    ./cpanm Bio::SeqIO
    
  4. R version: > 3.3.0

  5. R packages

    • ggplot2
    • pheatmap
    • reshape2
    • cowplot

    Install the required libraries in R:

    install.packages("ggplot2", dep=T)
    install.packages("cowplot", dep=T)
    install.packages("pheatmap", dep=T)
    install.packages("reshape2", dep=T)
    

Preparation of input files

  • Input file: Genome-wide cytosine methylation report

ViewBS uses Genome-wide cytosine methylation report as input file. It is sorted by chromosomal coordinates but also contains the sequence context and is in the following format:

<chromosome> <position> <strand> <count methylated> <count unmethylated> <C-context> <trinucleotide context>

NOTES: If you use other tools rather than Bismark to generate the methylation information, you can still use ViewBS. You have two ways to use ViewBS: 1) if you have the bam file (e.g. bam file generated by bwa-meth), you can use MethylDackel with '--cytosine_report' to output the methylation information in Genome-wide cytosine methylation report format; 2) We also include the scripts to convert the results of other tools (BSseeker and Brat) https://github.com/xie186/ViewBS/tree/master/lib/scripts to Genome-wide cytosine methylation report format. If the script for your tool is not included, please feel free to contact us at [email protected]

Please see details in Bismark websites.

Tips: how to generate Genome-wide Cytosine Methylation Report

If you already have finished the mapping using Bismark, you should have a sam/bam file. Let's say you have a sam file named test.sam. What you can do to generate Genome-wide Cytosine Methylation Report is:

### This step will generate several files:
bismark_methylation_extractor --bedGraph --CX test.sam
### This step will generate a file named bis_test.tab
coverage2cytosine -CX -o test.bis_rep.cov --genome_folder ara/ test.bismark.cov

*For BS-seq that is processed by Bismark but by other tools like BRAT, BS seeker2, ViewBS provides supports to convert DNA methylation data in other format to the format of genome-wide cytosine methylation report. Supports for other tools will be developed upon requests from the users. If you have DNA methylation data generated by other tools and you have difficulties on converting the data format, just give a post in the issuse. We're happy to add new functions for the file format conversion. *

For details, please see the link below: https://github.com/xie186/ViewBS/wiki/Support-for-nonBismark-results

  • Tabix indexing

Since ViewBS uses Bio::DB::HTS::Tabix to quickly retrieves information from the input (TAB-delited) files, the Genome-wide Cytosine Methylation Report files should be bgzipped and tabix indexed. bgzip and tabix .

Note: tabix and bgzip binaries are now part of the HTSlib project. https://github.com/samtools/htslib

Here is an example:

bgzip test.bis_rep.cov            ## test.bis_rep.cov.gz will be generated. Note: test.bis_rep.cov shoud be sorted based on chromosome coordinates. 
tabix -C -p vcf test.bis_rep.cov.gz  ## test.bis_rep.cov.gz.csi will be generated. Now test.bis_rep.cov.gz can be used as input for ViewBS. 
## If there is no chromosome length beyond (2^29-1), you can also run: 
tabix -p vcf test.bis_rep.cov.gz  ## test.bis_rep.cov.gz.tbi will be generated. Now test.bis_rep.cov.gz can be used as input for ViewBS.

USAGE

Download test data

https://gitlab.com/BS-seq/ViewBS_testdata

Top commands of ViewBS

MethCoverage

An Example of Reverse Cumulative Plot with x-axis representing the coverage of BS-seq.

To generate the figure above, use the command shown as below:

ViewBS MethCoverage --reference TAIR10_chr_all.fasta --sample bis_WT.tab.gz,WT --sample bis_cmt23.tab.gz,cmt23 --sample bis_cmt2-3.tab.gz,cmt2-3 --sample bis_drm12cmt23.tab.gz,drm12cmt12 --sample bis_drm12cmt2.tab.gz,drm12cmt2 --outdir methCoverage --prefix cmt2_proj_allsam

Under methCoverage folder, there will be three files generated.

  • Table for global methylation level.
Sample Context Depth Percentage
cmt2-3 CG 1 93.3323115145888
cmt2-3 CG 2 91.6474703919394
... ... ... ...
... ... ... ...
WT CG 1 93.8364493009668
  • A shell script which can re-generate the figure in PDF file.
  • A figure in PDF file.

BisNonConvRate

An Example of BisNonConvRate

To generate the figure above, use the command shown as below:

ViewBS BisNonConvRate --chrom chrC --sample bis_WT.tab.gz,WT --sample bis_cmt23.tab.gz,cmt23 --sample bis_cmt2-3.tab.gz,cmt2-3 --sample bis_drm12cmt2.tab.gz,drm12cmt2 --sample bis_drm12cmt23.tab.gz,drm12cmt23 --outdir BisNonConvRate --prefix cmt2_proj_allsam

Under BisNonConvRate, there will be three files generated.

  • Table for global methylation level.
Sample BisNonConvRate
cmt2-3 0.053
drm12cmt2 0.048
drm12cmt12 0.040
cmt23 0.046
WT 0.075
  • A shell script which can re-generate the figure in PDF file.
  • A figure in PDF file.

GlobalMethLev

An Example of GlobalMethLev

To generate the figure above, use the command shown as below:

ViewBS GlobalMethLev --sample bis_WT.tab.gz,WT --sample bis_cmt23.tab.gz,cmt23 --sample bis_cmt2-3.tab.gz,cmt2-3 --sample bis_drm12cmt2.tab.gz,drm12cmt2 --sample bis_drm12cmt23.tab.gz,drm12cmt23 --outdir methGlobal --prefix cmt2_proj_allsam

Under methGlobal, there will be three files generated.

  • Table for global methylation level.
Sample CG CHG CHH
cmt2-3 0.227 0.062 0.010
drm12cmt2 0.220 0.058 0.005
cmt23 0.224 0.009 0.011
drm12cmt23 0.219 0.004 0.005
WT 0.245 0.079 0.029
  • A shell script which can re-generate the figure in PDF file.
  • A figure in PDF file.

MethLevDist

An Example of MethLevDist

To generate the figure above, use the command shown as below:

ViewBS.pl MethLevDist --sample bis_WT.tab.gz,WT --sample bis_cmt23.tab.gz,cmt23 --sample bis_cmt2-3.tab.gz,cmt2-3 --sample bis_drm12cmt23.tab.gz,drm12cmt12 --sample bis_drm12cmt2.tab.gz,drm12cmt2 --outdir methLevDist --prefix cmt2_proj_allsam --binMethLev 0.1
  • Table for numbers and percentages of sites in each methylation level bin.
Sample Context MethLevBinMidPoint Number Percentage
cmt2-3 CG 0.05 3305969 12.83
cmt2-3 CG 0.15 62823 0.24
cmt2-3 CG 0.25 25182 0.09
... ... ... ... ..
WT CG 0.05 3470693 13.73
  • A shell script which can re-generate the figure in PDF file.
  • A figure in PDF file.

MethGeno

An example of MethGeno

To generate the figure above, use the command shown as below:

ViewBS MethGeno --genomeLength TAIR10_chr_all.fasta.fai --sample bis_WT.tab.gz,WT --sample bis_cmt23.tab.gz,cmt23 --sample bis_cmt2-3.tab.gz,cmt2-3 --sample bis_drm12cmt2.tab.gz,drm12cmt2 --sample bis_drm12cmt23.tab.gz,drm12cmt23 --prefix bis_geno_sample --context CHH

Note: fai file can generated by samtools: samtools faidx TAIR10_chr_all.fasta

View MethHeatmap

Region file format:

  • 1st column: chromsome ID;
  • 2nd column: start position;
  • 3rd column: end position;
  • 4th column: region ID

Note: If the file has 4th column, each row in this column should be unique.

An example of MethHeatmap

To generate the figure above, use the command shown as below:

ViewBS MethHeatmap --region CHG_hypo_DMR_drm12cmt23_to_WT.txt --sample bis_WT.tab.gz,WT --sample bis_drm12cmt23.tab.gz,drm12cmt23 --sample bis_cmt23.tab.gz,cmt23 --sample bis_cmt2-3.tab.gz,cmt2-3 --sample bis_drm12cmt2.tab.gz,drm12cmt2 --prefix CHG_hypo_DMR_drm12cmt23_to_WT --context CHG --outdir MethHeatmap

MethOverRegion

An example of MethOverregion

ViewBS MethOverRegion --region TAIR10_Transposable_Elements.chr1.bed --sample bis_WT.tab.gz,WT --sample bis_cmt23.tab.gz,cmt23 --sample bis_cmt2-3.tab.gz,cmt2-3 --sample bis_drm12cmt2.tab.gz,drm12cmt2 --sample bis_drm12cmt23.tab.gz,drm12cmt23 --prefix bis_TE_chr1_sample --context CHG

Besides providing sample and region information in the commind line, you can also read the information from a TEXT file. For example, if you are interested in more than one group of genes and you want to study the differences of DNA methylation patterns in the one sample, the methylation information can also be read from a TEXT file. Instead of giving an explicit sample information pairs, you need to write "file:" followed by the name of the TEXT file. In this case, you can only use --sample once and you cann't use --region anymore.

The TEXT file should follow the following format:

#MethReportFile LegendName RegionFile
DNAmethylation RegionName1 Region_file2

Here is an example:

ViewBS MethOverRegion --sample file:sampl_info_tab.txt --prefix bis_gene_5rank --context CG --outdir MethOverRegion

The genes were devided into quintiles based on gene expression level. Rank1 group was the group with lowest expression level. Users can use this method to study the correlation between DNA methylation and gene expression.

#DNAmethylationData Region RegionFile
bis_WT.tab.gz Rank1 TAIR10_GFF3_genes.WT.rank1.tab
bis_WT.tab.gz Rank2 TAIR10_GFF3_genes.WT.rank2.tab
bis_WT.tab.gz Rank3 TAIR10_GFF3_genes.WT.rank3.tab
bis_WT.tab.gz Rank4 TAIR10_GFF3_genes.WT.rank4.tab
bis_WT.tab.gz Rank5 TAIR10_GFF3_genes.WT.rank5.tab
  • Input format for the region file
  1. Chromosome ID
  2. Start postion
  3. End position
  4. ID
  5. Strand ("+" or "-")

See an example here: https://gitlab.com/BS-seq/ViewBS_testdata/blob/master/testdata/TAIR10_Transposable_Elements.chr1.bed

Here is the figure generated by the command line above:

image

MethOneRegion

View MethOneRegion will output the methylation information for one region give by the users and then plot the methylation levels across the chromsomesome region.

Here is an example:

image

To generate the figure above, you can use the following command line:

ViewBS MethOneRegion --region chr5:19499001-19499600 --sample bis_WT.tab.gz,WT --sample bis_cmt23.tab.gz,cmt23 --prefix chr5_19499001-19499600 --context CHG

How to merge figures into one graph

In ViewBS, all the figure objects will be saved into RDS files. The users can restore the RDS files and merge the figures into one graph.

There are two ways to do this: 1) use the helper script named mer_fig.R ; 2) the users can write R script to read the RDS files and merge the figures into one graph with cowplot.

1. Use the R script in ViewBS

Example as below:

cd $PATH2testdata
Rscript ../../lib/scripts/mer_fig.R --input BisNonConvRate/cmt2_proj_allsam.tab.rds,MethGlobal/cmt2_proj_allsam.tab.rds,MethHeatmap/CHG_hypo_DMR_drm12cmt23_to_WT_MethHeatmap_CHG.pdf.rds --output testplot_col2.pdf --aspect_ratio 1.5 --ncol 2

Please see the following for the help information:

$ Rscript ../../lib/scripts/mer_fig.R -h

USAGE
    Usage: Rscript mer_fig.R --input <fig1.rds,fig2.rds> --labels <A,B,C,D> [options]

DESCRIPTION
    mer_fig.R is developed to merge figures into on graph.

Options
    -help | -h
            Prints the help message and exits.

    --input [required]
           - RDS files. <fig1.rds,fig2.rds...>

    --labels [optional]
           - Labesl for each figure. Default: <A,B,C,D...>

    --output [optional]
           - Output files for the graph. Default: cowplot_mer_fig.pdf

    --ncol [optional]
           - Number of columns on the graph.

    --base_height [optional]
           - The height (in inches) of each sub-plot

    --base_aspect_ratio [optional]
           -  The aspect ratio of each sub-plot. Default: 1.6

Error: Please check the help information!
Execution halted

2. Use the template below to merge multiple figures into one graph.

library(cowplot) # https://cran.r-project.org/web/packages/cowplot/vignettes/introduction.html
p1 <- readRDS("BisNonConvRate/cmt2_proj_allsam.tab.rds")
p2 <- readRDS("MethGlobal/cmt2_proj_allsam.tab.rds")

plot2by2 <- plot_grid(p1, p2,
                      labels=c("A", "B"), ncol = 2)

save_plot("plot2by2.pdf", plot2by2,
          ncol = 2, # we're saving a grid plot of 2 columns
          #nrow = 2, # and 2 rows
          # each individual subplot should have an aspect ratio of 1.3
          base_aspect_ratio = 2
          )

Here is how plot2by2.png looks like:

Further improvement of the graph can be done in Inkscape if a PDF file was generated.

Where to find help

If you have bugs, feature requests, please report the issues here: (https://github.com/readbio/ViewBS/issues).

Commercial use

ViewBS uses GNU GPLv3 and is free for use by academic users. If you want to use it in commercial settings, please contact us.

How to cite

Xiaosan Huang, Shaoling Zhang, Kongqing Li, Jyothi Thimmapuram, Shaojun Xie; ViewBS: a powerful toolkit for visualization of high-throughput bisulfite sequencing data, Bioinformatics, , btx633, https://doi.org/10.1093/bioinformatics/btx633

Authors

  • Nanjing Agricultural University

Drs. Xiaosan Huang ([email protected]), Kong-Qing Li ([email protected]) and Shaoling Zhang ([email protected]).

  • Purdue Univeristy

Drs. Shaojun Xie: (Email: [email protected]) and Jyothi Thimmapuram ([email protected])