-
Notifications
You must be signed in to change notification settings - Fork 8
Usage: Methylation Data Matrix Generation
Command:
getMatrices.sh [OPTIONS] BAM_FILE CHR_NUM
This step takes the BAM file BAM_FILE as input and generates the methylation data matrix for chromosome number CHR_NUM. By default, the file BAM_FILE and its associated index file (with extension .bai) is expected to be in BAMDIR, and the output file produced by this step is stored in a subdirectory in INTERDIR named after the chromosome number CHR_NUM. The output file preserves the prefix from the file BAM_FILE and the suffix '_matrices.mat' is appended to it (e.g. if BAM_FILE is normal_sample.bam and CHR_NUM is 10, then the output file is saved as INTERDIR/chr10/normal_sample_matrices.mat). The file produced contains the following information for each genomic region, which is subsequently used for model estimation:
-
data matrix with -1,0,1 values for methylation status
-
CpG locations broken down by region
NOTE1: We recommend taking advantage of the array feature available in SGE and SLURM based clusters to submit an individual job for each chromosome.
NOTE2: See reference [1], "Online Methods: Quality control and alignment" for our suggested preprocessing steps when generating a sorted, indexed, deduplicated BAM file to input to informME.
NOTE3: Here is the full help file for getMatrices.sh
:
Description:
This function takes a BAM file as input and generates a methylation data matrix for
a given chromosome. The BAM file is expected to be in BAMDIR, while the output
file is stored in INTERDIR by default. These directories can be modified via
optional arguments that can be passed to the function. The output file produced
contains the following information for each genomic region which is subsequently
used for model estimation:
o data matrix with -1,0,1 values for methylation status
o CpG locations broken down by region
Usage:
getMatrices.sh [OPTIONS] BAM_FILE CHR_NUM
Mandatory arguments:
o BAM_FILE: BAM file for which the methylation data matrix will be generated
o CHR_NUM: chromosome to be processed
Options:
-h|--help help
-r|--refdir directory of reference genome and CpG location files (default: $REFGENEDIR)
-b|--bamdir directory of BAM file (default: $BAMDIR)
-d|--outdir output directory (default: $INTERDIR)
-q|--threads number of threads used (default: 1)
-t|--trim number or vector of bases to be trimmed (default: 0)
-c|--c_string name convention for chromosomes: 0 => 'X'; 1 => 'chrX' (default: 1)
-p|--paired_ends Type of reads: 0 => single-ends;1 => paired-ends. Default: 1.
--tmpdir directory of intermediate files (default: $SCRATCHDIR)
--time_limit maximum time (in minuttes) allowed for each thread to complete (default: 60)
-l|--MATLICENSE path to MATLAB's license
Examples:
* Usage keeping default options (e.g., no trimming or multithreading):
getMatrices.sh sample_1.bam 1
* Trimming 10 bases on each read and using 5 threads:
getMatrices.sh -q 5 -t 10 sample_1.bam 1
* Trimming different bases on each read, using 5 threads, and 'chrX' naming convention:
getMatrices.sh -q 5 -t '[15,20]' -c 1 sample_1.bam 1
Output:
MATLAB .mat file with suffix *_matrices.mat
Dependancies:
* MATLAB
* xargs
* timeout
* matrixFromBam.sh
* mergeMatrices.sh
Upstream:
fastaTCpG.sh
Downstream:
informME_run.sh
Authors:
Garrett Jenkinson <[email protected]>
Jordi Abante <[email protected]>
If you use informME, please cite:
[1] Jenkinson, G., Pujadas, E., Goutsias, J., and Feinberg, A.P. (2017), Potential energy landscapes identify the information-theoretic nature of the epigenome, Nature Genetics, 49: 719-729.
[2] Jenkinson, G., Abante, J., Feinberg, A.P., and Goutsias, J. (2018), An information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data, BMC Bioinformatics, 19:87, https://doi.org/10.1186/s12859-018-2086-5.
[3] Jenkinson, G., Abante, J., Koldobskiy, M., Feinberg, A.P., and Goutsias, J. (2019), Ranking genomic features using an information-theoretic measure of epigenetic discordance, BMC Bioinformatics, 20:175, https://doi.org/10.1186/s12859-019-2777-6.
- Home
- Software Overview
- Dependencies
- Installing InformME
- Directory Structure
- Usage
- Reference Genome Analysis
- Methylation Data Matrix Generation
- Model Estimation & Analysis
- Generate BEDGRAPH Files for Single Analysis
- Generate BEDGRAPH Files for Differential Analysis
- Postprocessing: BEDGRAPH to BW Conversion
- Postprocessing: DMR Detection
- Postprocessing: Gene Ranking
- Testing/Debugging Your Install
- FAQs
- Version History
- Licencing