forked from fzyan/MetaCSST
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathREADME
More file actions
87 lines (75 loc) · 3.95 KB
/
README
File metadata and controls
87 lines (75 loc) · 3.95 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
###############################################################################################
##Package: Metagenomic Complex Sequence Scanning Tool (MetaCSST) ##
##Developer: Fazhe Yan ##
##Email: fazheyan33@163.com ; ccwei@sjtu.edu ##
##Department: Department of Bioinformatics and Biostatistics, Shanghai Jiao Tong University ##
###############################################################################################
##################
## Introduction ##
##################
Metagenomic Complex Sequencing Scanning Tool (MetaCSST) is a tool to predict DGRs in sequenced genomes as well as metagenomic datasets. It is based on Generalized Hidden Markov Model (GHMM), using motif patterns to identify the elements in DGRs.
###############
## Copyright ##
###############
This software is free for personal, academic and non-profit use from https://github.com/fzyan/MetaCSST (GitHub website)
For commercial users, please contact <ccwei@sjtu.edu.cn>.
#########################
## System requirements ##
#########################
Linux operation system, memory 2G to use multiple threads.
Python 3.9+ (numpy, numba, pyfastx) and gcc with C++20 support.
###########
## Usage ##
###########
1>Identify sub structures (TR, VR or RT) in DGRs:
./MetaCSSTsub -build config.json -in $fa [-out $out_dir] [-thread $thread]
# $fa : input file in FASTA format
# $out_dir : output directory. If not given, the default out directory will be "out_metacsst"
# $thread : thread number, default 1
2>DGR prediction
Step1: ./MetaCSSTmain -build config.json -in $fa [-out $out_dir1] [-thread $thread]
# Identification of the sub structures using GHMM
Step2: python3 src/call_vr.py $out_dir1/raw.gtf $fa $out-DGR
# calling VRs and removing duplicate TR-VR pairs
# Note: legacy *.config files are no longer supported.
# Use single-file config.json / config.toml / config.yaml only.
###############
## OUT files ##
###############
1>Identify sub structures (TR, VR or RT) in DGRs:
out_dir/out.txt : Identified sub structures
out_dir/align.txt : count matrix for each position, used to build PWMs
out_dir/score.txt : PWMs (scoring matrices)
2>DGR prediction
Step1:
out_tmp1/raw.gtf : TRs and RTs identified.
Step2:
out-DGR.gtf : Final DGR output generated by call_vr.py
###########
## Files ##
###########
|-MetaCSSTmain executable program to predict DGRs
|-MetaCSSTsub executable program to identify TRs, VRs or RTs
|-config.json / config.toml / config.yaml single-file config files in the GHMM
|-align/*align align matrix used to develop the GHMM
|-src/main_modern.cpp source code to build MetaCSSTmain
|-src/sub_modern.cpp source code to build MetaCSSTsub
|-src/ghmm_modern.hpp GHMM core
|-src/fun_modern.hpp utility functions
|-src/config_modern.hpp config parsing utilities
|-src/call_vr.py VR calling + duplicate removal
|-addition/* collected/training/test data
|-example and example.sh example pipeline to identify DGRs
##################
## Installation ##
##################
MetaCSSTmain and MetaCSSTsub are executable programs.
If you want to modify the codes and recompile:
g++ -std=c++20 -O2 -Wall -Wextra -pthread src/main_modern.cpp -o MetaCSSTmain
g++ -std=c++20 -O2 -Wall -Wextra -pthread src/sub_modern.cpp -o MetaCSSTsub
#############
## Contact ##
#############
If you have any questions, feel free to contact us:
fazheyan33@163.com
ccwei@sjtu.edu.cn