qiyuanhuakai/MetaCSST
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Repository files navigation
############################################################################################### ##Package: Metagenomic Complex Sequence Scanning Tool (MetaCSST) ## ##Developer: Fazhe Yan ## ##Email: fazheyan33@163.com ; ccwei@sjtu.edu ## ##Department: Department of Bioinformatics and Biostatistics, Shanghai Jiao Tong University ## ############################################################################################### ################## ## Introduction ## ################## Metagenomic Complex Sequencing Scanning Tool (MetaCSST) is a tool to predict DGRs in sequenced genomes as well as metagenomic datasets. It is based on Generalized Hidden Markov Model (GHMM), using motif patterns to identify the elements in DGRs. ############### ## Copyright ## ############### This software is free for personal, academic and non-profit use from https://github.com/fzyan/MetaCSST (GitHub website) For commercial users, please contact <ccwei@sjtu.edu.cn>. ######################### ## System requirements ## ######################### Linux operation system, memory 2G to use multiple threads. Python 3.9+ (numpy, numba, pyfastx) and gcc with C++20 support. ########### ## Usage ## ########### 1>Identify sub structures (TR, VR or RT) in DGRs: ./MetaCSSTsub -build config.json -in $fa [-out $out_dir] [-thread $thread] # $fa : input file in FASTA format # $out_dir : output directory. If not given, the default out directory will be "out_metacsst" # $thread : thread number, default 1 2>DGR prediction Step1: ./MetaCSSTmain -build config.json -in $fa [-out $out_dir1] [-thread $thread] # Identification of the sub structures using GHMM Step2: python3 src/call_vr.py $out_dir1/raw.gtf $fa $out-DGR # calling VRs and removing duplicate TR-VR pairs # Note: legacy *.config files are no longer supported. # Use single-file config.json / config.toml / config.yaml only. ############### ## OUT files ## ############### 1>Identify sub structures (TR, VR or RT) in DGRs: out_dir/out.txt : Identified sub structures out_dir/align.txt : count matrix for each position, used to build PWMs out_dir/score.txt : PWMs (scoring matrices) 2>DGR prediction Step1: out_tmp1/raw.gtf : TRs and RTs identified. Step2: out-DGR.gtf : Final DGR output generated by call_vr.py ########### ## Files ## ########### |-MetaCSSTmain executable program to predict DGRs |-MetaCSSTsub executable program to identify TRs, VRs or RTs |-config.json / config.toml / config.yaml single-file config files in the GHMM |-align/*align align matrix used to develop the GHMM |-src/main_modern.cpp source code to build MetaCSSTmain |-src/sub_modern.cpp source code to build MetaCSSTsub |-src/ghmm_modern.hpp GHMM core |-src/fun_modern.hpp utility functions |-src/config_modern.hpp config parsing utilities |-src/call_vr.py VR calling + duplicate removal |-addition/* collected/training/test data |-example and example.sh example pipeline to identify DGRs ################## ## Installation ## ################## MetaCSSTmain and MetaCSSTsub are executable programs. If you want to modify the codes and recompile: g++ -std=c++20 -O2 -Wall -Wextra -pthread src/main_modern.cpp -o MetaCSSTmain g++ -std=c++20 -O2 -Wall -Wextra -pthread src/sub_modern.cpp -o MetaCSSTsub ############# ## Contact ## ############# If you have any questions, feel free to contact us: fazheyan33@163.com ccwei@sjtu.edu.cn