6
6
* The GTDB genomes are expected to be downloaded and annotated.
7
7
*
8
8
* The workflow starts from a set of annotated genomes in the format of faa.gz files (--inputfaas)
9
- * and gff.gz files (--inputgffs) plus a set of hmm profiles (--hmms). The protein sequences will be
10
- * searched with HMMER using the hmm files and subsequently classified into which profile it fits
11
- * best into. The latter uses a table describing the hierarchy of hmm profiles
9
+ * and, optionally, gff.gz files (--inputgffs) plus a set of hmm profiles (--hmms). The protein
10
+ * sequences will be searched with HMMER using the hmm files and subsequently classified into which
11
+ * profile it fits best into. The latter uses a table describing the hierarchy of hmm profiles
12
12
* (--profiles_hierarchy; see --help).
13
13
*
14
14
* Requirements:
15
15
* directory with faa.gz files
16
- * directory with .gff.gz files
17
16
* directory with all hmm profiles to be run
18
17
* file describing the hmm profile hierarchy
19
18
*
20
19
* Processing steps:
21
20
* Concatenate all faa.gz files into a single one
22
- * Concatenate all gff.gz files into a single one
21
+ * Optionally, concatenate all gff.gz files into a single one
23
22
* Perform an hmmsearch of all hmm profiles on all the proteomes
24
23
* Download the metadata files for archaeal and bacterial genomes from gtdb latest version
25
24
* repository and concatenate them into a single metadata file
@@ -50,11 +49,10 @@ def helpMessage() {
50
49
51
50
The typical command for running the pipeline is as follows:
52
51
53
- nextflow run main.nf --inputfaas path/to/genomes.faa.gzs --inputgffs path/to/genomes.gff.gzs --outputdir path/to/results --hmm_mincov value --dbsource GTDB:GTDB:release
52
+ nextflow run main.nf --inputfaas path/to/genomes.faa.gzs [ --inputgffs path/to/genomes.gff.gzs] --outputdir path/to/results --hmm_mincov value --dbsource GTDB:GTDB:release
54
53
55
54
Mandatory arguments:
56
55
--inputfaas path/to/genomes.faa.gzs Path of directory containing annotated genomes in the format faa.gz
57
- --inputgffs path/to/genomes.gff.gzs Path of directory containing annotated genomes in the format gff.gz
58
56
--gtdb_bac_metadata path/to/file Path of tsv file including the metadata for bacterial genomes
59
57
--gtdb_arc_metadata path/to/file Path of tsv file including the metadata for archaeal genomes
60
58
--hmms path/to/hmm_directory Path of directory with HMM profile files
@@ -66,6 +64,7 @@ def helpMessage() {
66
64
--featherprefix prefix Prefix for generated feather files (default "pfitmap-gtdb").
67
65
68
66
Non Mandatory parameters:
67
+ --inputgffs path/to/genomes.gff.gzs Path of directory containing annotated genomes in the format gff.gz
69
68
--max_cpus Maximum number of CPU cores to be used (default = 2)
70
69
--max_time Maximum time per process (default = 10 days)
71
70
@@ -98,7 +97,12 @@ if( !params.gtdb_bac_metadata ) {
98
97
99
98
// Create channels to start processing
100
99
genome_faas = Channel . fromPath(params. inputfaas, checkIfExists : true )
101
- genome_gffs = Channel . fromPath(params. inputgffs, checkIfExists : true )
100
+ if ( params. inputgffs ) {
101
+ genome_gffs = Channel . fromPath(params. inputgffs, checkIfExists : true )
102
+ }
103
+ else {
104
+ genome_gffs = Channel . empty()
105
+ }
102
106
hmm_files = Channel . fromPath(" $params . hmms /*.hmm" )
103
107
profiles_hierarchy = Channel . fromPath(params. profiles_hierarchy, checkIfExists : true )
104
108
dbsource = Channel . value(params. dbsource)
0 commit comments