diff --git a/README.md b/README.md index 4a8dfc0..a06aace 100644 --- a/README.md +++ b/README.md @@ -274,6 +274,27 @@ dbxrefs | ``, `db:id`, `,` separated list | `VFDB:VF0511` Protein sequences provided in short Fasta or GenBank format are searched with default thresholds of 90%, 80% and 80% for minimal identity, query and subject coverage, respectively. +#### User-provided HMMs + +Bakta accepts user-provided trusted HMMs via `--hmms` in HMMER's text format. If set, Bakta will adhere to the *trusted cutoff* specified in the HMM header. In addition, a max. evalue threshold of 1e-6 is applied. By default, Bakta used the HMM description line as a product description. Further information can be provided via the HMM description line using the *short* format as explained above. + +```bash +# default +HMMER3/f [3.1b2 | February 2015] +NAME id +ACC id +DESC product +LENG 435 +TC 600 600 + +# short +NAME id +ACC id +DESC gene~~~product~~~dbxrefs +LENG 435 +TC 600 600 +``` + ### Output Annotation results are provided in standard bioinformatics file formats: @@ -389,6 +410,7 @@ Annotation: Replicon information table (tsv/csv) --regions REGIONS Path to pre-annotated regions in GFF3 or Genbank format (regions only, no functional annotations). --proteins PROTEINS Fasta file of trusted protein sequences for CDS annotation + --hmms HMMS HMM file of trusted hidden markov models in HMMER format for CDS annotation --meta Run in metagenome mode. This only affects CDS prediction. Workflow: