Skip to content

E.coli serotyping with QC module and adaptive thresholding

Latest
Compare
Choose a tag to compare
@kbessonov1984 kbessonov1984 released this 24 Apr 05:39
· 4 commits to master since this release

Major improvements:

  • Incorporation of Quality Control module allowing for easier results interpretation and any need for correction measure (re-sequencing, wet-lab serotyping). Unique thresholding at allele level allowing to determine if a given allele and query quality parameters (%identity and %coverage) are sufficient to resolve an antigen call unambiguously.
  • Cluster friendly behaviour supporting multiple instances via a .lock file preventing racing conditions and simultaneous database update via several instances
  • An updated database of alleles with the removal of duplicated or truncated alleles (e.g. O157 antigen)
  • Improved species identification resolution for highly similar non-Ecoli species such as Shigella and E.albertii. Now species identification is only done via MASH NCBI RefSeq sketch (https://gembox.cbcb.umd.edu/mash/refseq.genomes.k21s1000.msh)
  • Users can add new alleles to an existing allele database and make serotype predictions via custom allele database thanks to --dbpath parameter
  • Improved O and H antigens call rates and accuracy thanks to decoupling of %identity and %coverage thresholds for each antigen. Now global thresholds could be specified separately. This is especially important if one of the antigen genes (e.g. wzx/wzy or fliC, etc) is truncated or has low coverage
  • Improved adaptive O antigen calling rates if only a single O antigen candidate in preliminary BLAST results is available making accurate O antigen call even in poorly sequenced samples with minimal coverage.
  • Addition of mixed O antigen calls for highly similar O antigens (e.g. O17/O77)
  • Allele names/keys used to make antigen calls are also reported making easier troubleshooting for dubious alleles and alleles database cleaning
  • More detailed error messages and support for 16 high similarity O-antigens (%identity > 99%) based on the reference publication PMID: 25428893