sccmec - A tool for typing SCCmec cassettes in assemblies
sccmec
is a tool for typing SCCmec cassettes in assemblies. It was designed to be easy to
use. Unlike its predecessor, staphopia-sccmec,
sccmec
is much simpler to maintain and update. This is because of camlhmp
which allows a organization to be defined in a YAML file.
If you would like to become a curator for sccmec
, please let me know! This could be in the
form of adding new SCCmec types, updating existing ones, or adjusting thresholds. I'm open
to any and all suggestions!
The following SCCmec types are supported by sccmec
.
Type | Citation |
---|---|
I | Katayama et al. 2000 |
II | Katayama et al. 2000, Ito et al. 2001 |
III | Katayama et al. 2000 |
IV | Ma et al. 2002 |
V | Ito et al. 2004 |
VI | Oliveira et al. 2006 |
VII | Berglund et al. 2008 |
VIII | Zhang et al. 2009 |
IX | Li et al. 2011 |
X | Li et al. 2011 |
XI | García-Álvarez et al. 2011 |
XII | Wu et al. 2015 |
XIII | Baig et al. 2018 |
XIV | Urushibara et al. 2020 |
XV | Wang et al. 2022 |
The following SCCmec subtypes are supported by sccmec
.
SubType | Citation |
---|---|
Ia | Ito et al. 2001 |
Ib | Han et al. 2009, Oliveira et.al. 2006 |
IIa | Katayama et al. 2000, Ito et al. 2001 |
IIb | Hisata et al. 2005 |
IIc | Shore et al. 2005 |
IId | Kondp et al. 2007 |
IIe | Han et al. 2009 |
IVa | Ma et al. 2002 |
IVb | Ma et al. 2002 |
IVc | Ma et al. 2006 |
IVd | Ma et al. 2006 |
IVg | Kwon et al. 2005 |
IVh | Milheirico et al. 2007 |
IVi | Berglund et al. 2009 |
IVj | Berglund et al. 2009 |
IVk | - |
IVl | Iwao et al. 2012 |
IVm | Hosoya et al. 2014 |
IVn | - |
Va | Ito et al. 2004 |
Vb | Hisata et al. 2011 |
Vc | Li et al. 2011 |
You can install sccmec
using conda
:
conda create -n sccmec -c conda-forge -c bioconda sccmec
conda activate sccmec
sccmec --help
Note: sccmec
is utilizes the API from camlhmp
with the defaults for --yaml-targets
, --yaml-regions
, --regions
and --targets
already set. Please don't let this confuse you when you see all the camels!
Usage: sccmec [OPTIONS]
sccmec - typing SCCmec cassettes in assemblies
╭─ Required Options ──────────────────────────────────────────────────────────────────────────────╮
│ * --input -i TEXT Input file in FASTA format to classify [required] │
│ * --yaml-targets -yt TEXT YAML file documenting the targets and types [required] │
│ * --yaml-regions -yr TEXT YAML file documenting the regions and types [required] │
│ * --targets -t TEXT Query targets in FASTA format [required] │
│ * --regions -r TEXT Query regions in FASTA format [required] │
╰─────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Filtering Options ─────────────────────────────────────────────────────────────────────────────╮
│ --min-targets-pident INTEGER Minimum percent identity of targets to count a hit │
│ [default: 90] │
│ --min-targets-coverage INTEGER Minimum percent coverage of targets to count a hit │
│ [default: 80] │
│ --min-regions-pident INTEGER Minimum percent identity of regions to count a hit │
│ [default: 85] │
│ --min-regions-coverage INTEGER Minimum percent coverage of regions to count a hit │
│ [default: 83] │
╰─────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Additional Options ────────────────────────────────────────────────────────────────────────────╮
│ --prefix -p TEXT Prefix to use for output files [default: sccmec] │
│ --outdir -o PATH Directory to write output [default: ./] │
│ --force Overwrite existing reports │
│ --verbose Increase the verbosity of output │
│ --silent Only critical errors will be printed │
│ --version Print schema and camlhmp version │
│ --help Show this message and exit. │
╰─────────────────────────────────────────────────────────────────────────────────────────────────╯
As mentioned above, sccmec
utilizes the camlhmp
API. Except, please note that the
--yaml-targets
, --yaml-regions
, --regions
and --targets
options are already set to
the SCCmec defaults. This means you only need to provide the --input
option with your
assembly file.
Here's an example of how to use sccmec
using an assembly file (both uncompressed and GZip
compressed are supported):
sccmec --input tests/fasta/type-Va-AB121219.fasta.gz --prefix type-v
Running sccmec (via camlhmp) with following parameters:
--input tests/fasta/type-Va-AB121219.fasta.gz
--yaml-targets /home/rpetit3/repos/sccmec/data/sccmec-targets.yaml
--yaml-regions /home/rpetit3/repos/sccmec/data/sccmec-regions.yaml
--targets /home/rpetit3/repos/sccmec/data/sccmec-targets.fasta
--regions /home/rpetit3/repos/sccmec/data/sccmec-regions.fasta
--outdir ./
--prefix type-v
--min-targets-pident 90
--min-targets-coverage 80
--min-regions-pident 85
--min-regions-coverage 83
Starting camlhmp for SCCmec Typing (targets)...
Running blastn...
Processing target hits...
Starting camlhmp for SCCmec Typing (regions)...
Running blastn...
Processing region hits...
Final Results...
SCCmec Typing
┏━━━━━┳━━━━━┳━━━━━┳━━━━━┳━━━━━┳━━━━━┳━━━━━┳━━━━━┳━━━━━┳━━━━┳━━━━━┳━━━━┳━━━━━┳━━━━┳━━━━━┳━━━━┳━━━━━┓
┃ sa… ┃ ty… ┃ su… ┃ me… ┃ ta… ┃ re… ┃ co… ┃ hi… ┃ ta… ┃ t… ┃ re… ┃ r… ┃ ca… ┃ p… ┃ ta… ┃ r… ┃ co… ┃
┡━━━━━╇━━━━━╇━━━━━╇━━━━━╇━━━━━╇━━━━━╇━━━━━╇━━━━━╇━━━━━╇━━━━╇━━━━━╇━━━━╇━━━━━╇━━━━╇━━━━━╇━━━━╇━━━━━┩
│ ty… │ V │ Va │ + │ cc… │ Va │ 10… │ 12 │ sc… │ 1… │ sc… │ 1… │ 1.… │ m… │ │ C… │ │
│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ b… │ │
│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ on │ │
│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ 12 │ │
│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ h… │ │
│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ w… │ │
│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ o… │ │
│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ or │ │
│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ m… │ │
│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ o… │ │
│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ h… │ │
└─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴────┴─────┴────┴─────┴────┴─────┴────┴─────┘
Final predicted type written to ./type-v.tsv
Target-based results against each type written to ./type-v.targets.details.tsv
Target-based blastn results written to ./type-v.targets.blastn.tsv
Region-based results against each type written to ./type-v.regions.details.tsv
Region-based blastn results written to ./type-v.regions.blastn.tsv
If needed, you could adjust the --min-targets-pident
, --min-targets-coverage
,
--min-regions-pident
and/or --min-regions-coverage
options to be more or less
depending on your needs. But please note the defaults are set to the recommended values.
Once the tool has completed, you will find five output files in the current directory which described below.
camlhmp-blast
will generate three output files:
File Name | Description |
---|---|
{PREFIX}.tsv |
A tab-delimited file with the predicted type |
{PREFIX}.targets.blastn.tsv |
A tab-delimited file of all target-specific blast hits |
{PREFIX}.targets.details.tsv |
A tab-delimited file with details for each type based on targets |
{PREFIX}.regions.blastn.tsv |
A tab-delimited file of all full cassette blast hits |
{PREFIX}.regions.details.tsv |
A tab-delimited file with details for each type based on full cassettes |
sample type subtype mecA targets regions coverage hits target_schema target_schema_version region_schema region_schema_version camlhmp_version params target_comment region_comment comment
type-v V Va + ccrC1,IS431,IS431_1,IS431_2,mecA,mecR1 Va 100.00 12 sccmec_targets 1.2.0 sccmec_regions 1.2.0 1.0.1 min-targets-coverage=80;min-targets-pident=90;min-regions-coverage=83;min-regions-pident=85 Coverage based on 12 hits;There were one or more overlapping hits
Column | Description |
---|---|
sample | The sample name as determined by --prefix |
type | The predicted type (based on targets and full cassettes) |
subtype | The predicted subtype (based on full cassettes) |
mecA | The mecA gene status (+=present or -=absent or not a significant hit) |
targets | The targets for the given type that had a hit |
regions | The regions for the given type that had a hit |
coverage | The coverage of the full cassette in the regions column |
hits | The number of hits that made up the full cassette coverage |
target_schema | The schema used to determine the type based on targets |
target_schema_version | The version of the schema used to determine the type based on targets |
region_schema | The schema used to determine the type based on full cassettes |
region_schema_version | The version of the schema used to determine the type based on full cassettes |
camlhmp_version | The version of camlhmp used to determine the type |
params | The parameters used to determine the type |
target_comment | A small comment about the target results |
region_comment | A small comment about the region results |
comment | A small comment about the final result |
qseqid sseqid pident qcovs qlen slen length nident mismatch gapopen qstart qend sstart send evalue bitscore
ccrC1 AB121219.1 100.000 100 1623 28612 1623 1623 0 0 1 1623 16132 17754 0.0 2998
ccrC1 AB121219.1 90.439 100 1677 28612 1684 1523 148 12 1 1677 16132 17809 0.0 2206
IS431_1 AB121219.1 100.000 100 791 28612 791 791 0 0 1 791 8221 9011 0.0 1461
IS431_1 AB121219.1 98.085 100 791 28612 731 717 14 0 1 731 3423 2693 0.0 1273
IS431_1 AB121219.1 99.704 100 675 28612 675 673 2 0 1 675 2693 3367 0.0 1236
...
This is the standard BLAST output with -outfmt 6
sample type status targets missing schema schema_version camlhmp_version params comment
type-v I False IS431,mecA,mecR1 ccrA1,ccrB1,IS1272 sccmec_targets 1.2.0 1.0.1 min-coverage=90;min-pident=80
type-v II False IS431,mecA,mecR1 ccrA2,ccrB2,mecI sccmec_targets 1.2.0 1.0.1 min-coverage=90;min-pident=80
type-v III False IS431,mecA,mecR1 ccrA3,ccrB3,mecI sccmec_targets 1.2.0 1.0.1 min-coverage=90;min-pident=80
type-v IV False IS431,mecA,mecR1 ccrA2,ccrB2,IS1272 sccmec_targets 1.2.0 1.0.1 min-coverage=90;min-pident=80
type-v V True ccrC1,IS431_1,mecA,mecR1,IS431_2 sccmec_targets 1.2.0 1.0.1 min-coverage=90;min-pident=80
type-v VI False IS431,mecA,mecR1 ccrA4,ccrB4,IS1272 sccmec_targets 1.2.0 1.0.1 min-coverage=90;min-pident=80
type-v VII False ccrC1,IS431_1,mecA,mecR1,IS431_2 IS12960D sccmec_targets 1.2.0 1.0.1 min-coverage=90;min-pident=80
type-v VIII False IS431,mecA,mecR1 ccrA4,ccrB4,mecI sccmec_targets 1.2.0 1.0.1 min-coverage=90;min-pident=80 Excluded target ccrC1 found, failing type VIII
type-v IX False IS431_1,mecA,mecR1,IS431_2 ccrA1,ccrB1 sccmec_targets 1.2.0 1.0.1 min-coverage=90;min-pident=80
type-v X False IS431_1,mecA,mecR1,IS431_2 ccrA1,ccrB6 sccmec_targets 1.2.0 1.0.1 min-coverage=90;min-pident=80
type-v XI False mecA,mecR1 ccrA1,ccrB3,blaZ,mecI sccmec_targets 1.2.0 1.0.1 min-coverage=90;min-pident=80
type-v XII False IS431_1,mecA,mecR1,IS431_2 ccrC2 sccmec_targets 1.2.0 1.0.1 min-coverage=90;min-pident=80
type-v XIII False IS431,mecA,mecR1 ccrC2,mecI sccmec_targets 1.2.0 1.0.1 min-coverage=90;min-pident=80
type-v XIV False ccrC1,IS431,mecA,mecR1 mecI sccmec_targets 1.2.0 1.0.1 min-coverage=90;min-pident=80
type-v XV False IS431,mecA,mecR1 ccrA1,ccrB6,mecI sccmec_targets 1.2.0 1.0.1 min-coverage=90;min-pident=80
This file provides a detailed view of the results. The columns are:
Column | Description |
---|---|
sample | The sample name as determined by --prefix |
type | The type being tested |
status | The status of the type (True if failed) |
targets | The targets for the given type that had a match |
missing | The targets for the given type that were not found |
schema | The schema used to determine the type |
schema_version | The version of the schema used to determine the type |
camlhmp_version | The version of camlhmp used to determine the type |
params | The parameters used to determine the type |
comment | A small comment about the result |
qseqid sseqid pident qcovs qlen slen length nident mismatch gapopen qstart qend sstart send evalue bitscore
III AB121219.1 99.371 25 68256 28612 4132 4106 26 0 24230 28361 8220 4089 0.0 7487
III AB121219.1 86.738 25 68256 28612 5067 4395 628 42 59204 64248 17954 12910 0.0 5594
III AB121219.1 94.259 25 68256 28612 3240 3054 172 11 44582 47815 22419 19188 0.0 4940
III AB121219.1 98.421 25 68256 28612 1837 1808 25 4 27952 29787 4458 2625 0.0 3229
III AB121219.1 99.494 25 68256 28612 791 787 3 1 34225 35015 3423 2634 0.0 1437
...
This is the standard BLAST output with -outfmt 6
sample type status targets missing coverage hits schema schema_version camlhmp_version params comment
type-v Ia False Ia 17.67 12 sccmec_regions 1.2.0 1.0.1 min-coverage=85;min-pident=83 Coverage based on 12 hits;There were one or more overlapping hits
type-v Ib False Ib 16.61 2 sccmec_regions 1.2.0 1.0.1 min-coverage=85;min-pident=83 Coverage based on 2 hits
type-v IIa False IIa 11.85 11 sccmec_regions 1.2.0 1.0.1 min-coverage=85;min-pident=83 Coverage based on 11 hits;There were one or more overlapping hits
type-v IIb False IIb 0.00 0 sccmec_regions 1.2.0 1.0.1 min-coverage=85;min-pident=83
type-v IIc False IIc 17.39 4 sccmec_regions 1.2.0 1.0.1 min-coverage=85;min-pident=83 Coverage based on 4 hits;There were one or more overlapping hits
type-v IId False IId 0.00 0 sccmec_regions 1.2.0 1.0.1 min-coverage=85;min-pident=83
type-v IIe False IIe 1.54 1 sccmec_regions 1.2.0 1.0.1 min-coverage=85;min-pident=83
type-v III False III 24.50 18 sccmec_regions 1.2.0 1.0.1 min-coverage=85;min-pident=83 Coverage based on 18 hits;There were one or more overlapping hits
type-v IVa False IVa 29.35 13 sccmec_regions 1.2.0 1.0.1 min-coverage=85;min-pident=83 Coverage based on 13 hits;There were one or more overlapping hits
type-v IVb False IVb 33.19 12 sccmec_regions 1.2.0 1.0.1 min-coverage=85;min-pident=83 Coverage based on 12 hits;There were one or more overlapping hits
type-v IVc False IVc 23.56 14 sccmec_regions 1.2.0 1.0.1 min-coverage=85;min-pident=83 Coverage based on 14 hits;There were one or more overlapping hits
type-v IVd False IVd 7.78 1 sccmec_regions 1.2.0 1.0.1 min-coverage=85;min-pident=83
type-v IVg False IVg 30.66 12 sccmec_regions 1.2.0 1.0.1 min-coverage=85;min-pident=83 Coverage based on 12 hits;There were one or more overlapping hits
type-v IVi False IVi 30.85 12 sccmec_regions 1.2.0 1.0.1 min-coverage=85;min-pident=83 Coverage based on 12 hits;There were one or more overlapping hits
type-v IVj False IVj 30.58 12 sccmec_regions 1.2.0 1.0.1 min-coverage=85;min-pident=83 Coverage based on 12 hits;There were one or more overlapping hits
type-v IVk False IVk 16.00 12 sccmec_regions 1.2.0 1.0.1 min-coverage=85;min-pident=83 Coverage based on 12 hits;There were one or more overlapping hits
type-v IVl False IVl 19.79 13 sccmec_regions 1.2.0 1.0.1 min-coverage=85;min-pident=83 Coverage based on 13 hits;There were one or more overlapping hits
type-v IVm False IVm 25.73 14 sccmec_regions 1.2.0 1.0.1 min-coverage=85;min-pident=83 Coverage based on 14 hits;There were one or more overlapping hits
type-v IVn False IVn 28.15 12 sccmec_regions 1.2.0 1.0.1 min-coverage=85;min-pident=83 Coverage based on 12 hits;There were one or more overlapping hits
type-v Va True Va 100.00 12 sccmec_regions 1.2.0 1.0.1 min-coverage=85;min-pident=83 Coverage based on 12 hits;There were one or more overlapping hits
type-v Vb False Vb 64.55 17 sccmec_regions 1.2.0 1.0.1 min-coverage=85;min-pident=83 Coverage based on 17 hits;There were one or more overlapping hits
type-v Vc False Vc 50.14 17 sccmec_regions 1.2.0 1.0.1 min-coverage=85;min-pident=83 Coverage based on 17 hits;There were one or more overlapping hits
type-v VI False VI 29.79 12 sccmec_regions 1.2.0 1.0.1 min-coverage=85;min-pident=83 Coverage based on 12 hits;There were one or more overlapping hits
type-v VII False VII 45.86 15 sccmec_regions 1.2.0 1.0.1 min-coverage=85;min-pident=83 Coverage based on 15 hits;There were one or more overlapping hits
type-v VIII False VIII 16.95 9 sccmec_regions 1.2.0 1.0.1 min-coverage=85;min-pident=83 Coverage based on 9 hits;There were one or more overlapping hits
type-v IX False IX 15.33 11 sccmec_regions 1.2.0 1.0.1 min-coverage=85;min-pident=83 Coverage based on 11 hits;There were one or more overlapping hits
type-v X False X 13.68 16 sccmec_regions 1.2.0 1.0.1 min-coverage=85;min-pident=83 Coverage based on 16 hits;There were one or more overlapping hits
type-v XI False XI 0.00 0 sccmec_regions 1.2.0 1.0.1 min-coverage=85;min-pident=83
type-v XII False XII 19.37 15 sccmec_regions 1.2.0 1.0.1 min-coverage=85;min-pident=83 Coverage based on 15 hits;There were one or more overlapping hits
type-v XIII False XIII 28.39 12 sccmec_regions 1.2.0 1.0.1 min-coverage=85;min-pident=83 Coverage based on 12 hits;There were one or more overlapping hits
type-v XIV False XIV 14.50 16 sccmec_regions 1.2.0 1.0.1 min-coverage=85;min-pident=83 Coverage based on 16 hits;There were one or more overlapping hits
type-v XV False XV 17.21 11 sccmec_regions 1.2.0 1.0.1 min-coverage=85;min-pident=83 Coverage based on 11 hits;There were one or more overlapping hits
This file provides a detailed view of the results. The columns are:
Column | Description |
---|---|
sample | The sample name as determined by --prefix |
type | The type being tested |
status | The status of the type (True if failed) |
targets | The targets for the given type that had a match |
missing | The targets for the given type that were not found |
coverage | The coverage of the full cassette |
hits | The number of hits that made up the full cassette coverage |
schema | The schema used to determine the type |
schema_version | The version of the schema used to determine the type |
camlhmp_version | The version of camlhmp used to determine the type |
params | The parameters used to determine the type |
comment | A small comment about the result |
If you use sccmec
in your research, please cite the following:
-
camlgmp
🐪Classification through yAML Heuristic Mapping Protocol 🐪
Petit III RA camlhmp: Classification through yAML Heuristic Mapping Protocol (GitHub) -
BLAST
Basic Local Alignment Search Tool
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009)
I considered thinking of a fun name for this tool, but sometimes it's best to get straight
to the point! So, here we are with sccmec
.
I'm not a lawyer and MIT has always been my go-to license. So, MIT it is!