- About
- Ontology based data specifications
- The Contextual Data Specification Package
- Contacts
- License
- Acknowledgements
Labs collect, encode and store information in different ways. They use different fields, terms and formats, they categorize variables in different ways, and the meanings of words change depending on the focus of the organization. This variability makes comparing, integrating and analyzing data generated by different organizations like trying to compare apples, oranges and bananas, which is difficult to do. Ontologies are collections of controlled vocabulary that are arranged in a hierarchy, where all the terms are linked using logical relationships. Ontologies are open source and meant to represent “universal truth” as much as possible (so not tied to one organization’s vocabulary of use case). Using ontology terms to standardize contextual data not only helps make data more interoperable by using a common language, it also helps to make contextual data FAIR (Findable, Accessible, Interoperable, Reusable).
Microbial genotypes are not usually not included in sequence records in public repositories. As a result, community members routinely download large datasets and type organisms of interest repeatedly to answer different public health questions - resulting in a duplication of effort that is both time and resource intensive. Furthermore, provenance (methods) are often lacking when genotypes are shared, impacting comparability of results. It is also difficult to identify/filter desired genotypes as typing attributes are non-standardized or missing.
The goal of the PHA4GE Microbial Genotyping Specification project is to create a set of standardized, machine-readable attributes for communicating genotyping methods and results across laboratories.
These attributes could be implemented by/in public repositories when typing results are included in sequence records. Such ‘tags” could better enable searching, filtering, and querying of microbial types relevant to public health. The tags can be used to standardize/summarize methods in supplementary material in manuscripts improving data ingestion for downstream applications (e.g. analytical/visualization tools). Public health analysis platforms can also use the standardized attributes to harmonize typing information ingested from different data streams and from different data providers.
The specification is currently at the draft stage. The initial draft was developed by members of the PHA4GE Data Structures Working Group, with contributions from participants of the pre-IMMEM XIV 12th Microbial Bioinformatics Hackathon held in Porto, Portugal (September, 2025).
New terms and/or term changes can be requested using issue request forms, with additional guidance on how to do so outline in the New Term Request (NTR) SOP. This resources are available in the files of this repository and listed below under Package Contents.
Please note that development of the specification is dynamic and it will be updated periodically to address user needs. Versioning is done in the format of x.y.z
.
x
= Field level changes
y
= Term value / ID level changes
z
= Definition, guidance, example, formatting, or other uncategorized changes
Descriptions of changes are provided in release notes for every new version.
Ontology ID | Label | Definition | Guidance | Permissable values (enums) |
---|---|---|---|---|
OBI:0000435 | genotyping_method | A method which generates data about a genotype from a specimen of genomic DNA. A variety of techniques and instruments can be used to produce information about sequence variation at particular genomic positions. | Select the method from the picklist. If the desired method is missing, submit a New Term Request. | target loci based typing; MLST; cgMLST; wgMLST; rMLST; plasmidMLST; segment based typing; whole genome based typing; SNP typing; phylogenetic typing; LINcode; in silico serotyping; In silico species detection |
GENEPIO:0102163 | genotyping_schema_taxon | The taxon that the genotyping schema characterizes. | Select the taxon from the picklist. If the desired taxon is missing, submit a New Term Request. | Achromobacter [NCBITaxon:222]; Acinetobacter [NCBITaxon:469]; Acinetobacter baumannii [NCBITaxon:470]; Actinobacillus [NCBITaxon:713]; Aeromonas [NCBITaxon:642]; Aeromonas salmonicida [NCBITaxon:645]; Aggregatibacter actinomycetemcomitans [NCBITaxon:714]; Anaplasma phagocytophilum [NCBITaxon:948]; Arcobacter [NCBITaxon:28196]; Aspergillus fumigatus [NCBITaxon:746128]; Avibacterium paragallinarum [NCBITaxon:728]; Bacillus anthracis [NCBITaxon:1392]; Bacillus cereus [NCBITaxon:1396]; Bacillus licheniformis [NCBITaxon:1402]; Bacillus subtilis [NCBITaxon:1423]; Bacteroides fragilis [NCBITaxon:817]; Bartonella bacilliformis [NCBITaxon:774]; Bartonella henselae [NCBITaxon:38323]; Bartonella washoeensis [NCBITaxon:186739]; Blastocystis [NCBITaxon:12967]; Bordetella [NCBITaxon:517]; Bordetella pertussis [NCBITaxon:520]; Brachyspira [NCBITaxon:29521]; Brucella [NCBITaxon:234]; Brucella melitensis [NCBITaxon:29459]; Burkholderia cepacia [NCBITaxon:292]; Burkholderia mallei [NCBITaxon:13373]; Burkholderia pseudomallei [NCBITaxon:28450]; Campylobacter [NCBITaxon:194]; Campylobacter coli [NCBITaxon:195]; Campylobacter jejuni [NCBITaxon:197]; Candida albicans [NCBITaxon:5476]; Candida glabrata [NCBITaxon:5478]; Candida krusei [NCBITaxon:4909]; Candida tropicalis [NCBITaxon:5482]; Carnobacterium maltaromaticum [NCBITaxon:2751]; Chlamydiales [NCBITaxon:51291]; Citrobacter [NCBITaxon:544]; Clonorchis sinensis [NCBITaxon:79923]; Clostridioides [NCBITaxon:1870884]; Clostridioides difficile [NCBITaxon:1496]; Clostridium [NCBITaxon:1485]; Clostridium botulinum [NCBITaxon:1491]; Clostridium chauvoei [NCBITaxon:46867]; Clostridium perfringens [NCBITaxon:1502]; Clostridium septicum [NCBITaxon:1504]; Corynebacterium [NCBITaxon:1716]: Corynebacterium diptheriae [NCBITaxon:1717]; Corynebacterium pseudotuberculosis [NCBITaxon:1719]; Cronobacter [NCBITaxon:413496]; Cronobacter malonaticus [NCBITaxon:413503]; Cronobacter sakazakii [NCBITaxon:28141]; Cutibacterium acnes [NCBITaxon:1747]; Dichelobacter nodosus [NCBITaxon:870]; Edwardsiella [NCBITaxon:635]; Enterobacter [NCBITaxon:547]; Enterobacter cloacae [NCBITaxon:550]; Enterobacter hormaechei [NCBITaxon:158836]; Enterococcus [NCBITaxon:1350]; Enterococcus faecium [NCBITaxon:1352]; Enterococcus faecalis [NCBITaxon:1351]; Enterococcus hirae [NCBITaxon:1354]; Escherichia [NCBITaxon:561]; Escherichia coli [NCBITaxon:562]; Escherichia fergusonii [NCBITaxon:564]; Flavobacterium psychrophilum [NCBITaxon:96345]; Francisella tularensis [NCBITaxon:263]; Gallibacterium anatis [NCBITaxon:750]; Geotrichum [NCBITaxon:43987]; Glaessarella parasuis [NCBITaxon:738]; Haemophilus [NCBITaxon:724]; Haemophilus influenzae [NCBITaxon:727]; Helicobacter cinaedi [NCBITaxon:213]; Helicobacter pylori [NCBITaxon:210]; Helicobacter suis [NCBITaxon:104628]; Influenza A virus [NCBITaxon:11320]; Klebsiella [NCBITaxon:570]; Klebsiella aerogenes [NCBITaxon:548]; Klebsiella grimontii [NCBITaxon:2058152]; Klebsiella oxytoca [NCBITaxon:571]; Klebsiella michiganensis [NCBITaxon:1134687]; Klebsiella pasteurii [NCBITaxon:2587529]; Klebsiella pneumoniae [NCBITaxon:573]; Klebsiella pneumoniae subsp. pneumoniae [NCBITaxon:72407]; Klebsiella quasipneumoniae [NCBITaxon:1463165]; Klebsiella variicola [NCBITaxon:244366]; Kluyvera [NCBITaxon:579]; Kudoa septempunctata [NCBITaxon:751907]; Lactobacillus salivarius [NCBITaxon:1624]; Lactococcus garvieae [NCBITaxon:1363]; Legionella [NCBITaxon:445]; Legionella pneumophila [NCBITaxon:446]; Leptospira [NCBITaxon:171]; Liberibacter solanacearum [NCBITaxon:556287]; Listeria [NCBITaxon:1637]; Listeria monocytogenes [NCBITaxon:1639]; Macrococcus canis [NCBITaxon:1855823]; Macrococcus caseolyticus [NCBITaxon:69966]; Mammaliicoccus sciuri [NCBITaxon:1296]; Mannheimia haemolytica [NCBITaxon:75985]; Melissococcus plutonius [NCBITaxon:33970]; Moraxella [NCBITaxon:475]; Morganella morganii [NCBITaxon:582]; Mycobacteriaceae [NCBITaxon:1762]; Mycobacterium [NCBITaxon:1763]; Mycobacterium africanum [NCBITaxon:33894]; Mycobacterium bovis [NCBITaxon:1765]; Mycobacterium canetti [NCBITaxon:78331]; Mycobacterium tuberculosis [NCBITaxon:1773]; Mycobacterioides abscessus [NCBITaxon:36809]; Mycoplasma agalactiae [NCBITaxon:2110]; Mycoplasma anserisalpingitidis [NCBITaxon:519450]; Mycoplasma bovis [NCBITaxon:28903]; Mycoplasma flocculare [NCBITaxon:2128]; Mycoplasma gallisepticum [NCBITaxon:2096]; Mycoplasma genitalium [NCBITaxon:2097]; Mycoplasma hominis [NCBITaxon:2098]; Mycoplasma hypopneumoniae [NCBITaxon:2099]; Mycoplasma hyorhinis [NCBITaxon:2100]; Mycoplasma hyosynoviae [Mycoplasma hyosynoviae]; Mycoplasma iowae [NCBITaxon:2116]; Mycoplasma pneumoniae [NCBITaxon:722438]; Mycoplasma synoviae [NCBITaxon:2109]; Neisseria [NCBITaxon:482]; Neisseria meningitidis [NCBITaxon:487]; Orientia tustsugamushi [NCBITaxon:784]; Ornithobacterium rhinotracheale [NCBITaxon:28251]; Paenibacillus larvae [NCBITaxon:1464]; Pasteurella multocida [NCBITaxon:747]; Pediococcus pentosaceus [NCBITaxon:1255]; Photobacterium damselae [NCBITaxon:38293]; Photorhabdus [NCBITaxon:29487]; Piscirickettsia salmonis [NCBITaxon:1238]; Porphyromonas gingivalis [NCBITaxon:837]; Proteus [NCBITaxon:583]; Proteus mirabilis [NCBITaxon:584]; Providencia stuartii [NCBITaxon:588]; Pseudomonas [NCBITaxon:286]; Pseudomonas aeruginosa [NCBITaxon:287]; Pseudomonas fluorescens [NCBITaxon:294]; Pseudomonas putida [NCBITaxon:303]; Rhodococcus anatipestifer [NCBITaxon:1827]; Riemerella [NCBITaxon:34085]; Salmonella [NCBITaxon:590]; Salmonella enterica [NCBITaxon:28901]; Salmonella enterica subsp. enterica [NCBITaxon:59201]; Saprolegnia parasitica [NCBITaxon:101203]; Serratia [NCBITaxon:613]; Serratia marcescens [NCBITaxon:615]; Severe Acute Respiratory Syndrome Coronavirus 2 [NCBITaxon:2697049]; Shewanella [NCBITaxon:22]; Sinorhizobium [NCBITaxon:28105]; Staphylococcus [NCBITaxon:1279]; Staphylococcus argenteus [NCBITaxon:985002]; Staphylococcus aureus [NCBITaxon:1280]; Staphylococcus capitis [NCBITaxon:29388]; Staphylococcus chromogenes [NCBITaxon:46126]; Staphylococcus epidermis [NCBITaxon:1282]; Staphylococcus haemolyticus [NCBITaxon:1283]; Staphylococcus hominis [NCBITaxon:1290]; Staphylococcus pseudintermedius [NCBITaxon:283734]; Stenotrophomonas maltophilia [NCBITaxon:40324]; Streptococcus [NCBITaxon:1301]; Streptococcus agalactiae [NCBITaxon:1311]; Streptococcus bovis [NCBITaxon:1335]; Streptococcus canis [NCBITaxon:1329]; streptococcus dysgalactiae [NCBITaxon:1334]; Streptococcus equinus [NCBITaxon:1335; Streptococcu gallolyticus [NCBITaxon:315405]; Streptococcus iniae [NCBITaxon:1346]; Streptococcus mitis [NCBITaxon:28037]; Streptococcus pneumoniae [NCBITaxon:1313]; Streptococcus pyogenes [NCBITaxon:1314; Streptococcus suis [NCBITaxon:1307]; Streptococcu thermophilus [NCBITaxon:1308]; Streptococcus uberis [NCBITaxon:1349]; Streptococcus zooepidemicus [NCBITaxon:40041]; Shigella [NCBITaxon:620]; Taylorella [NCBITaxon:29574]; Tenacibaculum [NCBITaxon:104267]; Treponema pallidum [NCBITaxon:160]; Trichomonas vaginalis [NCBITaxon:5722]; Ureaplasma [NCBITaxon:2129]; Vibrio [NCBITaxon:662]; Streptomyces [NCBITaxon:1883]; Vibrio cholerae [NCBITaxon:666]; Vibrio parahaemolyticus [NCBITaxon:670]; Vibrio tapetis [NCBITaxon:52443]; Vibrio vulnificus [NCBITaxon:672]; Wolbachia [NCBITaxon:953]; Xanthomonas citri [NCBITaxon:346]; Xylella fastidiosa [NCBITaxon:2371]; Yersinia [NCBITaxon:629]; Yersinia enterocolitica [NCBITaxon:630]; Yersinia pseudotuberculosis [NCBITaxon:1649845]; Yersinia ruckeri [NCBITaxon:29486] |
GENEPIO:0102164 | genotyping_database_name | The name of the database containing a set of allelic profiles and sequences for genotyping. | Provide the name of the database containing the alleles and allele sequences. | free text (string) |
GENEPIO:0102165 | genotyping_database_version | The version of the database containing a set of allelic profiles and sequences for genotyping. | Provide the database version. If a semantic version is unavailable (e.g. x.y.z), provide the date of database download (or date of database creation if the database is developed in-house). Provide dates in ISO 8601 format (YYYY-MM-DD). | free text (string) |
GENEPIO:0102166 | genotyping_schema_name | The name of the genotyping schema containing the traits (such as loci and alleles) used to determine the genotype. | Include the schema name as provided by the database. | free text (string) |
GENEPIO:0102167 | genotyping_software_name | The name of the software used to determine the genotype. | Select the name of the typing tool from the pick list. If the desired tool is not in the list, submit a New Term Request. | ARIBA; assembly_typer; BIGSdb; BPagST; BTyper3; characterize_neisseria_capsule; ChewBBACA; CoreProfiler; emm-typer; FastMLST; GBS-SBG; hicap; Kleborate; legsta; meningotype; Mentalist; MiST; MLST; mykrobe; PneumoKITy; pyMLST; pyngoST; Ridom SeqSphere+; SeqSero2; ShigaTyper; SISTR; SpoTyping; SRST2; stringMLST; TBProfiler; Toxin |
GENEPIO:0102168 | genotyping_software_version | The version number of the software used to determine the sequence type. | Provide the software version. If a semantic version is unavailable (e.g. x.y.z), provide the date of database download (or date of database creation if the database is developed in-house). Provide dates in ISO 8601 format (YYYY-MM-DD). | free text (string) |
SO:0001027 | genotype | The set of alleles and variants an individual carries in a particular genes or gene locations. | Provide the genotype as it is output from the typing software. Provide the most granular type. Add any notes about how to interpret untypable results or new sequence types in the "genotyping_details" field. | free text (string) |
GENEPIO:0102169 | genoype_confidence_value | The measure of confidence provided for a genotype call. | Some software provides a confidence value for a genotyping result. Different confidence scales are used across the community. Provide the confidence value as it is output from the tool. Add any notes about the confidence value in the "genotyping_details" field. | free text (string) |
GENEPIO:0102170 | genotyping_details | The details of the details of the genotyping assay. | Provide any extra information that may be useful for interpreting the genotyping results such as citations, and explanations of notations denoting new sequence types or untypable results. | free text (string) |
- In silico serotyping - Acinetobacter (Kaptive) - K antigen
Field | Value |
---|---|
genotyping_method: | in silico serotyping |
genotyping_schema_taxon: | Acinetobacter baumannii [NCBITaxon:470] |
genotyping_database_name: | kaptive |
genotyping_database_version: | 2.0.1 |
genotyping_schema_name: | Acinetobacter kleb k |
genotyping_software_name: | kaptive |
genotyping_software_version: | 3.0.1 |
genotype: | KL3 |
genotype_confidence_value: | Strong |
genotyping_details: |
- MLST - Staphylococcus aureus (MLST tool)
Field | Value |
---|---|
genotyping_method: | MLST |
genotyping_schema_taxon: | Staphylococcus aureus [NCBITaxon:1280] |
genotyping_database_name: | pubmlst |
genotyping_database_version: | 2025-09-01 |
genotyping_schema_name: | saureus |
genotyping_software_name: | mlst |
genotyping_software_version: | 2.0 |
genotype: | 5 |
genotype_confidence_value: | |
genotyping_details: |
- MLST - Acinetobacter baumanii (pubmlst)
Field | Value |
---|---|
genotyping_method: | MLST |
genotyping_schema_taxon: | Acinetobacter baumannii [NCBITaxon:470] |
genotyping_database_name: | pubmlst |
genotyping_database_version: | 2025-09-01 |
genotyping_schema_name: | MLST (Pasteur) |
genotyping_software_name: | pubmlst |
genotyping_software_version: | 2025-09-01 |
genotype: | ST2 |
genotype_confidence_value: | |
genotyping_details: |
For more information and/or assistance, contact Emma Griffiths at [email protected] or submit a repository issue request.
MIT License
Brought to you by The Centre for Infectious disease Genomics and One Health and the Public Health Alliance for Genomic Epidemiology Data Structures Workgroup.