Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coding: Add RDF-ization code to convert mappings to RDF #34

Open
callahantiff opened this issue Sep 14, 2020 · 23 comments · May be fixed by #51
Open

Coding: Add RDF-ization code to convert mappings to RDF #34

callahantiff opened this issue Sep 14, 2020 · 23 comments · May be fixed by #51
Assignees
Labels

Comments

@callahantiff
Copy link
Owner

callahantiff commented Sep 14, 2020

Needed Scripts: Write a script that converts mappings into RDF

@nicolevasilevsky -- thank you for meeting with me a few weeks ago and confirming our approach looks reasonable. I am just documenting this here as an issue since it's work I still need to do.

Planned Approach


NOT()
Details: Only occurs within the HP and only for Measurement and Drug domains
class_IRI: https://github.com/callahantiff/omop2obo/obo/ext/OMOP_4021360
Class_Name: 'Skin appearance normal'
Class Expression Syntax: not('Abnormality of the skin')

New Triples:

omop2obo: <https://github.com/callahantiff/omop2obo/obo/ext/>
oboInOwl: <http://www.geneontology.org/formats/oboInOwl>
owl: <http://www.w3.org/2002/07/owl>
rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns>
rdfs: <http://www.w3.org/2000/01/rdf-schema>

omop2obo:OMOP_4021360 oboInOwl:hasOBONamespace OMOP2OBO
omop2obo:OMOP_4021360 oboInOwl:id OMOP:4021360  
omop2obo:OMOP_4021360, rdf:type, owl:Class
omop2obo:OMOP_4021360, rdfs:label, 'Skin appearance normal'

omop2obo:OMOP_4021360, owl:equivalentClass, ec1
ec1, rdf:type, owl:Class
ec1, owl:complementOf, obo:HP_0000951

OR()
Details: Only occurs within DOID and HP and only for the Condition domain
class_IRI: https://github.com/callahantiff/omop2obo/obo/ext/OMOP_434473
Class_Name: 'Longitudinal deficiency of tibia AND/OR fibula'
Class Expression Syntax: ('Abnormality of fibula morphology' or 'Abnormality of tibia morphology')

New Triples:

omop2obo: <https://github.com/callahantiff/omop2obo/obo/ext/>
oboInOwl: <http://www.geneontology.org/formats/oboInOwl>
owl: <http://www.w3.org/2002/07/owl>
rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns>
rdfs: <http://www.w3.org/2000/01/rdf-schema>

omop2obo:OMOP_434473 oboInOwl:hasOBONamespace OMOP2OBO
omop2obo:OMOP_434473 oboInOwl:id OMOP:434473  
omop2obo:OMOP_434473, rdfs:label, "Longitudinal deficiency of tibia AND/OR fibula"
omop2obo:OMOP_434473, rdf:type, owl:Class
omop2obo:OMOP_434473, owl:equivalentClass, ec1
 
ec1, rdf:type, owl:Class
ec1, owl:unionOf _ec1_union1
ec1_union1, rdf:first, obo:HP_0002991
ec1_union1 rdf:type rdf:list
 
ec1_union1, rdf:rest,  ec1_union2
ec1_union2 , rdf:first, obo:HP_0002992
ec1_union2, rdf:rest, rdf:nil

AND()
Details: Occurs within all ontologies and domains
class_IRI: https://github.com/callahantiff/omop2obo/obo/ext/OMOP_434165
Class_Name: 'Abnormal cervical smear'
Class Expression Syntax: ('Abnormal cell morphology' and 'Abnormality of the uterine cervix')

New Triples:

omop2obo: <https://github.com/callahantiff/omop2obo/obo/ext/>
oboInOwl: <http://www.geneontology.org/formats/oboInOwl>
owl: <http://www.w3.org/2002/07/owl>
rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns>
rdfs: <http://www.w3.org/2000/01/rdf-schema>

omop2obo:OMOP_434165 oboInOwl:hasOBONamespace OMOP2OBO
omop2obo:OMOP_434165 oboInOwl:id OMOP:434165
omop2obo:OMOP_434165, rdfs:label, "Abnormal cervical smear"
omop2obo:OMOP_434165, rdf:type, owl:Class
omop2obo:OMOP_434165, owl:equivalentClass, ec1
 
ec1, rdf:type, owl:Class
ec1, owl:intersectionOf, ec_intersection1
ec_intersection1,  rdf:first, obo: HP_0012888
ec_intersection1, rdf:rest,  ec_intersection2
 
ec_intersection1,  rdf:first, obo:HP_0025461
ec_intersection2, rdf:rest, rdf:nil

AND()/OR()
Details: Only occurs within DOID and HP and only for the Condition domain
class_IRI: https://github.com/callahantiff/omop2obo/obo/ext/OMOP_77072
Class_Name: 'Joint effusion of ankle AND/OR foot'
Class Expression Syntax:

('Joint swelling' and 'Abnormality of the ankles')
or
('Joint swelling' and 'Abnormality of the foot')

New Triples:

omop2obo: <https://github.com/callahantiff/omop2obo/obo/ext/>
oboInOwl: <http://www.geneontology.org/formats/oboInOwl>
owl: <http://www.w3.org/2002/07/owl>
rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns>
rdfs: <http://www.w3.org/2000/01/rdf-schema>

omop2obo:OMOP_77072 oboInOwl:hasOBONamespace OMOP2OBO
omop2obo:OMOP_77072 oboInOwl:id OMOP:77072
omop2obo:OMOP_77072,rdfs:label, "Joint effusion of ankle AND/OR foot"
omop2obo:OMOP_77072, rdf:type, owl:Class,
omop2obo:OMOP_77072, owl:equivalentClass, ec1
 
ec1, rdf:type, owl:Class
ec1, owl:unionOf, ec_union1
ec_union1, rdf:type, rdf:List
 
ec_union1, rdf:first, ec_union_member_1
ec_union_member_1, rdf:type, owl:Class
ec_union_member_1, owl:intersectionOf, ec_intersection1
ec_intersection1, rdf:type, rdf:List
 
ec_intersection1, rdf:first, obo:HP_0001386
ec_intersection1, rdf:rest, ec_intersection1b
ec_intersection1b, rdf:type, rdf:List
ec_intersection1b, rdf:first,  obo:HP_0001760
ec_intersection1b, rdf:rest, rdf:nil
 
ec_union1, rdf:rest, ec_union_2
ec_union_2, rdf:type, rdf:List
ec_union_2, rdf:rest, rdf:nil
 
ec_union_2, rdf:first, ec_union_member_2
ec_union_member_2, rdf:type, owl:Class
ec_union_member_2, owl:intersectionOf, ec_intersection2
ec_intersection2, rdf:type, rdf:List
 
ec_intersection2, rdf:first, obo:HP_0001386
ec_intersection2, rdf:rest, ec_intersection2b
ec_intersection2b, rdf:type, rdf:List
ec_intersection2b, rdf:first, obo:HP_0003028
ec_intersection2b, rdf:rest, rdf:nil

AND()/NOT()
Details: Only occurs within DOID and HP and only for the Condition domain
class_IRI: https://github.com/callahantiff/omop2obo/obo/ext/OMOP_4120313
Class_Name: 'Non-diabetic disorder of endocrine pancreas'
Class Expression Syntax:
'Abnormality of the pancreas' and not('has phenotype' some 'Diabetes mellitus')

New Triples:

omop2obo: <https://github.com/callahantiff/omop2obo/obo/ext/>
oboInOwl: <http://www.geneontology.org/formats/oboInOwl>
owl: <http://www.w3.org/2002/07/owl>
rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns>
rdfs: <http://www.w3.org/2000/01/rdf-schema>

omop2obo:OMOP_4120313 oboInOwl:hasOBONamespace OMOP2OBO
omop2obo:OMOP_4120313 oboInOwl:id OMOP:4120313
omop2obo:OMOP_4120313, rdfs:label, "Non-diabetic disorder of endocrine pancreas"
omop2obo:OMOP_4120313, rdf:type, owl:Class
omop2obo:OMOP_4120313, owl:equivalentClass, ec1
 
ec1, owl:someValuesFrom, ec1_intersection1
ec1_intersection1, rdf:type, owl:Class
ec1_intersection1, owl:intersectionOf,  ec1_intersection_member1
ec1_intersection_member1 , rdf:first, obo:HP_0001732
ec1_intersection_member1 , rdf:type, rdf:List
 
ec1_intersection_member1 , rdf:rest, ec1_intersection_member2
ec1_intersection_member2, rdf:first,  ec1_complement
ec1_intersection_member2, rdf:rest, rdf:nil
ec1_intersection_member2 , rdf:type, rdf:List
ec1_complement, owl:complementOf, obo:HP_0000819

AND()/OR()/NOT()
Details: Only occurs within DOID and HP and only for the Condition domain
class_IRI: https://github.com/callahantiff/omop2obo/obo/ext/OMOP_435352
Class_Name: 'Periostitis without osteomyelitis, of the pelvic region and/or thigh'
Class Expression Syntax:

((Periostitis and 'Abnormality of femur morphology')
    and not(Osteomyelitis and 'Abnormality of femur morphology'))
or
((Periostitis and 'Abnormality of pelvic girdle bone morphology')
    and not(Osteomyelitis and 'Abnormality of pelvic girdle bone morphology'))

New Triples:

omop2obo: <https://github.com/callahantiff/omop2obo/obo/ext/>
oboInOwl: <http://www.geneontology.org/formats/oboInOwl>
owl: <http://www.w3.org/2002/07/owl>
rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns>
rdfs: <http://www.w3.org/2000/01/rdf-schema>

_:pelvic rdf:label "Periostitis without osteomyelitis, of the pelvic region"
_:pelvic rdf:type owl:Class
_:pelvic owl:equivalentClass _:ecp
_:ecp owl:intersectionOf _:ecp1
_:ecp1 rdf:type rdf:List
_:ecp1 rdf:first obo:HP_0002644 # abnormality of pelvic region
_:ecp1 rdf:rest _:ecp2
_:ecp2 rdf:type rdf:List
_:ecp2 rdf:first HP_0040165 # periostitis
_:ecp2 rdf:rest _:ecp3
_:ecp3 rdf:type rdf:List
_:ecp3 rdf:first _:ecp4
_:ecp4 rdf:type owl:Class
_:ecp4 owl:complementOf obo:HP_0002754 # osteomyelitis
_:ecp3 rdf:rest rdf:nil
 
_:thigh rdf:label "Periostitis without osteomyelitis, of the thigh"
_:thigh rdf:type owl:Class
_:thigh owl:equivalentClass _:ect
_:ect owl:intersectionOf _:ect1
_:ect1 rdf:type rdf:List
_:ect1 rdf:first obo:HP_0002823 # abnormality of femur morphology
_:ect1 rdf:rest _:ect2
_:ect2 rdf:type rdf:List
_:ect2 rdf:first HP_0040165 # periostitis
_:ect2 rdf:rest _:ect3
_:ect3 rdf:type rdf:List
_:ect3 rdf:first _:ect4
_:ect4 rdf:type owl:Class
_:ect4 owl:complementOf obo:HP_0002754 # osteomyelitis
_:ect3 rdf:rest rdf:nil
 
omop2obo:OMOP_435352 oboInOwl:hasOBONamespace OMOP2OBO
omop2obo:OMOP_435352 oboInOwl:id OMOP:435352
omop2obo:OMOP_435352, rdfs:label, "Periostitis without osteomyelitis, of the pelvic region and/or thigh"
omop2obo:OMOP_435352, rdf:type, owl:Class
omop2obo:OMOP_435352, owl:equivalentClass, ec1
 
ec1 rdf:type owl:Class
ec1, owl:unionOf, ec2
ec2 rdf:type rdf:List
ec2, rdf:first, _:pelvic
ec2 rdf:rest ec3
ec3 rdf:type rdf:List
ec3 rdf:first _:thigh
ec3 rdf:rest rdf:nil
@callahantiff
Copy link
Owner Author

callahantiff commented Oct 12, 2020

@bill-baumgartner and @nicolevasilevsky hoping to get your feedback on my proposal for logically validating the mappings, which is described, with examples below.

I am also including @LEHunter so he knows my plan. The reason I am adding this last step is two fold: (1) the use of reasoners as logical validation was Melissa's idea (and hinted at by a an OHDSI reviewer), which I love and (2) the resulting mappings will compatible with PheKnowLator.

Background

We have generated mappings for several clinical domains (i.e. conditions, drug ingredients, and measurements). Each domain is mapped to a different set of ontologies and several types of mappings were created. Mappings that included more than a single ontology concept (many ontology concepts to one clinical concept) were constructed using owl:IntersectionOf, owl:UnionOf, or owl:ComplementOf (additional content on how each of these mappings of mappings will be RDF-ized is included at the top of this issue). The ontologies utilized for each domain and the different types of mappings that are generated are shown in the tables below:

Clinical Domains and Ontologies

Clinical Domain Ontologies
Conditions HPO, MONDO
Drug Ingredients CHEBI, PRO, NCBITaxon, VO
Measurements HPO, UBERON, CL, CHEBI, NCBITaxon, PRO

Mapping Categories

Category Definition
Automatic Exact - Concept Exact label or synonym, dbXRef, or expert validated mapping @ concept-level; 1:1
Automatic Exact - Ancestor Exact label or synonym, dbXRef, or expert validated mapping @ concept ancestor-level; 1:1
Manual Exact - Concept Similarity Concept similarity score suggested mapping -- manually verified; 1:1
Automatic Constructor - Concept Exact label or synonym, dbXRef, cosine similarity, or expert validated mapping @ concept-level; 1:Many
Automatic Constructor - Ancestor Exact label or synonym, dbXRef, cosine similarity, or expert validated mapping @ concept-level; 1:Many
Manual Constructor Hand mapping created using expert suggested resources; 1:Many
Manual Hand mapping created using expert suggested resources; 1:1
UnMapped No suitable mapping or not mapped type

Mappings also have evidence, which varies according to the mapping type. An example is shown in the table below:

Mapping category Mapping Evidence
Automatic Exact - Concept CONCEPT_DBXREF:snomed_68345001
OBO_LABEL-OMOP_CONCEPT_LABEL:apraxia
OBO_LABEL-OMOP_CONCEPT_SYNONYM:apraxia
CONCEPT_SIMILARITY:HP_0002186_1.0

Analysis Plan

For each ontology, I will create a new ontology class for all mappings including 2 or more ontology concepts (since single concepts already existing within each ontology).

Reasoner(s):

  • Planning on using Hermit first and if that does not work, then ELK
  • Run reasoner on each ontology before and adding new concepts

Output and Record Statistics:
For each run, I will record the following results (potentially by mapping type):

  • Whether or not the reasoner finished. If not, record the reasons why and how the issues were fixed.
  • The number of new inferences generated that include the mappings

ASK:

  • Do you think this plan makes sense?

Creating Release Version RDF

Once the validation described above is complete, I will create a more complex version of the mappings that spans all ontologies for a given mapping, rather than creating ontology-specific mappings. Creating mappings that span multiple ontologies will require some additional content not currently included in each mapping. I realize that there are many ways one could approach this, this is what I think is the easiest, quickest, and most clean/transparent. Each class created from this process with be done so under the OMOP2OBO namespace and, include the official OMOP concept label, synonyms, and have the OMOP concept and source codes assigned as DbXRefs (all of this we get for free from the mappings).

Steps:
1 - Merge all ontologies together (ontologies listed in table above)
2 - Create new OMOP2OBO classes (in OMOP2OBO namespace) for mappings that include 2 or more concepts (see details below for how to integrate mappings spanning multiple ontologies). DbXRefs and owl:equivalentClass added with the OMOP concept_id for all 1:1 mappings
3 - Add the following metadata for each mapping using ECO: Mapping Category and Mapping Evidence

Relations to Connect Mappings Spanning Multiple Ontologies
I'm also including the inverse for each relation.

CONDITIONS
Ontologies: HPO, MONDO
Relations: MONDO has phenotype HPO

DRUG INGREDIENTS
Ontologies: CHEBI, PRO, NCBITaxon, VO (every concept has a CHEBI annotation)
Relations:

MEASUREMENTS
Ontologies: HPO, UBERON, CL, CHEBI, NCBITaxon, PRO (every concept has an HPO and UBERON annotation)
Relations:


ASK:

  • Do you think this plan makes sense?
  • Creating mappings that span multiple ontologies will require some additional content not currently included in each
  • Do you agree with the relations I used in the table above for connecting each of the ontologies for each clinical domain? When there was more than one good option I include "OR", please let me know what you think. These were tough.

@callahantiff
Copy link
Owner Author

@LEHunter - can we please talk through how to align the mapping categories and evidence on Wednesday? Tables for each are re-printed below:

Mapping Categories

Category Definition
Automatic Exact - Concept Exact label or synonym, dbXRef, or expert validated mapping @ concept-level; 1:1
Automatic Exact - Ancestor Exact label or synonym, dbXRef, or expert validated mapping @ concept ancestor-level; 1:1
Manual Exact - Concept Similarity Concept similarity score suggested mapping -- manually verified; 1:1
Automatic Constructor - Concept Exact label or synonym, dbXRef, cosine similarity, or expert validated mapping @ concept-level; 1:Many
Automatic Constructor - Ancestor Exact label or synonym, dbXRef, cosine similarity, or expert validated mapping @ concept-level; 1:Many
Manual Constructor Hand mapping created using expert suggested resources; 1:Many
Manual Hand mapping created using expert suggested resources; 1:1
UnMapped No suitable mapping or not mapped type

Evidence Types

Mapping category Mapping Evidence
Automatic Exact - Concept CONCEPT_DBXREF:snomed_68345001
OBO_LABEL-OMOP_CONCEPT_LABEL:apraxia
OBO_LABEL-OMOP_CONCEPT_SYNONYM:apraxia
CONCEPT_SIMILARITY:HP_0002186_1.0

@nicolevasilevsky
Copy link
Collaborator

@callahantiff
I think this plan looks good. It will be cool to see your results. I think the ELK reasoner is meant to be faster than Hermit, that is what I use by default. I like your idea to try both.

One minor thing, I think MONDO has phenotype HPO but I don't know if it makes sense to say a phenotype has phenotype a disease.

@callahantiff
Copy link
Owner Author

@callahantiff
I think this plan looks good. It will be cool to see your results. I think the ELK reasoner is meant to be faster than Hermit, that is what I use by default. I like your idea to try both.

One minor thing, I think MONDO has phenotype HPO but I don't know if it makes sense to say a phenotype has phenotype a disease.

Thanks so much for your feedback @nicolevasilevsky! Good point about Mondo and the has phenotype relation. I will verify that what I add makes sense. May ping you can next week once I have a working prototype.

Very excited to share the results with you! Oh, did you see I created a figure 1 to cover the mapping method? I'd love to know what you think. You can access it here.

@callahantiff
Copy link
Owner Author

@bill-baumgartner - thanks for your help on Friday! I think we have a great plan! The general assumptions for all clinical domains (i.e. conditions, medications, and measurements) are shown below. Updates/procedures for each specific clinical domain will be presented in 3 separate comments below, just to make it less overwhelming :D

General Steps

1 - Merge all ontologies together (ontologies listed in table above)
2 - Create mappings (add them to the merged ontologies) using the OMOP2OBO namespace
3 - For all OMOP2OBO classes, add dbXRefs for the omop_concept_id and the OMOP-provided standard terminology id (i.e. SNOMED-CT, RxNorm, and LOINC)
4 - For all 1:1 mappings (i.e. 1 clinical to 1 ontology concept), connect OMOP2OBO class to ontology class via owl:equivalentClass
5 - Add mapping category and metadata evidence for each mapping using ECO. THIS PART STILL NEEDS TO BE DISCUSSED -- sent you a meeting invite

@callahantiff
Copy link
Owner Author

callahantiff commented Oct 25, 2020

Conditions

Ontologies: HPO, MONDO

Assumptions:

  • Mappings will be converted to RDF using these patterns
  • All classes created in OMOP2OBO namespace, using 7-digit numbers starting at OMOP2OBO_0000001
  • All mappings (annotated with a mapping category other than Unmapped) have at least 1 HPO or MONDO concept
  • All phenotypes will be subclass of phenotypic abnormality (HP_0000118)
  • All diseases will be subclass of disease or disorder (MONDO_0000001)

Intra-Ontology Relations:


Mapping Combinations:
The following mapping patterns exist for this clinical domain:

Screen Shot 2020-10-25 at 16 03 45


Class Construction Heuristics:
Map

  • Use owl:equivalentClass for all 1:1 mappings where the HPO and MONDO represent the same concept
  • Use RO relations for all mappings where the HPO and MONDO represent different diseases/phenotypes

Don't Map

  • Ignore individual ontology mappings when mapping category is Unmapped
  • Ignore multiple ontology mappings when the mapping category for all ontologies is Unmapped


ASK:

@callahantiff
Copy link
Owner Author

callahantiff commented Oct 25, 2020

Medications

Ontologies: CHEBI, PRO, NCBITaxon, VO

Assumptions:

  • All classes created in OMOP2OBO namespace, using 7-digit numbers starting at OMOP2OBO_0000001
  • Mappings will be converted to RDF using these patterns
  • For all ingredients used at least 1 time in clinical practice, all mappings have at least 1 CHEBI annotation
  • Additional annotations will be added to connect each ingredient to its RxNorm drug
  • All new OMOP2OBOclasses for drug ingredients will be subclass of chemical entity (CHEBI_24431)
  • Ignore all mappings for Standard RxNorm Concepts Not Used In Practice
  • Ignore individual ontology mappings when mapping category is Unmapped

Intra-Ontology Relations:


Mapping Combinations:
The following 4 mapping patterns exist for this clinical domain:

Screen Shot 2020-10-29 at 22 49 39


Class Construction Heuristics:
Map

  • Assigning NCBITaxon from spreadsheet:
    • If PRO and VONCBITaxon to both
    • If only PRONCBITaxon to PRO
    • If only VONCBITaxon to VO
    • If No PRO/VONCBITaxon to CHEBI

Don't Map

  • Ignore all mappings for Standard RxNorm Concepts Not Used In Practice
  • Ignore individual ontology mappings when mapping category is Unmapped


ASK:

@callahantiff
Copy link
Owner Author

callahantiff commented Oct 26, 2020

Measurements

Ontologies: HPO, CHEBI, UBERON, NCBITaxon, CL, PRO

Assumptions:

  • All classes created in OMOP2OBO namespace, using 7-digit numbers starting at OMOP2OBO_0000001
  • Mappings will be converted to RDF using these patterns
  • For all lab tests used at least 1 time in clinical practice, all mappings (annotated with a mapping category other than Unmapped) have at least 1 HPO annotation and at least 1 UBERON annotation
  • Since mappings are to lab test results, but we know what LOINC code each test is mapped to, additional annotations will be added to connect each lab test result to its LOINC measurement_concept_id
  • All new OMOP2OBOclasses for measurements will be subclass of phenotypic abnormality (HP_0000118)

Intra-Ontology Relations:


Mapping Combinations:
The following 12 mapping patterns that exist for this clinical domain are shown below. Note that a dashed line is used to show multiple patterns that exist, but not for every case. There are two special cases of the patterns shown below: (1) IgE antibody tests and (2) IgA, IgD, IgG, and IgM (i.e. Antibody, but not IgE). These specific patterns are also demonstrated below.

Screen Shot 2020-10-30 at 22 46 18


Class Construction Heuristics:
Map

  • Assigning NCBITaxon from spreadsheet:
    • If PRO and CHEBINCBITaxon to both
      • If Antibody-related lab → NCBITaxon to CHEBI
    • If only PRONCBITaxon to PRO
    • If only CHEBINCBITaxon to CHEBI
    • All UBERONNCBITaxon_9606
    • All CLNCBITaxon_9606

Don't Map

  • Ignore multiple ontology mappings when the mapping category for all ontologies is Unmapped


ASK:

@bill-baumgartner
Copy link
Collaborator

Looking good @callahantiff! Just a few questions:

  1. I think I can guess based on their names, but what is the difference between Concept Used in Practice and Standard Concept?
  2. Will there be "root" concepts representing Condition, Medication, and Measurement in the OMOP2OBO namespace? Or are those concepts defined elsewhere perhaps?
  3. For Measurement, did we also discuss an example that would require the following relations?
    • HPO - occurs in - CL and
    • CL - located in - UBERON
    • I thought we did, but maybe I'm not remembering correctly.

@callahantiff
Copy link
Owner Author

callahantiff commented Oct 26, 2020

Thanks so much @bill-baumgartner!

Looking good @callahantiff! Just a few questions:

  1. I think I can guess based on their names, but what is the difference between Concept Used in Practice and Standard Concept?

Great question. Concepts Used in Practice include standard and non-standard (i.e. SNOMED and other terminologies) that have been used at least 1 time. Standard Concepts are specifically standard SNOMED-CT concepts that have not yet been used in clinical practice. I will be updating this label to: Standard SNOMED-CT Concepts Not Used in Practice. I think that makes it a bit clearer.

  1. Will there be "root" concepts representing Condition, Medication, and Measurement in the OMOP2OBO namespace? Or are those concepts defined elsewhere perhaps?

I was thinking of creating the following subclass relations:

  • Conditions: abnormal phenotype (HPO) and disease (MONDO)
  • Drug Ingredients: chemical entity (CHEBI) or role (CHEBI)
  • Measurement: abnormal phenotype (HPO)
  1. For Measurement, did we also discuss an example that would require the following relations?

    • HPO - occurs in - CL and
    • CL - located in - UBERON
    • I thought we did, but maybe I'm not remembering correctly.

Good catch! I updated the figure above. Do you agree with that? Does this cover the human IgE resulting from non-human response?

Thanks so much for all of your help! 😄 🙇‍♀️

@bill-baumgartner
Copy link
Collaborator

bill-baumgartner commented Oct 27, 2020

Do we need to keep HPO - has component - CL in order to represent something like white blood cell count?

Yep! Updated figure above. Although this will be somewhat tricky to distinguish from other mappings including cells. Even a red blood cell count is still measured from a blood sample. Noting that here so I make sure that I don't forget that.

@callahantiff
Copy link
Owner Author

callahantiff commented Oct 27, 2020

@bill-baumgartner -- for our meeting tomorrow, we are planning to discuss representing:
1 - Mapping Categories
2 - Mapping Evidence

Mapping Categories

Mapping categories added as class annotation.

Mapping Evidence

Evidence can come in the following forms:

  • OBO DbXRef to OMOP Source Code

    • OBO_DbXRef-OMOP_CONCEPT_SOURCE_CODE:xxxxxxx
    • OBO_DbXRef-OMOP_ANCESTOR_SOURCE_CODE:xxxxxxx
  • OBO Label to OMOP Synonym or Label

    • OBO_LABEL-OMOP_CONCEPT_LABEL:xxxxxxx
    • OBO_LABEL-OMOP_ANCETSOR_LABEL:xxxxxxx
    • OBO_LABEL-OMOP_CONCEPT_SYNONYM:xxxxxxx
    • OBO_LABEL-OMOP_ANCETSOR_SYNONYM:xxxxxxx
  • OBO Synonym to OMOP Synonym or Label

    • OBO_hasSynonymType-OMOP_CONCEPT_LABEL:xxxxxxx
    • OBO_hasSynonymType-OMOP_ANCETSOR_LABEL:xxxxxxx
    • OBO_hasSynonymType-OMOP_CONCEPT_SYNONYM:xxxxxxx
    • OBO_hasSynonymType-OMOP_ANCETSOR_SYNONYM:xxxxxxx
  • Concept Similarity Score → CONCEPT_SIMILARITY:OBO_URI_x.x

HOW TO REPRESENT THESE
For handling these, I think that it might be best to treat them each as annotations to the class, similar to how synonyms and dbxrefs are annotated to ontology classes. I am borrowing from @nicolevasilevsky patterns in MONDO. You will see that each annotation also includes metadata for the original OMOP concepts, original OBO concepts, OMOP Common Data Model (CDM) version used, ontologies version date, and url for current OMOP2OBO release.

So something like:

DbXRef Example 1: OBO_DbXRef-OMOP_CONCEPT_SOURCE_CODE:ABC_1234567
Pattern for all DbXref evidence to an OMOP concept.

class_id SKOS:exactMatch ABC_1234567

BNode owl:annotatedSource class_id
BNode owl:annotatedProperty SKOS:exactMatch
BNode owl:annotatedTarget ABC_1234567

BNode oboInOwl:source "Mapping Category"

BNode oboInOwl:source OBO_xxxxxxx
BNode oboInOwl:source "OBO:version date"
BNode oboInOwl:source OMOP_xxxxxxx
BNode oboInOwl:source "OMOP: common data model v5.0"
BNode oboInOwl:source http://omop2obo/wikiv1

DbXRef Example 2: OBO_DbXRef-OMOP_ANCESTOR_SOURCE_CODE:ABC_1234567
Pattern for all DbXref evidence that includes an OMOP concept ancestor.

class_id oboInOwl:hasDbXref ABC_1234567

BNode owl:annotatedSource class_id
BNode owl:annotatedProperty oboInOwl:hasDbXref
BNode owl:annotatedTarget ABC_1234567

BNode oboInOwl:source "Mapping Category"

BNode oboInOwl:source OBO_xxxxxxx
BNode oboInOwl:source "OBO:version date"
BNode oboInOwl:source OMOP_xxxxxxx
BNode oboInOwl:source "OMOP: common data model v5.0"
BNode oboInOwl:source http://omop2obo/wikiv1

Label Example: OBO_LABEL-OMOP_CONCEPT_LABEL:xxxxxxx
I think it makes the most sense to treat all OBO-OMOP label matches (even those to concept ancestors) as SKOS:exactMatch since this type of match only happens when the OBO and OMOP strings match exactly.

class_id SKOS:exactMatch OMOP_1234567

BNode owl:annotatedSource class_id
BNode owl:annotatedProperty SKOS:exactMatch
BNode oboInOwl:target OMOP_1234567

BNode oboInOwl:source "Mapping Category"

BNode oboInOwl:source OBO_xxxxxxx
BNode oboInOwl:source "OBO:version date"
BNode oboInOwl:source "LABEL STRING"
BNode oboInOwl:source "OMOP: common data model v5.0"
BNode oboInOwl:source http://omop2obo/wikiv1

OBO Synonym Example: OBO_hasSynonymType-OMOP_CONCEPT_LABEL:xxxxxxx
This would be the pattern for all OBO Synonym matches. Note that I am using a generic oboInOwl:hasSynonymType for this example, the actual axioms will use the specific types I recorded from each matched ontology.

class_id oboInOwl:hasSynonymType "Synonym string"

BNode owl:annotatedSource class_id
BNode owl:annotatedProperty oboInOwl:hasSynonymType
BNode owl:annotatedTarget "Synonym string"

BNode oboInOwl:source "Mapping Category"

BNode oboInOwl:source OBO_xxxxxxx
BNode oboInOwl:source "OBO:version date"
BNode oboInOwl:source OMOP_xxxxxxx
BNode oboInOwl:source "OMOP: common data model v5.0"
BNode oboInOwl:source http://omop2obo/wikiv1

Similarity Example: CONCEPT_SIMILARITY:OBO_URI_1.0
The pattern for all cosine similarity generated evidence. I think that we can use the RO property is evidence with support from (RO_0002614) with the NCIT class Cosine Distance Method NCIT_C272662. In addition to extending the metadata sources to include the similarity score float value.

class_id obo:RO_0002614 NCIT_C27662

BNode owl:annotatedSource class_id
BNode owl:annotatedProperty RO_0002614
BNode owl:annotatedTarget NCIT_C27662

BNode oboInOwl:source "Cosine similarity score of x.x derived from applying a Bag-Of-Words TF-IDF vector space model to all available OMOP and OBO labels and synonyms"

BNode oboInOwl:source "Mapping Category"

BNode oboInOwl:source OBO_xxxxxxx
BNode oboInOwl:source "OBO:version date"
BNode oboInOwl:source OMOP_xxxxxxx
BNode oboInOwl:source "OMOP: common data model v5.0"
BNode oboInOwl:source http://omop2obo/wikiv1

@callahantiff
Copy link
Owner Author

@bill-baumgartner - starting this work on branch mapping_rdfization. Will build out a separate class (and tests) for this work.

@callahantiff
Copy link
Owner Author

callahantiff commented Nov 7, 2020

@bill-baumgartner - refined the representation. Hoping to go over this on Monday with you (click to enlarge). I have also created the following Wiki pages for this content:

OMOP2OBO_kr_mappings

UPDATE: Figure updated to reflect discussion with @linikujp (11/15/2020)

@callahantiff
Copy link
Owner Author

Small Update: Will be using the OMOP2OBO namespace, but instead of creating omop2obo ids will use the OMOP concept ids. This is advantageous for several reasons: (1) easier for users to find classes and (2) then we don't have to keep track of a new set of identifiers.

@linikujp
Copy link

@callahantiff the Chemical "has component" Vaccine doesn't sound right to me.
I think it should be another way around. Can you give me an example to explain your rational?

Thanks,
Asiyah

@callahantiff
Copy link
Owner Author

@callahantiff the Chemical "has component" Vaccine doesn't sound right to me.
I think it should be another way around. Can you give me an example to explain your rational?

Thanks,
Asiyah

Hi @linikujp! So excited to have your feedback on this! Happy to explain our logic on this decision and equally happy to get your thoughts on it 😄 I have been working on how to include examples for the knowledge representation figure and just last week decided that we will include a figure caption that provides examples for each square in the figure. Your comment has helped me to realize that I left out an important component which I think is causing confusion. I have updated the figure and included it below. Please let me know if this is more clear.

OK, to your question. In general, when representing drug ingredients, we make the assumption that all OMOP drug exposure ingredients are either some kind of CHEBI chemical entity or they have some kind of CHEBI role (all Drug Exposure ingredients have at least one CHEBI annotation). For vaccines, we use the Vaccine Ontology (VO), allowing us to represent ingredients which are vaccines or components of vaccines (example 1 ) and in a few cases, allow us to represent other types of ingredients that are not vaccines, but are included in the VO (example 2).

Example 1: Varicella-Zoster Virus Vaccine Live (Oka-Merck) strain

This vaccine ingredient is represented using the CHEBI immunogen role (CHEBI_60816), which has component varicella-zoster virus vaccine live (oka-merck) strain (VO_0003273) that is in taxon human alphaherpesvirus 3 (NCBITaxon_10335).

Example 2: Gelatin, Iron, Catalase, Rho

Representing a drug exposure ingredient that is not a vaccine (e.g. gelatin, iron, catalase, or rho), but have classes in the VO. The VO includes terms that explicitly represent these ingredients, which we model as being components of particular CHEBI chemical entities.

Does that help clarify the different representations?

As mentioned above, I have edited the figure for Drug Exposure Ingredients to be more clear about this distinction. Does this help make the above distinction a bit more clear?

Screen Shot 2020-11-15 at 12 44 52

@linikujp
Copy link

linikujp commented Nov 16, 2020

Hi @callahantiff,

It makes better sense now. But I am afraid that your relation "has component" holds a different meaning as how I understand it. Do you use the RO relation here: http://www.ontobee.org/ontology/RO?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FRO_0002180 ?

both a role and a chemical entity has component to either a protein or vaccine is problematic from BFO's structure as well as OBO's.

From what you said "drug exposure ingredients are either some kind of CHEBI chemical entity or they have some kind of CHEBI role (all Drug Exposure ingredients have at least one CHEBI annotation)."
A drug exposure ingredient is_a CHEBI chemical entity
OR
a drug exposure ingredient is_a immunogen (OBI_1110023), which has_role CHEBI immunogen (CHEBI:60816) /OBI immunogen role (OBI_1110082).

Do you have to have all terms from CHEBI for this case?

How do you represent drugs?

Thanks,
Asiyah

@callahantiff
Copy link
Owner Author

Hi @callahantiff,

It makes better sense now. But I am afraid that your relation "has component" holds a different meaning as how I understand it. Do you use the RO relation here: http://www.ontobee.org/ontology/RO?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FRO_0002180 ?

both a role and a chemical entity has component to either a protein or vaccine is problematic from BFO's structure as well as OBO's.

From what you said "drug exposure ingredients are either some kind of CHEBI chemical entity or they have some kind of CHEBI role (all Drug Exposure ingredients have at least one CHEBI annotation)."
A drug exposure ingredient is_a CHEBI chemical entity
OR
a drug exposure ingredient is_a immunogen (OBI_1110023), which has_role CHEBI immunogen (CHEBI:60816) /OBI immunogen role (OBI_1110082).

Do you have to have all terms from CHEBI for this case?

How do you represent drugs?

Thanks,
Asiyah

Hi @linikujp -

I see your point. Let me check in with my project mentors and get back to you. We had initially come an agreement after some discussion on this, but there have also been some changes since then. Thanks again for raising these points and helping me to make this better.

-Tiffany

@callahantiff
Copy link
Owner Author

Dear @linikujp - Thank you again for your feedback and your great suggestions!

Over the last few days I met with my team and we spent a lot of time thinking about the drug and drug ingredient representations with respect to the points that you brought up. We definitely agreed that we did not have things quite right and have overhauled our initial representation. As best we could, we also incorporated some of your suggestions as well. This new representation (shown below) explicitly models the different subdivisions of the CHEBI hierarchy, which we feel is important to highlight as it requires slightly different logic patterns. We also explicitly model how drugs relate to ingredients, which should provide a much more clear picture of our overall approach.

OMOP2OBO_kr_mappings_Drugs

@linikujp
Copy link

Dear @callahantiff
Thank you for the response. Your current approach may work -- it depends on how you define a vaccine and your use case.
I feel the BNode may refer to something that covers the biologic and chemical medical product, which is very hard to define correctly. Am I understanding correctly?

I'd love to follow-up and see how your implementation goes with this apporach.

Thanks,
Asiyah

@callahantiff
Copy link
Owner Author

Dear @callahantiff
Thank you for the response. Your current approach may work -- it depends on how you define a vaccine and your use case.
I feel the BNode may refer to something that covers the biologic and chemical medical product, which is very hard to define correctly. Am I understanding correctly?

I'd love to follow-up and see how your implementation goes with this apporach.

Thanks,
Asiyah

Absolutely, I think we are at a great place to start testing the representation. We have some good experiments planned, which I will be focusing on completing over the next few weeks. I will definitely reach out and share the results when those are complete!

Thank you again for your feedback and for helping make this work stronger. I am very appreciative! 😄

@linikujp
Copy link

@callahantiff Cool! & +1
PS: Please reach out to me via my gmail due to my transition to NIH.

@callahantiff callahantiff linked a pull request Dec 15, 2020 that will close this issue
@callahantiff callahantiff linked a pull request Dec 15, 2020 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants