Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New properties for cross-species equivalents #107

Closed
gouttegd opened this issue May 18, 2022 · 18 comments
Closed

New properties for cross-species equivalents #107

gouttegd opened this issue May 18, 2022 · 18 comments

Comments

@gouttegd
Copy link

We need some new properties to represent mappings between equivalent (or near-equivalent) terms across species.

That is, to be able to state:

  • term A in species-specific ontology X is equivalent to term B in another species-specific ontology Y, or
  • term A in species-specific ontology X is near-equivalent to term B in another species-neutral ontology Z.

Those properties could then replace the oboInOwl:hasDbXref property which is currently used for that (among other things). They could also be used as the mapping predicate in SSSOM-style mappings.

How many nuances of “near-equivalent” do we allow is to be discussed, along with the exact labels to use. For the record, here is @dosumisproposition from a few months ago:

Simpler suggestion for X species mappings: just use a specific predicate for it as a subproperty of related. Cross-species mappings could never really be exact and need to be easily separable from mappings that are. We could potentially further specialise cross-species mappings with subproperties (less sure about these):

related
. cross-species
. . cross-species-equivalent ? # Use for identical logical def apart from clause in_taxon some X ?
. . cross-species broad ?
. . cross-species narrow ?

Tagging @matentzn to join the discussion.

@StroemPhi
Copy link

What speaks against using skos:exactMatch etc. for that, if I may ask?

@gouttegd
Copy link
Author

I object to using skos:exactMatch for cross-species mappings. Per the SKOS Reference, skos:exactMatch “is used to link two concepts, indicating a high degree of confidence that the concepts can be used interchangeably across a wide range of information retrieval applications”. I don’t see how this could possibly apply across species – the concepts of “fly neuron” and “human neuron” are not interchangeable.

skos:closeMatch (“to link two concepts that are sufficiently similar that they can be used interchangeably in some information retrieval applications” – emphasis not mine) does not seem correct either. It’s not a question of “degree of similarity”; no matter how similar fly neurons and human neurons are, the concepts can never be used interchangeably.

skos:relatedMatch (“to state an associated mapping link between two concepts”) could do, but in my opinion using it would be hardly better than our current use of oboInOwl:hasDbXref. It would just mean “there’s some kind of link between those two terms, go try figuring out by yourself what this link is”.

skos:broadMatch and skos:narrowMatch could do for the specific case of mapping between a species-neutral ontology and a species-specific one (e.g., FBbt’s neuron skos:broaderMatch Uberon’s neuron) – I think that would be correct. But again, that would only be marginally better than using oboInOwl:hasDbXref.

With any of those properties, the fact that we would be talking about a cross-species mapping could only be deduced by looking at which ontologies the mapped terms belong to, observe that they are about different species (or that one is species-neutral while the other is species-specific), and therefore conclude “this must be a cross-species mapping”. Can’t we do better?

Overall, I think none of the predicates currently recommended by the SSSOM specification covers accurately the case of cross-species mappings, and I do think those mappings are significant enough to warrant being clearly identified as such instead of overloading a more ”generic” predicate the same way we already overloaded oboInOwl:hasDbXref.

@StroemPhi
Copy link

@gouttegd thank you for taking the time to write up this detailed line of argumentation!

@cthoyt
Copy link
Contributor

cthoyt commented May 19, 2022

Copying some thoughts from slack:

Biomappings (https://github.com/biopragmatics/biomappings) contains an ad-hoc relationship used between species-specific entities and their corresponding non-species-specific terms. In particular we used these when mapping between KEGG pathways (not species-specific) and reactome pathways (species specific). Not exactly what you said but in the same universe.

Similarly, RO has several “homology” terms that could be a sub-relationship of a cross-species equivalence/similarity relationship that might (e.g., see https://www.ebi.ac.uk/ols/ontologies/ro/properties?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FRO_HOM0000017, though it’s a bit complicated to navigate the hierarchies for these)

@cmungall
Copy link
Contributor

See also https://academic.oup.com/sysbio/article/69/2/345/5584267

@matentzn
Copy link
Contributor

The worry on the one side (adding these cross species mappings) is mapping predicate proliferation. The question is, what other proliferations do we expect? Is it really that much of a worry?

Another problem is that most users wont care and wont know about the difference between skos:exactMatch and OMO:12090908 (“cross-species exact match”). Is the precision really worth the confusion (they will have to “know” what to look for to get these!).

Charlies dbxref argument on slack is not quite right because either way, we will maintain precision (exact, narrow, broad).

Downside of using skos is that data transformations and clique merging could look a bit odd for some users (think multi hop queries in OXO).

I could see a solution where we allow both kinds. or. us. gasp. gaaaaasp. the predicate_modifier column that we got.

@graybeal
Copy link

graybeal commented May 19, 2022

Having skimmed @cmungall 's reference long enough to know I don't have time to master it, I offer this. Two generalities emerge: (1) In countless pairs of hierarchies, different parents have almost-identical children. Wheels on bikes or tanks; children of people or dogs; roofs on houses or cars. Depending on the goal of the inferencing, having a purpose-built equivalent of exact match and close match (or any relation) in each domain may or may not be useful. (2) Using subclasses of relations to provide better precision always comes with a potential cost as @matentzn points out; ideally the tooling overcomes that costs by supporting generalized and specific searching by following the appropriate properties (appropriately) for the user's needs. However, I see this is one of the bigger unmet challenges in practical use of semantic content: enabling users to find good (yet appropriately rigorous) matches for their purpose.

In other words, I don't think there's exactly one right answer here.

@gouttegd
Copy link
Author

One precision, since homologies have been mentioned.

The cross-species mappings we would like to specify using the newly requested properties do not necessarily represent homologies, and we do not want to state that they are. In fact I used the words “equivalent” and “near-equivalent” in my request precisely to avoid talking about “homologies”.

Of course, some of the terms we would like to map undoubtedly represent cases of homology. But there would also be several cases where we do not necessarily know the underlying evolutionary link, if any.

@gouttegd
Copy link
Author

Regarding the problem of “mapping predicate proliferation”: I am not sure I understand what the concerns are exactly.

If the worry is that we could end up with dozens or hundreds of mapping properties, and that this would result in needless confusion:

  • No one is suggesting we open the flood gates. I’m happy with a policy that any new mapping property can only be added after thorough discussion (that’s what this issue is for) and once a consensus has been reached. There’s no reason why accepting this request (if we end up agreeing on it) should mean that we now have to accept any similar request without discussion.
  • I’d much rather have dozens of mapping properties, if there’s a need for it, than a handful of mapping properties with dozens of meanings associated with each of them.

@matentzn
Copy link
Contributor

We need to remember that exactMatch is also only "sort of exact". In reality, exactMatch means something like: "A reasonable user can expect that the subject and the object id refers to the same real world entity for most of the use cases". A fish head and a mouse head are not the same real world entities, which speaks for @gouttegd. On the other hand, there are a good number of use cases where we need to "conflate" two kinds of entities, i.e. mapping a gene to its reference protein, a disease to its clearly same phenotype etc. Admitting a request like this one here (while I also tend towards admitting) will pose the question of how to support all of these "conflation" use cases. We do NOT want to say MONDO:Alzheimer skos:relatedMatch HPO:Alzheimer (we want to say exactMatch), i.e. giving up the precision here just to ensure we are not overloading exact. Its a thin line to trot. I would like to learn more about the risks of using skos. You could solve the issue here with something like this as well: mapping-commons/sssom#180.

@gouttegd
Copy link
Author

gouttegd commented May 20, 2022

Admitting a request like this one here (while I also tend towards admitting) will pose the question of how to support all of these "conflation" use cases.

First, maybe that’s not a bad thing. Maybe this question should be raised anyway. (Instead of hiding it under the rug by using skos:exactMatch everywhere and let everybody else figure out what we meant.)

Second, actually I don’t see how this request would necessarily lead to that question. I don’t want to use skos:*Match for cross-species mappings; but I am not advocating that we reject these properties everywhere and force everybody to come up with precise properties for their use cases. If people are happy with a generic, meaning-free skos:exactMatch for their mappings, they can still use it – the existence of an hypothetical property specifically intended for cross-species mappings does not threaten that.

@gouttegd
Copy link
Author

Could we gather more opinions and try moving this issue forward?

I’d like to explain again my position by trying an analogy with a domain completely different from biology.

Let’s say you’re doing comparative law and you want to map legal texts from different countries and/or epochs. For example, you want to map the USA’ Roe v. Wade (1973) with the UK’s Abortion Act (1967) and with France’s Veil Law (1975). All three texts can be said to be kind of “equivalent”, in that they roughly fulfil the same purpose in their respective legal systems: they make it legal to have an abortion.¹ That said, should they be mapped with skos:exactMatch? I would argue that they should not:

  • they don’t have the same nature (the UK and French texts are codified laws voted by a parliament and enacted by a head of state, the US text is a court decision);
  • they don’t apply to the same persons in front of the same courts (try invoking the UK Abortion Act in front of a US court…);
  • they may very well differ in their exact provisions (e.g. under which conditions, if any, is abortion allowed).

Using skos:exactMatch, in my opinion, would only be possible by seriously twisting the definition of that relation, which I believe we should not do. (I would rather reserve the use of skos:exactMatch to map 410 U.S. 113 (1973) with 93 S.Ct. 705 (1973), which both refer to the Roe v. Wade decision – they are merely different identifiers from different publishers but referring to the same text.)

Could we use skos:relatedMatch instead? Probably. This would certainly be more correct in my opinion, but the problem is then that it is not very useful, because it wouldn’t say anything about how those texts are related.

I would thus argue that comparative law scholars should use a specific relation (which could be called has_foreign_equivalent or something like that), which would explicitly state the nature of the mapping (contrary to skos:relatedMatch) without implying that the mapped texts are the same (contrary to skos:exactMatch).

I believe cross-species mappings are of the same kind. They map entities that are clearly different but that are “equivalent” in the boundaries of their respective species. A fly neuron is not the same as a mouse neuron, but it does roughly the same thing in a fly brain as a mouse neuron does in a mouse brain. I believe this warrants using a specific relation, instead of overloading generic relations such as skos:*Match.


¹ I apologise for picking an example that may be seen as controversial. I picked it solely because, not being a law scholar, there are not many laws for which I know the equivalents in other countries… ^^'

@matentzn
Copy link
Contributor

matentzn commented Jun 1, 2022

@cmungall one option would be to create a coherent hierarchy of mapping relations in https://github.com/mapping-commons/semantic-mapping-vocabulary

@graybeal
Copy link

graybeal commented Jun 1, 2022

@gouttegd I don't think there is any issue with the conclusion the skos:exactMatch is not appropriate, and skos:relatedMatch is not very helpful. This is in the nature of SKOS properties—they are very general.

The challenge is the modeling work that is required to agree on a new property, even in the limited SKOSy world of general relations. To define a new property, I suspect one needs to be: fairly precise in its definition to make the consistent application of that property unambiguous to all users, including defining transitivity, invertability, and other characteristics of the property; fairly convincing that this is a commonly needed property worthy of making IAO more complex and not leading down a slope of many other properties; and finding a name for the property that gets general agreement.

Of these, the one I think would be most helpful is to describe your property more precisely than either the original proposal or the legal example does. What property exactly would you propose, and to which cross-species mappings will it apply or not apply? e.g., does 'equivalent' mean functional equivalence, genetic equivalence, behavioral equivalence, homologous (is that the right word?), all of the above? If it applies to any concept in a taxonomic ontology (pretty broad category, no?) then it's hard to immediately grok what it means across all different entities you might come across, and how similar something should be to apply it. (Or declare that it's a user judgment kind of assessment, like closeMatch. Which I might have used for the legal example, depending on the purpose of my mapping.)

@cthoyt
Copy link
Contributor

cthoyt commented Jun 1, 2022

I think @matentzn's idea to mint a new property in SEMAPV is the most practical solution. We can use this property to mean exactly what we need to solve this problem, then potentially work on the "ontologization" of that property later, if there's a real need for it.

@gouttegd
Copy link
Author

gouttegd commented Jun 6, 2022

I don't think there is any issue with the conclusion the skos:exactMatch is not appropriate, and skos:relatedMatch is not very helpful.

It was not clear to me that there was an agreement on that. If there is, good.

Of these, the one I think would be most helpful is to describe your property more precisely than either the original proposal or the legal example does.

All right, I can try. For my use-case (I maintain the mappings between FBbt and Uberon/CL), what I need is an annotation property which could be called cross-species-equivalent (@dosumis’ proposed name, as quoted in the initial message – I don’t have any better name in mind but I am open to suggestions) to map between species-neutral terms in Uberon/CL and Drosophila-specific terms in FBbt.

For which mappings such a property should be used? Practical and immediate answer: all mappings for which we are currently using oboInOwl:hasDbXref. If, as a Uberon/CL editor, someone had enough reasons to annotate a Uberon term with a oboInOwl:hasDbXref annotation pointing to to another term from a foreign species-specific ontology (say, FBbt), with the knowledge that in the context of Uberon/CL such an annotation results in the creation of a bridging axiom, then that same person would have enough reasons to use the new cross-species-equivalent annotation instead.

Can we define more precise rules as to when cross-species-equivalent would be appropriate? Not sure. @dosumis proposed “Use for identical logical def apart from clause in_taxon some X ?”, but this may not be workable – many terms referring to cells or structure I would without hesitation consider as “equivalent” do not actually have identical logical definitions across Uberon/CL and FBbt. I think it will ultimately be a judgment call from the editors: “Are those two terms referring to the same thing apart from the fact that the second term refers to that thing solely in the context of a specific species?”

I understand this is not ideal, but I‘d like to point out that until now, Uberon/CL curators have been making this kind of judgment calls when deciding whether to use a oboInOwl:hasDbXref-style mapping between Uberon/CL and all the other species-specific ontologies, and it has worked so far.

e.g., does 'equivalent' mean functional equivalence, genetic equivalence, behavioral equivalence, homologous (is that the right word?), all of the above?

In this proposal, “equivalent” (if we stick to this choice of term) would not imply any knowledge about how the cell types or anatomical structures are equivalent.

Maybe this could be refined later using sub-properties for each type of equivalence (to be used when we do know exactly how the structures are “equivalent”), but this would probably warrant further discussion, especially since many people seem concerned about property proliferation.

Of note: all of the above is regarding “my” use case (Uberon/CL to species-specific mappings). There may be other use cases for other people (e.g., in the Slack chat @matentzn mentioned something about mappings such as HPO:bighead skos:exactMatch ZFA:increasedheadsize, for which he would not want to use skos:exactMatch), I’d very much like to hear about those. If the proposed cross-species-equivalent (or however we name it) can be made to accommodate other people’s needs (thereby reducing maybe the need for more specific properties, one for each use case), that’s all good with me.

@dosumis
Copy link
Contributor

dosumis commented Jul 18, 2022

Perhaps it is easier to list the various use cases we want to group. Core cases:

  • A homologous to B (with some acceptable level of certainty that this is true)
  • A and B are phenotypes affecting homologous structures & charcaterised by the same quality
  • A and B are functionally defined grouping classes e.g. photoreceptor cell; mechanosensory cell.
  • A and B are so distant that homology is controversial - but we have reasonable grounds for believing that there is deep homology (e.g. human eye <-> fly eye; mamalian neuron <-> arthropod neuron)

Is this core sufficient, or should we extend, if so with what?

@gouttegd
Copy link
Author

Closing as the new mapping relations have found their way into the SEMAPV vocabulary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants