-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mapping involving post-coordinated subjects or objects #108
Comments
Hey @tudorache We have been debating about it here:
I have tried to push this through in our inaugural workshop, but we had some strong resistance against introducing this level of complexity into SSSOM. We have agreed during the workshop to wait for a very strong use case and then putting the proposal up for vote. Technically, it is not that difficult to realise this, but the feeling was that if we start doing that, it would lead to "abuse" of the standard for essentially encoding entire ontologies. But please raise the issue up again. I can also share the paper draft we are about to submit if you are interested. |
Thank you, @matentzn! The use case I am working on is the mapping of ICD-11 to other terminologies/ontologies, for example to prior versions of ICD (which do not support post-coordination), and to potential other targets (other WHO classifications, SNOMED CT, MESH, etc.). As a background, ICD-11 has a Foundational Component, which has an OWL representation. Postcoordination is one of the new features of ICD-11 (see here and here). Therefore, it is important to be able to represent "complex" mappings that would be able to link a postcoordinated entity in ICD-11 to a simple or complex entity in a target ontology. I think the suggestion in #36 of using subject/object patterns or templates would work. The postcoordinated entities in ICD-11 follow a clear pattern (which I suppose is also the case for other ontologies): I guess in an ideal world, SSSOM should remain as simple as possible, and have these complex mappings as optional adds-on, with tools that can unambiguously interpret them. My feeling is that there will be other use cases in need of complex maps (as noted in the previous comment), and rather than have different groups come up with arbitrary workarounds, it would be great to have a uniform and unambiguous way of representing and interpreting them. But, I do understand the reticence of adding complexity to an intentionally-simple representation. @matentzn, I would be very interested in reading the paper draft, if you can share it. Thanks! |
I am very happy you brought this up again. I cant promise a fast turnaround on the implementation, but at least we should consider adding the relevant elements to the spec to allow this kind of expression. The main issue right now is to change the @cmungall - do you have a position on this? |
Sorry, I am not following the motivation for changing sub/obj to multivalued The first thing to decide is, if we support mapping class expressions, what profile of OWL (or a more expressive logic) we support. While I agree with @tudorache that obo-format/GO annotation extension-style genus-different expressions with no nesting is sufficient for most use cases I'm aware of, if we hardcode a solution for that profile, it's guaranteed someone could come along later and want unions/nesting/QCRs/complementOf I would also caution against assuming OWL is the right fit for all complex mapping issues. Many rules have a closed-world flavor, e.g. ICD I think the broad approaches I can think of would be
I list 1 for completeness but I think we would all vote against this. Same for 5 2 has the usual problems that we already hashed with negation, the differentia fields are non-ignorable. and it only supports obo format profile I like some parts of 3. It's backwards and forwards compatible, in fact there is nothing to stop me doing this right now with external OWL files that include the axioms. It is composable in an IMHO elegant way - SSSOM remains a simple format for mappings between named entities, we use OWL (or other formalisms) externally to map named entities to expressions. For 3 it would be nice to have conventions for hashing expressions, maybe even register a bioregistry entry for this backed by some kind of distributed hash table it's not clear what the overall benefit is though if someone needs the composition to be able to meaningfully interpret the files, in which case we're back to 0 or 2. |
Alright, after careful considerations of pros and cons of the issue, we propose the following:
We will work with @tudorache to ensure that her requirements are met to that end. I personally will be on extended leave until end of January (from this week on), but after that we can set this process in motion. For now, I would suggest @tudorache you just collect the complex expressions you want to map, and we will work out together how exactly the "complex profile" will look like. I hope this makes sense! |
Hi, I'd like to add to @tudorache's request for post-coordination mappings. Some people working with WHO are trying to harmonize terms in WHO classifications (ICD, ICF [for functioning and disability], and ICHI [for health interventions]). For example there are signs and symptoms terms in ICD that are equivalent to ICF body functions and some impairment qualifiers. There are also people working on mappings of various national intervention classifications to ICHI, that need to use conjunction, disjunction, and qualifier post-coordination. I can solicit examples these mapping requirements. |
@samsontu thanks, some more concrete examples would be great. When you say "conjunction, disjunction, and qualifier post-coordination", do you explicitly mean those in the OWL sense? Or do you mean in a more abstract fashion, without any particular logical formalism in mind (i.e. some use cases would create some Common Logic, others OWL, others some other First Order Logic output), or can we assume that when you do post-composition, you are always talking in terms of OWL 2 class expressions? |
@cmungall just ran an idea by me which sounds extremely crazy at first, but would allow for a clear separation of concerns. I am not saying we go there, it sounds a bit crazy, but it allows us to 1) keep SSSOM simple and 2) offer maximal flexibility to the mapping process. The idea is this.
There is some risk of the connection between the SSSOM mapping filed and the two template files breaking, but we can require to use versioned PURLs for this field to control the issue. Advantage of this solution:
Disadvantages:
I think I can get behind a solution like that. Let me know what you all think! |
Can you reformulate this so all of the extra metadata required to do this kind of transformation lives in a secondary configuration file? Like would it be possible make a standardized way of doing this that doesn't touch the SSSOM standard at all? |
Less then these three simple mapping set level elements? You can make a proposal but I cannot think of any way that gives at least some kind of integrity to mapping - template connection.. What is your concern? |
I like the innovation, but I have a concern that may be the same @cthoyt is getting at. It's possible I've missed something along the way, but to say "System X supports the SSSOM standard", we need the functionality of the SSSOM standard to be clearly defined, understood, and easily implemented. (And my assumption is that the standard is defining a data file format, not a set of supported operational capabilities.) While I was thinking of it as a table of triples with some prefixes in front of it, easily converted to RDF, I was very confident that BioPortal could take that information and convert it to BioPortal mappings. With each complexity that might get added (additional columns, annotational specifications, indirect automation of information construction), I'm less sure about what is involved. If/when I have a few hours I'll be able to go through the whole thing in detail and maybe it's still straightforward. But in this case, it's clear that supporting the specification would require implementing tooling to recognize and apply the transformations on the fly. Far simpler (for a 'data file standard') would be if all those transformations happened in order to create the SSSOM-compatible file, rather than as a step in interpreting it. |
Unfortunately we are at this situation:
Both are diametrically opposed, and if we cannot agree on a standard way to do it, I will just promote the idea here as a non-standard way to deal with it (because I and my stakeholders need it) - outside the SSSOM standard. This means ad-hoc solutions for representing complex subjects will emerge, which may be fine, but may create difficulties later on. Rather than opposing any solution for complex subjects (we do need one), I want to encourage you (@cthoyt and @graybeal) to present concrete concerns which can be addressed.. I am fine to package the proposal here into an SSSOMC complex extension if I have to, but we need to really weigh the complexity of maintaining two models against the perceived benefits. Right now the proposal does
Except for extremely specialised tooling, no one will ever need to deal with the complex mappings shared like this! |
So:
First you have to tell your ingest software to ignore those elements. Then if you ignore the elements, you don't get the mappings, right? So you think you have processed the mapping file successfully but really you missed some arbitrary (unknown) amount of the content. If I'm not getting that please clarify. (Of course what BioPortal does with a coordinated mapping statement like "Mild diabetic retinopathy" sameAs "Diabetic retinopathy and severity some Mild", I have no idea either. But I digress.) I think this is a concrete concern: "You are requiring any adopting system to create or integrate complex algorithms to implement post-coordination, with all the potential challenges and inconsistencies that implies". I'm not sure it's a fair one because I'm not sure how complex the algorithm(s) will have to be, if they will have multiple sources of implementation, or require complex installation/integration procedures. But if we can agree this is an SSSOMC extension, what about the following to minimize the programming/devops required and the variability that might ensue?
I'll stop here, I know I'm over-designing. I just to show by example how everything could be tied together and examinable in a single file, eliminating all sorts of coupling and versioning issues; consistent results could be expected across any two systems that have installed the named template_system; the results could be recognizable as mappings; and there could be some control put in place up front that forces all the possible template_systems in the universe to satisfy a common specification for their operation (so, zero-coding installations). Whatever you can do to minimize the cost of implementation (for systems having to process SSSOM) and the cost of understanding (for users staring at an SSSOM file and wondering what it means) will increase adoption. |
@matentzn I actually wasn't thinking of adding any fields to sssom. All I'm proposing is a simple standard way of serializing or hashing an expressions as a CURIE. there are multiple ways to distribute simple lookup tables alongside sssom - for example a simple templating system that maps tuple + pattern to the composed identifier. but sssom doesn't need to know about these. it's just another id as far as sssom is concerned. |
If you are proposing "a simple standard way of serializing or hashing an expression as a CURIE" then I totally misunderstood you in the call. Ok then, back to square one. What is your proposal then? How will we connect the "simple lookup table" to the sssom mapping file if not through metadata? through file naming or packaging conventions? Since the subject_id values wont be connected to actual (resolvable) term IRIs, it will have to be link somewhere to be interpretable - maybe |
During a meeting I heard:
Unfortunately I didn't document which meeting. Another reason to be better at recording provenance. |
Ok, so @cmungall correct me if I am wrong. Your proposal is this. Rather than extending SSSOM, we define a convention based on A complex mapping is defined using a URL query pattern solution.
Thats it. The mapping provider may choose to bind
Which would return:
If the service is self-describing (defined with LinkML, for example), you could even look up what it does in swagger or some such. In SSSOM TSV it would look something like this:
Its not a thing of extreme beauty, but its pracical would help us serve to goals:
|
I've been thinking about this kind of technique, largely in the context of https://units-of-measurement.org/. Canonical FormThere should be only one way to write the pattern URL. I think it should be more opaque rather than more transparent.
We could use numeric IDs for patterns and fixed order for the slots: For even more opaqueness and brevity, we could compress/encode the arguments: Equivalent to Pre-Composed?What if Offline ProcessingNico gives the example of using |
Excellent point about the order resulting in two distinct concepts... Didnt think of that.. I thought about compression (base64 encoding) but felt it too away too much transparency.
Hmmm.. I still prefer them to be a bit readable.. You are sacrificing readability for data integration precision here, which I am not too sure I like the balance of.. I would prefer a pattern registry where all the
This is the real elephant in the room. I proposed something like this in my previous suggestions about how to deal with post composition (having a specific field in sssom with the ordered list of fillers for the post composed expression), and @cmungall shot it down.. I am a bit on the fence; I don't like any of the proposals very much so far, so for me its more about what I can tolerate the most :D Offline processing 100%, great idea. It would need some knowledge of the patterns being processed (perhaps this obo wide pattern registry I hinted at) but we should provide it as a python library on pypi.. Nice point. Let's see how @cmungall reacts to the "ordered parameter" suggestion. Your order sensitivity argument is compelling, but you could solve it by requiring alphanumeric ordering of query parameters.. |
I think the canonical form issue is something that can be addressed:
but there will always be different expressions that have the same extent, including non-anonymous expressions (@jamesaoverton's example of an expression that is later assigned a named class). This is all unavoidable, but fine. You just don't make the Unique Name Assumption. There can be surrounding tools that will infer equivalence and subsumption between expressions and either other expressions or named entities, that can be used to normalized files, eliminate trivial matches, but these should be optional, and considered best-effort. I admit this is a little unsatisfying. While OWL doesn't have a UNA, we have a convention in most ontologies we do a best effort to follow the UNA. We could make the URLs a bit shorter by assuming a fixed order and treating as a tuple. Serializations like protobuf do this under the hood. For this to work you need guarantees that ordering does not change, you can't later change your mind and insert intermediates or flip things around. This opens up a lot of possibilities for errors. Of course, named parameters are no cast iron guarantee of analogous errors, but it's unlikely that someone will invert the semantics of And ultimately the gain is IMO a bit marginal, the URLs are still ugly, and don't follow normal linked data idioms. I also think we may find tuples limiting. The tuple limitation in DOSDPs has led to pattern proliferation. I think if the system allows for both optional and multivalued (but no nesting) it will cover a broader range of use cases. Base64 encoding: I am open to that (I originally discounted hashing as not reversible, but base64 or any reversible encoding would work). I take Nico's point about transparency. But it would be very easy to expand on the command line etc. And it's not like In fact I am rapidly warming to this suggestion. Offline processing: definitely! There should be no need for a dependency on a server. There does need to be a way of resolving a pattern/template/class to some kind of computable description, but that can be entirely static. Of course having a lightweight service would be a nice thing to have, but not a necessity. Thanks everyone for engaging. This is a hard problem, there are difficult tradeoffs either way. But if we get this right, this solution could work for post-composition in general, not just in SSSOM. |
Ok lets pull some of the pieces of the URL apart. This is the basic grammar:
We have the "template portion", which is actually the part that most resembles a proper IRI. The grammar of the template portion is
|
Ok, just as a warning, we will probably solve this issue as follows:
The client can then call http://obofoundry.org/patterns/000001/robot/P2ZpbGxlcnM9VUJFUk9OOjEyMyxQQVRPOjEyMw== to refer to the pattern and instantiate it locally with, for example, |
Anyways, none of this affects SSSOM directly, but it is hugely important to the mapping community to find some way to distribute these. I guess you could be a horrible person and do: http://obofoundry.org/patterns/000001/ofn/P2ZUHIUYAGUJHGUY where P2ZUHIUYAGUJHGUY is a valid OWL class expression in OWL functional syntax :D |
Are you planning to use the OBO PURL system http://purl.obolibrary.org/? If not, why not? I don't quite understand how the static files would work for a ROBOT template. |
The OBO purl was just an example. Of course I would use the OBO PURL system if it was something OBO related! The point with the static files is that they would not "work" - they are like executable documentation. If I were to write a client for ROBOT template (which I would), I would basically
A webservice system would of course be able to hide the details of the templating system, but possibly at the cost of making versioning and maintenance harder. |
Hi @matentzn we would to implement sssom as a main model to publish our mappings. We are going to have as @tudorache wrote, multiple mappings from simple to complex combination (icd-11 postcoordination). Can we think to a solution where we can use a blank node such as :
As a concrete example we can obtain using owl : mappingcim10_cim11:MappingJ1001E30XN5SG
a sssom:Mapping ;
sssom:mapping_justification "Manuel" ;
sssom:object_id "1E30&XN5SG" ;
sssom:object_source [
a owl:Class ;
owl:intersectionOf (
<http://id.who.int/icd/release/11/mms/1418788600>
<http://id.who.int/icd/release/11/mms/753780243>
) ;
] ;
sssom:predicate_id "skos:exact" ;
sssom:subject_id "J10.0" ;
sssom:subject_source "http://data.esante.gouv.fr/atih/cim10/J10.0"^^xsd:anyURI ;
rdfs:label "Mapping_J100_1E30XN5SG" ;
. which can be viewed on tobraid-edg for example like this : Or you have a better solution ? |
@tayeb83 thanks for reaching out. @callahantiff and a few of us will propose a way to capture these kinds of mappings on the 23rd of April (during a SSSOM workshop on "non simple mappings") - if you are interested to work with us on our proposal, you can reach out via email or linked in, and I will share the docs with you (we anticipate some resistance, so we have not shared it yet). The question is mostly what exactly should go into the As an aside, you won't violate the SSSOM spec when base64-encoding a class expression in, say, owl functional syntax and stick this into the object_id - we will discourage this in our proposal, but, you could, in theory, do this. |
@matentzn sure!! very interested to work with you on it (with my team), it's urgent for us since we make the choice to manage mappings in the https://smt.esante.gouv.fr/ using sssom. |
Looking forward to working on this with you both! |
Several countries (Canada, Australia, Germany, among them) have projects mapping their ICD-10 national modifications to ICD-11. Recently, a “mapping task force” was convened in the WHO Family of Classifications Network. In its initial meeting, I raised the possibility of using SSSOM to standardize the mapping format produced in these efforts. Post-coordination is a huge issue, as national modifications of ICD-10 are invariably more granular than the international version. I’d be interested in how your proposal can be used for these cases.
With best regards,
Samson
On Feb 28, 2023, at 5:35 AM, Tiffany J. Callahan ***@***.***> wrote:
@tayeb83<https://github.com/tayeb83> thanks for reaching out. @callahantiff<https://github.com/callahantiff> and a few of us will propose a way to capture these kinds of mappings on the 23rd of April (during a SSSOM workshop on "non simple mappings") - if you are interested to work with us on our proposal, you can reach out via email or linked in, and I will share the docs with you (we anticipate some resistance, so we have not shared it yet).
The question is mostly what exactly should go into the object_id slot. We wont reach universal agreement here on SSSOM level, but we hope for a nice convention that strikes a balance between interpretability and expressiveness. We wont suggest to embed the whole anonymous expression in the sssom file - we advocate for decoupling logical concerns from SSSOM in externally defined pattern files.
As an aside, you won't violate the SSSOM spec when base64-encoding a class expression in, say, owl functional syntax and stick this into the object_id - we will discourage this in our proposal, but, you could, in theory, do this.
Looking forward to working on this with you both!
—
Reply to this email directly, view it on GitHub<#108 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AANN5RGOHZBBJ5ODGLASP63WZX5K3ANCNFSM5I2V26EQ>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
@samsontu This is very relevant for us to inform our work. Could share with us 10 mappings in whatever format which use post coordination, so we can make sure our proposal still works for these cases? @tayeb83 Please let me know how we can drive your issue forward - happy to meet as well (next week) |
I don’t have the mappings myself. I'll forward your request to to the relevant people in the WHO-FIC community.
With best regards,
Samson
On Mar 1, 2023, at 6:00 AM, Nico Matentzoglu ***@***.***> wrote:
@samsontu<https://github.com/samsontu> This is very relevant for us to inform our work. Could share with us 10 mappings in whatever format which use post coordination, so we can make sure our proposal still works for these cases?
@tayeb83<https://github.com/tayeb83> Please let me know how we can drive your issue forward - happy to meet as well (next week)
—
Reply to this email directly, view it on GitHub<#108 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AANN5RCS756HELBGXQZ65DDWZ5JAFANCNFSM5I2V26EQ>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
this may be a good alternative to encoding param value pairs as http params: |
Linking to slides from SSSOM workshop which relate to this issue https://docs.google.com/presentation/d/1kFD33S_WMgEGmCnT7IjVCeEyKI7OpcUw1ZzRXGqt1hs/edit#slide=id.g22c799fa946_0_0 |
Thank you for the SSSOM initiative!
I think I know the answer, but I would like to get your opinion on this topic: Is there a way to express in SSSOM the fact that one or both of subject/object are post-coordinated entities?
For example, in ontology one, there is a class: "Mild diabetic retinopathy" which should be mapped in ontology two to a post-coordinated entity: "Diabetic retinopathy and severity some Mild".
My guess is that this scenario is not supported in SSSOM. Do you envision some extensions to SSSOM in the future that would support post-coordinated entities, or what would you recommend as alternative mapping language for this case?
My strong preference would be to use SSSOM if there is a way to express these type of more complex mappings. Thank you!
The text was updated successfully, but these errors were encountered: