Revised proposal for Interlex-based cross-references with CBRAIN #628

emmetaobrien · 2024-03-22T16:50:41Z

This issue replaces #626.

The objective of this exercise is to enable links between CONP and CBRAIN via use of interlex terms to annotate datasets in CONP and tools in CBRAIN, and then correlate these annotations to generate links to CBRAIN tools relevant to each specific CONP dataset. The suggested means to store this information at the CONP end is by modifying DATS.json files to contain interlex links as appropriate.

The current structure of the types entity in the DATS.json file is:

 "types": [
    {
      "information": {
        "value": "value1"
      }
    },
    {
      "information": {
        "value": "value2"
      }
    },
  ...
]

The proposal is to change types values in CONP to:

 "types": [
    {
      "information": {
        "value": "value1",
        "valueIRI":"http://uri.interlex.org/<link1>"
      }
    },
    {
      "information": {
        "value": "value2",  
       "valueIRI":"http://uri.interlex.org/<link2>" 
      }
    },
  ...
]

where we have the interlex link available.

Changing DATS editor functionality :
a) offering a selectable list of the terms we already have with their interlex entries (I have prepared a lookup table of this data including those existing CONP types entities for which an interlex match is ambiguous or not found, at this point ~80% of distinct entries in CONP can be correlated with an interlex entry, and it should be possible to improve that further by seeking domain-expert input on some of the ambiguities).
b) allowing the user to search interlex for a term not already on that list, and then adding it to the list in future (maybe?)
c) allowing the user to add terms not in interlex and save them without generating a link (i.e. the current default behaviour)
Additional validation: for each types entry, check whether they have a link, and if so confirm that it is correctly formed and leads to a valid interlex page.
In the CONP portal display for each dataset:
a) show interlex links from each relevant types entry (should be transparent with the proposed design)?
[ types entries currently link to all datasets containing that entry ]
b) use interlex links as cross references to look up appropriate tools for this dataset (how to present this TBD?)
a once-off curation pass through all our existing DATS,json files to update types entries with relevant interlex links. This will involve sufficient harmonisation (e.g. consistent capitalisation, confirming that acronyms and full names refer to the same entity, editing the occasional case where multiple types have been entered in a single entity into separate entities) that I think it needs to be a manual exercise, and I am in a good position to go ahead with this based on preparing the abovementioned lookup table.

An alternative to 1b) above would be for the validation process to include looking up every submitted term that does not have an interlex link and see whether one can be found; any new links/entries to the lookup table derived automatically should receive manual confirmation. Either way the updated lookup table will then need to be made available to CBRAIN.

(Where Interlex is mentioned in the text above, links to other controlled vocabularies could be used in the same general structure if there were demand.)

Documentation will also need to be updated, precise details TBD as and when the implementation decisions have been made.

The text was updated successfully, but these errors were encountered:

carona898 · 2024-03-25T17:20:17Z

I think this proposal looks great. The implementation makes sense to me. I think I prefer the alternative for 1b) at the end of the proposal because it makes sense to me to kind of merge 1b) and 1c) together. Also, I think it would be a bit much to offer the user the option to select from all the other InterLex terms that we don't already have. (It might be a pretty huge list?)

emmetaobrien · 2024-03-26T18:53:53Z

I was thinking rather than a pull-down list of all the terms, which would indeed be huge and unwieldy, something reasonably prominent pointing the user to Interlex's own term search interface.

emmetaobrien · 2024-03-28T17:32:19Z

I have now completed adding valueIRI to DATS.json files in forks of all CONP datasets, according to the lookup table at https://github.com/CONP-PCNO/conp-documentation/blob/master/Developers-Notes/types_interlex_lookup.txt , and can incorporate those into the live version whenever that is agreed on.

emmetaobrien added the enhancement New feature or request label Mar 22, 2024

emmetaobrien changed the title ~~Revised proposal for interlex link annotation in CONP~~ Revised proposal for Interlex-based cross-references with CBRAIN Mar 22, 2024

emmetaobrien self-assigned this Mar 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revised proposal for Interlex-based cross-references with CBRAIN #628

Revised proposal for Interlex-based cross-references with CBRAIN #628

emmetaobrien commented Mar 22, 2024 •

edited

Loading

carona898 commented Mar 25, 2024

emmetaobrien commented Mar 26, 2024

emmetaobrien commented Mar 28, 2024 •

edited

Loading

Revised proposal for Interlex-based cross-references with CBRAIN #628

Revised proposal for Interlex-based cross-references with CBRAIN #628

Comments

emmetaobrien commented Mar 22, 2024 • edited Loading

carona898 commented Mar 25, 2024

emmetaobrien commented Mar 26, 2024

emmetaobrien commented Mar 28, 2024 • edited Loading

emmetaobrien commented Mar 22, 2024 •

edited

Loading

emmetaobrien commented Mar 28, 2024 •

edited

Loading