Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revised proposal for Interlex-based cross-references with CBRAIN #628

Open
emmetaobrien opened this issue Mar 22, 2024 · 3 comments
Open
Assignees
Labels
enhancement New feature or request

Comments

@emmetaobrien
Copy link
Collaborator

emmetaobrien commented Mar 22, 2024

This issue replaces #626.

The objective of this exercise is to enable links between CONP and CBRAIN via use of interlex terms to annotate datasets in CONP and tools in CBRAIN, and then correlate these annotations to generate links to CBRAIN tools relevant to each specific CONP dataset. The suggested means to store this information at the CONP end is by modifying DATS.json files to contain interlex links as appropriate.

The current structure of the types entity in the DATS.json file is:

 "types": [
    {
      "information": {
        "value": "value1"
      }
    },
    {
      "information": {
        "value": "value2"
      }
    },
  ...
]

The proposal is to change types values in CONP to:

 "types": [
    {
      "information": {
        "value": "value1",
        "valueIRI":"http://uri.interlex.org/<link1>"
      }
    },
    {
      "information": {
        "value": "value2",  
       "valueIRI":"http://uri.interlex.org/<link2>" 
      }
    },
  ...
]

where we have the interlex link available.

  1. Changing DATS editor functionality :
    a) offering a selectable list of the terms we already have with their interlex entries (I have prepared a lookup table of this data including those existing CONP types entities for which an interlex match is ambiguous or not found, at this point ~80% of distinct entries in CONP can be correlated with an interlex entry, and it should be possible to improve that further by seeking domain-expert input on some of the ambiguities).
    b) allowing the user to search interlex for a term not already on that list, and then adding it to the list in future (maybe?)
    c) allowing the user to add terms not in interlex and save them without generating a link (i.e. the current default behaviour)

  2. Additional validation: for each types entry, check whether they have a link, and if so confirm that it is correctly formed and leads to a valid interlex page.

  3. In the CONP portal display for each dataset:
    a) show interlex links from each relevant types entry (should be transparent with the proposed design)?
    [ types entries currently link to all datasets containing that entry ]
    b) use interlex links as cross references to look up appropriate tools for this dataset (how to present this TBD?)

  4. a once-off curation pass through all our existing DATS,json files to update types entries with relevant interlex links. This will involve sufficient harmonisation (e.g. consistent capitalisation, confirming that acronyms and full names refer to the same entity, editing the occasional case where multiple types have been entered in a single entity into separate entities) that I think it needs to be a manual exercise, and I am in a good position to go ahead with this based on preparing the abovementioned lookup table.

An alternative to 1b) above would be for the validation process to include looking up every submitted term that does not have an interlex link and see whether one can be found; any new links/entries to the lookup table derived automatically should receive manual confirmation. Either way the updated lookup table will then need to be made available to CBRAIN.

(Where Interlex is mentioned in the text above, links to other controlled vocabularies could be used in the same general structure if there were demand.)

Documentation will also need to be updated, precise details TBD as and when the implementation decisions have been made.

@emmetaobrien emmetaobrien added the enhancement New feature or request label Mar 22, 2024
@emmetaobrien emmetaobrien changed the title Revised proposal for interlex link annotation in CONP Revised proposal for Interlex-based cross-references with CBRAIN Mar 22, 2024
@emmetaobrien emmetaobrien self-assigned this Mar 25, 2024
@carona898
Copy link
Contributor

I think this proposal looks great. The implementation makes sense to me. I think I prefer the alternative for 1b) at the end of the proposal because it makes sense to me to kind of merge 1b) and 1c) together. Also, I think it would be a bit much to offer the user the option to select from all the other InterLex terms that we don't already have. (It might be a pretty huge list?)

@emmetaobrien
Copy link
Collaborator Author

I was thinking rather than a pull-down list of all the terms, which would indeed be huge and unwieldy, something reasonably prominent pointing the user to Interlex's own term search interface.

@emmetaobrien
Copy link
Collaborator Author

emmetaobrien commented Mar 28, 2024

I have now completed adding valueIRI to DATS.json files in forks of all CONP datasets, according to the lookup table at https://github.com/CONP-PCNO/conp-documentation/blob/master/Developers-Notes/types_interlex_lookup.txt , and can incorporate those into the live version whenever that is agreed on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants