-
Notifications
You must be signed in to change notification settings - Fork 22
Documentation: create a specifications document on how to write a MIxS LinkML slot #944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Documentation: create a specifications document on how to write a MIxS LinkML slot #944
Conversation
I skimmed this and it looks like a fantastic starting point. I didn't see anything that I disagree with yet and I'm sure we can add more over time. So I will read it again more carefully and am looking forward to advocating for it to be merged in! We should see if anything has come out of @only1chunts's related efforts about defining or clarifying the role of the different LinkML metaslots for MIxS terms/slots @sierra-moxon and I have been talking about refining the definitions of LinkML metaslots, and this may serve as a contribution towards that. |
nice :D I look forward to @only1chunts 's thoughts, and if mostly happy we can move to a bigger discussion in one of the TWG/CIG meetings? |
…-standards-consortium-mixs into docs-slot-specifications
Mostly updated bsaed on your feedback @only1chunts, main outstanding one is about the mapping table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
big positive step forward! Thanks @jfy133
src/docs/slot_specifications.md
Outdated
| Term | `slot` | A single discrete bit of information (metadata) that has various attributes on how this information should be represented and formatted | | ||
| Item | `title` | A short human readable name for the metadata term/slot | | ||
| MIXS ID | `slot_uri` | | | ||
| Definition | `description` | A detailed human-readable explanation of what information the metadata term/slot should be holding | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need description writing guidelines. I recommend Guidelines for writing definitions in ontologies by Seppälä, Ruttenberg and Smith
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would need agreement on this from the wider TWG/CIG groups
src/docs/slot_specifications.md
Outdated
|
||
The description SHOULD NOT include basic examples of the data the term is intended to hold (this is covered by the `examples` attribute). | ||
|
||
The description MAY include examples when the information for the term requires different formatting depending on certain conditions. The description MAY also include examples when it requires additional understanding that cannot be inferred by looking purely at the `examples` section. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
examples please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would need suggestoins as to what are the good examples of this from the CIG...
|
||
### 8.1 Number of keywords | ||
|
||
All term (slots) MUST have at least one keyword. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to discuss alignment and maintenance of keywords
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
personally I dislike keywords in general as they are by definition redundant, i.e. if its a key feature of the term then it should be in the name/title and or description. They are not useful for grouping things because they are uncontrolled free text, they are subjective based on the opinions of the person adding them and they are not specified to any particular level of granularity. But I'll step off my soap box now, rather than saying MUST I would suggest we use COULD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly agree, I never saw the purpose of them given there are sections
/subsets
already...
I could remove this for now as a specification and (leaving it entirely optional) unless @turbomam gives a useful use case and a way to standardise (in which case we keep it in, and bring it up for discussion with CIG/TWG)
src/docs/slot_specifications.md
Outdated
|
||
### 9.2 MIXS ID format | ||
|
||
The MIXS ID (slot_uri) must begin with the string `MIXS`, a colon, and followed by a 7 digit number. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should discuss who assigns these and when. Ideally, there would be no other system of record other than the schema file, so there could never be any conflicts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lschriml said at the GSC25 MIxS working day that currently this is specified on just prior release after a feature freeze for all new terms.
She was interest in automation of this though!
Can we also take this opportunity to encourage/require the use of term creator, maintainer, last edited date metaslots etc? |
@turbomam @only1chunts I think I've addressed all comments now, please have a look when you have a chance! When i compar eagainst: https://www.gensc.org/pages/standards-intro.html I think I've hit everything
I'm not sure how expected value fits in/if it's necessary, and the preferred unit bit whether what I've written is valid or not |
I definitely think we should, but needs agreement from CIG/board etc I think.. |
src/docs/slot_specifications.md
Outdated
|
||
### 2.1 Term name length | ||
|
||
The term (slot) name must be a maximum of 20 characters in length as per INSDC guidelines ([https://www.insdc.org/submitting-standards/feature-table/#3.1](https://www.insdc.org/submitting-standards/feature-table/#3.1)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace "term (slot) name " with "structured comment name (name)", this should be done for heading 2.2 (nb that number needs to be corrected too, its currently 2.1), 2.4, 2.5, 2.6 and 2.7. and in other places in this section text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated but I would like to lightly protest this on my own soapbox: what is a 'structured comment' even?! A comment to me refers to additional possibly opinionated information... I wouldn't use 'comment' to refer for such a fundamental thing like a name... I think that's why I went simply for just 'Term name' because I'm very worried a reader not particular familiar with MIxS will not know what we are talking about (in case they skipped the mapping table)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should bring up using the alias "short name" in place of "strucutred comment name" with CIG? I dont think we can entirely deprecate the use of "structured comment name" but we may be able to promote the use of "short name" in most instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
short name would definitely be better - but I think key name
would be a more precise word for it... it's the 'computerised' version to refer to that 'item'(?)
It's not really meant to be an 'alias' for humans
src/docs/slot_specifications.md
Outdated
|
||
## 4. Term data types | ||
|
||
### 4.1 Term data types must be valid LinkML types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess there is no direct MIxS equivelent to this? But I think it should be included in the table in preamble terminology table in the 'range' row. It just sort of jumped out of the blue here as a new concept. Maybe we want to change the header 4.1 to "The data types of a range must be valid LinkML types"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure now - your comment above seemed to imply that expected value
is the same as range
, just range
is much more restricted in the different types? Or is it not eqivalent then?
### 10.2 Specifying units | ||
|
||
Terms (slots) that require the use of a measurement unit SHOULD specify the types of units through a dedicated structured string pattern component. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this means? I thought we were encouraging the use of the 'preferred_unit:' linkml field for the unit requirements, which is also a MIxS word and should be included in the table at the top.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is preferred_unit
a LinkML thing:? It's not specified anywhere in the existing mixs.yaml
🤔
I also don't find a refernce to it in the linkml docs: https://linkml.io/linkml/search.html?q=preferred_unit&check_keywords=yes&area=default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you search for the word "preferred_unit" in the mixs.yaml file you will see 238 instances of it. If you look in the old v6 excel spreadsheet there is a column for "Preferred unit". I dont know if its from the off-the-shelf linkml or something thats been added specificly for our use case, but its definately there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scratch that: it's spelt Preferred_unit
(why the captial!?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, but I find it sort of unsatisfying (also the way the preferred units have been specified is very messy, with some having loads of different 'preferred' units... which sort of defeats the point...
e.g.
abs_air_humidity:
annotations:
Preferred_unit: gram per gram, kilogram per kilogram, kilogram, pound, gram per cubic meter, kilogram per cubic meter, percent
- [`keywords`](https://linkml.io/linkml-model/latest/docs/keywords/). | ||
- [`slot_uri`](https://linkml.io/linkml-model/latest/docs/slot_uri/). | ||
- [`range`](https://linkml.io/linkml/schemas/slots.html#ranges). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two points on section 4.1:
a. Should hte structured comment name be included in the list of required attributes? technically I guess its not an attribute in LinkML speak because its the slot identifier, but it is required.
b. For completeness can we include the MIxS terminology in the list as well?
This is a natural extension of PR #943 .
Instead of providing examples to slots that can be used as templates by newcomers for writing/preparing new slots, this is meant to act as a precise and exact reference (as far as possible) of exactly how a slot should be designed.
I have based the structure (e.g. with numbering, which could be likely automated instead of manually defining by a website rendering engine) off of another bioinformatics community project I am heavily involved in (example).
This is not yet finished, and will likely need large community input - however I place this hear to kick-start a conversation.
I will write based on my impression of the MIxS LinkML schema.
Warning
This page is entirely based on the experiences of a novice user, and will likely require heavy editing by experts