Skip to content

Conversation

jfy133
Copy link
Collaborator

@jfy133 jfy133 commented Apr 28, 2025

This is a natural extension of PR #943 .

Instead of providing examples to slots that can be used as templates by newcomers for writing/preparing new slots, this is meant to act as a precise and exact reference (as far as possible) of exactly how a slot should be designed.

I have based the structure (e.g. with numbering, which could be likely automated instead of manually defining by a website rendering engine) off of another bioinformatics community project I am heavily involved in (example).

This is not yet finished, and will likely need large community input - however I place this hear to kick-start a conversation.

I will write based on my impression of the MIxS LinkML schema.

Warning

This page is entirely based on the experiences of a novice user, and will likely require heavy editing by experts

jfy133 added a commit to jfy133/genomics-standards-consortium-mixs that referenced this pull request Apr 28, 2025
@jfy133 jfy133 marked this pull request as ready for review May 5, 2025 19:37
@turbomam
Copy link
Member

turbomam commented May 6, 2025

I skimmed this and it looks like a fantastic starting point. I didn't see anything that I disagree with yet and I'm sure we can add more over time. So I will read it again more carefully and am looking forward to advocating for it to be merged in!

We should see if anything has come out of @only1chunts's related efforts about defining or clarifying the role of the different LinkML metaslots for MIxS terms/slots

@sierra-moxon and I have been talking about refining the definitions of LinkML metaslots, and this may serve as a contribution towards that.

@jfy133
Copy link
Collaborator Author

jfy133 commented May 7, 2025

nice :D I look forward to @only1chunts 's thoughts, and if mostly happy we can move to a bigger discussion in one of the TWG/CIG meetings?

@jfy133
Copy link
Collaborator Author

jfy133 commented Jun 20, 2025

Mostly updated bsaed on your feedback @only1chunts, main outstanding one is about the mapping table

@jfy133 jfy133 changed the title Create a specifications document on how to write a MIxS LinkML slot Documentation: create a specifications document on how to write a MIxS LinkML slot Jul 14, 2025
Copy link
Member

@turbomam turbomam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

big positive step forward! Thanks @jfy133

| Term | `slot` | A single discrete bit of information (metadata) that has various attributes on how this information should be represented and formatted |
| Item | `title` | A short human readable name for the metadata term/slot |
| MIXS ID | `slot_uri` | |
| Definition | `description` | A detailed human-readable explanation of what information the metadata term/slot should be holding |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need description writing guidelines. I recommend Guidelines for writing definitions in ontologies by Seppälä, Ruttenberg and Smith

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need agreement on this from the wider TWG/CIG groups


The description SHOULD NOT include basic examples of the data the term is intended to hold (this is covered by the `examples` attribute).

The description MAY include examples when the information for the term requires different formatting depending on certain conditions. The description MAY also include examples when it requires additional understanding that cannot be inferred by looking purely at the `examples` section.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

examples please

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would need suggestoins as to what are the good examples of this from the CIG...


### 8.1 Number of keywords

All term (slots) MUST have at least one keyword.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to discuss alignment and maintenance of keywords

Copy link
Member

@only1chunts only1chunts Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personally I dislike keywords in general as they are by definition redundant, i.e. if its a key feature of the term then it should be in the name/title and or description. They are not useful for grouping things because they are uncontrolled free text, they are subjective based on the opinions of the person adding them and they are not specified to any particular level of granularity. But I'll step off my soap box now, rather than saying MUST I would suggest we use COULD.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly agree, I never saw the purpose of them given there are sections/subsets already...

I could remove this for now as a specification and (leaving it entirely optional) unless @turbomam gives a useful use case and a way to standardise (in which case we keep it in, and bring it up for discussion with CIG/TWG)


### 9.2 MIXS ID format

The MIXS ID (slot_uri) must begin with the string `MIXS`, a colon, and followed by a 7 digit number.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should discuss who assigns these and when. Ideally, there would be no other system of record other than the schema file, so there could never be any conflicts.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lschriml said at the GSC25 MIxS working day that currently this is specified on just prior release after a feature freeze for all new terms.

She was interest in automation of this though!

@turbomam
Copy link
Member

Can we also take this opportunity to encourage/require the use of term creator, maintainer, last edited date metaslots etc?

@jfy133 jfy133 requested review from only1chunts and turbomam August 11, 2025 20:08
@jfy133
Copy link
Collaborator Author

jfy133 commented Aug 11, 2025

@turbomam @only1chunts I think I've addressed all comments now, please have a look when you have a chance!

When i compar eagainst: https://www.gensc.org/pages/standards-intro.html

I think I've hit everything

MIxS Attribute Covered Spec section
Structured comment name YES Term structured naming
Item (rdfs:label) YES Term item title attribute
Definition YES Term description
Expected value NO  
Value syntax YES Slot range attribute
Example YES Term examples attribute
Section YES Term section attribute
Preferred unit PARTIAL Slot range attribute – Specifying units
Occurence YES Multiple occurance
MIXS ID YES Term MIxS ID attribute

I'm not sure how expected value fits in/if it's necessary, and the preferred unit bit whether what I've written is valid or not

@jfy133
Copy link
Collaborator Author

jfy133 commented Aug 11, 2025

Can we also take this opportunity to encourage/require the use of term creator, maintainer, last edited date metaslots etc?

* https://linkml.io/linkml-model/latest/docs/contributors/

* https://linkml.io/linkml-model/latest/docs/modified_by/

* https://linkml.io/linkml-model/latest/docs/created_on/

* https://linkml.io/linkml-model/latest/docs/last_updated_on/

I definitely think we should, but needs agreement from CIG/board etc I think..


### 2.1 Term name length

The term (slot) name must be a maximum of 20 characters in length as per INSDC guidelines ([https://www.insdc.org/submitting-standards/feature-table/#3.1](https://www.insdc.org/submitting-standards/feature-table/#3.1)).
Copy link
Member

@only1chunts only1chunts Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace "term (slot) name " with "structured comment name (name)", this should be done for heading 2.2 (nb that number needs to be corrected too, its currently 2.1), 2.4, 2.5, 2.6 and 2.7. and in other places in this section text.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated but I would like to lightly protest this on my own soapbox: what is a 'structured comment' even?! A comment to me refers to additional possibly opinionated information... I wouldn't use 'comment' to refer for such a fundamental thing like a name... I think that's why I went simply for just 'Term name' because I'm very worried a reader not particular familiar with MIxS will not know what we are talking about (in case they skipped the mapping table)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should bring up using the alias "short name" in place of "strucutred comment name" with CIG? I dont think we can entirely deprecate the use of "structured comment name" but we may be able to promote the use of "short name" in most instances.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

short name would definitely be better - but I think key name would be a more precise word for it... it's the 'computerised' version to refer to that 'item'(?)

It's not really meant to be an 'alias' for humans


## 4. Term data types

### 4.1 Term data types must be valid LinkML types
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there is no direct MIxS equivelent to this? But I think it should be included in the table in preamble terminology table in the 'range' row. It just sort of jumped out of the blue here as a new concept. Maybe we want to change the header 4.1 to "The data types of a range must be valid LinkML types"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure now - your comment above seemed to imply that expected value is the same as range, just range is much more restricted in the different types? Or is it not eqivalent then?

### 10.2 Specifying units

Terms (slots) that require the use of a measurement unit SHOULD specify the types of units through a dedicated structured string pattern component.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this means? I thought we were encouraging the use of the 'preferred_unit:' linkml field for the unit requirements, which is also a MIxS word and should be included in the table at the top.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is preferred_unit a LinkML thing:? It's not specified anywhere in the existing mixs.yaml 🤔

I also don't find a refernce to it in the linkml docs: https://linkml.io/linkml/search.html?q=preferred_unit&check_keywords=yes&area=default

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you search for the word "preferred_unit" in the mixs.yaml file you will see 238 instances of it. If you look in the old v6 excel spreadsheet there is a column for "Preferred unit". I dont know if its from the off-the-shelf linkml or something thats been added specificly for our use case, but its definately there.

Copy link
Collaborator Author

@jfy133 jfy133 Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I'm aware of it in the MIxS Excel spreadsheet/original MIxS, but maybe I'm being crazy but I'm not finding it in the schema...:

image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scratch that: it's spelt Preferred_unit (why the captial!?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, but I find it sort of unsatisfying (also the way the preferred units have been specified is very messy, with some having loads of different 'preferred' units... which sort of defeats the point...

e.g.

  abs_air_humidity:
    annotations:
      Preferred_unit: gram per gram, kilogram per kilogram, kilogram, pound, gram per cubic meter, kilogram per cubic meter, percent

- [`keywords`](https://linkml.io/linkml-model/latest/docs/keywords/).
- [`slot_uri`](https://linkml.io/linkml-model/latest/docs/slot_uri/).
- [`range`](https://linkml.io/linkml/schemas/slots.html#ranges).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two points on section 4.1:
a. Should hte structured comment name be included in the list of required attributes? technically I guess its not an attribute in LinkML speak because its the slot identifier, but it is required.
b. For completeness can we include the MIxS terminology in the list as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants